Publications
2023
- EnCoDe: Enhancing Compressed Deep Learning Models Through Feature—Distillation and Informative Sample SelectionRebati Gaire, Sepehr Tabrizchi, and Arman RoohiIn 2023 International Conference on Machine Learning and Applications (ICMLA), 2023
This paper presents Encode, a novel technique that merges active learning, model compression, and knowledge distillation to optimize deep learning models. The method tackles issues such as generalization loss, resource intensity, and data redundancy that usually impede compressed models’ performance. It actively integrates valuable samples for labeling, thus enhancing the student model’s performance while economizing on labeled data and computational resources. Encode’s utility is empirically validated using SVHN and CIFAR-10 datasets, demonstrating improved model compactness, enhanced generalization, reduced computational complexity, and lessened labeling efforts. In our evaluations, applied to compressed versions of VGGll and AlexNet models, Encode consistently outperforms baselines even when trained with 60% of the total training samples. Thus, it establishes an effective framework for enhancing the accuracy and generalization capabilities of compressed models, which is especially beneficial in situations with limited resources and scarce labeled data.
- Histogram of oriented gradients meet deep learning: A novel multi-task deep network for 2D surgical image semantic segmentationBinod Bhattarai, Ronast Subedi, Rebati Raman Gaire, and 2 more authorsMedical Image Analysis, 2023
We present our novel deep multi-task learning method for medical image segmentation. Existing multi-task methods demand ground truth annotations for both the primary and auxiliary tasks. Contrary to it, we propose to generate the pseudo-labels of an auxiliary task in an unsupervised manner. To generate the pseudo-labels, we leverage Histogram of Oriented Gradients (HOGs), one of the most widely used and powerful hand-crafted features for detection. Together with the ground truth semantic segmentation masks for the primary task and pseudo-labels for the auxiliary task, we learn the parameters of the deep network to minimize the loss of both the primary task and the auxiliary task jointly. We employed our method on two powerful and widely used semantic segmentation networks: UNet and U2Net to train in a multi-task setup. To validate our hypothesis, we performed experiments on two different medical image segmentation data sets. From the extensive quantitative and qualitative results, we observe that our method consistently improves the performance compared to the counter-part method. Moreover, our method is the winner of FetReg Endovis Sub-challenge on Semantic Segmentation organised in conjunction with MICCAI 2021. Code and implementation details are available at: https://github.com/thetna/medical_image_segmentation .
- Why is the winner the best?Matthias Eisenmann, Annika Reinke, Vivienn Weru, and 8 more authorsIn Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
International benchmarking competitions have become fundamental for the comparative performance assessment of image analysis methods. However, little attention has been given to investigating what can be learnt from these competitions. Do they really generate scientific progress? What are common and successful participation strategies? What makes a solution superior to a competing method? To address this gap in the literature, we performed a multi-center study with all 80 competitions that were conducted in the scope of IEEE ISBI 2021 and MICCAI 2021. Statistical analyses performed based on comprehensive descriptions of the submitted algorithms linked to their rank as well as the underlying participation strategies revealed common characteristics of winning solutions. These typically include the use of multi-task learning (63%) and/or multi-stage pipelines (61%), and a focus on augmentation (100%), image preprocessing (97%), data curation (79%), and postprocessing (66%). The "typical" lead of a winning team is a computer scientist with a doctoral degree, five years of experience in biomedical image analysis, and four years of experience in deep learning. Two core general development strategies stood out for highly-ranked teams: the reflection of the metrics in the method design and the focus on analyzing and handling failure cases. According to the organizers, 43% of the winning algorithms exceeded the state of the art but only 11% completely solved the respective domain problem. The insights of our study could help researchers (1) improve algorithm development strategies when approaching new problems, and (2) focus on open research questions revealed by this work.
- Placental vessel segmentation and registration in fetoscopy: literature review and MICCAI FetReg2021 challenge findingsSophia Bano, Alessandro Casella, Francisco Vasconcelos, and 8 more authorsMedical Image Analysis, 2023
Fetoscopy laser photocoagulation is a widely adopted procedure for treating Twin-to-Twin Transfusion Syndrome (TTTS). The procedure involves photocoagulation pathological anastomoses to restore a physiological blood exchange among twins. The procedure is particularly challenging, from the surgeon’s side, due to the limited field of view, poor manoeuvrability of the fetoscope, poor visibility due to amniotic fluid turbidity, and variability in illumination. These challenges may lead to increased surgery time and incomplete ablation of pathological anastomoses, resulting in persistent TTTS. Computer-assisted intervention (CAI) can provide TTTS surgeons with decision support and context awareness by identifying key structures in the scene and expanding the fetoscopic field of view through video mosaicking. Research in this domain has been hampered by the lack of high-quality data to design, develop and test CAI algorithms. Through the Fetoscopic Placental Vessel Segmentation and Registration (FetReg2021) challenge, which was organized as part of the MICCAI2021 Endoscopic Vision (EndoVis) challenge, we released the first large-scale multi-center TTTS dataset for the development of generalized and robust semantic segmentation and video mosaicking algorithms with a focus on creating drift-free mosaics from long duration fetoscopy videos. For this challenge, we released a dataset of 2060 images, pixel-annotated for vessels, tool, fetus and background classes, from 18 in-vivo TTTS fetoscopy procedures and 18 short video clips of an average length of 411 frames for developing placental scene segmentation and frame registration for mosaicking techniques. Seven teams participated in this challenge and their model performance was assessed on an unseen test dataset of 658 pixel-annotated images from 6 fetoscopic procedures and 6 short clips. For the segmentation task, overall baseline performed was the top performing (aggregated mIoU of 0.6763) and was the best on the vessel class (mIoU of 0.5817) while team RREB was the best on the tool (mIoU of 0.6335) and fetus (mIoU of 0.5178) classes. For the registration task, overall the baseline performed better than team SANO with an overall mean 5-frame SSIM of 0.9348. Qualitatively, it was observed that team SANO performed better in planar scenarios, while baseline was better in non-planner scenarios. The detailed analysis showed that no single team outperformed on all 6 test fetoscopic videos. The challenge provided an opportunity to create generalized solutions for fetoscopic scene understanding and mosaicking. In this paper, we present the findings of the FetReg2021 challenge, alongside reporting a detailed literature review for CAI in TTTS fetoscopy. Through this challenge, its analysis and the release of multi-center fetoscopic data, we provide a benchmark for future research in this field.
- A client-server deep federated learning for cross-domain surgical image segmentationRonast Subedi, Rebati Raman Gaire, Sharib Ali, and 3 more authorsIn MICCAI Workshop on Data Engineering in Medical Imaging, 2023
This paper presents a solution to the cross-domain adaptation problem for 2D surgical image segmentation, explicitly considering the privacy protection of distributed datasets belonging to different centers. Deep learning architectures in medical image analysis necessitate extensive training data for better generalization. However, obtaining sufficient diagnostic and surgical data is still challenging, mainly due to the inherent cost of data curation and the need of experts for data annotation. Moreover, increased privacy and legal compliance concerns can make data sharing across clinical sites or regions difficult. Another ubiquitous challenge the medical datasets face is inevitable domain shifts among the collected data at the different centers. To this end, we propose a Client-server deep federated architecture for cross-domain adaptation. A server hosts a set of immutable parameters common to both the source and target domains. The clients consist of the respective domain-specific parameters and make requests to the server while learning their parameters and inferencing. We evaluate our framework in two benchmark datasets, demonstrating applicability in computer-assisted interventions for endoscopic polyp segmentation and diagnostic skin lesion detection and analysis. Our extensive quantitative and qualitative experiments demonstrate the superiority of the proposed method compared to competitive baseline and state-of-the-art methods. We will make the code available upon the paper’s acceptance.
- SenTer: A reconfigurable processing-in-sensor architecture enabling efficient ternary MLPSepehr Tabrizchi, Rebati Gaire, Shaahin Angizi, and 1 more authorIn Proceedings of the Great Lakes Symposium on VLSI 2023, 2023
Recently, Intelligent IoT (IIoT), including various sensors, has gained significant attention due to its capability of sensing, deciding, and acting by leveraging artificial neural networks (ANN). Nevertheless, to achieve acceptable accuracy and high performance in visual systems, a power-delay-efficient architecture is required. In this paper, we propose an ultra-low-power processing in-sensor architecture, namely SenTer, realizing low-precision ternary multi-layer perceptron networks, which can operate in detection and classification modes. Moreover, SenTer supports two activation functions based on user needs and the desired accuracy-energy trade-off. SenTer is capable of performing all the required computations for the MLP’s first layer in the analog domain and then submitting its results to a co-processor. Therefore, SenTer significantly reduces the overhead of analog buffers, data conversion, and transmission power consumption by using only one ADC. Additionally, our simulation results demonstrate acceptable accuracy on various datasets compared to the full precision models.
2022
- GAN-Based Two-Step Pipeline for Real-World Image Super-ResolutionRebati Raman Gaire, Ronast Subedi, Ashim Sharma, and 3 more authorsIn ICT with Intelligent Applications: Proceedings of ICTIS 2021, Volume 1, 2022
Mostly, the prior works on single image super-resolution are dependent on high-resolution image and their bicubically downsampled low-resolution image pairs. Such methods have achieved outstanding results in single image super-resolution. Yet, these methods struggle to generalize real-world low-resolution images. Real-world low-resolution images have large varieties of degradation, and modeling these degradation accurately is a challenging task. Although some works have been proposed to address this problem, their results still lack fine perceptual details. Here, we use a GAN-based two-step pipeline to address this challenging problem of real-world image super-resolution. At first, we train a GAN network that transforms real-world low-resolution images to a space of bicubic images of the same size. This network is trained on real-world low-resolution images as input and bicubically downsampled version of their corresponding high-resolution images as ground truth. Then, we employ the nESRGAN+ network trained on bicubically downsampled low-resolution and high-resolution image pairs to super-resolve the transformed bicubic alike images. Hence, the first network transforms the wide varieties of degraded images into the bicubic space, and the second network upscales the first output by the factor of four. We show the effectiveness of this work by evaluating its output on various benchmark test datasets and comparing our results with other works. We also show that our work outperforms prior works in both qualitative and quantitative comparison. We have published our source code and trained models here for further research and improvement.