Application of visual transformer in renal image analysis

Yin, Yuwei; Tang, Zhixian; Weng, Huachun

doi:10.1186/s12938-024-01209-z

BioMedical Engineering OnLine

Table 1 Comparison of kidney image segmentation algorithm performance

From: Application of visual transformer in renal image analysis

Algorithms	Datasets	Evaluation indicators/results	Main views and contributions	Limitations
TransUNet [29]	Synapse 2015/ACDC	Synapse (DSC: 77.48%; Kidney (R): 81.87%; Kidney (L):77.02%; HD: 31.69 mm)/ACDC(DSC: 89.71%)	TransUNet is the first successful attempt to introduce a Transformer into medical image segmentation. Combining CNN and Transformer in coding	Transformer leads to a dramatic increase in the number of model parameters
IB-TransUNet [68]	Synapse 2015	DSC: Kidney (R):79.87% Kidney (L):83.89%	Using the UNet model to combine the information bottleneck (IB) with the Transformer	More advantages in learning small organ features
Swin-Unet [32]	Synapse 2015	DSC: 79.13% HD: 21.55 mm	The information bottleneck block was innovatively introduced in the encoding; a hierarchical Swin Transformer model with moving windows is used as an encoder to extract contextual features. An asymmetric Swin Transformer model decoder with a patch extension layer is designed to perform the upsampling operation	Higher dependency on large and diverse datasets with a large number of parameters and complexity
AgDenseU-Net 2.5D [60]	KiTS 2021	DSC: Kidney: 95% Tumor: 87.8% Cyst: 74.6%	Combining the features of AggRes (which enhances feature representation by aggregating residual connectivity and attention mechanisms) and DenseU-Net (which efficiently performs multi-scale feature fusion)	Higher computation and memory consumption, longer training time
LeViT-UNet [69]	Synapse/ACDC	Synapse (DSC: 78.53%, Kidney (R): 80.25%, Kidney (L): 84.61%, HD: 16.84 mm)/ACDC (DSC: 90.32%)	Using LeViT as the encoder of LeViT-UNet, combining LeViT Transformer with U-Net	Some metrics do not reach SOTA, and the segmentation performance is imaged to some extent to reduce the computational complexity
ViTBIS [70]	Synapse 2015	DSC: 80.45%	Adding the Concat operator for merging features	The dataset is more homogeneous, with fewer baselines for comparison
TransClaw U-Net [33]	Synapse 2015	Synapse (DSC: 78.09%, HD: 26.38 mm)	Claw U-Net with Transformer Combined/decoder dual-path design	Relatively homogenous data sets
After-Unet [71]	Thorax-85/BCV/SegTHOR thorax	Thorax-85 (DSC: 92.32%)/BCV (DSC: 81.02%)/SegTHOR thorax (DSC: 92.10%)	Both intra- and inter-slice long-distance cues were considered to guide segmentation	Axis information is naturally provided mainly for 3D volume
TransBTSV2 [19]	KiTS 2019/ BraTS2019/ BraTS2020/ LiTS 2017	KiTS 2019 (DSC: KIdney: 97.37%, Tumor: 83.69%, Composite: 90.53%)	Not limited to brain tumor segmentation (BTS) but focuses on general medical image segmentation, providing a powerful and efficient 3D baseline for the volumetric segmentation of medical images	Mainly for 3D medical image segmentation tasks
UNETR [31]	BTCV/MSD	BTCV (AVG: 89.1%)/MSD (DSC: 71.1%, HD95: 8.822 mm)	The Transformers encoder utilizes embedded 3D corpora to capture remote dependencies efficiently; the jump-join decoder combines extracted representations of different resolutions and predicts the segmentation output	Mainly for 3D medical image segmentation
DBT-UNETR [72]	BTCV	AVG:80.3%	An improved SwinUNETR is proposed based on UNETR with Swin Transformer as an alternative to Transformer	No significant improvement in performance compared to UNETR
NnFormer [37]	Synapse 2015/ ACDC	Synapse (DSC: 87.40%)/ACDC(DSC: 91.78%)	Utilizing a combination of cross-convolution and self-attention operations	Little performance gain on the ACDC dataset
HiFormer [73]	Synapse 2015	DSC:80.69%	Two multi-scale representations were designed based on the Swin transformer module and CNN encoder, and the Double-Level Fusion (DLF) module was designed to finely fuse the global and local features of the two representations	Single dataset
MPSHT [74]	Synapse 2015/ ACDC	Synapse (DSC: 79.76%, KIdney: 80.77%, HD: 21.55 mm)/ACDC(DSC: 91.80%)	Based on the CNN-Transformer model hybrid model, to which the asymptotic sampling module is added	Accuracy of segmentation to be improved
DSGA-Net [75]	Synapse 2015/ BraTs 2020/ ACDC	Synapse (DSC: 81.24%)/BraTs2020 (DSC: 85.82%)/ACDC(DSC: 91.34%)	Add a Depth Separable Gating Visual Transformation (DSG-ViT) module to the code and propose a Hybrid Three-Branch Attention (MTA) module	Considerable computational burden; consumes large amounts of GPU memory
MedNeXt [76]	BTCV/AMOS22/KiTS19/BraTS21/AVG	BTCV (DSC: 88.76%)/AMOS22 (DSC: 91.77%)/KiTS19 (DSC: 91.02%)/BraTS21 (DSC: 91.49%)/AVG (DSC: 88.01%)	The use of ConvNeXt 3D and the extension of ConvNeXt blocks to upsampling and downsampling layers represents a modern deep architecture for medical image segmentation	Deep Networks Dedicated to Medical Image Segmentation
MESTrans [77]	COVID-DS36/GlaS/Synapse/I2CVB	COVID-DS36 (DSC: 81.23%)/GlaS (DSC: 89.95%, IoU: 82.39)/Synapse (DSC: 77.48%, HD:31.69 mm)/I2CVB (DSC: 92.3%, IoU: 85.8)	Propose a Multi-scale Embedding (MEB) and Multi-layer Spatial Attention Transformer structure (SATrans) to adjust the sensory field. Propose a Feature Fusion Module (FFM) for global learning between shallow and deep features	The performance of small organ segmentation needs to be improved
ST-Unet [78]	Synapse 2015/ISIC 2018	Synapse2015(DSC:78.86%, HD:20.37mm)/ISIC 2018(F1:90.94%, mIoU:85.26)	Proposing a new Cross-Layer Feature Enhancement (CLFE) module for cross-layer feature learning with spatial and channel squeezing and excitation modules to highlight the saliency of specific regions	The accuracy of segmentation needs to be improved
COTRNet [79]	KiTS 2021	DSC: Kidney:92.28% Tumor:55.28% Cyst:0.50.52%	Utilizing pre-trained ResNet to develop the encoder, in addition to adding deep supervised	The accuracy of segmentation for masses and tumors needs to be improved
CS-Unet [80]	Synapse 2015	DSC:82.21% Kidney(R):79.52% Kidney(L):85.28%	Design of convolutional Swin-Transformer (CST) module that merges convolution with multi-head self-attention and feed-forward networks	Facing the challenge of dealing with long-range dependencies

Back to article page

ISSN: 1475-925X

Contact us

Submission enquiries: journalsubmissions@springernature.com