Skip to main content
Fig. 4 | BioMedical Engineering OnLine

Fig. 4

From: Advantages of transformer and its application for medical image segmentation: a survey

Fig. 4

Key components of the ViT and Swin Transformer. a The ViT architecture, showcases the transformation of input feature maps into patches, followed by linear mapping and processing through the Transformer. The result undergoes classification via an MLP. b The details of the ViT encoder, emphasizing the integration of multihead attention modules. c The feature map evolution in Swin Transformer during W-MSA and SW-MSA computation, highlighting the cyclic shift operation for integrating shifted window feature maps. d Swin Transformer Block, outlining its computational process

Back to article page