Fig. 4From: Advantages of transformer and its application for medical image segmentation: a surveyKey components of the ViT and Swin Transformer. a The ViT architecture, showcases the transformation of input feature maps into patches, followed by linear mapping and processing through the Transformer. The result undergoes classification via an MLP. b The details of the ViT encoder, emphasizing the integration of multihead attention modules. c The feature map evolution in Swin Transformer during W-MSA and SW-MSA computation, highlighting the cyclic shift operation for integrating shifted window feature maps. d Swin Transformer Block, outlining its computational processBack to article page