Skip to main content

Table 1 Transformer applications in histopathological image classification tasks

From: A survey of Transformer applications for histopathological image analysis: New developments and future directions

Method

Tissue

Dataset

Challenge

Highlight

ACC / F1 / AUC (%)

ScoreNet [16]

Breast

BRACS, BACH and CAMELYON16

The huge size of WSIs and the cost of exhaustive localized annotations

Efficient transformer-based architecture local and global attention mechanism

–/ 81.10 /–

BreaST-Net [51]

Breast

BreakHis

Differentiating subtypes of benign and malignant cancers

Ensemble of Swin transformers

99.60 / 99.50 / 99.40

HATNet [52]

Breast

Custom

Diagnostic variability and misdiagnosis of breast cancer

End-to-end ViTs with self-attention mechanism

71.00 / 70.00 /–

dMIL-transformer et al. [53]

Breast (LNM)

CAMELYON16 and 17 and the SLN-Breast

Taking into account the morphology and spatial distribution of cancerous regions

Two-stage double max–min MIL transformer architecture

89.23 / 84.83 / 91.67

ASI-DBNet  [54]

Brain

UHP

Lack of precision and accuracy in grading brain tumor

An adaptive sparse interactive ResNet ViT dual network

95.24 / 95.23 / 96.83

Ding et al. [55]

Brain

NCT-CRC-HE, BreaKHis and LDCH

Aliasing phenomena caused by downsampling operations and smoothing discontinuous

ViT-based network with wavelet position embedding

99.01 /–/–

DT-DSMIL [56]

Colorectal

Custom

Data annotations

Weakly supervised ViT-based MIL

93.50 / 94.37 / 97.69

IMGL-VTNet [57]

Gastric

IMGL

The problem of identifying IM glands

Multi-scale deformable transformer

–/ 94.00 /–

tRNAsformer  [58]

Kidney

TCGA

Gather the information needed to learn WSI representations

Transformer-based learning to predict RNA sequence expressions

96.25 / 96.25 /–

i-ViT [59]

Kidney

TCGA-KIRP

Capturing cellular and cell-layer level patterns

Instance-based Vision Transformer network

93.01 / 93.60 /–

GTP [46]

Lung

CPTAC, TCGA and NLST

Label noise

Graph-transformer with vision transformer

91.20 /–/ 97.70

FDTrans [60]

Lung

TCGA-NSCLC

Large intra-class differences and a lack of annotated datasets

Frequency domain transformer-based architecture

92.33 / 94.64 / 93.16

Yacob et al. [45]

Skin

Custom

Time-consuming and inter-pathologist variability

Weakly supervised approach using graph-transformer

93.50 /–/–

KAT [61]

Stomach

Gastric-2K, Endometrial-2K

Over-smoothing and High computational complexity

Kernel attention transformer

94.9 /–/ 98.30

DT-MIL [62]

Lung and breast

CPTAC-LUAD and BREAST-LNM

The problem of learning an effective WSI representation

Deformable transformer model for MIL

–/ 96.92 / 99.06

TCNN [1]

Breast, Lung, etc.

MDD and RWD

Artifacts in WSIs

Transformer with CNN

96.90 / 97.40 / 98.50

CWC-transformer  [63]

Breast and Lung

CAMELYON16, TCGA-LUNG and MSK

Loss of spatial information and problems associated with feature extraction in WSI

Combination of transformer and CNN

92.59 /–/ 94.88

TransPath [64]

Breast, Lung, etc.

TCGA, PAIP, PatchCam, etc.

Data annotation

Self-supervised learning transformer-based network

95.85 / 95.82 / 97.79

TransMIL [65]

Breast, Lung and Kidney

CAMELYON16, TCGA (NSCLC and RCC)

Correlation among different instances, Huge size and the lack of pixel-level annotations

Transformer-based multiple-instance learning (MIL)

94.66 /–/ 98.82

DecT [66]

Breast, Endometrium

BreakHis, BACH, and UC

Not taking into account the staining properties of histopathological images

Color deconvolution with transformer architecture

93.02 / 93.89 /–

LA-MIL [44]

Colorectal and stomach

TCGA-CRC and TCGA-STAD

Quadratic complexity of transformer architectures with respect to the sequence length

MIL local attention graph-based transformer model

–

Prompt-MIL [67]

Breast and colorectal

TCGA(BRCA and CRC and BRIGHT

Overfitting problems and a lack of annotated data

Prompt Tuning MIL transformer

93.47 /–/–

HAG-MIL [68]

Breast, Gastric, Lung, etc.

CAMELYON16, IMGC, TCGA-RCC and NSCLC

The difficulties in locating the most discriminative patches

Hierarchical attention-guided MIL transformer framework

91.40 / 89.40 / 98.20

MI-Zero [69]

Breast, cell, and lung

TCGA (BRCA, NSCLC and RCC), etc.

Computational issues and a scarcity of large-scale publicly available datasets

Transformer-based visual language pre-trained MI zero-shot transfer

70.20 /–/ –

HAG-MIL [69]

Breast, cell, and lung

TCGA (BRCA, NSCLC and RCC), etc.

Computational issues and a scarcity of large-scale publicly available datasets

Transformer-based visual language pre-trained MI zero-shot transfer

70.20 /–/ –

MEGT [47]

Kidney and breast

TCGA-RCC and CAMELYON16

The problem of learning multi-scale image representation from large images like gigapixel WSIs

Multi-scale efficient graph transformer-based network

96.91 / 96.26 / 97.30

MSPT [70]

Breast, and lung

TCGA-NSCLC and CAMELYON16

The problem of uneven representation between the negative and positive instances in bags

Multi-scale prototypical transformer-based network

95.36 /–/ 98.69

GLAMIL [71]

Breast, lung, and kidney

TCGA(RCC and NSCLC) and CAMELYON16

Overfitting, WSI-level feature aggregation, and imbalanced data challenges

Local-to-global spatial learning

95.01 /–/ 99.26