Method | Tissue | Dataset | Challenge | Highlight | ACC / F1 / AUC (%) |
---|---|---|---|---|---|
ScoreNet [16] | Breast | BRACS, BACH and CAMELYON16 | The huge size of WSIs and the cost of exhaustive localized annotations | Efficient transformer-based architecture local and global attention mechanism | –/ 81.10 /– |
BreaST-Net [51] | Breast | BreakHis | Differentiating subtypes of benign and malignant cancers | Ensemble of Swin transformers | 99.60 / 99.50 / 99.40 |
HATNet [52] | Breast | Custom | Diagnostic variability and misdiagnosis of breast cancer | End-to-end ViTs with self-attention mechanism | 71.00 / 70.00 /– |
dMIL-transformer et al. [53] | Breast (LNM) | CAMELYON16 and 17 and the SLN-Breast | Taking into account the morphology and spatial distribution of cancerous regions | Two-stage double max–min MIL transformer architecture | 89.23 / 84.83 / 91.67 |
ASI-DBNet  [54] | Brain | UHP | Lack of precision and accuracy in grading brain tumor | An adaptive sparse interactive ResNet ViT dual network | 95.24 / 95.23 / 96.83 |
Ding et al. [55] | Brain | NCT-CRC-HE, BreaKHis and LDCH | Aliasing phenomena caused by downsampling operations and smoothing discontinuous | ViT-based network with wavelet position embedding | 99.01 /–/– |
DT-DSMILÂ [56] | Colorectal | Custom | Data annotations | Weakly supervised ViT-based MIL | 93.50 / 94.37 / 97.69 |
IMGL-VTNet [57] | Gastric | The problem of identifying IM glands | Multi-scale deformable transformer | –/ 94.00 /– | |
tRNAsformer  [58] | Kidney | TCGA | Gather the information needed to learn WSI representations | Transformer-based learning to predict RNA sequence expressions | 96.25 / 96.25 /– |
i-ViT [59] | Kidney | TCGA-KIRP | Capturing cellular and cell-layer level patterns | Instance-based Vision Transformer network | 93.01 / 93.60 /– |
GTP [46] | Lung | CPTAC, TCGA and NLST | Label noise | Graph-transformer with vision transformer | 91.20 /–/ 97.70 |
FDTrans [60] | Lung | TCGA-NSCLC | Large intra-class differences and a lack of annotated datasets | Frequency domain transformer-based architecture | 92.33 / 94.64 / 93.16 |
Yacob et al. [45] | Skin | Custom | Time-consuming and inter-pathologist variability | Weakly supervised approach using graph-transformer | 93.50 /–/– |
KAT [61] | Stomach | Gastric-2K, Endometrial-2K | Over-smoothing and High computational complexity | Kernel attention transformer | 94.9 /–/ 98.30 |
DT-MIL [62] | Lung and breast | CPTAC-LUAD and BREAST-LNM | The problem of learning an effective WSI representation | Deformable transformer model for MIL | –/ 96.92 / 99.06 |
TCNNÂ [1] | Breast, Lung, etc. | MDD and RWD | Artifacts in WSIs | Transformer with CNN | 96.90 / 97.40 / 98.50 |
CWC-transformer  [63] | Breast and Lung | CAMELYON16, TCGA-LUNG and MSK | Loss of spatial information and problems associated with feature extraction in WSI | Combination of transformer and CNN | 92.59 /–/ 94.88 |
TransPath [64] | Breast, Lung, etc. | TCGA, PAIP, PatchCam, etc. | Data annotation | Self-supervised learning transformer-based network | 95.85 / 95.82 / 97.79 |
TransMIL [65] | Breast, Lung and Kidney | CAMELYON16, TCGA (NSCLC and RCC) | Correlation among different instances, Huge size and the lack of pixel-level annotations | Transformer-based multiple-instance learning (MIL) | 94.66 /–/ 98.82 |
DecT [66] | Breast, Endometrium | BreakHis, BACH, and UC | Not taking into account the staining properties of histopathological images | Color deconvolution with transformer architecture | 93.02 / 93.89 /– |
LA-MIL [44] | Colorectal and stomach | TCGA-CRC and TCGA-STAD | Quadratic complexity of transformer architectures with respect to the sequence length | MIL local attention graph-based transformer model | – |
Prompt-MIL [67] | Breast and colorectal | TCGA(BRCA and CRC and BRIGHT | Overfitting problems and a lack of annotated data | Prompt Tuning MIL transformer | 93.47 /–/– |
HAG-MILÂ [68] | Breast, Gastric, Lung, etc. | CAMELYON16, IMGC, TCGA-RCC and NSCLC | The difficulties in locating the most discriminative patches | Hierarchical attention-guided MIL transformer framework | 91.40 / 89.40 / 98.20 |
MI-Zero [69] | Breast, cell, and lung | TCGA (BRCA, NSCLC and RCC), etc. | Computational issues and a scarcity of large-scale publicly available datasets | Transformer-based visual language pre-trained MI zero-shot transfer | 70.20 /–/ – |
HAG-MIL [69] | Breast, cell, and lung | TCGA (BRCA, NSCLC and RCC), etc. | Computational issues and a scarcity of large-scale publicly available datasets | Transformer-based visual language pre-trained MI zero-shot transfer | 70.20 /–/ – |
MEGTÂ [47] | Kidney and breast | TCGA-RCC and CAMELYON16 | The problem of learning multi-scale image representation from large images like gigapixel WSIs | Multi-scale efficient graph transformer-based network | 96.91 / 96.26 / 97.30 |
MSPT [70] | Breast, and lung | TCGA-NSCLC and CAMELYON16 | The problem of uneven representation between the negative and positive instances in bags | Multi-scale prototypical transformer-based network | 95.36 /–/ 98.69 |
GLAMIL [71] | Breast, lung, and kidney | TCGA(RCC and NSCLC) and CAMELYON16 | Overfitting, WSI-level feature aggregation, and imbalanced data challenges | Local-to-global spatial learning | 95.01 /–/ 99.26 |