Method | Tissue | Dataset | Challenge | Highlight | C-index (%) |
---|---|---|---|---|---|
TransSurv [97] | Colorectal | TCGA-CRC and NCT-CRC-HE | Inability of the previous models to extract useful predictive features from the multi-modal data | Transformer-based multi-modal feature fusion network | 82.20 |
PG-TFNet [98] | Colorectal | TCGA-CRC | Inability to make use of the powerful representation learning capabilities of the neural networks | Transformer-based multi-modal feature fusion network | 81.60 |
ESATÂ [99] | Lung | NLST and CHCAMS | Using a pre-selected subset of main patches or patch clusters as input instead of using the entire WSIs | Make use of the ViT backbone with convolution operations. | 73.00 |
MCATÂ [26] | Bladder, Breast, Lung, Uterine | BLCA, UCEC, BRCA, BMLGG, LUAD | Computational complexity and large data heterogeneity gap between genomics and WSIs | Multimodal Co-Attention Transformer for Survival Prediction | 65.30 |
HiMTÂ [100] | Bladder, Breast, Lung, Brain, etc. | BLCA, BRCA, UCEC, LUAD, LGG, etc. | High computational cost of extracting patches from WSIs, which results in a large bag size | Hierarchical-based multi-modal Transformer framework | 67.30 |
MaskHITÂ [82] | Breast, Lung, etc. | TCGA | Huge number of network parameters and insufficient labeled data | Masked pre-training of Transformers | 61.20 |
SURVPATH Â [101] | Breast, Bladder, Stomach, etc. | TCGA | Capturing dense multimodal interactions between different modalities | Memory-efficient multimodal Transformer | 62.90 |
Surformer  [102] | Bladder, Breast, Lung, etc. | TCGA (BLCA, BRCA, LUAD, etc.) | Weak interpretability problems of the previous computational pathology model | Pattern-perceptive survival Transformer-based Network | 68.70 |
HVTSurv [103] | Bladder, Breast, Lung, etc. | TCGA (BLCA, BRCA, LUAD, etc.) | The challenges of exploring contextual, spatial, and hierarchical interaction in the patient-level bag | Hierarchical ViT-based architecture | 63.40 |
HMCATÂ [104] | Low Grade Glioma | TCGA-GBMLGG | The significant disparity between the spatial scales of radiology images and WSIs | Hierarchical multimodal co-attention transformer-based network | 79.60 |
AMIGOÂ [3] | Ovarian and bladder | InUIT and MIBC | ignoring specific details regarding the individual cells in a tile image | Sparse multi-modal graph Transformer-based network | 61.00 |
SeTranSurv  [25] | Breast, Lung, Ovarian | OV, LUSC, and BRCA | Ignoring the important role of spatial information in patches and the correlation between patches and WSIs | Integration of patch features through self-supervised learning and Transformer | 70.50 |