From: Deep learning for gastroscopic images: computer-aided techniques for clinicians
Target disease | Main purpose | Reference | Imaging modality | DL task type | Dataset information | Network architecture | Result |
---|---|---|---|---|---|---|---|
GC | Detection of GC | Wang et al. [17] | WLI | Image classification | A total of 1350 images depicting cancer (highly suspicious) and 103,514 normal images Train:validation:test = 6:2:2 | AlexNet GoogLeNet VGGNet | Sensitivity: 79.622% Specificity: 78.48% Misdiagnoses rate: 20.377% Misdiagnosis rate: 21.51% |
Hirasawa et al. [52] | WLI CE NBI | Object detection | Training dataset: 13,584 endoscopic images of gastric cancer Testing dataset: 2296 stomach images collected from 69 consecutive patients with 77 gastric cancer lesions | SSD | The CNN required 47 s to analyse 2296 test images The CNN correctly diagnosed 71 of 77 gastric cancer lesions with an overall sensitivity of 92.2% 161 noncancerous lesions were detected as gastric cancer, resulting in a positive predictive value of 30.6% 70 of the 71 lesions (98.6%) with a diameter of 6 mm or more as well as all invasive cancers were correctly detected | ||
Ishioka et al. [53] | WLI CE NBI | Object detection | Training dataset: 13,584 endoscopic images of gastric cancer Testing dataset: video images were collected from 68 endoscopic submucosal dissection procedures for early gastric cancer in 62 patients | SSD | The CNN correctly diagnosed 64 of 68 lesions (94.1%) Median time for lesion detection was 1 s (range: 0–44 s) after the lesions first appeared on the screen | ||
Ikenoyama et al. [55] | WLI CE NBI | Object detection | Training dataset: 13,584 endoscopic images from 2639 lesions of gastric cancer Testing dataset: 2940 images from 140 cases | SSD | The average diagnostic time for analysing 2940 test endoscopic images by the CNN and endoscopists were 45.5 ± 1.8 s and 173.0 ± 66.0 min, respectively The sensitivity, specificity, and positive and negative predictive values for the CNN were 58.4%, 87.3%, 26.0%, and 96.5%, respectively. These values for 67 endoscopists were 31.9%, 97.2%, 46.2%, and 94.9%, respectively The CNN had a significantly higher sensitivity than the endoscopists (by 26.5%) | ||
Luo et al. [72] | WLI | Semantic segmentation | A total of 1,036,496 endoscopy images from 84,424 individuals Train:validation:test = 8:1:1 | DeepLabv3 + | The diagnostic accuracy in identifying upper gastrointestinal cancers was 0.955 in the internal validation set, 0.927 in the prospective set, and ranged from 0.915 to 0.977 in the five external validation sets The diagnostic sensitivity was similar to that of the expert endoscopist (0.942 vs. 0.945) and superior sensitivity compared with competent (0.858) and trainee (0.722) endoscopists The positive predictive value was 0.814 for the system, 0.932 for the expert endoscopist, 0.974 for the competent endoscopist, and 0.824 for the trainee endoscopist The negative predictive value was 0.978 for the system, 0.980 for the expert endoscopist, 0.951 for the competent endoscopist, and 0.904 for the trainee endoscopist | ||
Diagnosis of GC | Sakai et al. [16] | WLI | Image classification | Training dataset: 9587 cancer images and 9800 normal images Testing dataset: 4653 cancer images and 4997 normal images | GoogLeNet | Accuracy: 87.6% Sensitivity: 80.0% Specificity: 94.8% | |
Cho et al. [26] | WLI | Image classification | Training dataset: 4205 images from 1057 patients Testing dataset: 812 images from 212 patients. An additional 200 images from 200 patients were collected and used for prospective validation | Inception-Resnet-v2 | The weighted average accuracy of the model reached 84.6% for the five-category classification The mean area under the curve (AUC) of the model for differentiating gastric cancer and neoplasm was 0.877 and 0.927, respectively In prospective validation, the model showed lower performance compared with the endoscopist with the best performance (five-category accuracy 76.4% vs. 87.6%; cancer 76.0% vs. 97.5%; neoplasm 73.5% vs. 96.5%; P < 0.001). However, there was no significant difference between the model and the endoscopist with the worst performance in the differentiation of gastric cancer (accuracy 76.0% vs. 82.0%) and neoplasm (AUC 0.776 vs. 0.865) | ||
Lee et al. [27] | WLI | Image classification | Training dataset: 200 Ulcer images, 337 cancer images, 180 normal images Testing dataset: 20 ulcer images, 30 cancer images, 20 normal images | ResNet-50 VGGNet-16 Inception v4 | The AUCs were 0.95, 0.97, and 0.85 for the three classifiers. ResNet-50 showed the highest level of performance The cases involving normal, i.e., normal vs. ulcer and normal vs. cancer resulted in accuracies above 90%. The case of ulcer vs. cancer classification resulted in a lower accuracy of 77.1%, | ||
Li et al. [34] | ME-NBI | Image classification | Training dataset: A total of 386 images of noncancerous lesions and 1702 images of early gastric cancer Testing dataset: A total of 341 endoscopic images (171 noncancerous lesions and 170 early gastric cancer) | Inception-v3 | The sensitivity, specificity, and accuracy of the CNN system in the diagnosis of early gastric cancer were 91.18%, 90.64%, and 90.91%, respectively No significant difference in the specificity and accuracy of diagnosis between the CNN and experts. However, the diagnostic sensitivity of the CNN was significantly higher than that of the experts The diagnostic sensitivity, specificity and accuracy of the CNN were significantly higher than those of the nonexperts | ||
Horiuchi et al. [31] | ME-NBI | Image classification | Training dataset: 1492 EGC and 1078 gastritis images Testing dataset: 151 EGC and 107 gastritis images | GoogLeNet | Accuracy: 85.3% Sensitivity: 95.4% Specificity: 71.0% PPV: 82.3% NPV: 91.7% The overall test speed was 51.83 images/s (0.02 s/image) | ||
Horiuchi et al. [32] | ME-NBI | Image classification | Training dataset: 1492 cancerous and 1078 noncancerous images obtained using ME-NBI Testing dataset: 174 videos (87 cancerous and 87 noncancerous videos) Comparisons were made between the system and 11 experts who were skilled in diagnosing EGC using ME-NBI with clinical experience of more than 1 year | GoogLeNet | AUC: 0.8684 Accuracy: 85.1% Sensitivity: 87.4% Specificity: 82.8% PPV: 83.5% NPV: 86.7% The CAD system was significantly more accurate than two experts, significantly less accurate than one expert, and not significantly different from the remaining eight experts | ||
Hu et al. [33] | ME-NBI | Image classification | A total of 1777 ME-NBI images from 295 cases were collected from 3 centres Training cohort (TC, n = 170) Internal test cohort (ITC, n = 73) External test cohort (ETC, n = 52) compared the model with eight endoscopists with varying experience | VGG-19 | AUC: 0.808 in the ITC and 0.813 in the ETC Similar predictive performance to the senior endoscopists (accuracy: 0.770 vs. 0.755; sensitivity: 0.792 vs. 0.767; specificity: 0.745 vs. 0.742) better than the junior endoscopists (accuracy: 0.770 vs. 0.728) After referring to the results of the system, the average diagnostic ability of the endoscopists was significantly improved in terms of accuracy, sensitivity, PPV, and NPV | ||
Liu et al. [36] | ME-NBI | Image classification | A total of 3871 ME-NBI images including 1130 CGT, 1114 LGN and 1627 EGC tenfold cross-validation | ResNet-50 VGG-16 Inception-v3 InceptionResNetv2 | ResNet-50 is the best among the four networks Accuracy: 0.96 f1-scores: 0.92, 0.91 and 0.99 for classifying ME-NBI images into CGT, LGN and EGC, respectively | ||
Ueyama et al. [38] | ME-NBI | Image classification | Training dataset: 5574 ME‐NBI images (3797 EGCs, 1777 non‐cancerous mucosa and lesions) Testing dataset: 2300 ME‐NBI images (1430 EGCs, 870 non‐cancerous mucosa and lesions) | ResNet-50 | The AI‐assisted CNN‐CAD system required 60 s to analyse 2300 test images Accuracy: 98.7% Sensitivity: 98% Specificity: 100% Positive predictive value: 100% Negative predictive value: 96.8% All misdiagnosed images of EGCs were of low‐quality or of superficially depressed and intestinal‐type intramucosal cancers that were difficult to distinguish from gastritis, even by experienced endoscopists | ||
Zhang et al. [39] | WLI | Image classification Semantic segmentation | Training dataset: 21,217 gastroscopic images of peptic ulcer (PU), early gastric cancer (EGC), high‐grade intraepithelial neoplasia (HGIN), advanced gastric cancer (AGC), gastric submucosal tumours (SMTs), and normal gastric mucosa without lesions Testing dataset: 1091 images The CNN diagnosis was compared with those of 10 endoscopists with over 8 years of experience in endoscopic diagnosis | ResNet34 DeepLabv3 | The diagnostic specificity and PPV of the CNN were higher than that of the endoscopists for the EGC and HGIN images (specificity: 91.2% vs. 86.7%; PPV: 55.4% vs. 41.7%) The diagnostic accuracy of the CNN was close to those of the endoscopists for the lesion‐free, EGC and HGIN, PU, AGC, and SMTs images The CNN had an image recognition time of 42 s for all the test set images | ||
Determining the invasion depth of GC | Zhu et al. [29] | WLI | Image classification | Training dataset: 790 images Testing dataset: 203 images | ResNet-50 | At a threshold value of 0.5, the sensitivity was 76.47%, the specificity was 95.56%, the AUC was 0.94, the overall accuracy was 89.16%, the positive predictive value was 89.66%, and the negative predictive value was 88.97% The CNN–CAD system achieved significantly higher accuracy (by 17.25%) and specificity (by 32.21%) than human endoscopists | |
Cho et al. [30] | WLI | Image classification | Internal dataset: a total of 2899 images Train:validation:test = 8:1:1 External dataset: 206 images for testing | DenseNet-161 | In the internal test, the mean area under the curve discriminating submucosal invasion was 0.887 In the external test, the mean area under the curve reached 0.887 Clinical simulation showed that 6.7% of patients who underwent gastrectomy in the external test were accurately qualified by the established algorithm for potential endoscopic resection, avoiding unnecessary operation | ||
Delineating the margin of GC | An et al. [73] | WLI CE ME-NBI | Semantic segmentation | Training dataset: WLI: 343 images from 260 patients CE: 546 images from 67 patients Testing dataset: WLI: 321 images from 218 patients CE: 34 images from 14 patients | UNet + + | The system had an accuracy of 85.7% on the CE images and 88.9% on the WLE images under an overlap ratio threshold of 0.60 in comparison with the manual markers labelled by the experts On the ESD videos, the resection margins predicted by the system covered all areas of high-grade intraepithelial neoplasia and cancers The minimum distance between the margins predicted by the system and the histological cancer boundary was 3.44 ± 1.45 mm which outperformed the resection margin based on ME-NBI | |
Detection of GC Anatomical classification | Wu et al. [18] | NBI BLI WLI | Image classification | Training dataset: 3170 gastric cancer and 5981 benign images for detect GC; 24,549 images from different parts of stomach for monitor blind spots Testing dataset: 100 gastric cancer and 100 benign images for detect GC; 170 images for monitor blind spots | VGG-16 ResNet-50 | The DCNN identified EGC from nonmalignancy with an accuracy of 92.5%, a sensitivity of 94.0%, a specificity of 91.0%, a positive predictive value of 91.3%, and a negative predictive value of 93.8% The DCNN classified gastric locations into 10 or 26 parts with an accuracy of 90% or 65.9% | |
Detection of GC Determining the invasion depth of GC | Yoon et al. [19] | WLI | Image classification | A total of 11,539 images (896 T1a-EGC, 809 T1b-EGC, and 9834 non-EGC) Train:validation:test = 6:2:2 | VGG-16 | AUC for EGC detection: 0.981 AUC for depth prediction: 0.851 | |
Detection of GC Delineating the margin of GC | Shibata et al. [82] | WLI | Image classification Semantic segmentation | A total of 1208 healthy and 533 cancer images fivefold cross-validation | Mask R-CNN | For the detection task: the sensitivity and false positives (FPs) per image were 96.0% and 0.10 FP/image, respectively For segmentation task: the average Dice index was 71% | |
Classifying the type of GC Delineating the margin of GC | Ling et al. [35] | ME-NBI | Image classification | For CNN1 to identify EGC differentiation status Training dataset: 2217 images from 145 EGC patient Testing dataset: 1870 images from 139 EGC patients The performance of CNN1 was then compared with that of experts using 882 images from 58 EGC patients For CNN2 to delineate the EGC margins Training dataset: 928 images from 132 EGC patients Testing dataset: 742 images from 87 EGC patients | VGG-16 and ResNet-50 UNet + + | The system predicted the differentiation status of EGCs with an accuracy of 83.3% in the testing dataset In the man – machine contest, CNN1 performed significantly better than the five experts (86.2% vs. 69.7%) The system delineated EGC margins with an accuracy of 82.7% in differentiated EGC and 88.1% in undifferentiated EGC under an overlap ratio of 0.80 In unprocessed EGC videos, the system achieved real-time diagnosis of EGC differentiation status and EGC margin delineation in ME-NBI endoscopy | |
HP | Detection of HP | Itoh et al. [14] | WLI | Image classification | 179 upper gastrointestinal endoscopy images obtained from 139 patients (65 were HP-positive and 74 were HP-negative) Training dataset: 149 were used as training images, The 149 training images were subjected to data augmentation, which yielded 596 images Testing dataset: the remaining 30 (15 from HP-negative patients and 15 from HP-positive patients) were set aside to be used as test images | GoogLeNet | Sensitivity: 86.7% Specificity: 86.7% AUC: 0.956 |
Nakashima et al. [15] | WLI BLI LCI | Image classification | Training dataset: For per group (WLI BLI LCI): 486 images (rotated 90, 180, and 270 degrees) in addition to the original 162, for a total of 648 Testing dataset: For per group (WLI BLI LCI): 60 images | GoogLeNet | AUC for WLI: 0.66 AUC for BLI: 0.96 AUC for LCI: 0.95 | ||
Zheng et al [20] | WLI | Image classification | Training dataset: 11,729 gastric images Testing dataset: 3755 gastric images | ResNet-50 | The AUC for a single gastric image was 0.93 with sensitivity, specificity, and accuracy of 81.4%, 90.1%, and 84.5%, respectively, using an optimal cut-off value of 0.3 The AUC for multiple gastric images per patient was 0.97 with sensitivity, specificity, and accuracy of 91.6%, 98.6%, and 93.8%, respectively, using an optimal cut-off value of 0.4 | ||
Diagnosis of HP | Nakashima et al. [37] | LCI WLI | Image classification | Training dataset: 6639 WLI images and 6248 LCI images from 395 subjects Testing dataset: Videos of 120 subjects | A 22-layer skip-connection architecture | For the LCI-CAD system: Accuracy: 84.2% for uninfected, 82.5% for currently infected, and 79.2% for post-eradication status For the WLI-CAD system: Accuracy: 75.0% for uninfected, 77.5% for currently infected, and 74.2% for post-eradication status The LCI-CAD system demonstrated significantly superior diagnostic accuracy to that of the WLI-CAD system and comparable diagnostic accuracy to that of experienced endoscopists | |
Shichijo et al. [28] | WLI | Object detection | Training dataset: 98,564 endoscopic images from 5236 patients (742 H. pylori-positive, 3649 -negative, and 845 -eradicated) Testing dataset: 23,699 images from 847 patients; 70 positive, 493 negative, and 284 eradicated | GoogLeNet | 80% (465/582) of negative diagnoses were accurate, 84% (147/174) eradicated, and 48% (44/91) positive The time needed to diagnose 23,699 images was 261 s | ||
GP | Detection of GP | Zhang et al. [54] | WLI | Image classification | Training dataset: 708 images Testing dataset: 50 images | SSD | The model can realize real-time polyp detection with 50 frames per second (FPS) The model can achieve the mean average precision (mAP) of 90.4% The model has excellent performance in improving polyp detection recalls over 10%, especially in small polyp detection |
GIM | Diagnosis of GIM | Yan et al. [22] | NBI ME-NBI | Image classification | Training dataset: 1880 endoscopic images (1048 GIM and 832 non-GIM) from 336 patients Testing dataset: 477 pathologically confirmed images (242 GIM and 235 non-GIM) from 80 patients | EfficientNetB4 | AUC: 0.928 Sensitivity: 91.9% Specificity: 86.0% Accuracy: 88.8% |
CAG | Diagnosis of CAG | Zhang et al. [21] | White-light i-Scan | Image classification | A total of 5470 images of the gastric antrum of 1699 patients (3042 images depicted atrophic gastritis and 2428 did not) fivefold cross-validation The diagnoses of the deep learning model were compared with those of three experts | DenseNet121 | Accuracy: 0.942 Sensitivity: 0.945 Specificity: 0.940 The detection rates of mild, moderate, and severe atrophic gastritis were 93%, 95%, and 99%, respectively The diagnostic performance of the CNN model was higher than that of the experts |