Skip to main content

Table 1 Disease-related application of deep learning to gastroscopic image processing

From: Deep learning for gastroscopic images: computer-aided techniques for clinicians

Target disease

Main purpose

Reference

Imaging modality

DL task type

Dataset information

Network architecture

Result

GC

Detection of GC

Wang et al. [17]

WLI

Image classification

A total of 1350 images depicting cancer (highly suspicious) and 103,514 normal images

Train:validation:test = 6:2:2

AlexNet

GoogLeNet

VGGNet

Sensitivity: 79.622%

Specificity: 78.48%

Misdiagnoses rate: 20.377%

Misdiagnosis rate: 21.51%

Hirasawa et al. [52]

WLI CE NBI

Object detection

Training dataset: 13,584 endoscopic images of gastric cancer

Testing dataset: 2296 stomach images collected from 69 consecutive patients with 77 gastric cancer lesions

SSD

The CNN required 47 s to analyse 2296 test images

The CNN correctly diagnosed 71 of 77 gastric cancer lesions with an overall sensitivity of 92.2%

161 noncancerous lesions were detected as gastric cancer, resulting in a positive predictive value of 30.6%

70 of the 71 lesions (98.6%) with a diameter of 6 mm or more as well as all invasive cancers were correctly detected

Ishioka et al. [53]

WLI CE NBI

Object detection

Training dataset: 13,584 endoscopic images of gastric cancer

Testing dataset: video images were collected from 68 endoscopic submucosal dissection procedures for early gastric cancer in 62 patients

SSD

The CNN correctly diagnosed 64 of 68 lesions (94.1%)

Median time for lesion detection was 1 s (range: 0–44 s) after the lesions first appeared on the screen

Ikenoyama et al. [55]

WLI CE NBI

Object detection

Training dataset: 13,584 endoscopic images from 2639 lesions of gastric cancer

Testing dataset: 2940 images from 140 cases

SSD

The average diagnostic time for analysing 2940 test endoscopic images by the CNN and endoscopists were 45.5 ± 1.8 s and 173.0 ± 66.0 min, respectively

The sensitivity, specificity, and positive and negative predictive values for the CNN were 58.4%, 87.3%, 26.0%, and 96.5%, respectively. These values for 67 endoscopists were 31.9%, 97.2%, 46.2%, and 94.9%, respectively

The CNN had a significantly higher sensitivity than the endoscopists (by 26.5%)

Luo et al. [72]

WLI

Semantic segmentation

A total of 1,036,496 endoscopy images from 84,424 individuals

Train:validation:test = 8:1:1

DeepLabv3 + 

The diagnostic accuracy in identifying upper gastrointestinal cancers was 0.955 in the internal validation set, 0.927 in the prospective set, and ranged from 0.915 to 0.977 in the five external validation sets

The diagnostic sensitivity was similar to that of the expert endoscopist (0.942 vs. 0.945) and superior sensitivity compared with competent (0.858) and trainee (0.722) endoscopists

The positive predictive value was 0.814 for the system, 0.932 for the expert endoscopist, 0.974 for the competent endoscopist, and 0.824 for the trainee endoscopist

The negative predictive value was 0.978 for the system, 0.980 for the expert endoscopist, 0.951 for the competent endoscopist, and 0.904 for the trainee endoscopist

Diagnosis of GC

Sakai et al. [16]

WLI

Image classification

Training dataset: 9587 cancer images and 9800 normal images

Testing dataset: 4653 cancer images and 4997 normal images

GoogLeNet

Accuracy: 87.6%

Sensitivity: 80.0%

Specificity: 94.8%

Cho et al. [26]

WLI

Image classification

Training dataset: 4205 images from 1057 patients

Testing dataset: 812 images from 212 patients. An additional 200 images from 200 patients were collected and used for prospective validation

Inception-Resnet-v2

The weighted average accuracy of the model reached 84.6% for the five-category classification

The mean area under the curve (AUC) of the model for differentiating gastric cancer and neoplasm was 0.877 and 0.927, respectively

In prospective validation, the model showed lower performance compared with the endoscopist with the best performance (five-category accuracy 76.4% vs. 87.6%; cancer 76.0% vs. 97.5%; neoplasm 73.5% vs. 96.5%; P < 0.001). However, there was no significant difference between the model and the endoscopist with the worst performance in the differentiation of gastric cancer (accuracy 76.0% vs. 82.0%) and neoplasm (AUC 0.776 vs. 0.865)

Lee et al. [27]

WLI

Image classification

Training dataset: 200 Ulcer images, 337 cancer images, 180 normal images

Testing dataset: 20 ulcer images, 30 cancer images, 20 normal images

ResNet-50

VGGNet-16

Inception v4

The AUCs were 0.95, 0.97, and 0.85 for the three classifiers. ResNet-50 showed the highest level of performance

The cases involving normal, i.e., normal vs. ulcer and normal vs. cancer resulted in accuracies above 90%. The case of ulcer vs. cancer classification resulted in a lower accuracy of 77.1%,

Li et al. [34]

ME-NBI

Image classification

Training dataset:

A total of 386 images of noncancerous lesions and 1702 images of early gastric cancer

Testing dataset:

A total of 341 endoscopic images (171 noncancerous lesions and 170 early gastric cancer)

Inception-v3

The sensitivity, specificity, and accuracy of the CNN system in the diagnosis of early gastric cancer were 91.18%, 90.64%, and 90.91%, respectively

No significant difference in the specificity and accuracy of diagnosis between the CNN and experts. However, the diagnostic sensitivity of the CNN was significantly higher than that of the experts

The diagnostic sensitivity, specificity and accuracy of the CNN were significantly higher than those of the nonexperts

Horiuchi et al. [31]

ME-NBI

Image classification

Training dataset:

1492 EGC and 1078 gastritis images

Testing dataset:

151 EGC and 107 gastritis images

GoogLeNet

Accuracy: 85.3%

Sensitivity: 95.4%

Specificity: 71.0%

PPV: 82.3%

NPV: 91.7%

The overall test speed was 51.83 images/s (0.02 s/image)

Horiuchi et al. [32]

ME-NBI

Image classification

Training dataset:

1492 cancerous and 1078 noncancerous images obtained using ME-NBI

Testing dataset:

174 videos (87 cancerous and 87 noncancerous videos)

Comparisons were made between the system and 11 experts who were skilled in diagnosing EGC using ME-NBI with clinical experience of more than 1 year

GoogLeNet

AUC: 0.8684

Accuracy: 85.1%

Sensitivity: 87.4%

Specificity: 82.8%

PPV: 83.5%

NPV: 86.7%

The CAD system was significantly more accurate than two experts, significantly less accurate than one expert, and not significantly different from the remaining eight experts

Hu et al. [33]

ME-NBI

Image classification

A total of 1777 ME-NBI images from 295 cases were collected from 3 centres

Training cohort (TC, n = 170)

Internal test cohort (ITC, n = 73)

External test cohort (ETC, n = 52)

compared the model with eight endoscopists with varying experience

VGG-19

AUC: 0.808 in the ITC and 0.813 in the ETC

Similar predictive performance to the senior endoscopists (accuracy: 0.770 vs. 0.755; sensitivity: 0.792 vs. 0.767; specificity: 0.745 vs. 0.742)

better than the junior endoscopists (accuracy: 0.770 vs. 0.728)

After referring to the results of the system, the average diagnostic ability of the endoscopists was significantly improved in terms of accuracy, sensitivity, PPV, and NPV

Liu et al. [36]

ME-NBI

Image classification

A total of 3871 ME-NBI images including 1130 CGT, 1114 LGN and 1627 EGC

tenfold cross-validation

ResNet-50

VGG-16

Inception-v3

InceptionResNetv2

ResNet-50 is the best among the four networks

Accuracy: 0.96

f1-scores: 0.92, 0.91 and 0.99 for classifying ME-NBI images into CGT, LGN and EGC, respectively

Ueyama et al. [38]

ME-NBI

Image classification

Training dataset: 5574 ME‐NBI images (3797 EGCs, 1777 non‐cancerous mucosa and lesions)

Testing dataset: 2300 ME‐NBI images (1430 EGCs, 870 non‐cancerous mucosa and lesions)

ResNet-50

The AI‐assisted CNN‐CAD system required 60 s to analyse 2300 test images

Accuracy: 98.7%

Sensitivity: 98%

Specificity: 100%

Positive predictive value: 100%

Negative predictive value: 96.8%

All misdiagnosed images of EGCs were of low‐quality or of superficially depressed and intestinal‐type intramucosal cancers that were difficult to distinguish from gastritis, even by experienced endoscopists

Zhang et al. [39]

WLI

Image classification

Semantic segmentation

Training dataset:

21,217 gastroscopic images of peptic ulcer (PU), early gastric cancer (EGC), high‐grade intraepithelial neoplasia (HGIN), advanced gastric cancer (AGC), gastric submucosal tumours (SMTs), and normal gastric mucosa without lesions

Testing dataset: 1091 images

The CNN diagnosis was compared with those of 10 endoscopists with over 8 years of experience in endoscopic diagnosis

ResNet34

DeepLabv3

The diagnostic specificity and PPV of the CNN were higher than that of the endoscopists for the EGC and HGIN images (specificity: 91.2% vs. 86.7%; PPV: 55.4% vs. 41.7%)

The diagnostic accuracy of the CNN was close to those of the endoscopists for the lesion‐free, EGC and HGIN, PU, AGC, and SMTs images

The CNN had an image recognition time of 42 s for all the test set images

Determining the invasion depth of GC

Zhu et al. [29]

WLI

Image classification

Training dataset: 790 images

Testing dataset: 203 images

ResNet-50

At a threshold value of 0.5, the sensitivity was 76.47%, the specificity was 95.56%, the AUC was 0.94, the overall accuracy was 89.16%, the positive predictive value was 89.66%, and the negative predictive value was 88.97%

The CNN–CAD system achieved significantly higher accuracy (by 17.25%) and specificity (by 32.21%) than human endoscopists

Cho et al. [30]

WLI

Image classification

Internal dataset: a total of 2899 images

Train:validation:test = 8:1:1

External dataset: 206 images for testing

DenseNet-161

In the internal test, the mean area under the curve discriminating submucosal invasion was 0.887

In the external test, the mean area under the curve reached 0.887

Clinical simulation showed that 6.7% of patients who underwent gastrectomy in the external test were accurately qualified by the established algorithm for potential endoscopic resection, avoiding unnecessary operation

Delineating the margin of GC

An et al. [73]

WLI CE ME-NBI

Semantic segmentation

Training dataset:

WLI: 343 images from 260 patients

CE: 546 images from 67 patients

Testing dataset:

WLI: 321 images from 218 patients

CE: 34 images from 14 patients

UNet +  + 

The system had an accuracy of 85.7% on the CE images and 88.9% on the WLE images under an overlap ratio threshold of 0.60 in comparison with the manual markers labelled by the experts

On the ESD videos, the resection margins predicted by the system covered all areas of high-grade intraepithelial neoplasia and cancers

The minimum distance between the margins predicted by the system and the histological cancer boundary was 3.44 ± 1.45 mm which outperformed the resection margin based on ME-NBI

Detection of GC

Anatomical classification

Wu et al. [18]

NBI BLI WLI

Image classification

Training dataset: 3170 gastric cancer and 5981 benign images for detect GC; 24,549 images from different parts of stomach for monitor blind spots

Testing dataset: 100 gastric cancer and 100 benign images for detect GC; 170 images for monitor blind spots

VGG-16

ResNet-50

The DCNN identified EGC from nonmalignancy with an accuracy of 92.5%, a sensitivity of 94.0%, a specificity of 91.0%, a positive predictive value of 91.3%, and a negative predictive value of 93.8%

The DCNN classified gastric locations into 10 or 26 parts with an accuracy of 90% or 65.9%

Detection of GC

Determining the invasion depth of GC

Yoon et al. [19]

WLI

Image classification

A total of 11,539 images (896 T1a-EGC, 809 T1b-EGC, and 9834 non-EGC)

Train:validation:test = 6:2:2

VGG-16

AUC for EGC detection: 0.981

AUC for depth prediction: 0.851

 

Detection of GC

Delineating the margin of GC

Shibata et al. [82]

WLI

Image classification

Semantic segmentation

A total of 1208 healthy and 533 cancer images

fivefold cross-validation

Mask R-CNN

For the detection task:

the sensitivity and false positives (FPs) per image were 96.0% and 0.10 FP/image, respectively

For segmentation task: the average Dice index was 71%

Classifying the type of GC

Delineating the margin of GC

Ling et al. [35]

ME-NBI

Image classification

For CNN1 to identify EGC differentiation status

Training dataset: 2217 images from 145 EGC patient

Testing dataset: 1870 images from 139 EGC patients

The performance of CNN1 was then compared with that of experts using 882 images from 58 EGC patients

For CNN2 to delineate the EGC margins

Training dataset: 928 images from 132 EGC patients

Testing dataset: 742 images from 87 EGC patients

VGG-16 and ResNet-50

UNet +  + 

The system predicted the differentiation status of EGCs with an accuracy of 83.3% in the testing dataset

In the man – machine contest, CNN1 performed significantly better than the five experts (86.2% vs. 69.7%)

The system delineated EGC margins with an accuracy of 82.7% in differentiated EGC and 88.1% in undifferentiated EGC under an overlap ratio of 0.80

In unprocessed EGC videos, the system achieved real-time diagnosis of EGC differentiation status and EGC margin delineation in ME-NBI endoscopy

HP

Detection of HP

Itoh et al. [14]

WLI

Image classification

179 upper gastrointestinal endoscopy images obtained from 139 patients (65 were HP-positive and 74 were HP-negative)

Training dataset: 149 were used as training images, The 149 training images were subjected to data augmentation, which yielded 596 images

Testing dataset: the remaining 30 (15 from HP-negative patients and 15 from HP-positive patients) were set aside to be used as test images

GoogLeNet

Sensitivity: 86.7%

Specificity: 86.7%

AUC: 0.956

Nakashima et al. [15]

WLI BLI LCI

Image classification

Training dataset: For per group (WLI BLI LCI): 486 images (rotated 90, 180, and 270 degrees) in addition to the original 162, for a total of 648

Testing dataset: For per group (WLI BLI LCI): 60 images

GoogLeNet

AUC for WLI: 0.66

AUC for BLI: 0.96

AUC for LCI: 0.95

Zheng et al [20]

WLI

Image classification

Training dataset: 11,729 gastric images

Testing dataset: 3755 gastric images

ResNet-50

The AUC for a single gastric image was 0.93 with sensitivity, specificity, and accuracy of 81.4%, 90.1%, and 84.5%, respectively, using an optimal cut-off value of 0.3

The AUC for multiple gastric images per patient was 0.97 with sensitivity, specificity, and accuracy of 91.6%, 98.6%, and 93.8%, respectively, using an optimal cut-off value of 0.4

Diagnosis of HP

Nakashima et al. [37]

LCI WLI

Image classification

Training dataset:

6639 WLI images and 6248 LCI images from 395 subjects

Testing dataset:

Videos of 120 subjects

A 22-layer skip-connection architecture

For the LCI-CAD system:

Accuracy: 84.2% for uninfected, 82.5% for currently infected, and 79.2% for post-eradication status

For the WLI-CAD system:

Accuracy: 75.0% for uninfected, 77.5% for currently infected, and 74.2% for post-eradication status

The LCI-CAD system demonstrated significantly superior diagnostic accuracy to that of the WLI-CAD system and comparable diagnostic accuracy to that of experienced endoscopists

Shichijo et al. [28]

WLI

Object detection

Training dataset: 98,564 endoscopic images from 5236 patients (742 H. pylori-positive, 3649 -negative, and 845 -eradicated)

Testing dataset: 23,699 images from 847 patients; 70 positive, 493 negative, and 284 eradicated

GoogLeNet

80% (465/582) of negative diagnoses were accurate, 84% (147/174) eradicated, and 48% (44/91) positive

The time needed to diagnose 23,699 images was 261 s

GP

Detection of GP

Zhang et al. [54]

WLI

Image classification

Training dataset: 708 images

Testing dataset: 50 images

SSD

The model can realize real-time polyp detection with 50 frames per second (FPS)

The model can achieve the mean average precision (mAP) of 90.4%

The model has excellent performance in improving polyp detection recalls over 10%, especially in small polyp detection

GIM

Diagnosis of GIM

Yan et al. [22]

NBI ME-NBI

Image classification

Training dataset:

1880 endoscopic images (1048 GIM and 832 non-GIM) from 336 patients

Testing dataset:

477 pathologically confirmed images (242 GIM and 235 non-GIM) from 80 patients

EfficientNetB4

AUC: 0.928

Sensitivity: 91.9%

Specificity: 86.0%

Accuracy: 88.8%

CAG

Diagnosis of CAG

Zhang et al. [21]

White-light i-Scan

Image classification

A total of 5470 images of the gastric antrum of 1699 patients (3042 images depicted atrophic gastritis and 2428 did not)

fivefold cross-validation

The diagnoses of the deep learning model were compared with those of three experts

DenseNet121

Accuracy: 0.942

Sensitivity: 0.945

Specificity: 0.940

The detection rates of mild, moderate, and severe atrophic gastritis were 93%, 95%, and 99%, respectively

The diagnostic performance of the CNN model was higher than that of the experts

  1. WLI, white-light imaging; CE, chromoendoscopy; NBI, narrow-band imaging; GC, gastric cancer; SSD, single-shot multibox detection; CNN, convolutional neural network; HP, Helicobacter pylori; AUC, area under curve; BLI, blue-light imaging; LCI, linked colour imaging; DCNN, deep convolutional neural network; EGC, early gastric cancer; FPS, frame per second; mAP, mean average precision; GP, gastric polyp; CAD, computer-aided diagnosis; WLE, white-light endoscopy; ESD, endoscopic submucosal dissection; ME, magnifying endoscope; PPV, positive predictive value; NPV, negative predictive value; CGT, chronic gastritis; LGN, low-grade neoplasia; AI, artificial intelligence; GIM, gastric intestinal metaplasia; PU, peptic ulcer; HGIN, high-grade intraepithelial neoplasia; AGC, advanced gastric cancer; SMTs, submucosal tumours; CAG, chronic atrophic gastritis