Deep learning for gastroscopic images: computer-aided techniques for clinicians

Jin, Ziyi; Gan, Tianyuan; Wang, Peng; Fu, Zuoming; Zhang, Chongan; Yan, Qinglai; Zheng, Xueyong; Liang, Xiao; Ye, Xuesong

doi:10.1186/s12938-022-00979-8

BioMedical Engineering OnLine

Table 1 Disease-related application of deep learning to gastroscopic image processing

From: Deep learning for gastroscopic images: computer-aided techniques for clinicians

Target disease	Main purpose	Reference	Imaging modality	DL task type	Dataset information	Network architecture	Result
GC	Detection of GC	Wang et al. [17]	WLI	Image classification	A total of 1350 images depicting cancer (highly suspicious) and 103,514 normal images Train:validation:test = 6:2:2	AlexNet GoogLeNet VGGNet	Sensitivity: 79.622% Specificity: 78.48% Misdiagnoses rate: 20.377% Misdiagnosis rate: 21.51%
		Hirasawa et al. [52]	WLI CE NBI	Object detection	Training dataset: 13,584 endoscopic images of gastric cancer Testing dataset: 2296 stomach images collected from 69 consecutive patients with 77 gastric cancer lesions	SSD	The CNN required 47 s to analyse 2296 test images The CNN correctly diagnosed 71 of 77 gastric cancer lesions with an overall sensitivity of 92.2% 161 noncancerous lesions were detected as gastric cancer, resulting in a positive predictive value of 30.6% 70 of the 71 lesions (98.6%) with a diameter of 6 mm or more as well as all invasive cancers were correctly detected
		Ishioka et al. [53]	WLI CE NBI	Object detection	Training dataset: 13,584 endoscopic images of gastric cancer Testing dataset: video images were collected from 68 endoscopic submucosal dissection procedures for early gastric cancer in 62 patients	SSD	The CNN correctly diagnosed 64 of 68 lesions (94.1%) Median time for lesion detection was 1 s (range: 0–44 s) after the lesions first appeared on the screen
		Ikenoyama et al. [55]	WLI CE NBI	Object detection	Training dataset: 13,584 endoscopic images from 2639 lesions of gastric cancer Testing dataset: 2940 images from 140 cases	SSD	The average diagnostic time for analysing 2940 test endoscopic images by the CNN and endoscopists were 45.5 ± 1.8 s and 173.0 ± 66.0 min, respectively The sensitivity, specificity, and positive and negative predictive values for the CNN were 58.4%, 87.3%, 26.0%, and 96.5%, respectively. These values for 67 endoscopists were 31.9%, 97.2%, 46.2%, and 94.9%, respectively The CNN had a significantly higher sensitivity than the endoscopists (by 26.5%)
		Luo et al. [72]	WLI	Semantic segmentation	A total of 1,036,496 endoscopy images from 84,424 individuals Train:validation:test = 8:1:1	DeepLabv3 +	The diagnostic accuracy in identifying upper gastrointestinal cancers was 0.955 in the internal validation set, 0.927 in the prospective set, and ranged from 0.915 to 0.977 in the five external validation sets The diagnostic sensitivity was similar to that of the expert endoscopist (0.942 vs. 0.945) and superior sensitivity compared with competent (0.858) and trainee (0.722) endoscopists The positive predictive value was 0.814 for the system, 0.932 for the expert endoscopist, 0.974 for the competent endoscopist, and 0.824 for the trainee endoscopist The negative predictive value was 0.978 for the system, 0.980 for the expert endoscopist, 0.951 for the competent endoscopist, and 0.904 for the trainee endoscopist
	Diagnosis of GC	Sakai et al. [16]	WLI	Image classification	Training dataset: 9587 cancer images and 9800 normal images Testing dataset: 4653 cancer images and 4997 normal images	GoogLeNet	Accuracy: 87.6% Sensitivity: 80.0% Specificity: 94.8%
		Cho et al. [26]	WLI	Image classification	Training dataset: 4205 images from 1057 patients Testing dataset: 812 images from 212 patients. An additional 200 images from 200 patients were collected and used for prospective validation	Inception-Resnet-v2	The weighted average accuracy of the model reached 84.6% for the five-category classification The mean area under the curve (AUC) of the model for differentiating gastric cancer and neoplasm was 0.877 and 0.927, respectively In prospective validation, the model showed lower performance compared with the endoscopist with the best performance (five-category accuracy 76.4% vs. 87.6%; cancer 76.0% vs. 97.5%; neoplasm 73.5% vs. 96.5%; P < 0.001). However, there was no significant difference between the model and the endoscopist with the worst performance in the differentiation of gastric cancer (accuracy 76.0% vs. 82.0%) and neoplasm (AUC 0.776 vs. 0.865)
		Lee et al. [27]	WLI	Image classification	Training dataset: 200 Ulcer images, 337 cancer images, 180 normal images Testing dataset: 20 ulcer images, 30 cancer images, 20 normal images	ResNet-50 VGGNet-16 Inception v4	The AUCs were 0.95, 0.97, and 0.85 for the three classifiers. ResNet-50 showed the highest level of performance The cases involving normal, i.e., normal vs. ulcer and normal vs. cancer resulted in accuracies above 90%. The case of ulcer vs. cancer classification resulted in a lower accuracy of 77.1%,
		Li et al. [34]	ME-NBI	Image classification	Training dataset: A total of 386 images of noncancerous lesions and 1702 images of early gastric cancer Testing dataset: A total of 341 endoscopic images (171 noncancerous lesions and 170 early gastric cancer)	Inception-v3	The sensitivity, specificity, and accuracy of the CNN system in the diagnosis of early gastric cancer were 91.18%, 90.64%, and 90.91%, respectively No significant difference in the specificity and accuracy of diagnosis between the CNN and experts. However, the diagnostic sensitivity of the CNN was significantly higher than that of the experts The diagnostic sensitivity, specificity and accuracy of the CNN were significantly higher than those of the nonexperts
		Horiuchi et al. [31]	ME-NBI	Image classification	Training dataset: 1492 EGC and 1078 gastritis images Testing dataset: 151 EGC and 107 gastritis images	GoogLeNet	Accuracy: 85.3% Sensitivity: 95.4% Specificity: 71.0% PPV: 82.3% NPV: 91.7% The overall test speed was 51.83 images/s (0.02 s/image)
		Horiuchi et al. [32]	ME-NBI	Image classification	Training dataset: 1492 cancerous and 1078 noncancerous images obtained using ME-NBI Testing dataset: 174 videos (87 cancerous and 87 noncancerous videos) Comparisons were made between the system and 11 experts who were skilled in diagnosing EGC using ME-NBI with clinical experience of more than 1 year	GoogLeNet	AUC: 0.8684 Accuracy: 85.1% Sensitivity: 87.4% Specificity: 82.8% PPV: 83.5% NPV: 86.7% The CAD system was significantly more accurate than two experts, significantly less accurate than one expert, and not significantly different from the remaining eight experts
		Hu et al. [33]	ME-NBI	Image classification	A total of 1777 ME-NBI images from 295 cases were collected from 3 centres Training cohort (TC, n = 170) Internal test cohort (ITC, n = 73) External test cohort (ETC, n = 52) compared the model with eight endoscopists with varying experience	VGG-19	AUC: 0.808 in the ITC and 0.813 in the ETC Similar predictive performance to the senior endoscopists (accuracy: 0.770 vs. 0.755; sensitivity: 0.792 vs. 0.767; specificity: 0.745 vs. 0.742) better than the junior endoscopists (accuracy: 0.770 vs. 0.728) After referring to the results of the system, the average diagnostic ability of the endoscopists was significantly improved in terms of accuracy, sensitivity, PPV, and NPV
		Liu et al. [36]	ME-NBI	Image classification	A total of 3871 ME-NBI images including 1130 CGT, 1114 LGN and 1627 EGC tenfold cross-validation	ResNet-50 VGG-16 Inception-v3 InceptionResNetv2	ResNet-50 is the best among the four networks Accuracy: 0.96 f1-scores: 0.92, 0.91 and 0.99 for classifying ME-NBI images into CGT, LGN and EGC, respectively
		Ueyama et al. [38]	ME-NBI	Image classification	Training dataset: 5574 ME‐NBI images (3797 EGCs, 1777 non‐cancerous mucosa and lesions) Testing dataset: 2300 ME‐NBI images (1430 EGCs, 870 non‐cancerous mucosa and lesions)	ResNet-50	The AI‐assisted CNN‐CAD system required 60 s to analyse 2300 test images Accuracy: 98.7% Sensitivity: 98% Specificity: 100% Positive predictive value: 100% Negative predictive value: 96.8% All misdiagnosed images of EGCs were of low‐quality or of superficially depressed and intestinal‐type intramucosal cancers that were difficult to distinguish from gastritis, even by experienced endoscopists
		Zhang et al. [39]	WLI	Image classification Semantic segmentation	Training dataset: 21,217 gastroscopic images of peptic ulcer (PU), early gastric cancer (EGC), high‐grade intraepithelial neoplasia (HGIN), advanced gastric cancer (AGC), gastric submucosal tumours (SMTs), and normal gastric mucosa without lesions Testing dataset: 1091 images The CNN diagnosis was compared with those of 10 endoscopists with over 8 years of experience in endoscopic diagnosis	ResNet34 DeepLabv3	The diagnostic specificity and PPV of the CNN were higher than that of the endoscopists for the EGC and HGIN images (specificity: 91.2% vs. 86.7%; PPV: 55.4% vs. 41.7%) The diagnostic accuracy of the CNN was close to those of the endoscopists for the lesion‐free, EGC and HGIN, PU, AGC, and SMTs images The CNN had an image recognition time of 42 s for all the test set images
	Determining the invasion depth of GC	Zhu et al. [29]	WLI	Image classification	Training dataset: 790 images Testing dataset: 203 images	ResNet-50	At a threshold value of 0.5, the sensitivity was 76.47%, the specificity was 95.56%, the AUC was 0.94, the overall accuracy was 89.16%, the positive predictive value was 89.66%, and the negative predictive value was 88.97% The CNN–CAD system achieved significantly higher accuracy (by 17.25%) and specificity (by 32.21%) than human endoscopists
	Determining the invasion depth of GC	Cho et al. [30]	WLI	Image classification	Internal dataset: a total of 2899 images Train:validation:test = 8:1:1 External dataset: 206 images for testing	DenseNet-161	In the internal test, the mean area under the curve discriminating submucosal invasion was 0.887 In the external test, the mean area under the curve reached 0.887 Clinical simulation showed that 6.7% of patients who underwent gastrectomy in the external test were accurately qualified by the established algorithm for potential endoscopic resection, avoiding unnecessary operation
	Delineating the margin of GC	An et al. [73]	WLI CE ME-NBI	Semantic segmentation	Training dataset: WLI: 343 images from 260 patients CE: 546 images from 67 patients Testing dataset: WLI: 321 images from 218 patients CE: 34 images from 14 patients	UNet + +	The system had an accuracy of 85.7% on the CE images and 88.9% on the WLE images under an overlap ratio threshold of 0.60 in comparison with the manual markers labelled by the experts On the ESD videos, the resection margins predicted by the system covered all areas of high-grade intraepithelial neoplasia and cancers The minimum distance between the margins predicted by the system and the histological cancer boundary was 3.44 ± 1.45 mm which outperformed the resection margin based on ME-NBI
	Detection of GC Anatomical classification	Wu et al. [18]	NBI BLI WLI	Image classification	Training dataset: 3170 gastric cancer and 5981 benign images for detect GC; 24,549 images from different parts of stomach for monitor blind spots Testing dataset: 100 gastric cancer and 100 benign images for detect GC; 170 images for monitor blind spots	VGG-16 ResNet-50	The DCNN identified EGC from nonmalignancy with an accuracy of 92.5%, a sensitivity of 94.0%, a specificity of 91.0%, a positive predictive value of 91.3%, and a negative predictive value of 93.8% The DCNN classified gastric locations into 10 or 26 parts with an accuracy of 90% or 65.9%
	Detection of GC Determining the invasion depth of GC	Yoon et al. [19]	WLI	Image classification	A total of 11,539 images (896 T1a-EGC, 809 T1b-EGC, and 9834 non-EGC) Train:validation:test = 6:2:2	VGG-16	AUC for EGC detection: 0.981 AUC for depth prediction: 0.851
	Detection of GC Delineating the margin of GC	Shibata et al. [82]	WLI	Image classification Semantic segmentation	A total of 1208 healthy and 533 cancer images fivefold cross-validation	Mask R-CNN	For the detection task: the sensitivity and false positives (FPs) per image were 96.0% and 0.10 FP/image, respectively For segmentation task: the average Dice index was 71%
	Classifying the type of GC Delineating the margin of GC	Ling et al. [35]	ME-NBI	Image classification	For CNN1 to identify EGC differentiation status Training dataset: 2217 images from 145 EGC patient Testing dataset: 1870 images from 139 EGC patients The performance of CNN1 was then compared with that of experts using 882 images from 58 EGC patients For CNN2 to delineate the EGC margins Training dataset: 928 images from 132 EGC patients Testing dataset: 742 images from 87 EGC patients	VGG-16 and ResNet-50 UNet + +	The system predicted the differentiation status of EGCs with an accuracy of 83.3% in the testing dataset In the man – machine contest, CNN1 performed significantly better than the five experts (86.2% vs. 69.7%) The system delineated EGC margins with an accuracy of 82.7% in differentiated EGC and 88.1% in undifferentiated EGC under an overlap ratio of 0.80 In unprocessed EGC videos, the system achieved real-time diagnosis of EGC differentiation status and EGC margin delineation in ME-NBI endoscopy
HP	Detection of HP	Itoh et al. [14]	WLI	Image classification	179 upper gastrointestinal endoscopy images obtained from 139 patients (65 were HP-positive and 74 were HP-negative) Training dataset: 149 were used as training images, The 149 training images were subjected to data augmentation, which yielded 596 images Testing dataset: the remaining 30 (15 from HP-negative patients and 15 from HP-positive patients) were set aside to be used as test images	GoogLeNet	Sensitivity: 86.7% Specificity: 86.7% AUC: 0.956
		Nakashima et al. [15]	WLI BLI LCI	Image classification	Training dataset: For per group (WLI BLI LCI): 486 images (rotated 90, 180, and 270 degrees) in addition to the original 162, for a total of 648 Testing dataset: For per group (WLI BLI LCI): 60 images	GoogLeNet	AUC for WLI: 0.66 AUC for BLI: 0.96 AUC for LCI: 0.95
		Zheng et al [20]	WLI	Image classification	Training dataset: 11,729 gastric images Testing dataset: 3755 gastric images	ResNet-50	The AUC for a single gastric image was 0.93 with sensitivity, specificity, and accuracy of 81.4%, 90.1%, and 84.5%, respectively, using an optimal cut-off value of 0.3 The AUC for multiple gastric images per patient was 0.97 with sensitivity, specificity, and accuracy of 91.6%, 98.6%, and 93.8%, respectively, using an optimal cut-off value of 0.4
	Diagnosis of HP	Nakashima et al. [37]	LCI WLI	Image classification	Training dataset: 6639 WLI images and 6248 LCI images from 395 subjects Testing dataset: Videos of 120 subjects	A 22-layer skip-connection architecture	For the LCI-CAD system: Accuracy: 84.2% for uninfected, 82.5% for currently infected, and 79.2% for post-eradication status For the WLI-CAD system: Accuracy: 75.0% for uninfected, 77.5% for currently infected, and 74.2% for post-eradication status The LCI-CAD system demonstrated significantly superior diagnostic accuracy to that of the WLI-CAD system and comparable diagnostic accuracy to that of experienced endoscopists
	Diagnosis of HP	Shichijo et al. [28]	WLI	Object detection	Training dataset: 98,564 endoscopic images from 5236 patients (742 H. pylori-positive, 3649 -negative, and 845 -eradicated) Testing dataset: 23,699 images from 847 patients; 70 positive, 493 negative, and 284 eradicated	GoogLeNet	80% (465/582) of negative diagnoses were accurate, 84% (147/174) eradicated, and 48% (44/91) positive The time needed to diagnose 23,699 images was 261 s
GP	Detection of GP	Zhang et al. [54]	WLI	Image classification	Training dataset: 708 images Testing dataset: 50 images	SSD	The model can realize real-time polyp detection with 50 frames per second (FPS) The model can achieve the mean average precision (mAP) of 90.4% The model has excellent performance in improving polyp detection recalls over 10%, especially in small polyp detection
GIM	Diagnosis of GIM	Yan et al. [22]	NBI ME-NBI	Image classification	Training dataset: 1880 endoscopic images (1048 GIM and 832 non-GIM) from 336 patients Testing dataset: 477 pathologically confirmed images (242 GIM and 235 non-GIM) from 80 patients	EfficientNetB4	AUC: 0.928 Sensitivity: 91.9% Specificity: 86.0% Accuracy: 88.8%
CAG	Diagnosis of CAG	Zhang et al. [21]	White-light i-Scan	Image classification	A total of 5470 images of the gastric antrum of 1699 patients (3042 images depicted atrophic gastritis and 2428 did not) fivefold cross-validation The diagnoses of the deep learning model were compared with those of three experts	DenseNet121	Accuracy: 0.942 Sensitivity: 0.945 Specificity: 0.940 The detection rates of mild, moderate, and severe atrophic gastritis were 93%, 95%, and 99%, respectively The diagnostic performance of the CNN model was higher than that of the experts

WLI, white-light imaging; CE, chromoendoscopy; NBI, narrow-band imaging; GC, gastric cancer; SSD, single-shot multibox detection; CNN, convolutional neural network; HP, Helicobacter pylori; AUC, area under curve; BLI, blue-light imaging; LCI, linked colour imaging; DCNN, deep convolutional neural network; EGC, early gastric cancer; FPS, frame per second; mAP, mean average precision; GP, gastric polyp; CAD, computer-aided diagnosis; WLE, white-light endoscopy; ESD, endoscopic submucosal dissection; ME, magnifying endoscope; PPV, positive predictive value; NPV, negative predictive value; CGT, chronic gastritis; LGN, low-grade neoplasia; AI, artificial intelligence; GIM, gastric intestinal metaplasia; PU, peptic ulcer; HGIN, high-grade intraepithelial neoplasia; AGC, advanced gastric cancer; SMTs, submucosal tumours; CAG, chronic atrophic gastritis

Back to article page

ISSN: 1475-925X

Contact us

Submission enquiries: journalsubmissions@springernature.com