Skip to main content

Diagnostic test accuracy of machine learning algorithms for the detection intracranial hemorrhage: a systematic review and meta-analysis study

Abstract

Background

This systematic review and meta-analysis were conducted to objectively evaluate the evidence of machine learning (ML) in the patient diagnosis of Intracranial Hemorrhage (ICH) on computed tomography (CT) scans.

Methods

Until May 2023, systematic searches were conducted in ISI Web of Science, PubMed, Scopus, Cochrane Library, IEEE Xplore Digital Library, CINAHL, Science Direct, PROSPERO, and EMBASE for studies that evaluated the diagnostic precision of ML model-assisted ICH detection. Patients with and without ICH as the target condition who were receiving CT-Scan were eligible for the research, which used ML algorithms based on radiologists' reports as the gold reference standard. For meta-analysis, pooled sensitivities, specificities, and a summary receiver operating characteristics curve (SROC) were used.

Results

At last, after screening the title, abstract, and full paper, twenty-six retrospective and three prospective, and two retrospective/prospective studies were included. The overall (Diagnostic Test Accuracy) DTA of retrospective studies with a pooled sensitivity was 0.917 (95% CI 0.88–0.943, I2 = 99%). The pooled specificity was 0.945 (95% CI 0.918–0.964, I2 = 100%). The pooled diagnostic odds ratio (DOR) was 219.47 (95% CI 104.78–459.66, I2 = 100%). These results were significant for the specificity of the different network architecture models (p-value = 0.0289). However, the results for sensitivity (p-value = 0.6417) and DOR (p-value = 0.2187) were not significant. The ResNet algorithm has higher pooled specificity than other algorithms with 0.935 (95% CI 0.854–0.973, I2 = 93%).

Conclusion

This meta-analysis on DTA of ML algorithms for detecting ICH by assessing non-contrast CT-Scans shows the ML has an acceptable performance in diagnosing ICH. Using ResNet in ICH detection remains promising prediction was improved via training in an Architecture Learning Network (ALN).

Background

A potentially fatal disorder known as intracranial hemorrhage (ICH) occurs in 25 per 100,000 yearly, which is related to 2 million strokes globally and has an estimated incidence [1]. There is a variety of fundamental (80–85%) and secondary (15–20%) underlying causes of ICH [2]. The most frequent non-traumatic secondary causes include brain tumors, ischemic strokes, and vascular malformations. Hospital admissions for ICH have grown during the past ten years, primarily because of the elderly population, insufficient blood pressure (BP), and increased use of blood thinners management [3, 4]. In such a way that, rational decrease of BP is an important factor to manage these patients, specifictly for lower than 15 mL ICH volume [5, 6]. The revascularization in the acute phase of strokes can improve the symptoms and better prognosis of these patients [7]. The tissue plasminogen activator (tPA) is the main treatment for ischemic stroke. Moreover, the clot in the blood vessel can be removed by thrombectomy technique that catheter intervent upper of femur; then, using angioplasty blocked artery can be opened up [8, 9].

Neuroimaging is, therefore, essential for the diagnosis of acute ICH because perchance challenging to differentiate it from other diseases, such as ischemic stroke [10]. The successful procedure of a non-contrast computed tomography (CT) for the cerebrum, an accessible and quick technique for diagnosing ICH, are crucial component of the ICH diagnostic process. Fundamental ICH features such as location, edema, ventricular system expansion, and midline shift are morphologically revealed by a CT-Scan [11]. However, more significant CT-Scan usage could delay the identification of ICH, and a growing burden in radiology departments could lead to job-related stress and burnout. In contrast, it has been discovered that artificial intelligence (AI) can improve radiology practice by lowering the amount of effort required [12,13,14].

Today, the efficiency of machine learning (ML) algorithms, especially improving deep learning (DL) algorithms for computer vision, has advanced significantly. The CT-Scan, one of the most well-known imaging modalities, and has seen considerable breakthroughs in ML and its application [15, 16]. Support vector machine (SVM), Convolutional neural network (CNN), random forest (RF), and conditional random field (CRF) are the most prominent ML algorithms for recognizing brain bleeding from visual data. Even though a great deal of work has already been accomplished in this field, there is still room for growth. Additional research is required to improve the accuracy, precision, and resilience of ML-based brain segmentation [17, 18]. A meta-analysis reported DTA of AI for the detection of ICH; however, this study did not report subgroups for distinguishing between Algorithms and also types of ICH [19]. Therefore, this systematic review and meta-analysis were conducted to objectively evaluate the evidence of ML in the patient diagnosis of ICH on CT scans.

Results

Study selection & characteristics

Following the primary search, 1,405 studies were recognized after removing duplicated studies. At last, after screening the title, abstract, and full paper, twenty-six retrospective and three prospective, and two retrospective/prospective studies were included [20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50]; then, twenty-nine studies were included in the final quantitative analysis, and the other studies were excluded because no diagnostic accuracy was reported (Fig. 1) [20,21,22,23,24,25,26, 28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46, 48,49,50,51,52]. The machine learning networks were classified into, Support vector machine (SVM), Random Forest (RF), k-nearest neighbors’ algorithm (k-NN), VGG-16, Logistic Regression (LR), ResNet-18, AlexNet, DenseNet-121, eXtreme Gradient Boosting (XGBoost), Decision Tree (DT), and Deep Learning (DL) included Convolutional Neural Network (CNN); ResNet34, ResNet50, ResNet18, ResNet-v2, GoogleNet (Table 1).

Fig. 1
figure 1

Study Flow Diagram showing how to extract articles

Table 1 Summary of findings for all studies included in the qualitative synthesis

Risk of bias

The validity and the possibility of bias for the included studies were evaluated with the QUADAS-2 (Fig. 2). One high-risk bias was reported in all the included studies [20]. When the publication bias is very low, the points will be symmetrically distributed around the true effect of an inverted funnel, as shown in Fig. 3.

Fig. 2
figure 2figure 2

A. Risk of bias and applicability concerns graph; review authors' judgments about each domain presented as percentages across included studies. B. Risk of bias and applicability concerns summary; review authors' judgments about each domain for each included study

Fig. 3
figure 3

Funnel plot showing the low likelihood of publication bias in all included studies

Diagnostic test accuracy (DTA) of all included studies

Retrospective studies

The overall DTA of the 26 retrospective studies and 904,755 scans was estimated using a univariate meta-analysis with a pooled sensitivity was 0.917 (95% CI 0.88 to 0.943, I2 = 99%) (Fig. 4) [20,21,22,23,24,25,26, 28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43, 45, 46, 48,49,50]. The pooled specificity was 0.945 (95% CI 0.918 to 0.964, I2 = 100%) (Fig. 5). The pooled diagnostic odds ratios (DOR) was 219.47 (95% CI 104.78 to 459.66, I2 = 100%) (Additional file 1: Figure S1). The LR+ ranges from 12.639 to 20.784 with pooled mean of 16.208 (Table 2), and LR ranges from 0.072 to 0.123 with pooled mean of 0.094. The AUC of 0.971 was reported for the SROC via the bivariate model (Fig. 6). The overall accuracy was 90.3 (ranges from 87.24 to 93.01), the precision was 76.24 (ranges from 66.71 to 86.32), and the F1-score was 79.14 (ranges from 70.9 to 86.48) (Table 2).

Fig. 4
figure 4

Univariate sub-group analysis of sensitivity with random model based on retrospective studies

Fig. 5
figure 5

Univariate sub-group analysis of specificity with random model based on retrospective studies

Table 2 DTA estimated from all the studies included in the meta-analysis using (2 \(\times\) 2) confusion table
Fig. 6
figure 6

The SROC of the bivariate for DTA based on retrospective studies

Prospective studies

The overall DTA of the five prospective studies and 104,397 scans was estimated using a univariate meta-analysis with a pooled sensitivity was 0.886 (95% CI 0.613–0.975, I2 = 100%) (Fig. 7) [24, 29, 33, 40, 44]. The pooled specificity was 0.967 (95% CI 0.937–0.983, I2 = 100%) (Fig. 8). The pooled DOR was 227.71 (95% CI 27.82–1863.51, I2 = 100%) (Additional file 1: Figure S2). The LR+ ranges from 6.054 to 87.029 with pooled mean of 22.953 (Table 2), and LR ranges from 0.005 to 1.932 with pooled mean of 0.101. The AUC of 0.98 was reported for the SROC via the bivariate model (Fig. 9).

Fig. 7
figure 7

Univariate sub-group analysis of sensitivity with random model based on prospective studies

Fig. 8
figure 8

Univariate sub-group analysis of specificity with random model based on prospective studies

Fig. 9
figure 9

The SROC of the bivariate for DTA based on prospective studies

The overall accuracy was 93.69 (ranges from 90.31 to 97.2), the precision was 75.58 (ranges from 55.23 to 91.18), and the F1-score was 77.26 (ranges from 56.23 to 91.32) (Table 2).

DTA Based on network architecture

The Network Architecture analysis was divided into ResNet, RF, and SVM [20,21,22,23,24,25,26, 28, 30,31,32,33,34,35,36,37,38,39, 41,42,43, 45, 46, 48,49,50]. These results were significant for the specificity of the different network architecture models (p-value = 0.0289). However, the results for sensitivity (p-value = 0.6417) and DOR (p-value = 0.2187) were not significant (Additional file 1: Figures S3–S5).

DTA based on ICH types

The ICH types of analysis were divided into EDH, SDH, IPH, IVH, SAH, and CPH [21, 25, 33, 36, 38, 46, 49, 50]. These results were significant for the results for specificity (p-value < 0.0001) and DOR (p-value = 0.0009). However, the sensitivity of different ICH types (p-value = 0.4564) was insignificant (Additional file 1: Figures S6–S8).

DTA based on data sources

The data sources analysis was divided into single [20, 22, 24, 26, 27, 30, 32,33,34, 36,37,38,39, 41, 42, 47, 48, 50] or multiple [21, 23,24,25, 27, 28, 31, 34, 35, 43, 45,46,47,48,49]. These results were not significant for the sensitivity (p-value = 0.6879), specificity (p-value = 0.6494), and DOR (p-value = 0.7272) (Additional file 1: Figures S9–S11).

The data sources analysis was divided into benchmark [26, 28, 31, 32, 36, 38, 42, 46] or real-time data [20,21,22,23,24,25, 27, 30, 33,34,35, 37, 39, 41, 43, 45, 47,48,49,50]. These results were not significant for the sensitivity (p-value = 0.1017), specificity (p-value = 0.5189), and DOR (p-value = 0.1285) (Additional file 1: Figures S12–S14).

Discussion

Detection of ICH by ML in systematic studies may decrease the time to diagnosis, which is crucial for clinical because approximately most of ICH in accordance with death occurs within the primary hours [53]. This meta-analysis demonstrated that ResNet algorithms could detect ICHs accurately with retrospective and non-randomized data [22, 31, 33, 37, 38, 50].

In this current study, ML has been used in ICH non-contrast CT-Scans with different architecture models. The resulting pooled sensitivity, specificity, DOR, AUC, accuracy, and precision were 0.917 (95% CI 0.88 to 0.943, I2 = 99%), 0.945 (95% CI 0.918 to 0.964, I2 = 100%), 219.47 (95% CI 104.78 to 459.66, I2 = 100%), 0.971, 90.3 (ranges from 87.24 to 93.01), and 76.24 (ranges from 66.71 to 86.32), respectively.

Practical ML is characterized by high accuracy measures such as AUC, sensitivity, and specificity, which can accurately categorize illness suspects and non-suspects. This meta-analysis revealed a combined AUC of 0.971. On the other hand, the high AUC of the included trials could not correctly represent the performance of the algorithm's therapeutic benefit [54]. Initially, the range of AUC among studies was 0.608 to 1 that Neural Networks (NNs) learning such as CNN, ResNet, and RNN had a higher rate from other ML algorithms [20, 21, 23, 24, 26,27,28,29, 31, 33, 37,38,39, 43, 44, 46, 49]. In other words, this result suggested that NNs algorithms in the big data can improve the rate of AUC which it is a useful way to detect a good model and positive and negative target classes.

DL models were shown to have a pooled sensitivity of 87.00% (95% confidence interval: 83.00–90.20%) and specificity of 92.50% (95% confidence interval: 85.10–96.40%) when compared to the gold standard by Liu et al. (2019), who pooled 14 out-of-sample external validation experiments [55].

To interpret the results, a DOR of 219.47 (95% CI 104.78–459.66, I2 = 100%) generally means using ML in diagnosing ICH is valuable. Due to the necessity of reporting the convergence of the results along with the accuracy, precision is also mentioned. Precision equal to 76.24 (ranges from 66.71 to 86.32) indicates a relative convergence besides the accuracy of 90.3 (ranges from 87.24 to 93.01). These results show that ML can be diagnosed with ICH in healthy patients. Also, likelihood ratios are important factors that could help improve clinical judgment and show the range of disease frequencies, and LR+ greater than 10 produces a greater pretest probability. The LR less than 0.1 has conclusive changes in the post-test possibility [56]. The pooled positive LR+ and LR range from 12.639 to 20.784 with a mean of 16.208 and 0.072 to 0.123 with a pooled mean of 0.094, respectively. The pooled LR+ of 16.208 means that diagnosis of ICH is 16.208 times more likely to be diagnosed while ML is used; likewise, the pooled LR of 0.094 means ICH has a higher likelihood of negative test for the ML algorithm than healthy patients. The pooled F1 score of this study was 79.14 (ranging from 70.9 to 86.48). The F1 score is a numerical score between 0 and 100; the closer this number is to 100, the more valuable the method studied [57]. This score results from the average weight of recall and precision, which has a significant place in data interpretation. It can be reduced the number of false negatives and positives.

The sub-group analysis based on the ML architecture and algorithms was done to assess these factors' influence on the DTA results. The network architecture analysis results showed significance for the specificity of the different network architecture models (p-value = 0.0289). However, the results for sensitivity (p-value = 0.6417) and DOR (p-value = 0.2187) were not significant. Thus, the ResNet algorithm has higher pooled specificity than other algorithms 0.935 (95% CI 0.854 to 0.973, I2 = 93%). Between studies, CNN architectures included specialized neural networks and ensemble learning [58]. However, this study focuses on CNNs for detecting ICHs in general, and it may not be acceptable to extend the results to other AI projects [25]. To increase the number of entirely connected layers from one to five, Lee et al. 2019 combined a final CNN made up of VGG16, ResNet50, Inception-v3, and Inception ResNet-v2 utilizing ResNet18 with only minor alterations [33]. It has been demonstrated that standard ImageNet architectures such as ResNet18 do not significantly outperform smaller and simpler CNNs [59]. However, by averaging many transfer models, the performance of an ensemble of transfer models may be enhanced. Chang et al. (2018) used a hybrid 3D/2D CNN pyramid with a proprietary mask R-CNN architecture as its backbone to detect and segment ICHs [60]. Medical imaging can use finely tuned 3D networks, which have shown exceptional performance in a variety of applications; however, 3D networks need a large dataset and several training parameters, with the image depth volume varying from 20 to 400 slices per scan, which is more demanding in terms of computation efficiency [25].

Besides, the sub-group analysis based on the ICH types was significant for specificity (p-value < 0.0001) and DOR (p-value = 0.0009). However, the sensitivity of different ICH types (p-value = 0.4564) was insignificant. Thus, EDH has higher pooled specificity and DOR than other ICH types 0.99 (95% CI 0.947–0.998, I2 = 100%) and 616.79 (95% CI 91.76–4145.99, I2 = 97%). However, there were no significant differences between data sources (single versus multiple or benchmark versus real-time).

Misdetection of ICHs, which are difficult to distinguish from bone or undiscovered microbleeds in trauma imaging, is another therapeutically significant and relevant issue [61]. Using image processing techniques, the skull and face were removed from NCTCs in Kuo et al. 2019 research. They achieved 100% sensitivity in an external test set of 200 NCTCs, which was likely made possible by the simplicity of detecting bleeding when only intracranial structures were considered [62]. Patients excluded or removed because of picture artifacts might improve the algorithm. NCTCs are familiar with patient-related imaging artifacts in CT, such as metallic materials, human movements, and incomplete projections. In addition, the diversity of CT scanners and image reconstruction methods makes direct comparisons between research challenging [33].

Limitations

Developing a clinical environment where an ML supports the radiologist could improve diagnostic efficacy and should be assessed from a socioeconomic and patient standpoint [63]. The deployment of MLs in clinical operations necessitates a sophisticated configuration coupled with medical imaging systems. Just one of the included articles assessed midline shift [25]. Therefore, this outcome couldn’t analyze. This would be important clinically, as its value > 5 mm may be an indication for urgent neurosurgical review.

Additionally, the findings of the I-squared analysis make it clear that combining the data from these studies may not be appropriate, underscoring the dearth of external validation research. Due to factors like scanning methodology, scanner types, algorithm designs, and reference standards, it is not easy to compare different research, which reduces the generalizability and validity of the findings. The judgment of articles may have been tainted by subjective bias since writers' degrees of experience varied. The creation of additional prospective studies in this area may significantly advance future research since, in addition to the different causes of variability, the use of retrospective studies was the study's most noticeable limitation.

Conclusion

This meta-analysis on DTA of ML algorithms for detecting ICH by assessing non-contrast CT-Scans shows the ML has an acceptable performance in diagnosing ICH. Using ResNet in ICH detection remains promising prediction was improved via training in an Architecture Learning Network (ALN). However, further studies with greater homogeneity are needed to draw more accurate conclusions about the results of DTA of ML in ICH.

Methods

Protocol and registration

This meta-analysis study was reported according to Preferred Reporting Items for Systematic Reviews-Diagnostic Test Accuracy (PRISMA-DTA) guideline [64].

Eligibility criteria

Original studies were eligible if they met all the following predefined inclusion criteria: a) patients undergoing non-contrast brain computed tomography (CT) scan for the detection of acute or chronic Intracranial hemorrhage (ICH), such as intraparenchymal hemorrhage (IPH), subdural hemorrhage (SDH), epidural hemorrhage (EDH), intraventricular hemorrhage (IVH), and subarachnoid hemorrhage (SAH), or b) using a gold standard (Radiologists) to report the ICH.

Information sources

Until May 2023, systematic searches were conducted in ISI Web of Science, PubMed, Scopus, Cochrane Library, IEEE Xplore Digital Library, CINAHL, Science Direct, PROSPERO, and EMBASE for studies that evaluated the diagnostic precision of ML model-assisted ICH detection.

Search strategy

One knowledgeable librarian [KSH] established and refined search tactics through team discussion. “Deep Learning,” “Machine Learning,” “Artificial Intelligence,” “Intracranial Hemorrhages,” “intraparenchymal hemorrhage,” “epidural hemorrhage,” “subdural hemorrhage,” “subarachnoid hemorrhage,” “intraventricular hemorrhage,” “Diagnosis,” “Meta-Analysis,” and “Computerized Tomography” were among the kwywords. Moreover, conferences, editorials, commentaries, reviews, guidelines, book chapters, technical articles, and papers with inadequate citation standards that did not match the conceptual framework of the study were rejected.

Summary measures

ICHs versus HCs that were true positive (TP, true ICH, predicted to be ICH), true negative (TN, non-ICH predicted to be non-ICH), false positive (FP, non-ICH predicted to be ICH), or false negative (FN, ICH, predicted to be non-ICH) were extracted for meta-analysis purposes. The original study's inclusion criteria were utilized to obtain data for the meta-analysis on detecting ICH. In addition, the publication year, the nation where the research was conducted, the study methodology, the number of patients, and their ages were recovered. The primary outcomes were diagnostic accuracy = ((TP + TN)/(TP + FN + FP + TN)), specificity = TN/(FP + TN), sensitivity = TP/(TP + FN), precision = (TP/TP + FP), F1- Score = 2 × (Precision × Recall/Precision + Recall), negative likelihood ratio (LR) = (1-sensitivity/specificity), positive likelihood ratio (LR+) = (sensitivity/1- specificity), DOR = (LR+/LR), and the AUC of ML on detecting ICH in the patients, ICH versus healthy controls (HCs) [65, 66]. Comparing the accuracy, sensitivity, and specificity of ML and CT-Scan were the subgroup analysis.

Risk of bias across studies

Two independent reviewers utilized the updated Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) instrument to evaluate all studies' quality and potential bias. Communication resolved conflicts, and a third reviewer and reviewers independently assessed the first included papers. Two categories were considered: bias susceptibility and patient selection, index test, and comparative benchmark application. In the flow and pace areas, bias was evaluated.

Additional analyses

Using the Random Effects Model (RE) technique, a univariate meta-analysis was conducted for each modality's sensitivity and specificity to determine its diagnostic accuracy [67]. The RE model was chosen because of the suspected high proportion of heterogeneity. The primary endpoints were sensitivity, specificity, a summary of receiver operating characteristics (SROC) curve, and diagnostic odds ratio (DOR). Point estimates and 95% confidence intervals (CIs) for each study were calculated to ensure consistency of sensitivity and specificity. A bivariate meta-analysis of sensitivity and specificity used R version 4.1.2 (R Foundation for Statistics Computing, Vienna, Austria, 2021) and RStudio version 1.4.1717 to obtain the SROC curve. This includes the "mada" and "meta" R packages implemented. Then the average AUC of SROC was estimated [68, 69]. The secondary outcomes comprised the positive and negative likelihood ratios, precision, and F1 score. Cochran's Q test and I2 statistics were utilized to evaluate statistical heterogeneity between studies. 0–40% indicates insignificant non-uniformity, 30%–60% indicates moderate non-uniformity, and 75–100% indicates considerable non-uniformity for Q statistics. A funnel chart was used to examine and depict publication bias (32). All p-values are derived from two-sided tests, and p-values of 0.05 are statistically significant. Screening based on machine learning algorithms, ICH types, retrospective or prospective study design, and acute or chronic ICHs was used to perform subgroup analysis. Using the Cochrane Review Manager version 5.4 (RevMan 5.4) program, bias cross-study risk and applicability concern charts were assessed.

Availability of data and materials

Not applicable.

References

  1. An SJ, Kim TJ, Yoon BW. Epidemiology, risk factors, and clinical features of intracerebral hemorrhage: an update. J Stroke. 2017;19(1):3–10.

    Article  Google Scholar 

  2. Rindler RS, et al. Neuroimaging of intracerebral hemorrhage. Neurosurgery. 2020;86(5):E414–23.

    Article  Google Scholar 

  3. Hong JM, Kim DS, Kim M. Hemorrhagic transformation after ischemic stroke: mechanisms and management. Front Neurol. 2021;12: 703258.

    Article  Google Scholar 

  4. Ginat DT. Analysis of head CT scans flagged by deep learning software for acute intracranial hemorrhage. Neuroradiology. 2020;62(3):335–40.

    Article  MathSciNet  Google Scholar 

  5. Shi L, et al. Blood pressure management for acute intracerebral hemorrhage: a meta-analysis. Sci Rep. 2017;7(1):14345.

    Article  MathSciNet  Google Scholar 

  6. Rabinstein AA. Optimal Blood Pressure After Intracerebral Hemorrhage: Still a Moving Target. Stroke. 2018;49(2):275–6.

    Article  Google Scholar 

  7. Rha JH, Saver JL. The impact of recanalization on ischemic stroke outcome: a meta-analysis. Stroke. 2007;38(3):967–73.

    Article  Google Scholar 

  8. Leng T, Xiong ZG. Treatment for ischemic stroke: From thrombolysis to thrombectomy and remaining challenges. Brain Circ. 2019;5(1):8–11.

    Article  Google Scholar 

  9. Hughes RE, Tadi P, Bollu PC. TPA Therapy. In: StatPearls. StatPearls Publishing Copyright ©: StatPearls Publishing LLC.: Treasure Island (FL); 2023.

    Google Scholar 

  10. Sporns PB, et al. Neuroimaging of acute intracerebral hemorrhage. J Clin Med. 2021;10(5):1086.

    Article  Google Scholar 

  11. Vidhya V, et al. Automated detection and screening of traumatic brain injury (TBI) using computed tomography images: a comprehensive review and future perspectives. Int J Environ Res Public Health. 2021;18(12):6499.

    Article  Google Scholar 

  12. Rao B, et al. Utility of artificial intelligence tool as a prospective radiology peer reviewer—detection of unreported intracranial hemorrhage. Acad Radiol. 2021;28(1):85–93.

    Article  Google Scholar 

  13. Hosny A, et al. Artificial intelligence in radiology. Nat Rev Cancer. 2018;18(8):500–10.

    Article  Google Scholar 

  14. Derevianko A, et al. The use of artificial intelligence (AI) in the radiology field: what is the state of doctor0patient communication in cancer diagnosis? Cancers. 2023;15(2):470.

    Article  MathSciNet  Google Scholar 

  15. Rana M, Bhushan M. Machine learning and deep learning approach for medical image analysis: diagnosis to detection. Multimed Tools Appl. 2022;24:1–39.

    Google Scholar 

  16. Sarker IH. Machine learning: algorithms, real-world applications and research directions. SN Comput Sci. 2021;2(3):160.

    Article  Google Scholar 

  17. Lee JY, et al. Detection and classification of intracranial haemorrhage on CT images using a novel deep-learning algorithm. Sci Rep. 2020;10(1):20546.

    Article  Google Scholar 

  18. Kundisch A, et al. Deep learning algorithm in detecting intracranial hemorrhages on emergency computed tomographies. PLoS ONE. 2021;16(11): e0260560.

    Article  Google Scholar 

  19. Matsoukas S, et al. Accuracy of artificial intelligence for the detection of intracranial hemorrhage and chronic cerebral microbleeds: a systematic review and pooled analysis. Radiol Med. 2022;127(10):1106–23.

    Article  Google Scholar 

  20. Abe D, et al. A prehospital triage system to detect traumatic intracranial hemorrhage using machine learning algorithms. JAMA Netw Open. 2022;5(6): e2216393.

    Article  Google Scholar 

  21. Alis D, et al. A joint convolutional-recurrent neural network with an attention mechanism for detecting intracranial hemorrhage on noncontrast head CT. Sci Rep. 2022;12(1):2084.

    Article  Google Scholar 

  22. Altuve M, Pérez A. Intracerebral hemorrhage detection on computed tomography images using a residual neural network. Phys Med. 2022;99:113–9.

    Article  Google Scholar 

  23. Arbabshirani MR, et al. Advanced machine learning in action: identification of intracranial hemorrhage on computed tomography scans of the head with clinical workflow integration. NPJ Digit Med. 2018;1:9.

    Article  Google Scholar 

  24. Chang PD, et al. Hybrid 3D/2D convolutional neural network for hemorrhage evaluation on head CT. AJNR Am J Neuroradiol. 2018;39(9):1609–16.

    Article  Google Scholar 

  25. Chilamkurthy S, et al. Deep learning algorithms for detection of critical findings in head CT scans: a retrospective study. Lancet. 2018;392(10162):2388–96.

    Article  Google Scholar 

  26. Cortes-Ferre L, et al. Deep Learning Applied to Intracranial Hemorrhage Detection. J Imaging. 2022;9(2):37. https://doi.org/10.3390/jimaging9020037.

    Article  Google Scholar 

  27. Danilov G, et al. Classification of Intracranial Hemorrhage Subtypes Using Deep Learning on CT Scans. Stud Health Technol Inform. 2020;272:370–3.

    Google Scholar 

  28. Grewal M, et al. Radnet: Radiologist level accuracy using deep learning for hemorrhage detection in ct scans in 2018 IEEE 15th International symposium on biomedical imaging (ISBI 2018). IEEE. 2018. https://doi.org/10.48550/arXiv.1710.04934.

    Article  Google Scholar 

  29. Hopkins BS, et al. Mass deployment of deep neural network: real-time proof of concept with screening of intracranial hemorrhage using an open data set. Neurosurgery. 2022;90(4):383–9.

    Article  Google Scholar 

  30. Kau T, et al. FDA-approved deep learning software application versus radiologists with different levels of expertise: detection of intracranial hemorrhage in a retrospective single-center study. Neuroradiology. 2022;64(5):981–90.

    Article  Google Scholar 

  31. Kumaravel P, et al. A simplified framework for the detection of intracranial hemorrhage in CT brain images using deep learning. Curr Med Imaging. 2021;17(10):1226–36.

    Article  Google Scholar 

  32. Kuo W, et al. Expert-level detection of acute intracranial hemorrhage on head computed tomography using deep learning. Proc Natl Acad Sci U S A. 2019;116(45):22737–45.

    Article  Google Scholar 

  33. Lee H, et al. An explainable deep-learning algorithm for the detection of acute intracranial haemorrhage from small datasets. Nat Biomed Eng. 2019;3(3):173–82.

    Article  Google Scholar 

  34. Majumdar A, et al. Detecting Intracranial Hemorrhage with Deep Learning. Annu Int Conf IEEE Eng Med Biol Soc. 2018;2018:583–7.

    Google Scholar 

  35. McLouth J, et al. Validation of a deep learning tool in the detection of intracranial hemorrhage and large vessel occlusion. Front Neurol. 2021;12:656112–656112.

    Article  Google Scholar 

  36. Phaphuangwittayakul A, et al. An optimal deep learning framework for multi-type hemorrhagic lesions detection and quantification in head CT images for traumatic brain injury. Appl Intell. 2022;52(7):7320–38.

    Article  Google Scholar 

  37. Rao BN, et al. Deep transfer learning for automatic prediction of hemorrhagic stroke on CT images. Comput Math Methods Med. 2022;2022:3560507.

    Article  Google Scholar 

  38. Salehinejad H, et al. A real-world demonstration of machine learning generalizability in the detection of intracranial hemorrhage on head computerized tomography. Sci Rep. 2021;11(1):17051–17051.

    Article  Google Scholar 

  39. Schmitt N, et al. Automated detection and segmentation of intracranial hemorrhage suspect hyperdensities in non-contrast-enhanced CT scans of acute stroke patients. Eur Radiol. 2022;32(4):2246–54.

    Article  Google Scholar 

  40. Seyam M, et al. Utilization of artificial intelligence-based intracranial hemorrhage detection on emergent noncontrast CT images in clinical workflow. Radiol Artif Intell. 2022;4(2): e210168.

    Article  Google Scholar 

  41. Tang Z, et al. Deep learning-based prediction of hematoma expansion using a single brain computed tomographic slice in patients with spontaneous intracerebral hemorrhages. World Neurosurg. 2022. https://doi.org/10.1016/j.wneu.2022.05.109.

    Article  Google Scholar 

  42. Tharek A, et al. Intracranial hemorrhage detection in CT scan using deep learning. Asian J Med Technol. 2022;2(1):1–18.

    Article  Google Scholar 

  43. Trevisi G, et al. Machine learning model prediction of 6-month functional outcome in elderly patients with intracerebral hemorrhage. Neurosurg Rev. 2022. https://doi.org/10.1007/s10143-022-01802-7.

    Article  Google Scholar 

  44. Uchida K, et al. Development of machine learning models to predict probabilities and types of stroke at prehospital stage: the Japan urgent stroke triage score using machine learning (JUST-ML). Transl Stroke Res. 2022;13(3):370–81.

    Article  Google Scholar 

  45. Voter AF, et al. Diagnostic accuracy and failure mode analysis of a deep learning algorithm for the detection of intracranial hemorrhage. J Am Coll Radiol. 2021;18(8):1143–52.

    Article  Google Scholar 

  46. Wang X, et al. A deep learning algorithm for automatic detection and classification of acute intracranial hemorrhages in head CT scans. NeuroImage Clinical. 2021;32:102785–102785.

    Article  Google Scholar 

  47. Xu J, et al. Deep network for the automatic segmentation and quantification of intracranial hemorrhage on CT. Front Neurosci. 2021;14:541817–541817.

    Article  Google Scholar 

  48. Xu X, et al. Prognostic prediction of hypertensive intracerebral hemorrhage using CT radiomics and machine learning. Brain and behavior. 2021;11(5):e02085–e02085.

    Article  Google Scholar 

  49. Ye H, et al. Precise diagnosis of intracranial hemorrhage and subtypes using a three-dimensional joint convolutional and recurrent neural network. Eur Radiol. 2019;29(11):6191–201.

    Article  MathSciNet  Google Scholar 

  50. Zhou Q, et al. Transfer learning of the ResNet-18 and DenseNet-121 model used to diagnose intracranial hemorrhage in CT scanning. Curr Pharm Des. 2022;28(4):287–95.

    Article  Google Scholar 

  51. Neves G, et al. External validation of an artificial intelligence device for intracranial hemorrhage detection. World Neurosurg. 2023;173:e800–7.

    Article  Google Scholar 

  52. Abrigo JM, et al. Artificial intelligence for detection of intracranial haemorrhage on head computed tomography scans: diagnostic accuracy in Hong Kong. Hong Kong Med J. 2023;29(2):112–20.

    Article  Google Scholar 

  53. O’Neill TJ, et al. Active reprioritization of the reading worklist using artificial intelligence has a beneficial effect on the turnaround time for interpretation of head CT with intracranial hemorrhage. Radiol Artif Intell. 2021;3(2): e200024.

    Article  Google Scholar 

  54. Fleming TR, DeMets DL. Surrogate end points in clinical trials: are we being misled? Ann Intern Med. 1996;125(7):605–13.

    Article  Google Scholar 

  55. Liu X, et al. A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. Lancet Digit Health. 2019;1(6):e271–97.

    Article  Google Scholar 

  56. Jaeschke R, Guyatt GH, Sackett DL. Users’ guides to the medical literature III. How to use an article about a diagnostic test B. What are the results and will they help me in caring for my patients? The evidence-based medicine working group. JAMA. 1994;271(9):703–7.

    Article  Google Scholar 

  57. Goutte C, Gaussier E. A probabilistic interpretation of precision, recall and F-score, with implication for evaluation. in European conference on information retrieval. Berlin: Springer; 2005.

    Google Scholar 

  58. Daugaard Jorgensen M, et al. Convolutional neural network performance compared to radiologists in detecting intracranial hemorrhage from brain computed tomography: A systematic review and meta-analysis. Eur J Radiol. 2022;146: 110073.

    Article  Google Scholar 

  59. Raghu M, et al. Transfusion: Understanding transfer learning for medical imaging. Adv Neural Inf Process Syst. 2019. https://doi.org/10.4855/arXiv.1902.07208.

    Article  Google Scholar 

  60. Singh SP, et al. 3D deep learning on medical images: a review. Sensors. 2020;20(18):5097.

    Article  Google Scholar 

  61. Samek W, et al. Evaluating the visualization of what a deep neural network has learned. IEEE Trans Neural Netw Learn Syst. 2017;28(11):2660–73.

    Article  MathSciNet  Google Scholar 

  62. Barrett JF, Keat N. Artifacts in CT: recognition and avoidance. Radiographics. 2004;24(6):1679–91.

    Article  Google Scholar 

  63. Nagendran M, et al. Artificial intelligence versus clinicians: systematic review of design, reporting standards, and claims of deep learning studies. BMJ. 2020;368: m689.

    Article  Google Scholar 

  64. McInnes MDF, et al. Preferred reporting items for a systematic review and meta-analysis of diagnostic test accuracy studies: the PRISMA-DTA statement. JAMA. 2018;319(4):388–96.

    Article  Google Scholar 

  65. Cronin P, et al. How to Perform a Systematic Review and Meta-analysis of Diagnostic Imaging Studies. Acad Radiol. 2018;25(5):573–93.

    Article  Google Scholar 

  66. Manikandan R, Dorairajan LN. How to appraise a diagnostic test. Indian J Urol. 2011;27(4):513–9.

    Article  Google Scholar 

  67. Shim SR, Kim SJ, Lee J. Diagnostic test accuracy: application and practice using R software. Epidemiol Health. 2019;41: e2019007.

    Article  Google Scholar 

  68. Doebler P, Holling H. Meta-analysis of diagnostic accuracy with mada. R Packag. 2015;1:15.

    MATH  Google Scholar 

  69. Guo J, Riebler A. meta4diag: Bayesian bivariate meta-analysis of diagnostic test studies for routine practice. arXiv Prepr. 2015. https://doi.org/10.48550/arXiv.1512.06220.

    Article  Google Scholar 

Download references

Funding

Not applicable.

Author information

Authors and Affiliations

Authors

Contributions

KSH, MM, SHS, MT, and PP: designed data collection tools, monitored data collection, wrote the statistical analysis plan, cleaned and analyzed the data, and drafted and revised the paper. KSH, JM, and MM: wrote the statistical analysis plan, and cleaned and analyzed the data. KSH, MM, SH.S, MT, and PP: implemented the study, analyzed the data, drafted, and revised the paper.

Corresponding author

Correspondence to Kiarash Shirbandi.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: Figure S1.

Univariate sub-group analysis of DOR with random model based on retrospective studies. Figure S2. Univariate sub-group analysis of DOR with random model based on prospective studies. Figure S3. Univariate sub-group analysis of specificity with random model based on Network Architecture. G represents sub-group analysis of data, when g = 0 (CNN), g = 1 (ResNet), g = 2 (RF), and g = 3 (SVM). Figure S4. Univariate sub-group analysis of sensitivity with random model based on Network Architecture. G represents sub-group analysis of data, when g = 0 (CNN), g = 1 (ResNet), g = 2 (RF), and g = 3 (SVM). Figure S5. Univariate sub-group analysis of DOR with random model based on Network Architecture. G represents sub-group analysis of data, when g = 0 (CNN), g = 1 (ResNet), g = 2 (RF), and g = 3 (SVM). Figure S6. Univariate sub-group analysis of specificity with random model based on ICH types. G represents sub-group analysis of data, when g = 0 (EDH), g = 1 (SDH), g = 2 (IPH), g = 3 (IVH), g = 4 (SAH), and g = 5 (CPH). Figure S7. Univariate sub-group analysis of DOR with random model based on ICH types. G represents sub-group analysis of data, when g = 0 (EDH), g = 1 (SDH), g = 2 (IPH), g = 3 (IVH), g = 4 (SAH), and g = 5 (CPH). Figure S8. Univariate sub-group analysis of sensitivity with random model based on ICH types. G represents sub-group analysis of data, when g = 0 (EDH), g = 1 (SDH), g = 2 (IPH), g = 3 (IVH), g = 4 (SAH), and g = 5 (CPH). Figure S9. Univariate sub-group analysis of sensitivity with random model based on single or multiple center. G represents sub-group analysis of data, when g = 0 (Single), and g = 1 (Multiple). Figure S10. Univariate sub-group analysis of specificity with random model based on single or multiple center. G represents sub-group analysis of data, when g = 0 (Single), and g = 1 (Multiple). Figure S11. Univariate sub-group analysis of DOR with random model based on single or multiple center. G represents sub-group analysis of data, when g = 0 (Single), and g = 1 (Multiple). Figure S12. Univariate sub-group analysis of sensitivity with random model based on benchmark or real-time data. G represents sub-group analysis of data, when g = 0 (benchmark), and g = 1 (real-time data). Figure S13. Univariate sub-group analysis of specificity with random model based on benchmark or real-time data. G represents sub-group analysis of data, when g = 0 (benchmark), and g = 1 (real-time data). Figure S14. Univariate sub-group analysis of DOR with random model based on benchmark or real-time data. G represents sub-group analysis of data, when g = 0 (benchmark), and g = 1 (real-time data).

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Maghami, M., Sattari, S.A., Tahmasbi, M. et al. Diagnostic test accuracy of machine learning algorithms for the detection intracranial hemorrhage: a systematic review and meta-analysis study. BioMed Eng OnLine 22, 114 (2023). https://doi.org/10.1186/s12938-023-01172-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12938-023-01172-1

Keywords