Diagnostic test accuracy of machine learning algorithms for the detection intracranial hemorrhage: a systematic review and meta-analysis study

Maghami, Masoud; Sattari, Shahab Aldin; Tahmasbi, Marziyeh; Panahi, Pegah; Mozafari, Javad; Shirbandi, Kiarash

doi:10.1186/s12938-023-01172-1

Review
Open access
Published: 04 December 2023

Diagnostic test accuracy of machine learning algorithms for the detection intracranial hemorrhage: a systematic review and meta-analysis study

Masoud Maghami¹,
Shahab Aldin Sattari²,
Marziyeh Tahmasbi³,
Pegah Panahi¹,
Javad Mozafari^4,6 &
…
Kiarash Shirbandi ORCID: orcid.org/0000-0002-1055-6606⁵

BioMedical Engineering OnLine volume 22, Article number: 114 (2023) Cite this article

1320 Accesses
1 Citations
1 Altmetric
Metrics details

Abstract

Background

This systematic review and meta-analysis were conducted to objectively evaluate the evidence of machine learning (ML) in the patient diagnosis of Intracranial Hemorrhage (ICH) on computed tomography (CT) scans.

Methods

Until May 2023, systematic searches were conducted in ISI Web of Science, PubMed, Scopus, Cochrane Library, IEEE Xplore Digital Library, CINAHL, Science Direct, PROSPERO, and EMBASE for studies that evaluated the diagnostic precision of ML model-assisted ICH detection. Patients with and without ICH as the target condition who were receiving CT-Scan were eligible for the research, which used ML algorithms based on radiologists' reports as the gold reference standard. For meta-analysis, pooled sensitivities, specificities, and a summary receiver operating characteristics curve (SROC) were used.

Results

At last, after screening the title, abstract, and full paper, twenty-six retrospective and three prospective, and two retrospective/prospective studies were included. The overall (Diagnostic Test Accuracy) DTA of retrospective studies with a pooled sensitivity was 0.917 (95% CI 0.88–0.943, I² = 99%). The pooled specificity was 0.945 (95% CI 0.918–0.964, I² = 100%). The pooled diagnostic odds ratio (DOR) was 219.47 (95% CI 104.78–459.66, I² = 100%). These results were significant for the specificity of the different network architecture models (p-value = 0.0289). However, the results for sensitivity (p-value = 0.6417) and DOR (p-value = 0.2187) were not significant. The ResNet algorithm has higher pooled specificity than other algorithms with 0.935 (95% CI 0.854–0.973, I² = 93%).

Conclusion

This meta-analysis on DTA of ML algorithms for detecting ICH by assessing non-contrast CT-Scans shows the ML has an acceptable performance in diagnosing ICH. Using ResNet in ICH detection remains promising prediction was improved via training in an Architecture Learning Network (ALN).

Background

A potentially fatal disorder known as intracranial hemorrhage (ICH) occurs in 25 per 100,000 yearly, which is related to 2 million strokes globally and has an estimated incidence [1]. There is a variety of fundamental (80–85%) and secondary (15–20%) underlying causes of ICH [2]. The most frequent non-traumatic secondary causes include brain tumors, ischemic strokes, and vascular malformations. Hospital admissions for ICH have grown during the past ten years, primarily because of the elderly population, insufficient blood pressure (BP), and increased use of blood thinners management [3, 4]. In such a way that, rational decrease of BP is an important factor to manage these patients, specifictly for lower than 15 mL ICH volume [5, 6]. The revascularization in the acute phase of strokes can improve the symptoms and better prognosis of these patients [7]. The tissue plasminogen activator (tPA) is the main treatment for ischemic stroke. Moreover, the clot in the blood vessel can be removed by thrombectomy technique that catheter intervent upper of femur; then, using angioplasty blocked artery can be opened up [8, 9].

Neuroimaging is, therefore, essential for the diagnosis of acute ICH because perchance challenging to differentiate it from other diseases, such as ischemic stroke [10]. The successful procedure of a non-contrast computed tomography (CT) for the cerebrum, an accessible and quick technique for diagnosing ICH, are crucial component of the ICH diagnostic process. Fundamental ICH features such as location, edema, ventricular system expansion, and midline shift are morphologically revealed by a CT-Scan [11]. However, more significant CT-Scan usage could delay the identification of ICH, and a growing burden in radiology departments could lead to job-related stress and burnout. In contrast, it has been discovered that artificial intelligence (AI) can improve radiology practice by lowering the amount of effort required [12,13,14].

Today, the efficiency of machine learning (ML) algorithms, especially improving deep learning (DL) algorithms for computer vision, has advanced significantly. The CT-Scan, one of the most well-known imaging modalities, and has seen considerable breakthroughs in ML and its application [15, 16]. Support vector machine (SVM), Convolutional neural network (CNN), random forest (RF), and conditional random field (CRF) are the most prominent ML algorithms for recognizing brain bleeding from visual data. Even though a great deal of work has already been accomplished in this field, there is still room for growth. Additional research is required to improve the accuracy, precision, and resilience of ML-based brain segmentation [17, 18]. A meta-analysis reported DTA of AI for the detection of ICH; however, this study did not report subgroups for distinguishing between Algorithms and also types of ICH [19]. Therefore, this systematic review and meta-analysis were conducted to objectively evaluate the evidence of ML in the patient diagnosis of ICH on CT scans.

Results

Study selection & characteristics

Following the primary search, 1,405 studies were recognized after removing duplicated studies. At last, after screening the title, abstract, and full paper, twenty-six retrospective and three prospective, and two retrospective/prospective studies were included [20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50]; then, twenty-nine studies were included in the final quantitative analysis, and the other studies were excluded because no diagnostic accuracy was reported (Fig. 1) [20,21,22,23,24,25,26, 28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46, 48,49,50,51,52]. The machine learning networks were classified into, Support vector machine (SVM), Random Forest (RF), k-nearest neighbors’ algorithm (k-NN), VGG-16, Logistic Regression (LR), ResNet-18, AlexNet, DenseNet-121, eXtreme Gradient Boosting (XGBoost), Decision Tree (DT), and Deep Learning (DL) included Convolutional Neural Network (CNN); ResNet34, ResNet50, ResNet18, ResNet-v2, GoogleNet (Table 1).

Table 1 Summary of findings for all studies included in the qualitative synthesis

Full size table

Risk of bias

The validity and the possibility of bias for the included studies were evaluated with the QUADAS-2 (Fig. 2). One high-risk bias was reported in all the included studies [20]. When the publication bias is very low, the points will be symmetrically distributed around the true effect of an inverted funnel, as shown in Fig. 3.

Diagnostic test accuracy (DTA) of all included studies

Retrospective studies

The overall DTA of the 26 retrospective studies and 904,755 scans was estimated using a univariate meta-analysis with a pooled sensitivity was 0.917 (95% CI 0.88 to 0.943, I² = 99%) (Fig. 4) [20,21,22,23,24,25,26, 28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43, 45, 46, 48,49,50]. The pooled specificity was 0.945 (95% CI 0.918 to 0.964, I² = 100%) (Fig. 5). The pooled diagnostic odds ratios (DOR) was 219.47 (95% CI 104.78 to 459.66, I² = 100%) (Additional file 1: Figure S1). The LR⁺ ranges from 12.639 to 20.784 with pooled mean of 16.208 (Table 2), and LR⁻ ranges from 0.072 to 0.123 with pooled mean of 0.094. The AUC of 0.971 was reported for the SROC via the bivariate model (Fig. 6). The overall accuracy was 90.3 (ranges from 87.24 to 93.01), the precision was 76.24 (ranges from 66.71 to 86.32), and the F1-score was 79.14 (ranges from 70.9 to 86.48) (Table 2).

Table 2 DTA estimated from all the studies included in the meta-analysis using (2 \(\times\) 2) confusion table

Full size table

Prospective studies

The overall DTA of the five prospective studies and 104,397 scans was estimated using a univariate meta-analysis with a pooled sensitivity was 0.886 (95% CI 0.613–0.975, I² = 100%) (Fig. 7) [24, 29, 33, 40, 44]. The pooled specificity was 0.967 (95% CI 0.937–0.983, I² = 100%) (Fig. 8). The pooled DOR was 227.71 (95% CI 27.82–1863.51, I² = 100%) (Additional file 1: Figure S2). The LR⁺ ranges from 6.054 to 87.029 with pooled mean of 22.953 (Table 2), and LR⁻ ranges from 0.005 to 1.932 with pooled mean of 0.101. The AUC of 0.98 was reported for the SROC via the bivariate model (Fig. 9).

The overall accuracy was 93.69 (ranges from 90.31 to 97.2), the precision was 75.58 (ranges from 55.23 to 91.18), and the F1-score was 77.26 (ranges from 56.23 to 91.32) (Table 2).

DTA Based on network architecture

The Network Architecture analysis was divided into ResNet, RF, and SVM [20,21,22,23,24,25,26, 28, 30,31,32,33,34,35,36,37,38,39, 41,42,43, 45, 46, 48,49,50]. These results were significant for the specificity of the different network architecture models (p-value = 0.0289). However, the results for sensitivity (p-value = 0.6417) and DOR (p-value = 0.2187) were not significant (Additional file 1: Figures S3–S5).

DTA based on ICH types

The ICH types of analysis were divided into EDH, SDH, IPH, IVH, SAH, and CPH [21, 25, 33, 36, 38, 46, 49, 50]. These results were significant for the results for specificity (p-value < 0.0001) and DOR (p-value = 0.0009). However, the sensitivity of different ICH types (p-value = 0.4564) was insignificant (Additional file 1: Figures S6–S8).

DTA based on data sources

The data sources analysis was divided into single [20, 22, 24, 26, 27, 30, 32,33,34, 36,37,38,39, 41, 42, 47, 48, 50] or multiple [21, 23,24,25, 27, 28, 31, 34, 35, 43, 45,46,47,48,49]. These results were not significant for the sensitivity (p-value = 0.6879), specificity (p-value = 0.6494), and DOR (p-value = 0.7272) (Additional file 1: Figures S9–S11).

The data sources analysis was divided into benchmark [26, 28, 31, 32, 36, 38, 42, 46] or real-time data [20,21,22,23,24,25, 27, 30, 33,34,35, 37, 39, 41, 43, 45, 47,48,49,50]. These results were not significant for the sensitivity (p-value = 0.1017), specificity (p-value = 0.5189), and DOR (p-value = 0.1285) (Additional file 1: Figures S12–S14).

Discussion

Detection of ICH by ML in systematic studies may decrease the time to diagnosis, which is crucial for clinical because approximately most of ICH in accordance with death occurs within the primary hours [53]. This meta-analysis demonstrated that ResNet algorithms could detect ICHs accurately with retrospective and non-randomized data [22, 31, 33, 37, 38, 50].

In this current study, ML has been used in ICH non-contrast CT-Scans with different architecture models. The resulting pooled sensitivity, specificity, DOR, AUC, accuracy, and precision were 0.917 (95% CI 0.88 to 0.943, I² = 99%), 0.945 (95% CI 0.918 to 0.964, I² = 100%), 219.47 (95% CI 104.78 to 459.66, I² = 100%), 0.971, 90.3 (ranges from 87.24 to 93.01), and 76.24 (ranges from 66.71 to 86.32), respectively.

Practical ML is characterized by high accuracy measures such as AUC, sensitivity, and specificity, which can accurately categorize illness suspects and non-suspects. This meta-analysis revealed a combined AUC of 0.971. On the other hand, the high AUC of the included trials could not correctly represent the performance of the algorithm's therapeutic benefit [54]. Initially, the range of AUC among studies was 0.608 to 1 that Neural Networks (NNs) learning such as CNN, ResNet, and RNN had a higher rate from other ML algorithms [20, 21, 23, 24, 26,27,28,29, 31, 33, 37,38,39, 43, 44, 46, 49]. In other words, this result suggested that NNs algorithms in the big data can improve the rate of AUC which it is a useful way to detect a good model and positive and negative target classes.

DL models were shown to have a pooled sensitivity of 87.00% (95% confidence interval: 83.00–90.20%) and specificity of 92.50% (95% confidence interval: 85.10–96.40%) when compared to the gold standard by Liu et al. (2019), who pooled 14 out-of-sample external validation experiments [55].

To interpret the results, a DOR of 219.47 (95% CI 104.78–459.66, I² = 100%) generally means using ML in diagnosing ICH is valuable. Due to the necessity of reporting the convergence of the results along with the accuracy, precision is also mentioned. Precision equal to 76.24 (ranges from 66.71 to 86.32) indicates a relative convergence besides the accuracy of 90.3 (ranges from 87.24 to 93.01). These results show that ML can be diagnosed with ICH in healthy patients. Also, likelihood ratios are important factors that could help improve clinical judgment and show the range of disease frequencies, and LR⁺ greater than 10 produces a greater pretest probability. The LR⁻ less than 0.1 has conclusive changes in the post-test possibility [56]. The pooled positive LR⁺ and LR⁻ range from 12.639 to 20.784 with a mean of 16.208 and 0.072 to 0.123 with a pooled mean of 0.094, respectively. The pooled LR⁺ of 16.208 means that diagnosis of ICH is 16.208 times more likely to be diagnosed while ML is used; likewise, the pooled LR⁻ of 0.094 means ICH has a higher likelihood of negative test for the ML algorithm than healthy patients. The pooled F1 score of this study was 79.14 (ranging from 70.9 to 86.48). The F1 score is a numerical score between 0 and 100; the closer this number is to 100, the more valuable the method studied [57]. This score results from the average weight of recall and precision, which has a significant place in data interpretation. It can be reduced the number of false negatives and positives.

The sub-group analysis based on the ML architecture and algorithms was done to assess these factors' influence on the DTA results. The network architecture analysis results showed significance for the specificity of the different network architecture models (p-value = 0.0289). However, the results for sensitivity (p-value = 0.6417) and DOR (p-value = 0.2187) were not significant. Thus, the ResNet algorithm has higher pooled specificity than other algorithms 0.935 (95% CI 0.854 to 0.973, I² = 93%). Between studies, CNN architectures included specialized neural networks and ensemble learning [58]. However, this study focuses on CNNs for detecting ICHs in general, and it may not be acceptable to extend the results to other AI projects [25]. To increase the number of entirely connected layers from one to five, Lee et al. 2019 combined a final CNN made up of VGG16, ResNet50, Inception-v3, and Inception ResNet-v2 utilizing ResNet18 with only minor alterations [33]. It has been demonstrated that standard ImageNet architectures such as ResNet18 do not significantly outperform smaller and simpler CNNs [59]. However, by averaging many transfer models, the performance of an ensemble of transfer models may be enhanced. Chang et al. (2018) used a hybrid 3D/2D CNN pyramid with a proprietary mask R-CNN architecture as its backbone to detect and segment ICHs [60]. Medical imaging can use finely tuned 3D networks, which have shown exceptional performance in a variety of applications; however, 3D networks need a large dataset and several training parameters, with the image depth volume varying from 20 to 400 slices per scan, which is more demanding in terms of computation efficiency [25].

Besides, the sub-group analysis based on the ICH types was significant for specificity (p-value < 0.0001) and DOR (p-value = 0.0009). However, the sensitivity of different ICH types (p-value = 0.4564) was insignificant. Thus, EDH has higher pooled specificity and DOR than other ICH types 0.99 (95% CI 0.947–0.998, I² = 100%) and 616.79 (95% CI 91.76–4145.99, I² = 97%). However, there were no significant differences between data sources (single versus multiple or benchmark versus real-time).

Misdetection of ICHs, which are difficult to distinguish from bone or undiscovered microbleeds in trauma imaging, is another therapeutically significant and relevant issue [61]. Using image processing techniques, the skull and face were removed from NCTCs in Kuo et al. 2019 research. They achieved 100% sensitivity in an external test set of 200 NCTCs, which was likely made possible by the simplicity of detecting bleeding when only intracranial structures were considered [62]. Patients excluded or removed because of picture artifacts might improve the algorithm. NCTCs are familiar with patient-related imaging artifacts in CT, such as metallic materials, human movements, and incomplete projections. In addition, the diversity of CT scanners and image reconstruction methods makes direct comparisons between research challenging [33].

Limitations

Developing a clinical environment where an ML supports the radiologist could improve diagnostic efficacy and should be assessed from a socioeconomic and patient standpoint [63]. The deployment of MLs in clinical operations necessitates a sophisticated configuration coupled with medical imaging systems. Just one of the included articles assessed midline shift [25]. Therefore, this outcome couldn’t analyze. This would be important clinically, as its value > 5 mm may be an indication for urgent neurosurgical review.

Additionally, the findings of the I-squared analysis make it clear that combining the data from these studies may not be appropriate, underscoring the dearth of external validation research. Due to factors like scanning methodology, scanner types, algorithm designs, and reference standards, it is not easy to compare different research, which reduces the generalizability and validity of the findings. The judgment of articles may have been tainted by subjective bias since writers' degrees of experience varied. The creation of additional prospective studies in this area may significantly advance future research since, in addition to the different causes of variability, the use of retrospective studies was the study's most noticeable limitation.

Conclusion

This meta-analysis on DTA of ML algorithms for detecting ICH by assessing non-contrast CT-Scans shows the ML has an acceptable performance in diagnosing ICH. Using ResNet in ICH detection remains promising prediction was improved via training in an Architecture Learning Network (ALN). However, further studies with greater homogeneity are needed to draw more accurate conclusions about the results of DTA of ML in ICH.

Methods

Protocol and registration

This meta-analysis study was reported according to Preferred Reporting Items for Systematic Reviews-Diagnostic Test Accuracy (PRISMA-DTA) guideline [64].

Eligibility criteria

Original studies were eligible if they met all the following predefined inclusion criteria: a) patients undergoing non-contrast brain computed tomography (CT) scan for the detection of acute or chronic Intracranial hemorrhage (ICH), such as intraparenchymal hemorrhage (IPH), subdural hemorrhage (SDH), epidural hemorrhage (EDH), intraventricular hemorrhage (IVH), and subarachnoid hemorrhage (SAH), or b) using a gold standard (Radiologists) to report the ICH.

Information sources

Until May 2023, systematic searches were conducted in ISI Web of Science, PubMed, Scopus, Cochrane Library, IEEE Xplore Digital Library, CINAHL, Science Direct, PROSPERO, and EMBASE for studies that evaluated the diagnostic precision of ML model-assisted ICH detection.

Search strategy

One knowledgeable librarian [KSH] established and refined search tactics through team discussion. “Deep Learning,” “Machine Learning,” “Artificial Intelligence,” “Intracranial Hemorrhages,” “intraparenchymal hemorrhage,” “epidural hemorrhage,” “subdural hemorrhage,” “subarachnoid hemorrhage,” “intraventricular hemorrhage,” “Diagnosis,” “Meta-Analysis,” and “Computerized Tomography” were among the kwywords. Moreover, conferences, editorials, commentaries, reviews, guidelines, book chapters, technical articles, and papers with inadequate citation standards that did not match the conceptual framework of the study were rejected.

Summary measures

ICHs versus HCs that were true positive (TP, true ICH, predicted to be ICH), true negative (TN, non-ICH predicted to be non-ICH), false positive (FP, non-ICH predicted to be ICH), or false negative (FN, ICH, predicted to be non-ICH) were extracted for meta-analysis purposes. The original study's inclusion criteria were utilized to obtain data for the meta-analysis on detecting ICH. In addition, the publication year, the nation where the research was conducted, the study methodology, the number of patients, and their ages were recovered. The primary outcomes were diagnostic accuracy = ((TP + TN)/(TP + FN + FP + TN)), specificity = TN/(FP + TN), sensitivity = TP/(TP + FN), precision = (TP/TP + FP), F1- Score = 2 × (Precision × Recall/Precision + Recall), negative likelihood ratio (LR⁻) = (1-sensitivity/specificity), positive likelihood ratio (LR⁺) = (sensitivity/1- specificity), DOR = (LR⁺/LR), and the AUC of ML on detecting ICH in the patients, ICH versus healthy controls (HCs) [65, 66]. Comparing the accuracy, sensitivity, and specificity of ML and CT-Scan were the subgroup analysis.

Risk of bias across studies

Two independent reviewers utilized the updated Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) instrument to evaluate all studies' quality and potential bias. Communication resolved conflicts, and a third reviewer and reviewers independently assessed the first included papers. Two categories were considered: bias susceptibility and patient selection, index test, and comparative benchmark application. In the flow and pace areas, bias was evaluated.

Additional analyses

Using the Random Effects Model (RE) technique, a univariate meta-analysis was conducted for each modality's sensitivity and specificity to determine its diagnostic accuracy [67]. The RE model was chosen because of the suspected high proportion of heterogeneity. The primary endpoints were sensitivity, specificity, a summary of receiver operating characteristics (SROC) curve, and diagnostic odds ratio (DOR). Point estimates and 95% confidence intervals (CIs) for each study were calculated to ensure consistency of sensitivity and specificity. A bivariate meta-analysis of sensitivity and specificity used R version 4.1.2 (R Foundation for Statistics Computing, Vienna, Austria, 2021) and RStudio version 1.4.1717 to obtain the SROC curve. This includes the "mada" and "meta" R packages implemented. Then the average AUC of SROC was estimated [68, 69]. The secondary outcomes comprised the positive and negative likelihood ratios, precision, and F1 score. Cochran's Q test and I² statistics were utilized to evaluate statistical heterogeneity between studies. 0–40% indicates insignificant non-uniformity, 30%–60% indicates moderate non-uniformity, and 75–100% indicates considerable non-uniformity for Q statistics. A funnel chart was used to examine and depict publication bias (32). All p-values are derived from two-sided tests, and p-values of 0.05 are statistically significant. Screening based on machine learning algorithms, ICH types, retrospective or prospective study design, and acute or chronic ICHs was used to perform subgroup analysis. Using the Cochrane Review Manager version 5.4 (RevMan 5.4) program, bias cross-study risk and applicability concern charts were assessed.

Availability of data and materials

Not applicable.

References

An SJ, Kim TJ, Yoon BW. Epidemiology, risk factors, and clinical features of intracerebral hemorrhage: an update. J Stroke. 2017;19(1):3–10.
Article Google Scholar
Rindler RS, et al. Neuroimaging of intracerebral hemorrhage. Neurosurgery. 2020;86(5):E414–23.
Article Google Scholar
Hong JM, Kim DS, Kim M. Hemorrhagic transformation after ischemic stroke: mechanisms and management. Front Neurol. 2021;12: 703258.
Article Google Scholar
Ginat DT. Analysis of head CT scans flagged by deep learning software for acute intracranial hemorrhage. Neuroradiology. 2020;62(3):335–40.
Article MathSciNet Google Scholar
Shi L, et al. Blood pressure management for acute intracerebral hemorrhage: a meta-analysis. Sci Rep. 2017;7(1):14345.
Article MathSciNet Google Scholar
Rabinstein AA. Optimal Blood Pressure After Intracerebral Hemorrhage: Still a Moving Target. Stroke. 2018;49(2):275–6.
Article Google Scholar
Rha JH, Saver JL. The impact of recanalization on ischemic stroke outcome: a meta-analysis. Stroke. 2007;38(3):967–73.
Article Google Scholar
Leng T, Xiong ZG. Treatment for ischemic stroke: From thrombolysis to thrombectomy and remaining challenges. Brain Circ. 2019;5(1):8–11.
Article Google Scholar
Hughes RE, Tadi P, Bollu PC. TPA Therapy. In: StatPearls. StatPearls Publishing Copyright ©: StatPearls Publishing LLC.: Treasure Island (FL); 2023.
Google Scholar
Sporns PB, et al. Neuroimaging of acute intracerebral hemorrhage. J Clin Med. 2021;10(5):1086.
Article Google Scholar
Vidhya V, et al. Automated detection and screening of traumatic brain injury (TBI) using computed tomography images: a comprehensive review and future perspectives. Int J Environ Res Public Health. 2021;18(12):6499.
Article Google Scholar
Rao B, et al. Utility of artificial intelligence tool as a prospective radiology peer reviewer—detection of unreported intracranial hemorrhage. Acad Radiol. 2021;28(1):85–93.
Article Google Scholar
Hosny A, et al. Artificial intelligence in radiology. Nat Rev Cancer. 2018;18(8):500–10.
Article Google Scholar
Derevianko A, et al. The use of artificial intelligence (AI) in the radiology field: what is the state of doctor0patient communication in cancer diagnosis? Cancers. 2023;15(2):470.
Article MathSciNet Google Scholar
Rana M, Bhushan M. Machine learning and deep learning approach for medical image analysis: diagnosis to detection. Multimed Tools Appl. 2022;24:1–39.
Google Scholar
Sarker IH. Machine learning: algorithms, real-world applications and research directions. SN Comput Sci. 2021;2(3):160.
Article Google Scholar
Lee JY, et al. Detection and classification of intracranial haemorrhage on CT images using a novel deep-learning algorithm. Sci Rep. 2020;10(1):20546.
Article Google Scholar
Kundisch A, et al. Deep learning algorithm in detecting intracranial hemorrhages on emergency computed tomographies. PLoS ONE. 2021;16(11): e0260560.
Article Google Scholar
Matsoukas S, et al. Accuracy of artificial intelligence for the detection of intracranial hemorrhage and chronic cerebral microbleeds: a systematic review and pooled analysis. Radiol Med. 2022;127(10):1106–23.
Article Google Scholar
Abe D, et al. A prehospital triage system to detect traumatic intracranial hemorrhage using machine learning algorithms. JAMA Netw Open. 2022;5(6): e2216393.
Article Google Scholar
Alis D, et al. A joint convolutional-recurrent neural network with an attention mechanism for detecting intracranial hemorrhage on noncontrast head CT. Sci Rep. 2022;12(1):2084.
Article Google Scholar
Altuve M, Pérez A. Intracerebral hemorrhage detection on computed tomography images using a residual neural network. Phys Med. 2022;99:113–9.
Article Google Scholar
Arbabshirani MR, et al. Advanced machine learning in action: identification of intracranial hemorrhage on computed tomography scans of the head with clinical workflow integration. NPJ Digit Med. 2018;1:9.
Article Google Scholar
Chang PD, et al. Hybrid 3D/2D convolutional neural network for hemorrhage evaluation on head CT. AJNR Am J Neuroradiol. 2018;39(9):1609–16.
Article Google Scholar
Chilamkurthy S, et al. Deep learning algorithms for detection of critical findings in head CT scans: a retrospective study. Lancet. 2018;392(10162):2388–96.
Article Google Scholar
Cortes-Ferre L, et al. Deep Learning Applied to Intracranial Hemorrhage Detection. J Imaging. 2022;9(2):37. https://doi.org/10.3390/jimaging9020037.
Article Google Scholar
Danilov G, et al. Classification of Intracranial Hemorrhage Subtypes Using Deep Learning on CT Scans. Stud Health Technol Inform. 2020;272:370–3.
Google Scholar
Grewal M, et al. Radnet: Radiologist level accuracy using deep learning for hemorrhage detection in ct scans in 2018 IEEE 15th International symposium on biomedical imaging (ISBI 2018). IEEE. 2018. https://doi.org/10.48550/arXiv.1710.04934.
Article Google Scholar
Hopkins BS, et al. Mass deployment of deep neural network: real-time proof of concept with screening of intracranial hemorrhage using an open data set. Neurosurgery. 2022;90(4):383–9.
Article Google Scholar
Kau T, et al. FDA-approved deep learning software application versus radiologists with different levels of expertise: detection of intracranial hemorrhage in a retrospective single-center study. Neuroradiology. 2022;64(5):981–90.
Article Google Scholar
Kumaravel P, et al. A simplified framework for the detection of intracranial hemorrhage in CT brain images using deep learning. Curr Med Imaging. 2021;17(10):1226–36.
Article Google Scholar
Kuo W, et al. Expert-level detection of acute intracranial hemorrhage on head computed tomography using deep learning. Proc Natl Acad Sci U S A. 2019;116(45):22737–45.
Article Google Scholar
Lee H, et al. An explainable deep-learning algorithm for the detection of acute intracranial haemorrhage from small datasets. Nat Biomed Eng. 2019;3(3):173–82.
Article Google Scholar
Majumdar A, et al. Detecting Intracranial Hemorrhage with Deep Learning. Annu Int Conf IEEE Eng Med Biol Soc. 2018;2018:583–7.
Google Scholar
McLouth J, et al. Validation of a deep learning tool in the detection of intracranial hemorrhage and large vessel occlusion. Front Neurol. 2021;12:656112–656112.
Article Google Scholar
Phaphuangwittayakul A, et al. An optimal deep learning framework for multi-type hemorrhagic lesions detection and quantification in head CT images for traumatic brain injury. Appl Intell. 2022;52(7):7320–38.
Article Google Scholar
Rao BN, et al. Deep transfer learning for automatic prediction of hemorrhagic stroke on CT images. Comput Math Methods Med. 2022;2022:3560507.
Article Google Scholar
Salehinejad H, et al. A real-world demonstration of machine learning generalizability in the detection of intracranial hemorrhage on head computerized tomography. Sci Rep. 2021;11(1):17051–17051.
Article Google Scholar
Schmitt N, et al. Automated detection and segmentation of intracranial hemorrhage suspect hyperdensities in non-contrast-enhanced CT scans of acute stroke patients. Eur Radiol. 2022;32(4):2246–54.
Article Google Scholar
Seyam M, et al. Utilization of artificial intelligence-based intracranial hemorrhage detection on emergent noncontrast CT images in clinical workflow. Radiol Artif Intell. 2022;4(2): e210168.
Article Google Scholar
Tang Z, et al. Deep learning-based prediction of hematoma expansion using a single brain computed tomographic slice in patients with spontaneous intracerebral hemorrhages. World Neurosurg. 2022. https://doi.org/10.1016/j.wneu.2022.05.109.
Article Google Scholar
Tharek A, et al. Intracranial hemorrhage detection in CT scan using deep learning. Asian J Med Technol. 2022;2(1):1–18.
Article Google Scholar
Trevisi G, et al. Machine learning model prediction of 6-month functional outcome in elderly patients with intracerebral hemorrhage. Neurosurg Rev. 2022. https://doi.org/10.1007/s10143-022-01802-7.
Article Google Scholar
Uchida K, et al. Development of machine learning models to predict probabilities and types of stroke at prehospital stage: the Japan urgent stroke triage score using machine learning (JUST-ML). Transl Stroke Res. 2022;13(3):370–81.
Article Google Scholar
Voter AF, et al. Diagnostic accuracy and failure mode analysis of a deep learning algorithm for the detection of intracranial hemorrhage. J Am Coll Radiol. 2021;18(8):1143–52.
Article Google Scholar
Wang X, et al. A deep learning algorithm for automatic detection and classification of acute intracranial hemorrhages in head CT scans. NeuroImage Clinical. 2021;32:102785–102785.
Article Google Scholar
Xu J, et al. Deep network for the automatic segmentation and quantification of intracranial hemorrhage on CT. Front Neurosci. 2021;14:541817–541817.
Article Google Scholar
Xu X, et al. Prognostic prediction of hypertensive intracerebral hemorrhage using CT radiomics and machine learning. Brain and behavior. 2021;11(5):e02085–e02085.
Article Google Scholar
Ye H, et al. Precise diagnosis of intracranial hemorrhage and subtypes using a three-dimensional joint convolutional and recurrent neural network. Eur Radiol. 2019;29(11):6191–201.
Article MathSciNet Google Scholar
Zhou Q, et al. Transfer learning of the ResNet-18 and DenseNet-121 model used to diagnose intracranial hemorrhage in CT scanning. Curr Pharm Des. 2022;28(4):287–95.
Article Google Scholar
Neves G, et al. External validation of an artificial intelligence device for intracranial hemorrhage detection. World Neurosurg. 2023;173:e800–7.
Article Google Scholar
Abrigo JM, et al. Artificial intelligence for detection of intracranial haemorrhage on head computed tomography scans: diagnostic accuracy in Hong Kong. Hong Kong Med J. 2023;29(2):112–20.
Article Google Scholar
O’Neill TJ, et al. Active reprioritization of the reading worklist using artificial intelligence has a beneficial effect on the turnaround time for interpretation of head CT with intracranial hemorrhage. Radiol Artif Intell. 2021;3(2): e200024.
Article Google Scholar
Fleming TR, DeMets DL. Surrogate end points in clinical trials: are we being misled? Ann Intern Med. 1996;125(7):605–13.
Article Google Scholar
Liu X, et al. A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. Lancet Digit Health. 2019;1(6):e271–97.
Article Google Scholar
Jaeschke R, Guyatt GH, Sackett DL. Users’ guides to the medical literature III. How to use an article about a diagnostic test B. What are the results and will they help me in caring for my patients? The evidence-based medicine working group. JAMA. 1994;271(9):703–7.
Article Google Scholar
Goutte C, Gaussier E. A probabilistic interpretation of precision, recall and F-score, with implication for evaluation. in European conference on information retrieval. Berlin: Springer; 2005.
Google Scholar
Daugaard Jorgensen M, et al. Convolutional neural network performance compared to radiologists in detecting intracranial hemorrhage from brain computed tomography: A systematic review and meta-analysis. Eur J Radiol. 2022;146: 110073.
Article Google Scholar
Raghu M, et al. Transfusion: Understanding transfer learning for medical imaging. Adv Neural Inf Process Syst. 2019. https://doi.org/10.4855/arXiv.1902.07208.
Article Google Scholar
Singh SP, et al. 3D deep learning on medical images: a review. Sensors. 2020;20(18):5097.
Article Google Scholar
Samek W, et al. Evaluating the visualization of what a deep neural network has learned. IEEE Trans Neural Netw Learn Syst. 2017;28(11):2660–73.
Article MathSciNet Google Scholar
Barrett JF, Keat N. Artifacts in CT: recognition and avoidance. Radiographics. 2004;24(6):1679–91.
Article Google Scholar
Nagendran M, et al. Artificial intelligence versus clinicians: systematic review of design, reporting standards, and claims of deep learning studies. BMJ. 2020;368: m689.
Article Google Scholar
McInnes MDF, et al. Preferred reporting items for a systematic review and meta-analysis of diagnostic test accuracy studies: the PRISMA-DTA statement. JAMA. 2018;319(4):388–96.
Article Google Scholar
Cronin P, et al. How to Perform a Systematic Review and Meta-analysis of Diagnostic Imaging Studies. Acad Radiol. 2018;25(5):573–93.
Article Google Scholar
Manikandan R, Dorairajan LN. How to appraise a diagnostic test. Indian J Urol. 2011;27(4):513–9.
Article Google Scholar
Shim SR, Kim SJ, Lee J. Diagnostic test accuracy: application and practice using R software. Epidemiol Health. 2019;41: e2019007.
Article Google Scholar
Doebler P, Holling H. Meta-analysis of diagnostic accuracy with mada. R Packag. 2015;1:15.
MATH Google Scholar
Guo J, Riebler A. meta4diag: Bayesian bivariate meta-analysis of diagnostic test studies for routine practice. arXiv Prepr. 2015. https://doi.org/10.48550/arXiv.1512.06220.
Article Google Scholar

Download references

Funding

Not applicable.

Author information

Authors and Affiliations

Medical Doctor (MD), School of Medicine, Ahvaz Jundishapur University of Medical Sciences, Ahvaz, Iran
Masoud Maghami & Pegah Panahi
Department of Neurosurgery, Johns Hopkins University School of Medicine, Baltimore, MD, USA
Shahab Aldin Sattari
Department of Medical Imaging and Radiation Sciences, School of Allied Medical Sciences, Ahvaz Jundishapur University of Medical Sciences, Ahvaz, Iran
Marziyeh Tahmasbi
Department of Emergency Medicine, School of Medicine, Ahvaz Jundishapur University of Medical Sciences, Ahvaz, Iran
Javad Mozafari
Independent Medical Imaging Researcher, Tehran, Iran
Kiarash Shirbandi
Department of Radiology, Resident (MD), EUREGIO-KLINIK Albert-Schweitzer-Straße GmbH, Nordhorn, Germany
Javad Mozafari

Authors

Masoud Maghami
View author publications
You can also search for this author in PubMed Google Scholar
Shahab Aldin Sattari
View author publications
You can also search for this author in PubMed Google Scholar
Marziyeh Tahmasbi
View author publications
You can also search for this author in PubMed Google Scholar
Pegah Panahi
View author publications
You can also search for this author in PubMed Google Scholar
Javad Mozafari
View author publications
You can also search for this author in PubMed Google Scholar
Kiarash Shirbandi
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

KSH, MM, SHS, MT, and PP: designed data collection tools, monitored data collection, wrote the statistical analysis plan, cleaned and analyzed the data, and drafted and revised the paper. KSH, JM, and MM: wrote the statistical analysis plan, and cleaned and analyzed the data. KSH, MM, SH.S, MT, and PP: implemented the study, analyzed the data, drafted, and revised the paper.

Corresponding author

Correspondence to Kiarash Shirbandi.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: Figure S1.

Univariate sub-group analysis of DOR with random model based on retrospective studies. Figure S2. Univariate sub-group analysis of DOR with random model based on prospective studies. Figure S3. Univariate sub-group analysis of specificity with random model based on Network Architecture. G represents sub-group analysis of data, when g = 0 (CNN), g = 1 (ResNet), g = 2 (RF), and g = 3 (SVM). Figure S4. Univariate sub-group analysis of sensitivity with random model based on Network Architecture. G represents sub-group analysis of data, when g = 0 (CNN), g = 1 (ResNet), g = 2 (RF), and g = 3 (SVM). Figure S5. Univariate sub-group analysis of DOR with random model based on Network Architecture. G represents sub-group analysis of data, when g = 0 (CNN), g = 1 (ResNet), g = 2 (RF), and g = 3 (SVM). Figure S6. Univariate sub-group analysis of specificity with random model based on ICH types. G represents sub-group analysis of data, when g = 0 (EDH), g = 1 (SDH), g = 2 (IPH), g = 3 (IVH), g = 4 (SAH), and g = 5 (CPH). Figure S7. Univariate sub-group analysis of DOR with random model based on ICH types. G represents sub-group analysis of data, when g = 0 (EDH), g = 1 (SDH), g = 2 (IPH), g = 3 (IVH), g = 4 (SAH), and g = 5 (CPH). Figure S8. Univariate sub-group analysis of sensitivity with random model based on ICH types. G represents sub-group analysis of data, when g = 0 (EDH), g = 1 (SDH), g = 2 (IPH), g = 3 (IVH), g = 4 (SAH), and g = 5 (CPH). Figure S9. Univariate sub-group analysis of sensitivity with random model based on single or multiple center. G represents sub-group analysis of data, when g = 0 (Single), and g = 1 (Multiple). Figure S10. Univariate sub-group analysis of specificity with random model based on single or multiple center. G represents sub-group analysis of data, when g = 0 (Single), and g = 1 (Multiple). Figure S11. Univariate sub-group analysis of DOR with random model based on single or multiple center. G represents sub-group analysis of data, when g = 0 (Single), and g = 1 (Multiple). Figure S12. Univariate sub-group analysis of sensitivity with random model based on benchmark or real-time data. G represents sub-group analysis of data, when g = 0 (benchmark), and g = 1 (real-time data). Figure S13. Univariate sub-group analysis of specificity with random model based on benchmark or real-time data. G represents sub-group analysis of data, when g = 0 (benchmark), and g = 1 (real-time data). Figure S14. Univariate sub-group analysis of DOR with random model based on benchmark or real-time data. G represents sub-group analysis of data, when g = 0 (benchmark), and g = 1 (real-time data).

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article

Maghami, M., Sattari, S.A., Tahmasbi, M. et al. Diagnostic test accuracy of machine learning algorithms for the detection intracranial hemorrhage: a systematic review and meta-analysis study. BioMed Eng OnLine 22, 114 (2023). https://doi.org/10.1186/s12938-023-01172-1

Download citation

Received: 20 July 2023
Accepted: 17 November 2023
Published: 04 December 2023
DOI: https://doi.org/10.1186/s12938-023-01172-1

Diagnostic test accuracy of machine learning algorithms for the detection intracranial hemorrhage: a systematic review and meta-analysis study

Abstract

Background

Methods

Results

Conclusion

Background

Results

Study selection & characteristics

Risk of bias

Diagnostic test accuracy (DTA) of all included studies

Retrospective studies

Prospective studies

DTA Based on network architecture

DTA based on ICH types

DTA based on data sources

Discussion

Limitations

Conclusion

Methods

Protocol and registration

Eligibility criteria

Information sources

Search strategy

Summary measures

Risk of bias across studies

Additional analyses

Availability of data and materials

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher's Note

Supplementary Information

Additional file 1: Figure S1.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BioMedical Engineering OnLine

Contact us