Assessment of inter-examiner agreement and variability in the manual classification of auditory brainstem response
- Kheline FP Naves†1Email author,
- Adriano A Pereira†1,
- Slawomir J Nasuto†2,
- Ieda PC Russo†3 and
- Adriano O Andrade†1
© Naves et al.; licensee BioMed Central Ltd. 2012
Received: 11 June 2011
Accepted: 10 October 2012
Published: 22 November 2012
The analysis of the Auditory Brainstem Response (ABR) is of fundamental importance to the investigation of the auditory system behaviour, though its interpretation has a subjective nature because of the manual process employed in its study and the clinical experience required for its analysis. When analysing the ABR, clinicians are often interested in the identification of ABR signal components referred to as Jewett waves. In particular, the detection and study of the time when these waves occur (i.e., the wave latency) is a practical tool for the diagnosis of disorders affecting the auditory system. Significant differences in inter-examiner results may lead to completely distinct clinical interpretations of the state of the auditory system. In this context, the aim of this research was to evaluate the inter-examiner agreement and variability in the manual classification of ABR.
A total of 160 ABR data samples were collected, for four different stimulus intensity (80dBHL, 60dBHL, 40dBHL and 20dBHL), from 10 normal-hearing subjects (5 men and 5 women, from 20 to 52 years). Four examiners with expertise in the manual classification of ABR components participated in the study. The Bland-Altman statistical method was employed for the assessment of inter-examiner agreement and variability. The mean, standard deviation and error for the bias, which is the difference between examiners’ annotations, were estimated for each pair of examiners. Scatter plots and histograms were employed for data visualization and analysis.
In most comparisons the differences between examiner’s annotations were below 0.1 ms, which is clinically acceptable. In four cases, it was found a large error and standard deviation (>0.1 ms) that indicate the presence of outliers and thus, discrepancies between examiners.
Our results quantify the inter-examiner agreement and variability of the manual analysis of ABR data, and they also allows for the determination of different patterns of manual ABR analysis.
KeywordsAuditory brainstem response ABR classification Inter-examiner variability
The study of the Auditory Brainstem Response (ABR) is an important tool for the evaluation of the auditory capacity and plasticity, as well as for the investigation of the integrity of the structures involved in the transmission of electrical impulses through the auditory system [1–3]. The classical process of analysis of the ABR consists in the identification of relevant temporal and morphological features of the Jewett waves, which are basic components of the ABR. The waves I, III and V are characterized by presenting the most evident positive peaks in the whole signal, and they are usually employed for the evaluation of the integrity of the auditory pathway [4–6].
When the objective of the ABR exam is the investigation of electro-physiological thresholds, the wave V is the most relevant, as it remains more evident in the signal even under low power intensity (e.g., 20 dB) . Currently, ABR analysis can be employed in distinct contexts. For instance, it can be used for the determination of electro-physiological thresholds in children, diagnosis of neural dysfunctions [1, 7], intra-operative monitoring , cardiac surgery, staging of coma, detection of degenerative diseases that produce hearing impairment, and in the diagnosis of auditory disorders that cannot be identified by tonal audiometry (e.g., in some motor deficiencies) .
The most common use of ABR analysis in clinical practice is the diagnosis of early hearing loss, particularly in newborns and children. According to the World Health Organization (WHO), 1.4 million children worldwide suffer from hearing problems. Olusanya et al.  reported that 855 babies are born every day in developing countries with hearing loss with little expectation of being diagnosed. A late diagnosis may hamper the cognitive development of patients, language skills, consequently resulting in delay of the learning and emotional processes [11, 12]. Another relevant application of ABR analysis is in the identification of diseases in the auditory nerve, such as tumor (schwannoma), neuropathy, dys-synchrony and degenerative diseases affecting the brainstem.
In most clinical situations, the ABR waves are identified through a manual assessment. The process of identification of the ABR components is dependent upon many variables, such as the employed experimental protocol, the clinical conditions of the subject and more importantly, on the previous experience of the examiner. The manual analysis of the ABR yields inconsistency in the results obtained by distinct examiners [13–15]. This makes the process of identification of the Jewett waves prone to error and can contribute to the erroneous diagnosis of some diseases. The consequences of a non-precise diagnosis are numerous, for instance, leading to inadequate treatment, or even delaying discovery of a serious illness.
In this context, given the importance of the ABR analysis and the subjective nature of its interpretation, the main objective of this study was to evaluate the inter-examiner agreement and variability in the manual classification of ABR. The examiners focused their analysis on classical features (i.e., temporal and morphological) manually extracted from the signal, as it is practiced in the clinical routine.
The results of this study quantify the variability found in the responses given by the examiners. Such results can be useful for highlighting the necessity of continuing training and standardization of procedures used for the interpretation of the ABR in the clinical practice. In the future, they can also be employed in the development of more accurate intelligent algorithms used for the automatic detection of the ABR waves.
Experience in years for each examiner
ABR data were collected by means of the commercial amplifier Bio-logic’s Evoked Potential System (EP), from Bio-Logic, USA. Prior to the positioning of electrodes on the scalp of the subject, the skin was properly cleansed and abraded. The electrodes were positioned according to the International 10–20 System proposed by Jasper in 1958 , being M1 (mastoid right) and M2 (mastoid left), Cz (active) and Fz (ground). Two differential channels of information were recorded. Channel 1 (M1-Cz), representing information detected from the right ear and Channel 2 (M2-Cz) from the left ear.
The signals were collected at a sample rate of 37,101 Hz, meaning that the time interval between two consecutive samples was of 0.027 ms. Each signal, resulting from an auditory stimulus, lasted 13.824 ms (or 512 samples). In this study we work with the averaged ABR, which is obtained by averaging 2000 ABR samples. This process can be seen as a filter that reduces background activity and highlights the signal of interest. The auditory stimulus (clicks) was used for the 80, 60, 40 and 20 dBHL power intensities for each ear. The stimulus rate was set to 21 cycles/s, as commonly used in clinical practice. The auditory stimulus used was the Click in the following intensities 80, 60, 40 and 20 dBHL for each ear. This procedure was repeated twice, resulting in 160ABR samples.
The examiners followed their individual criteria and professional experience, and their analyses consisted in the manual classification of waves I, II, III, IV and V. The results of this classification were the identification of the peak of the wave (amplitude) and its corresponding time of occurrence. Based on these results it was possible to estimate the inter-examiner agreement and variability by using the Bland-Altman statistical method. This method is a tool that has been cited on more than 11,500 research studies , highlighting its relevance in medical research.
Data consistency analysis
The results include the analysis of 160 ABR samples. In the graph the shaded areas represent the area limited by the minimum and maximum latency values obtained for the analysis for each wave and intensity. In addition, the standard deviation of the samples is presented together with a central tendency (i.e., the mean) and its 95% confidence interval estimated by means of Bootstrap .
The visual inspection of the graph reveals that the latency increases as the intensity decreases. This behavior is in accordance with findings reported in literature, which discusses the differences in the ABR patterns as function of the intensity [1, 18–20]. Another relevant observation is that at the 80 dBHL intensity, the ABR signal has a relatively high signal-to-noise ratio, which allows for a more precise evaluation of the waves, as they are more evident. For this reason, at the high intensity the latency is an important discriminatory feature of the Jewett waves. Note in the graph that at this intensity there is no overlap between the shaded areas and the central tendencies of the waves.
However, as we decrease the intensity, the visual detection of some waves is impaired. For instance, the examiners could not visually detect the presence of waves I and II at the 20 dBHL intensity. Wave III is more evident in the intensities of 80, 60 and 40 dBHL. In the 20 dBHL intensity the number of detections was significantly smaller. Waves IV and V remain evident for all intensities, but they tend to overlap at the 20 dBHL, as the detection of waves IV and V gets more complex (because the signal amplitude for this intensity tends to decrease). The number of detections is significantly lower at low intensity. This happens because of the way the neurons are activated by low stimulus intensity.
In general waves I, II and III are less evident at lower intensity, different from waves IV and V, which are evident even at low intensity, being therefore employed in auditory threshold detection studies.
The experimental results given in Figure 1 are in accordance with those found in literature [1, 19, 21, 22], showing, therefore, the consistency of our data set and the visual detection of the Jewett waves executed by the examiners.
Inter-examiner agreement and variability analysis
The mean, standard deviation and error (see (1)) for the bias and each pair of examiners is presented
Discussion and conclusion
The main objective of this study was to investigate the inter-examiner agreement and variability in the manual analysis of ABR provided by four seasoned examiners.
The motivation of this research comes from our own clinical experience that have shown that subjectivity and lack of standards in the interpretation of ABR is common and can lead to erroneous and inaccurate diagnosis of disorders that affect the auditory system. This subjectivity is also reported in many published research works [13, 23].
The first stage of our analysis was to verify whether the latency values obtained by the examiners were compatible with those reported in the literature. The results presented in Figure 1 depict all information provided by the examiners.
They are consistent with patterns described in other studies. For the intensity of 80 dBHL we obtained the following mean values for the Jewett waves: 1.56 ms (wave I), 3.77 ms (wave III) and 5.53 ms (wave V). Antonelli  reported that the normal average values of latency in the 100 dBSPL (Sound Pressure Level) intensity for waves I, III and V are respectively equal to 1.54 ms, 3.73 ms and 5.52 ms. Hernandez  evaluated the behavior of waves generated at different power intensities. In the intensities of 90, 70, 50, 30 and 10 dBHL the wave V was always detected and the average latency values were 1.49 ms, 3.73 ms and 5.53 ms for the waves I, III and V, respectively. These results indicate the coherence in the manual analysis provided by the examiners in this research.
Another problem we had to face in our analysis was in the establishment of acceptable threshold levels for the variation of the latency of Jewett waves. There is some disagreement in literature, as some authors report a variation of 0.1 ms as acceptable, whereas others report 0.2 ms [1, 5, 13, 21, 23]. In addition, some studies concerning the development of automatic systems for the detection of Jewett waves have considered values of latency between 0.1 ms and 0.2 ms as acceptable for the validation of these systems [24–27].
In our study, even though the application of the statistical paired t-test did not confirm a null bias for all pair of examiners and waves, the analysis of the mean, standard deviation and error showed that the discrepancies between examiners are below those considered clinically acceptable (see Figures 3 and 5).
The analysis of the error is important because it allows for the detection of outliers. For instance, the large error (0.014) found for examiners E3 and E4 in comparison to other results, reveals the presence of an outlier (0.8 ms), which is not clinically acceptable (see Figures 2 and 3). Large errors like this may lead to misdiagnosis that can have serious consequences to patients.
Besides the error it is also important to assess the standard deviation, as it is a measure of data variability. For instance, for wave IV, examiners E1 and E3, and also for examiners E2 and E3, the standard deviation is relatively large (>0.1 ms, see Table 2) which is also an indicative of discrepancies between examiners.
In general, when considering the variables involved in the process of ABR analysis, such as subjectivity and the number of years of experience of examiners, our results showed that there is a consistency between the annotations provided by the examiners. In most comparisons the variability found in the results was not clinically relevant since they are below 0.1 ms, though a more detailed study of the cases that presented large error and standard deviation suggested relevant discrepancies (e.g., outliers) between examiners.
A relevant finding of the study was that the experience in years in ABR analysis was not a determinant criterion in the success of the agreement between examiners. In our investigations examiners with different experience showed compatible results as can be seen in Tables 1 and 2. However, the largest disagreements between examiners’ annotations (see Table 1, cells marked with ‘§’), had the participation of examiner E3, who is the less seasoned examiner. This may suggest that this examiner needs further training in ABR analysis.
Occasional large differences between examiners may happen due to many factors: (i) misdetection of the peak of the ABR wave during the manual process of data analysis; (ii) lack of standardization during the process of peak identification; and (iii) introduction of error during the annotation procedure which involves transferring data from the computer screen to a spreadsheet.
The main contributions of this research were: (i) determination of patterns of manual annotations, for different stimulus intensity and waves, for a specific group of examiners; (ii) the proposal of a method capable of detecting examiners that have different patterns of ABR analysis; (iii) the possibility of applying the results to the development and evaluation of automatic systems for detecting ABR waves.
The authors would like to thank the Brazilian government for the financial support for this research, in particular the Foundation for Research Support of the State of Minas Gerais (FAPEMIG), The National Council for Scientific and Technological Development (CNPq) and the Coordination for the Improvement of Higher Education Personnel (CAPES).
- Hood LJ: Clinical Applications of the Auditory Brainstem response. San Diego: Singular Publishing Group Inc.; 1998.Google Scholar
- Nodarse EM, Abalo MCP, López GS: Métodos de pesquisaje de las pérdidas auditivas a edades tempranas. Revista Electrónica de Audiologia 2006, 3: 9–18.Google Scholar
- Eggermont JJ: Electric and Magnetic Fields of Synchronous Neural Activity. In Auditory Evoked Potentials: basic principles and clinical application. Edited by: Burkard RF, Eggermont JJ, Don M. Baltimore: Lippincott Williams & Wilkins; 2007:2–21.Google Scholar
- Hall JW: New Handbook of Auditory Evoked Responses. Boston: Pearson Edication, Inc.; 2006.Google Scholar
- Misulis KE: Potencial Evocado de Spehlmann. 2nd edition. Rio de janeiro, Brazil: Revinter Ltda; 2003.Google Scholar
- Schwanke D: Exame de Potenciais Evocados Auditivos Utilizando Processador Digital de Sinais - DSPEA. Dissertação de Mestrado. Porto Alegre, Brazil: Universidade Federal do Rio Grande do Sul, Instituto de Informática; 2000.Google Scholar
- Sininger YS: Source Analysis of Auditory Evoked Potentials and Filds. In The use of Auditory Brainstem Response in Screening for Hearing Loss and Audiometric Threshold Prediction. Edited by: Burkard RF, Eggermont JJ, Don M. Baltimore: Lippincott Williams & Wilkins; 2007:254–274.Google Scholar
- Martin WH, Shi BYB: Intraoperative monitoring. In Auditory Evoked Potentials: Basic Principles and Clinical Application. Edited by: Burkard R, J.J E, Don M. Philadelphia: Lippincott Williams & Wilkins; 2007:355–384.Google Scholar
- Katz J: Audiologia clínica. 3rd edition. New York: manole; 1989.Google Scholar
- Garcia BG, Gaffney C, Chacon S, Gaffney M: Overview of newborn hearing screening activities in Latin America. Rev Panam Salud Publica 2011, 29: 145–152.Google Scholar
- Chomsky N: Three factors in language design. Linguistic Inquiry 2005, 36: 1–22. 10.1162/0024389052993655View ArticleGoogle Scholar
- Fitcha WT, Hauserb MD, Chomsky N: The evolution of the language faculty: clarifications and implications. Cognition 2005, 97: 179–210. 10.1016/j.cognition.2005.02.005View ArticleGoogle Scholar
- Vidler M, Parker D: Auditory brainstem response threshold estimation: subjective threshold estimation by experienced clinicians in a computer simulation of the clinical test. Int J Audiol 2004, 43: 417–429. 10.1080/14992020400050053View ArticleGoogle Scholar
- Pediatrics AAO: Newborn and infant hearing loss: detection and intervention. Pediatricis 1999, 103: 527–530.View ArticleGoogle Scholar
- Junqueira CAO, Colafêmina JF: Investigação da estabilidade inter e intra-examinador na identificação do P300 auditivo: análise de erros. Rev Bras Otorrinolaringol 2002, 68: 468–478. 10.1590/S0034-72992002000400004View ArticleGoogle Scholar
- Fernandez R, George F: Validating the Bland-Altman Method of Agreement. Western Users of SAS Software; Long Beach California: Long Beach; 2012. [http://www.wuss.org/proceedings09/09WUSSProceedings/papers/pos/POS-Fernandez.pdf]Google Scholar
- Porto MAA, Azevedo MF, Gil D: Auditory evoked potentials in premature and full-term infants. Baz J Otohrinolaryngol 2011, 77: 622–627.Google Scholar
- Don M, Ponton CW, Eggermont JJ, Kwong B: The effects of sensory hearing loss on cochlear filter times estimated from auditory brainstem response latencies. Acoustical Society of America 1998, 104: 2280–2289. 10.1121/1.423741View ArticleGoogle Scholar
- Hernández JD, Castro FZ, Prat JJB: Normalización de los potenciales evocados auditivos del tronco cerebral I: resultados en una muestra de adultos normoyentes. Revista Electrónica de Audiologia 2003, 2: 13–18.Google Scholar
- Cd M, Manjón M, vinuales M, Menéndez C: Estúdio morfológico de los potenciales evocados auditivos de tronco Del encéfalo.Influência de la posición Del eletrodo de referência. Rev Neurol 2002, 34: 84–88.Google Scholar
- Vannier E, Adam O, Motsch J-F: Objective detection of brainstem auditory evoked potentials with a priori information from higher presentation levels. Artif Intell Med 2002, 25: 283–301. 10.1016/S0933-3657(02)00029-5View ArticleGoogle Scholar
- Antonelli AR, Bellotto R, Grandori F: Audiologic diagnosis of central versus eighth nerve and cochlear auditory impairment. Audiology 1987, 4: 209–226.View ArticleGoogle Scholar
- Don M: Quantitative approaches for defining the quality and threshold of auditory brainstem responses. IEEE Engineering In Medicine & Biology Society 1989, 2: 0761–0762.Google Scholar
- Jacquin A, Causevic E, John ER, Prichep LS: Optimal denoising of brainstem auditory evoked response (BAER) for Automatic peak identification and brainstem assessment. In Book Optimal denoising of brainstem auditory evoked response (BAER) for Automatic peak identification and brainstem assessment. City: IEEE; 2006:1723–1726.Google Scholar
- Acyra N, Ozdamarb O, Guzelis C: Automatic classification of auditory brainstem responses using SVM-based feature selection algorithm for threshold detection. Eng Appl Artif Intell 2006, 19: 209–218. 10.1016/j.engappai.2005.08.004View ArticleGoogle Scholar
- Boston JR: Automated interpretation of brainstem auditory evoked potentials: a prototype system. IEEE Trans Biomed Eng 1989, 36: 528–532. 10.1109/10.24254View ArticleGoogle Scholar
- Bradley AP, Wilson WJ: On wavelet analysis of auditory evoked potentials. Clin Neurophysiol 2004, 115: 1114–1128. 10.1016/j.clinph.2003.11.016View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.