Most heart diseases are associated with and reflected by the sounds that the heart produces. Heart auscultation, defined as
listening to the heart sound, has been a very important method for the early diagnosis of cardiac dysfunction. Traditional auscultation requires substantial clinical experience and good listening skills. The emergence of the electronic stethoscope has paved the way for a new field of computer-aided auscultation. This article provides an in-depth study of (1) the electronic stethoscope technology, and (2) the methodology for diagnosis of cardiac disorders based on computer-aided auscultation. The paper is based on a comprehensive review of (1) literature articles, (2) market (state-of-the-art) products, and (3) smartphone stethoscope apps. It covers in depth every key component of the computer-aided system with electronic stethoscope, from sensor design, front-end circuitry, denoising algorithm, heart sound segmentation, to the final machine learning techniques. Our intent is to provide an informative and illustrative presentation of the electronic stethoscope, which is valuable and beneficial to academics, researchers and engineers in the technical field, as well as to medical professionals to facilitate its use clinically. The paper provides the technological and medical basis for the development and commercialization of a real-time integrated heart sound detection, acquisition and quantification system.
Heart disease is the leading cause of death in most countries in the world. In 2012, cardiovascular diseases killed 17.5 million people, i.e., three in every ten deaths . The valvular heart disease (VHD) is one typical type of cardiovascular disease causing significant indisposition and adverse effect on the functionality and long term life of the patient . In the reduction of deaths from heart diseases (particularly VHD), diagnosis plays a vital role. The electrocardiogram (ECG) is a powerful and common screening tool for heart diseases. It is relatively inexpensive, non-invasive, and easy to use. However, it does have some limitations, one of which is the difficulty in detecting structural abnormalities in heart valves and defects characterized by heart murmurs . Nowadays, the magnetic and ultrasound scanner can take detailed and even moving images of the heart. Echocardiogram (echo) uses the principle of rebounding waves to create a moving picture of the heart. It provides information about the size, shape, structure and function of the heart. Cardiac magnetic resonance imaging (MRI) uses radio waves, magnets and a computer to create pictures of the heart as it beats. The MRI test produces both still and moving pictures of the heart and major blood vessels . A computed tomography (CT) scan of the heart, on the other hand, is an imaging method that uses x-rays to create detailed pictures of the heart and its blood vessels. Recently, many advanced geometry processing techniques have been developed for the MRI and CT based 3D and 4D heart model reconstruction, to further understand and visualize what we are unable to obtain from static 2D images [5–8]. However, the major disadvantages of echo, MRI and CT are their high cost and the need of specialized personnel to operate the complex machines. These equipments are usually only affordable in large hospitals in the big cities. According to WHO , nearly 80% of deaths due to cardiovascular disease occur in low- and middle-income countries. The availability of the medical imaging equipments in these countries is quite low.
Therefore, it is very important to have a cost effective and accurate method for the early detection of cardiac illnesses. Heart auscultation, defined as listening and interpretation of the heart sound (HS), has been a very important method for the early diagnosis of heart diseases  by capturing abnormal HSs. A phonocardiogram (PCG) is a plot of high fidelity recording of the HS with the help of the machine called phonocardiograph. The HS and PCG are often used interchangeably in the literature. Throughout our study, unless otherwise stated, the term HS is used. Despite the advantages of low cost and easy operation, auscultation of the heart has been traditionally limited by three factors. Firstly, as the HS contains a mixture of high frequency (HF) and low frequency (LF) acoustic signals with low amplitude, it is highly required for the stethoscope (or the sensor used in the stethoscope) to have a high selective sensitivity. Secondly, the HS data recorded with the stethoscope is often corrupted with noise, which can prohibit accurate and effective diagnosis of heart diseases. Thirdly, the interpretation of the HS is very subjective, and it depends largely on the experience, skills, and hearing ability of the physician. Therefore, the need for advancement of heart auscultation is highly evident.
The emergence of electronic stethoscope has opened a new field named “computer-aided auscultation”. With the recent developments in the technology, from acoustic sensor design, advanced digital signal processing to the computer based machine learning techniques, the acoustic based automatic diagnosis of cardiac dysfunction by electronic stethoscope has attracted much attention in recent years. This paper provides a comprehensive review of literature articles, market (state-of-the-art) products, and smartphone stethoscope apps covering every key component of computer-aided cardiac dysfunction detection system using the electronic stethoscope. We believe that the information provided in this paper will be valuable and beneficial to not only the researchers and engineers in the technical field, but also to medical professionals.
This paper is organized as follows. The characteristics of the HS are first discussed. The paper subsequently gives the overall system design and the organization of the review. The detailed results of the review study are then presented, followed by discussions and recommendations. Finally concluding remarks are reported.
Characteristics of heart sounds
HSs are generated by the beating heart and the resultant flow of blood through it . In healthy adults, there are two normal HSs (as illustrated in Figure 1): the first HS (S1), produced by the closing of the atrioventricular valves; and the second HS (S2), caused by the closure of the semilunar valves. In the case of abnormal HS, there could be other several signal activities between S1 and S2 such as S3, S4, murmur, etc. The third HS (S3) is a rare extra sound caused by a sudden deceleration of blood flow into the left ventricle from the left atrium. This sound is normal in children and adults up to age 35–40 years. After the age of 40, a third HS is usually abnormal and correlates with dysfunction or volume overload of the ventricles . The fourth HS (S4) is caused by the vibration of valves, supporting structures and the ventricular walls. S4 is proved to be a sign or symptom of heart failure during diastolic period. In general, the frequency of S1 is lower than that of S2, and the duration of S1 is longer than that of S2. The S3 occurs from 0.1 to 0.2 s after S2, while S4 occurs from 0.07 to 0.1 s before S1—both of them are low pitched. In addition to these HSs, numerous heart murmurs may arise mainly from heart problems or diseases. The murmurs are extra or unusual sound heard during a heartbeat and broadly classified as systolic, diastolic and continuous [13, 14].
The complete types and characteristics of the HSs are listed in Figure 2. Each heart disorder is associated with one or more HSs. The disorders that are associated with each sound are also detailed in Figure 2. The summary of characteristics of some common murmurs is provided in Table 1.
In some miscellaneous HSs, the opening snap is a high-pitched diastolic sound produced by rapid opening of mitral valve in mitral stenosis (MS) or tricuspid valve in tricuspid stenosis (TS). The ejection sound (ES) is the most common early systolic sound which results from abnormal sudden halting of the semilunar cusps as they open during early systole. The mid-systolic click (MSC) is a HF sound in mid systole that results from the abrupt halting of prolapsing mitral valve leaflets’ excursion into the atrium by chordae .
During cardiac auscultation, the physicians are particularly interested in abnormal sounds, and various types of murmurs, which may suggest the presence of a cardiac pathology and also provide diagnostic information.
Figure 3 shows one example of the state-of-the-art electronic stethoscopes, which provides the choice of bell, diaphragm and wide mode to pick the right frequency for better body sound acquisition. It allows the physician to record the HS of their patients directly onto their PC or laptop for further visualization and analysis. With this capability, healthcare givers can have better understanding and interpretation of the HS for better-quality medical services. More detailed discussion on the commercially available electronic stethoscopes can be found in the later section.
Usually, there are three main modules, namely data acquisition module, pre-processing module and signal processing module, in the computer-based cardiac dysfunction detection system using electronic stethoscope, as displayed in the flow chart of Figure 4 . For the data acquisition module, an electronic stethoscope records the HS and the associated electronics converts it into digital signals and sends it to the pre-processing module. In the pre-processing module, the filtered and interference reduced HS signal is normalized and segmented. Feature extraction and classification are carried out in the signal processing module. The output of the system is the classification result for clinical diagnostic decision making. Detailed descriptions on the three main modules and their sub-modules are provided as follows.
Heart sound data acquisition module
The HS acquisition stage creates the digital HS data for further processing.
Electronic stethoscope sensor The HS signals are directly collected from patients by using an electronic stethoscope. Some commonly used transducers in the stethoscope are microphone, piezoelectric sensors, etc. The sound signals from the heart are converted to analog electrical signals.
Amplifier and filter Amplification and filtering are the two major aspects in any signal acquisition system. Usually, a pre-amplifier with a small gain is used to suppress the 50 or 60 Hz interference from power lines. An anti-aliasing filter is then employed to prevent aliasing effect. In some system designs, the filter section is built with a band pass filter circuit having the frequency range of most HS signals. The use of band pass filter with proper passband selection not only prevents aliasing, but also removes some of the noises outside the passband. In post amplification, the filtered signal is amplified to the level range required by the analog-to-digital converter.
Analog-to-digital converter The amplified and filtered analog signal is converted to digital signal by the analog-to-digital converter. The sampling frequency and bit resolution can be set by the system designer. Usually, a higher sampling rate and bit resolution will provide greater accuracy, at the cost of more bandwidth required and power consumption.
Heart sound pre-processing module
In this stage, the digital HS signal will undergo noise reduction, normalization and segmentation.
Signal denoising unit A digital filter is sometimes used to extract the signal within the frequency band of interest from the noisy data. In order to equip the system with even better denoising capability, some advanced artifacts removal techniques are generally utilized such that the output signal-to-noise ratio (SNR) can be further improved.
Normalization and segmentation In data acquisition, different sampling and acquisition locations normally result in a signal variation. Thus, the HS signals are normalized to a certain scale, so that the expected amplitude of the signal is not affected from the data acquisition locations and different samples. After getting the normalized signals, the HS signals are segmented into cycles which are ready for HS components detection and features extraction.
Heart sound signal processing module
The feature extraction and classification are conducted in this stage.
Feature extraction Signal processing is carried out to convert the raw data to some type of parametric representation. This parametric representation, called feature, is then used for further analysis and processing.
Classification A classifier, trained with the extracted features, is used to categorize the data and assist the medical specialist for clinical diagnostic decision making.
Therefore, the processing blocks (shown in Figure 4) form the core units of a computer-aided HS measurement and analysis system. Based on extensive study, we have found that the study for the automated detection of various heart pathological conditions and diseases from HS signal mainly focuses on three stages: (1) HS acquisition system and sensor design (2) denoising and segmentation of HS signals, and (3) appropriate feature extraction and automatic interpretation of HS.
Organization of review
The review process covers three main aspects, namely articles, state-of-the-art products and smartphone stethoscope apps review.
The purpose of the ‘article review’ is to provide in-depth summary and evaluation of the recent progress in computer-based HS analysis from the engineering perspective. For this purpose, relevant articles were initially identified from searches of various electronic resources, such as IEEE, Springer, Elsevier, PubMed and ACM digital library databases. The search was carried out by using the keyword “heart sound denoising” and “heart sound classification”. A selection criterion was finalized, and every article was selected according to the selection criteria. A total of 21 and 53 articles, respectively, were included in the final selection of articles for HS denoising and classification. Furthermore, some brief studies on the HS recording sensor and HS segmentation methods have also been done, and their summaries are provided for the completeness of the review covering all major components of the overall HS analyzing system.
The objective of ‘market and stethoscope apps review’ is to validate the needs, problems and gaps from the medical and market point of view as well as from a technology push perspective. The prior art search for market and smartphone stethoscope apps was performed by using the web search engines with the keyword “electronic stethoscope” and “stethoscope apps”, respectively. A total of eight products and five apps were selected for analysis and comparison.
The literature articles review results consist of four subsections:
Sensors for HS recording in data acquisition module
Heart sound denoising in pre-processing module
Heart sound segmentation in pre-processing module
Feature extraction and classificationof HS in signal processing module
Sensors for heart sound recording
There are generally two types of stethoscopes: the traditional acoustic stethoscope and electronic stethoscope. The acoustic stethoscope operates on the transmission of sound from the chest piece, via air-filled hollow tubes, to the listener’s ears. The chest piece usually consists of two parts that can be placed against the patient for sensing sound: the bell transmitting LF sounds and the diaphragm transmitting HF sounds. One problem with acoustic stethoscope is that the sound level is extremely low. An electronic stethoscope overcomes the low sound levels by electronically amplifying the body sounds. Electronic stethoscopes convert the acoustic sound waves obtained through the chest piece into electrical signals which can then be amplified for optimal listening. The converted electrical signals can also be digitalized for further processing and transmission. It has been shown that compared with the conventional stethoscope, the electronic stethoscope is considered to be better for hearting HSs, even though no benefit can be found for breathing sounds .
Unlike acoustic stethoscopes, transducers or sensors in electronic stethoscopes vary widely. The simplest method of sound detection is achieved by placing a microphone in the chest piece. The microphone, mounted behind the stethoscope diaphragm, picks up the sound pressure created by the stethoscope diaphragm, and converts it to electrical signals. The microphone itself has a diaphragm, and thus the acoustic transmission path comprises of the stethoscope diaphragm, air inside the stethoscope housing, and finally the microphone diaphragm. The existence of two diaphragms and the intervening air path result in excess ambient noise pickup by the microphone, as well as inefficient acoustic energy transfer .
The piezo-electric sensors operate on a somewhat different principle than merely sensing diaphragm sound pressure [20–24]. Piezo-electric sensors produce electrical energy by deformation of a crystal substance. In one case, the diaphragm motion deforms a piezoelectric sensor crystal, which is mechanically coupled to the stethoscope diaphragm, and an electrical signal results. The problem with this sensor is that the conversion mechanism produces signal distortion compared with sensing the pure motion of the diaphragm. The resulting sound is thus somewhat different in tone and distorted compared with that obtained by an acoustic stethoscope.
The capacitive-type sensor based on the Micro-electro-mechanical system (MEMS) technology detects acoustic pressure with a change in its nominal capacitance value [25–29]. Figure 5a shows the working principle of the capacitive MEMS sensor. The center piece which is the diaphragm is a suspended weight (proof mass) that is free to move. This proof mass is electrically isolated from a static fixed structure depicted by the fixed comb in Figure 5b, thus having a nominal capacitance value between them. When the diaphragm is subjected to an acoustic pressure source, it will start to move in harmony with the source thus causing changes in its nominal capacitance value. The capacitive MEMS sensor has the advantage of small size, mass production and better temperature stability. In addition, it is compatible with conventional complementary metal-oxide-semiconductor (CMOS) technology; hence, when combined with integrated circuit it makes it possible to develop high performance HS sensor system.
The signal captured by the sensor of the electronic stethoscope is fed into the pre-amplifier and anti-aliasing circuit as discussed in earlier section. Figure 6 shows the two stages of the signal path from the stethoscope pickup with a microphone as the sensor ; therein, the signal from the microphone goes through an amplifier with a gain of 20, followed by an anti-aliasing low pass filter using OPA134 with a cut-off frequency at 2 kHz. The audio CODEC block and signal denoising block are also shown in Figure 6; the CODEC block is a single device that encodes analog sound signal as digital signals and decodes digital back into analog; the denoising block is used to suppress the unwanted noises. The detailed discussion on the HS denoising will be provided in the following subsection.
Heart sound denoising
HSs are very often disturbed by various factors, which can prohibit their automated analysis. In general, the HS noise can be divided into two groups: external and internal . The external disturbances include a wide frequency and intensity spectrum of noise caused by speech and noise caused by motion, whereas the group of signals with internal origin disturbances consists of mainly signals caused by digestive and respiratory processes. Moreover, there are many other types of noise, which during the measurement may occur occasionally, such as vocal (coughing, laughing), physiological (muscle movements, swallowing), sensor (rubbing) and ambient (knocking at the door, ambient music, phone ringing, footsteps) . Due to the existence of such noises, some components of the HS may become extremely hard to hear during auscultation—especially the murmurs which have lower amplitude and similar characteristics to noise. Therefore, the development of precise noise removal algorithms, which are capable to work in noisy environments, is of great importance and is the research subject of many scientists.
A total of 21 articles for HS noise reduction techniques were selected for review studies, and the overview of nine (out of 21) articles is tabulated in Table 2. Generally speaking, the HS denoising algorithms can be categorized into two groups: denoising in time domain and transform domain.
Time domain denoising An adaptive noise canceller (ANC) is employed for external noise reduction [33, 43, 44]. The ANC, as shown in Figure 7, consists of two sound transducers. The first one, called primary input, is used to pick up the noise corrupted HS. The second one, named reference input, is used to only pick up the environmental noise correlated in some unknown way with the primary noise. The reference input is filtered and subtracted from the primary input to obtain the signal estimate.
Since both HS and environmental noise are time-varying, the filter uses an adaptive algorithm to change the value of the filter coefficients, so that it acquires a better approximation of the signal after each iteration. Some of the most popular of such adaptive algorithms are least mean square (LMS) algorithm, normalized LMS (NLMS) algorithm, subband LMS (S-LMS) algorithm, subband NLMS (S-NLMS) algorithm and recursive least square (RLS) algorithm . To further enhance the HS, a type I Chebyshev infinite impulse response (IIR) band pass filter was also used in  to filter out the HF and LF bands, and only keep the major frequency band of interest.
One major drawback of the ANC is that it needs two input channels (e.g., two microphone sensors) as a primary signal and a reference signal. However, a single stethoscope can provide just one input sound. To overcome this issue, the authors in  have proposed a single input ANC (SIANC) for suppression of breath sound in a cardiac auscultation sound. The proposed SIANC uses a reference generation system which generates a reference signal using the primary signal. The reference generation system consists of a HS detector, a control unit and a reference generator. The HS detector is used to detect the presence of major HS components, i.e., S1 and S2. When S1 or S2 is detected, the reference signal is taken from the reference generator output, and hence the main HS components are excluded from the reference signal. However, the SIANC is only suitable for normal HS denoising. Its performance will be largely degraded if the original HS comes from a patient with certain heart disease, where murmur signal exists between S1 and S2. The ultimate objective of any ANC is to only suppress the external noise not produced by the heart. With SIANC, the murmur signal can be easily transferred to the reference generator and subsequently taken as part of the reference signal, resulting in the elimination of murmur from the final output HS.
Apart from ANC, the adaptive line enhancer (ALE) is also used for denoising the HS. The main advantage of it is that, unlike ANC, it does not require any reference signal to eliminate the noise signal, and thus the additional sensor becomes unnecessary. As seen from Figure 8, the reference signal in ALE is the delayed version of the primary signal. In , two types of ALE were implemented, namely LMS-ALE and RLS-ALE. The study shows that RLS-ALE is a method superior to the LMS-ALE for HS denoising. The former is found to be more consistent and accurate though computationally expensive.
One challenge with the conventional LMS and NLMS is the proper choice of step size, since there is a trade-off between the convergence rate and the misadjustment error which is the ratio of excess error to the optimum error . A larger step size can be used to speed up the convergence, but this will increase the misadjustment error. To overcome this issue, the authors in  proposed to implement the ALE with variable step-size Griffith LMS. This method not only facilitates faster convergence, but also smaller convergence error/misadjustment.
Transform domain denoising The signal transform is defined as converting the signal from one domain (normally time domain) to another domain, with the purpose of extracting the characteristic information embedded within the time series that is otherwise not readily observable in its original form. The Fourier analysis forms the basis, as the most popular transformation technique to find the frequency content of a signal. The Fourier transform (FT) reveals the frequency composition of a time series by transforming it from the time domain into the frequency domain. The FT is usually implemented in the form of the fast FT algorithm (FFT). One drawback of FT is that it is not suited for analyzing non-stationary signals because it does not reveal how the signal’s frequency contents vary with time. On the other hand, as most real world signals are generally non-stationary in nature (such as HS signals), a new signal processing technique that is able to handle the non-stationarity of a signal is needed. The short-time FT (STFT) is one solution to overcome the limitations of the FT . The STFT analyzes a small section of the (stationary or non-stationary) signal at a time, which is known as windowing. The time domain signal is decomposed into a 2D time–frequency representation using STFT, and the variations of the frequency content of that signal within the window function are revealed. The problem with STFT is a compromise in resolution. A wide window gives better frequency resolution but poor time resolution. A narrow window results in good time resolution but poor frequency resolution. This resolution problem of STFT is solved by wavelet transform (WT)  which uses variable-length window. It was developed as a method to obtain high resolution in both time and frequency. A WT in which the wavelets are discretely sampled is known as the discrete WT (DWT).
One type of HS denoising algorithm using FFT and STFT is called the spectral subtraction (SS) based methods . In this method, an average signal spectrum and average noise spectrum are estimated in parts of the recording and subtracted from each other, so that average SNR is improved. One drawback of the classical SS is that it often distorts the signal and introduces certain artifacts called “musical noise”. In this regard, Ephraim and Malah  have proposed a minimum mean-square-error (MMSE) based short-time spectral amplitude (STSA) estimator that is closely related to SS. This technique uses a “decision-directed” approach to estimate the SNR on-line in the spectral domain. The SNR is then used to determine a Wiener filter gain applied to the spectral amplitudes for estimating the signal. A number of improved algorithms based on the MMSE–STSA have been proposed for HS denoising application [37, 50]. For stationary noises, these methods can achieve a high amount of noise suppression; however, the amount of non-stationary noise suppression tends to be dependent on the speed with which the noise spectrum is updated.
The most widely used noise reduction methods of HS are wavelet denoising algorithms which are based on DWT [39, 40, 51–58]. The general wavelet denoising procedure is as follows:
Decomposition Apply DWT to the noisy signal to produce the multi-level wavelet coefficients. The WT will compress the energy in a signal into few large components, whereas the noise is disorderly and characterized by small coefficients scattered throughout the transform.
Thresholding Select appropriate threshold value at each level, and threshold method to neglect the smaller coefficients from the wavelet-decomposed details.
Reconstruction Apply inverse DWT to the resultant wavelet coefficients to obtain a denoised signal.
One assumption made in  is that the environmental noise can be modeled as white Gaussian noise. However, according to the authors of , some color noises can also be found in the real HS signals collected by using their self-produced HS recording device. More specifically, based on the study of signal spectrum, the power spectral densities (PSD) of the noise registered during real HS signal measurements is very similar to the PSD of red or pink noise. Therefore, some investigations on the DWT parameters selection have been done in  with the noise characterized and modeled as colored. And it is concluded that the best results of wavelet denoising are obtained by the use of wavelet Coiflet 5 at the 10 decomposition level, minimax thresholding algorithm, and multiple level rescaling function.
Some shortcomings of DWT are that it has the shift variant property, and the wavelet coefficients are oscillatory in nature . A small shift in the input signal may completely change the wavelet coefficient oscillation pattern around singularities. On the other hand, due to the non-ideal wavelet filter, some redundant frequencies will still be left after being firstly filtered by a wavelet filter [53, 54]. Hence, a biomedical signal denoising algorithm based on the dual-tree complex wavelet transform was presented in , with the important additional properties including an anti-aliasing effect and nearly shift invariant. Experiment results show the effectiveness of the method in terms of SNR improvements.
The performance of DWT with Stein’s Unbiased Risk Estimate (SURE) shrinkage and Bayes shrinkage were studied and compared to that of LMS-ALE and RLS-ALE . It is evident from the analysis that RLS-ALE and DWT with SURE shrinkage give the best results with HS recovery of 95–98% and noise reduction of 88–92%.
It was mentioned earlier that the adoption of ANC algorithm usually requires two sensors, whereas only 1 input channel is needed for DWT thanks to the un-necessity of reference signal input. However, a novel DWT based method which allows a more effective cancellation of noises was proposed , by which significant improvement of signal quality can be achieved at the cost of one additional microphone sensor. Two identical high-sensitivity electret microphones are employed in the system, wherein the first microphone records the signals of heart, internal noises as well as external noises; the second microphone measures the external noises. The internal noises and external noises are suppressed in a two-step sequential processing with wavelet decomposition done on both the recorded signal channels.
The efficiency of wavelet based denoising techniques greatly depends on the choice of threshold parameter. A very large threshold cuts too many coefficients, resulting in some useful signal loss. On the other hand, a too small threshold value allows many coefficients to be included in reconstruction whereby excess noise will be retained. Besides the categorization of thresholding techniques as hard thresholding and soft thresholding, the threshold choosing methods can also be divided into two types according to their wavelet level dependence: global thresholding and level-dependent thresholding. The former chooses a single value of threshold to be applied globally to all empirical wavelet coefficients, while the latter selects different thresholds for each wavelet level. Some improved global thresholding function in wavelet domain can be found in [55, 56], and the wavelet subband dependent thresholding functions were proposed in [57, 58] with encouraging results reported.
Besides WT, the empirical mode decomposition (EMD) , as another method for denoising, has become rather popular. The EMD process is a way to decompose a signal into so-called intrinsic mode functions (IMF), and obtain instantaneous frequency data. It has proven to be quite versatile in a broad range of applications for extracting signals from data generated in noisy non-linear and non-stationary processes. Mode mixing is one of the drawbacks of the original EMD, wherein a single IMF may consist of signals of widely disparate scales or a signal of similar scales may reside in different IMFs. In order to overcome this problem, the Ensemble EMD (EEMD) constitutes a noise-assisted data approach proposed in . This algorithm defines IMF components as an ensemble of trials, each consisting of original signals contaminated with white noise of finite amplitude. A number of denoising methods of HS using EMD or EEMD can be found in [41, 42]. It is concluded that the EEMD seems to be a good solution for HS denoising, provided the useful subtle features have been detected previously .
Another two recently proposed HS denoising methods [61, 62] are also worth further exploration and investigation on their effectiveness with more real noisy HS data, especially abnormal HS data. A denoising and segmentation technique for the second HS S2 using matching pursuit (MP) was presented earlier . The MP is an algorithm used to decompose a signal into a linear expansion of waveforms that are selected from a dictionary of time–frequency functions (called atoms) . MP can be applied for denoising, analysis and synthesis of the HS. One drawback of MP based algorithm is the lack of explicit models. This has motivated the authors in  to develop a new dynamical model for HS, which is capable of synthesizing realistic HS signals. A new HS denoising framework using extended Kalman smoother was established subsequently based on the proposed HS dynamical model. Simulation results demonstrate the superior performance compared to that of WT denoising methods . Furthermore, this framework may also be helpful for joint ECG–PCG real-time processing for clinical application. However, the major limitation of the method is that the dynamical model was built to only model the main HS components, i.e., S1 and S2. Hence performance degradation may be encountered if other components of the HS also exist.
It was mentioned earlier in this section that the lung sound (LS) constitutes one component of the HS noise. The LS is produced by the structures of the lungs during breathing. In fact, even though most of the denoising techniques discussed above are claimed to be able to suppress the LS from the HS, analyzing the nature of the LS and how they affect HS often requires another independent study. Among all the solutions, blind source separation or specifically the independent component analysis (ICA) techniques are most widely adopted. The basic idea of ICA is to find a demixing matrix to extract out the desired signal (the HS) from the linearly mixing signal (the HS and LS). The details can be found in [64, 65] and the references therein.
Heart sound segmentation
Signal segmentation is usually carried out after denoising stage. The simplest way of segmentation is through the use of sliding window . This technique simply partitions the whole HS data into a number of segments with the same duration, without considering the positions of starting point and ending point for each segment. The more commonly adopted segmentation technique in HS analysis is performed on the basis of the cardiac cycle, since in most cases the activities in the HS signal relating to given disease are contained in a single interval of cardiac cycle. Specifically, the segmentation algorithm divides the HS data into portions, each of which consists of four parts: S1, the systole, S2 and the diastole.
The majority of the attempts in segmenting the HSs use the ECG signal and/or carotid pulse as the reference signal [67–71]. This method, often called indirect segmentation, demarcates the HS boundaries using the QRS complex and T-waves of ECG signal. The disadvantage of indirect segmentation is that, apart from the employment of ECG electrodes, the complete segmentation may become difficult because the T-wave is too weak to be identified in some patients . Another disadvantage of using ECG as a reference is that the timing between electrical and mechanical activities in a cardiac cycle is not constant for all patients, because of a variety of pathological conditions .
The direct segmentation methods are well established in the literature, where the cardiac cycle boundary is located based on the HS signal itself without a reference to the ECG. The most famous method of direct segmentation is the envelogram based approach presented in  and illustrated in Figure 9, where the following four steps are performed:
Calculate the envelope of the HS, using normalized average Shannon energy; Figure 9a shows the original HS signal and its normalized average Shannon energy.
Detect the peaks of the envelope with a threshold. Several additions have been made in this procedure to reduce the peak detection errors, such as rejecting extra peaks and recovering “lost” peaks. The "lost" peaks are recovered by lowering the threshold (Figure 9b).
Establish the one-to-one correspondence between the peaks and the S1 and S2, based on the fact that the largest interval of a recording is the diastolic period, as depicted in Figure 9c.
Form the cardiac cycle using the S1–S1 intervals.
The envelogram based method in  was extended in  such that, instead of performing the four-steps processing on the original HS data, a DWT was carried out and some wavelet coefficients were selected to cue the same segmentation procedure. The advantage of this extension is the lower sensitivity to ambient noises and recording locations. The basic theory behind the methods in [74, 75] is that a transform (Shannon energy and DWT) is performed to take the raw time domain HS data into another domain where S1 and S2 are emphasized. Other types of such transformation have been reported over the years for HS segmentation application, to name a few, STFT , PSD , homomorphic filtering, K-means clustering , MP , and EMD .
Feature extraction and classification of heart sound
The feature extraction and classification of HS are the two key stages for automatic HS interpretation and diagnosis of heart dysfunction. A total of 53 articles were selected for review studies and the overview of nine (out of 53) articles is tabulated in Table 3.
Instrumentation for heart sound recording The sensors that are used most often in HS classification are electronic stethoscope with microphone and piezoelectric sensor, which can acquire a wide range of frequencies between 0 and 2,000 Hz. Few notable recording instruments used by earlier researchers are electronic stethoscopes from Welch–Allyn (piezoelectric sensor), HP (piezoelectric sensor), 3 M (piezoelectric sensor) and Cardionics (microphone). More details on the commercially available HS recording instruments will be provided later in the market review.
Analyzed heart sounds/disorders Cardiac dysfunction is diagnosed through the detection and analysis of abnormal HSs, such as S3, S4 and most importantly the murmurs. In the section of “Characteristics of heart sounds”, we listed the 12 common murmurs. In some previous research work done by the researchers [90–104], the HS signals are analyzed and classified as normal HS and murmur, based on which the HS data are simply categorized as normal and abnormal. With more sophisticated algorithms, the classifiers can even distinguish different types of murmurs. The four distinct classes of murmur that are most commonly considered by the classifiers are MS, mitral regurgitation (MR), aortic stenosis (AS) and aortic regurgitation (AR) (e.g., [81, 82, 105–109]).
Test heart sound data Various normal and abnormal HS data collected from public HS database were used for algorithm validation in 17 out of the 53 selected studies. For most of the rest studies (apart from ten studies in which the type of test data was not mentioned), real HS data recorded from several to dozens of subjects (normal subjects and patients) were used to test the effectiveness of the algorithms proposed therein. Here one should note the differences between the “number of HSs” and “number of subjects” that are commonly mentioned in the literature. The number of HSs is usually much larger than the number of subjects, since multiple pieces of HS may come from single subject recording at different time or different chest locations.
Methods for feature extraction The extraction of features, which is the process of identifying distinctive properties from a signal, plays a major role in the effective classification of HS. The features can be extracted from the signals in one of the four domains: time domain, frequency domain, statistical domain and time–frequency domain. Some typical feature extraction techniques used in computer-based HS analysis are:
Time domain: timing, intensity, frequency location over time and shape , zero crossing rate, transition ratio , power, instantaneous energy , systole duration, diastole duration , durations of S1 and S2 , etc.
Frequency domain: spectral power based features [84, 113], instantaneous frequency , dominant maximum frequency [111, 114], etc.
The MFCC and DWT based features are most widely used for HS classifications, and the results presented recently in the literature have demonstrated their effectiveness. In fact, to achieve the best classification performance, some mixtures of features from different domains are always employed. A set of optimal features extracted from time, frequency and statistical domain was introduced in . In , the whole feature set was formed by a total of 18 time domain features, 9 DWT features and five entropy features. By using the method presented in , a feature vector of dimension 100 can be created with four from the statistical domain, 88 from the morphological domain, and eight belonging to the frequency domain. A large number of features were extracted in  and they are depicted in Figure 10, showing a combination of three features from time domain (Shannon energy), time–frequency domain (wavelet detail), and statistical domain (variance fractal dimension).
Methods for classification HS classification is a challenging task due to the nature of non-stationary property of the HS signal and large variations for the HS signals belonging to the same category. According to the review results, the artificial neural network (ANN) and support vector machine (SVM) algorithms are the classification techniques that are mostly used. The accuracy provided in  is 99% in classifying normal HS, MR, MS, pulmonary stenosis (PS), AR and summation gallop (SG) using ANN. The recognition rate reported in  was 95.56% using SVM in classification of normal HS, MS, AS and ventricular septal defect (VSD). The ANN has the ability to adapt well with complex non-linear data (such as HS) and classify it accurately and effectively. SVM, which is a new classification technique, has been used as a classification tool with a great deal of success in various applications areas. Other methods that have been used in the classification of HSs are Gaussian mixture model (GMM), hidden Markov model (HMM), k-nearest neighbors (KNN), decision trees and Bayesian networks, etc. Interested readers can refer to  and the references therein, for the detailed discussions and comparisons of different classification techniques.
The structure of the classifier depends on the goal of the system which may be either to screen normal from abnormal HS or to identify a specific heart disease. Figure 11 depicts three approaches to conduct HS classification, assuming the normal HS (NHS), AR, AS, MR and MS are those to be classified. The objective of multiple independent binary classifications (Figure 11a) is to categorize each of the heart disorder against NHS or a different heart disorder [81, 83, 84, 88]. The more commonly adopted approaches are multi-class classification (Figure 11b) [82, 85, 86] and binary hierarchical classification (Figure 11c) [87, 89], where the former classifies the HS instances into one of more than two classes; and the latter distinguishes one class from the remaining classes at each hierarchy level.
Proper validation of a classification method is important to see its effectiveness. Usually, the available data are divided into training set and testing set, e.g., in , 90% of the HSs (both normal and abnormal) were applied as training samples and the remaining 10% as testing samples; in , 70% of the HSs were randomly selected for training while 30% were taken for testing. Two commonly selected validation techniques are “repeated k-fold cross validation” [81, 84, 91, 94, 98, 99, 117, 128], and “leave-one-out” method [85, 88, 107].
Table 3 lists the summaries of classification performances in terms of sensitivity and specificity for nine selected articles, and the following observations can be made (here only AR, AS, MR and MS are considered since they are the most commonly diagnosed heart disorders):
8, 9, 8 and 6 out of the nine selected articles considered the diagnosis of AR, AS, MR and MS, respectively.
The mean (sensitivities, specificities) for diagnosing AR, AS, MR and MS are (89.8, 98.0%), (88.4, 98.3%), (91.0, 97.52%) and (92.2, 99.29%).
The good classification performance obtained suggests that these techniques are potentially useful for medical application, even though it is still premature to look at their real diagnostic value as will be discussed in later sections.
Extreme learning machine (ELM), as another new machine learning method, has attracted extensive attention recently due to its remarkable advantages such as fast operation, straightforward solution and strong generalization . However, the use of ELM in HS analysis is found to be very limited in the literature. The diagnostic potential in this domain of ELM has definitely not been sufficiently explored as yet, and so further research is required in this direction.
Market (state-of-the-art) products review results
The review results on the state-of-the-art products are provided in this section, validating the needs, problems and gaps from the medical and market point of view as well as from a technology push perspective with an overview tabulated in Table 4.
Development of the electronic stethoscope is gaining an edge over traditional stethoscope mainly due to the advanced sensor technologies, digital signal processing techniques as well as the digital sound transmission capabilities of digital stethoscopes. Most stethoscope manufacturers are focusing on developing the devices with enhanced acoustics, better performance and innovative designs. One example of the state-of-the-art electronic stethoscope has been given in Figure 1.
Unlike acoustic stethoscopes, which are all based on the same physics, transducers in electronic stethoscopes vary widely:
The simplest and least effective method of sound detection is achieved by placing a microphone in the chest piece. This method suffers from ambient noise interference and has fallen out of favor.
Another method, used in Welch–Allyn’s Meditron stethoscope, comprises placement of a piezoelectric crystal at the head of a metal shaft, the bottom of the shaft making contact with a diaphragm.
3 M also uses a piezoelectric crystal placed within a foam behind a thick rubber-like diaphragm.
Thinklabs’ Rhythm 32 inventor, Clive Smith uses an electromagnetic diaphragm with a conductive inner surface to form a capacitive sensor. This diaphragm responds to sound waves identically to a conventional acoustic stethoscope, with changes in an electric field replacing changes in air pressure. This preserves the sound of an acoustic stethoscope with the benefits of amplification.
Almost all electronic stethoscopes in the market are equipped with configurable filters with different frequency response modes for listening to the heart, lungs and even other human body sounds. These band pass filters with user-selectable passband frequencies are easy to implement and cost-effective. Besides the basic filtering mechanism, novel sensor design and noise reduction algorithm are also implemented in the products from 3 M, Thinklabs, Welch–Allyn and JABES to suppress the ambient noise, as well as the friction noise due to either patient’s body motion or physician’s hand tremor.
Currently this is no product offering for a truly specialized stethoscope tailored to the automatic heart disease diagnosis. Most of the products offer a wide range of generalist features for general auscultations. Nevertheless, thanks to the signal recording and transmitting capability offered by these products, the digital HS data can be recorded locally and transferred to a PC for visualization and further analysis. Some electronic stethoscopes can become portable by establishing connections to other handheld devices like mobile phone or wirelessly transmitting the sound signals to a remote processing unit through Bluetooth. Figure 12 shows an electronic stethoscope with a mobile phone  to view the HSs and send the sound recordings instantly for telemedicine applications.
Rijuven is a new entrant providing a “generalist” stethoscope which also encompasses the S3, S4 and murmur sound detection. The CardioSleeve stethoscope attachment enhances a stethoscope by providing synchronized digital auscultation with ECG. With acoustic HS and ECG data, the analyzing software implemented in a remote device can be utilized to visualize the HS and detect murmur and cardiac dysfunction.
Smartphone stethoscope apps review results
With the rapid development of smartphone technology, mobile health (mHealth) can support daily practice for health services and information. The mHealth applications include the use of mobile devices in collecting clinical health data, delivery of healthcare information to practitioners, researchers and patients, real-time monitoring of patient vital signs and direct provision of care (via mobile telemedicine) . This section provides a brief review summary on the newly emerged smartphone stethoscope apps. The summaries of five typical stethoscope apps are provided in Table 5.
SensiCardiac , shown in Figure 13a, is declared as the world’s most accurate cardiac murmur assessment solution. Some of the key techniques involved in SensiCardiac are: HS identification, localization and segmentation, HS features extraction in time and frequency domain, and HS classification using ANN. The ANN in SensiCardiac was trained on a data set of nearly 2,000 HSs to objectively distinguish between pathological and innocent murmurs. Several independent clinical studies were conducted, and the results showed that a sensitivity and specificity of above 80% was achievable. It should be noted here that SensiCardiac is to be used as an aid to a physician in interpreting the data, and not intended as a sole means of diagnosis. StethoCloud , illustrated above in Figure 13b, is a cloud-based service that turns a Windows smartphone into a digital stethoscope. StethoCloud takes a relatively simple approach to replace stethoscopes and targets a common childhood killer. Different from SensiCardiac which performs the data analysis locally, both the noise suppression and data analysis for StethoCloud happen in the cloud using Windows Azure . Thinklabs’s app  is one of the earliest developed medical stethoscope apps and it is used to record and visualize HSs on the smartphone.
All of the previous three apps require the smartphone to be connected with an external stethoscope, while some other stethoscope apps use the sensitive enough microphones embedded in the smartphone [146, 147]. Specifically, the microphone is placed on the key auscultation points. The app then enhances the sounds that the microphone hears on the smartphone; these sounds can then be recorded, saved and transmitted. Despite the advantages of non-necessity of external hardware, however, two main drawbacks exist for this type of stethoscope apps: firstly, the microphone is not easy to be placed in the exactly correct orientation and place, and secondly, the recorded sounds tend to be more prone to noise artifacts compared to those acquired using external advanced stethoscope.
Discussions and recommendations
Since its invention 200 years ago, the stethoscope has been an invaluable bedside tool for auscultating HSs and over the years has undergone very few changes to both itself and the way in which it is used . The stethoscope is employed by all doctors and nurses from primary to tertiary care. In the past decade, some fascinating debate has been sparked as to whether the stethoscope can be replaced by handheld ultrasound. Supporters of the handheld ultrasound-dominated view of future healthcare have pointed out that the use of ultrasound at the point of care can truly enhance patient care and outcomes by expediting diagnoses and treatment . On the other hand, many others have advocated that handheld ultrasound devices be used as a sequel to or as an extension of the stethoscope, and not as a replacement for the stethoscope . Stethoscopes have their primary use in cardiac diagnosis, which can be upgraded by electronic stethoscope. It is simply not possible for every nurse and doctor (and intern) to employ handheld echocardiography system universally, in place of the stethoscope. This is because handheld echocardiography requires highly trained personnel for data acquisition and interpretation ; also, costs will prohibit the exclusive use of handheld echocardiography systems . Further at this stage, echocardiography cannot provide automated cardiac assessment, and requires extensive computerized analysis of heart function followed by development of indices for clinical diagnosis. So based on these considerations, there is a tremendous scope for the design of the very portably adoptable electronic stethoscope, so that each and every doctor and nurse can employ it for automated diagnosis in the clinical setting.
The emergence and further development of the computer-aided electronic stethoscope can become a big step ahead of the traditional acoustic stethoscope, by allowing physicians to hear and view, record and transfer the HS data, and obtain diagnostic assessment. With recent technological advancements on signal processing algorithms and machine learning techniques, the computer-aided diagnosis of cardiac dysfunction using electronic stethoscope will become a reality, and represent another tremendous step forward toward finding a truly acoustic based diagnostic tool in all medical clinics as well as in bedside use. The new intelligent computer-based stethoscope system can contribute to a new auscultatory semiology, based on reliable methods of signal analysis and automatic interpretation, and will be the preliminary point of care to be followed by other points of care modality like portable ultrasound. Hence, we are advocating that this new intelligent computer-based stethoscope system be made available for universal use in clinics and hospitals.
Generally, the ultimate goal of future research and development is to make the electronic stethoscope a truly diagnostic device, which interprets the HSs and offers diagnosis based on reliable and validated correspondence between HSs and cardiac and valvular disorders. To date, however, the computer based HS analysis has not yet been developed to a level that can be used for automated reliable diagnosis in the clinical setting. Almost all the available electronic stethoscopes and smartphone stethoscope apps in the market are only for general auscultation, i.e., hearing, recording and transferring the HS data. They are still far away from being called diagnostic devices.
In fact, the way that regulators would treat an electronic stethoscope depends critically on the claims made for it. Since the device would be used for diagnosis and treatment of disease, the electronic stethoscope would be treated as a medical device in US. The regulatory demands would be very different for:
An “electrically amplified device used to project the sounds associated with the heart, arteries, and veins and other internal organs ”; most of the state-of-the-art products fall into this category [16, 135, 137, 140, 141].
A device that interprets sounds and is to be used as an aid to a physician in interpreting the data; some devices of this nature already exist [139, 154, 155].
A device that interprets sounds and offers diagnoses on its own, which would require extensive clinical data to get past the Food and Drug Administration (FDA).
The main factors limiting the electronic stethoscope to become a truly diagnostic device, in our opinion, are not only the diagnostic accuracy that the computer based system can provide, but more importantly the system’s reliability, robustness and transportability.
Firstly, the cardiac auscultations are usually performed in an environment which is not well controlled. The acoustic HS sensed by the stethoscope can be easily corrupted by the interferences and noises with versatile types and occurring at any time. The effectiveness of the computer based system can be largely affected by the existence of noises, due to the fact that the heart murmur signals has large frequency range and similar characteristics with background noises. The situation becomes even worse if the HS signals of interest are embedded in the high-amplitude noises. These problems can be eventually overcome by using HS recording sensors with high selective sensitivity and robust denoising algorithm, and these two solutions are part of the ongoing research trends in this field. As seen from the denoising algorithm review, the performance of existing noise reduction methods are evaluated mostly by using the clean HS signal from web HS databases combined with computer simulated noises; some are using clean real HS signal superimposed with simulated noises. The computer simulated noises, which are normally assumed as Gaussian distributed wide sense stationary, may largely deviate from the real noises that are usually encountered in the auscultation process. Therefore, more elegant noise suppression algorithms have to be developed and tested with various types of real interferences and noises.
Secondly, HS classification is a challenging task due to the nature of non-stationary property of the heart murmurs and large variations for the murmurs associating with the same heart disease. The characteristics of murmurs could vary for different patients with the same cardiac dysfunction, or even change for a single patient during different cardiac cycles. Hence, advanced feature extraction and classification techniques with robustness to the murmur variations are highly required to make the system more reliable. The number and choice of features is one of the most critical steps to the success of a classifier . Selecting too little features will result in data under-fit, i.e., a lack of information to classify the more complex events. While at the other extreme, having too many features will lead to data over-fit, making the system have poor predictive performance, as it can exaggerate minor fluctuations in the data. As stated in , the number of features in the classifiers should be minimized and several approaches can be recommended to achieve this. Another important issue is to select independent training and testing data set for validation of the classification system. For HS analysis, different HS records from the same patient cannot be considered to be sufficiently independent, even if the records were from different chest positions or from different days . Therefore, if repeated k-fold cross validation is adopted, it is not valid to simply divide all the HS segments into k folds. The proper procedure is as follows:
Divide the enrolled subjects into k folds on an average.
In each validation round, the HS segments associated with the subjects in one of the k folds are taken for validation, after the HS segments from the subjects in the remaining folds are used for training.
Sometimes, the more reliable cross validation accuracy can be obtained by averaging the accuracy over multiple times of k rounds.
This procedure is illustrated in Figure 14, where 50 times of tenfold cross validation are conducted.
Thirdly, as noted in [156, 157], computational transportability is a critical issue when applying machine learning methods. By computational transportability of a computer-aided auscultation system, it means the possibility for other experts or clinicians in different institutions to apply the proposed system to their own dataset or patients. Studies on the automatic HS analysis addressing this issue are rare. The HS classifier is said to be highly transportable if it can be directly applied by users in another hospital or another lab without any further training process. Unfortunately, this might not be the case in every system or in every application. For example, if the classifier is to be applied to a different racial group, re-training might need to be conducted to cater for differences in physiology and anatomy in the different population groups, which limits its use as a universal diagnostic device. One possible solution is to build a model that allows the system to perform well when transferred to new institutions with minimal tuning through re-training. For example in , a layered HMM representation has been formulated with the ability to decouple different levels of analysis for training and inference. Each level of the hierarchy is trained independently, with different feature vectors and time granularities. In consequence, the lowest, signal-analysis layer, that is most sensitive to new conditions, can be retrained to tune the parameters, while leaving the higher-level layers unchanged. Another issue related to transportability is the reproducibility of the results. Unlike conventional statistical model, machine learning systems are usually less transparent and poorly documented. Moreover, the system might produce different results if implemented in different software environments, which further makes it less reproducible . Hence, it is highly recommended that the software codes are properly documented and the implementations of the method are made publicly available.
Fourthly, fairly extensive and well-designed clinical trials have to be conducted, in order for the computer-aided auscultation system and device to get approval from FDA, and to be accepted by the medical professionals. Although preliminary results are promising, the clinical evaluation procedures of HS classification systems still have limitations. Most of the studies have been performed in the laboratory environment and the system has been tested with a limited set of data. In , a very preliminary study was conducted on the automatic HS analysis to classify AS, AR, MS and MR. Therein, 250 cardiac periods of HS data from HS simulator were used for system training and testing. Even though encouraging results were reported (with sensitivity and specificity of 97.5 and 100%), validations with larger sets of real HS data are definitely necessary to produce more reliable and stable performance evaluation. Furthermore, to determine the real clinical usefulness of the computer-aided auscultation system, it would be required to have a controlled clinical trial with blinded method. Table 6 gives the clinical trials with electronic stethoscope listed in the ClinicalTrials.gov registry. In , the cardiac sonospectrographic analyzer (CSA) output and the CT data of 200 patients were prospectively analyzed and compared in a double-blinded manner. The overall sensitivity of 89.5% and specificity of 57.7% for the CSA to predict coronary artery disease were statistically significant. Similarly in , the AS acceleration indices obtained from HS data for 50 patients were correlated double blindly to standard transthoracic echo measurements to predict the severity of AS.
Lastly, in most of the present research work, the diagnosing process of cardiac dysfunction with HS is usually performed offline. The recorded HS data is stored locally and transferred to a PC for signal interpretation. Here two approaches could be considered to further improve the electronic stethoscope system:
Stethoscope with embedded autonomous analysis and interpretation of HS. The signal processing including noise suppression and event recognition can be implemented by using a low power digital signal processor. With this embedded system design, the stethoscope can become not only valuable to the physician but also simple for home use by the patients themselves, who are usually not considered as the target user of the stethoscope.
Stethoscope coupled with a hosting device or a server for sophisticated analysis (coupled to a PC with a Bluetooth link) for the use of professionals, in order to improve performance of clinical medical diagnosis.
The development and commercialization of real-time computer based HS analysis system will be a major area for future research approaches.
Comprehensive suggestions for future development of computer-aided auscultation system from the cardiologist’s point of view have been presented in . Some key suggestions therein are summarized below:
The system must be easy to use by minimally trained personnel in a primary healthcare setting.
The automatic signal analysis must be performed in real-time to allow for immediate patient disposition.
Simultaneous HS data recording at multiple chest sites with multiple sensors are more beneficial in terms of diagnostic accuracy, since the results from a single HS signal can be cross-referenced with those obtained from other sites.
Cardiac auscultation constitutes an important part of clinical medicine. Recently, techniques associated with advances in electronic stethoscope have been developed to process auscultatory HS signals as well as analyze and clarify the resulting sounds, to make diagnosis based on quantifiable medical assessment. The availability of automatic interpretation of HSs opens interesting perspectives in the context of cardiovascular diagnostic measures.
The stethoscope is used by each and every doctor and nurse. So if we can incorporate into the electronic stethoscope (Figures 3, 12 and 13), the technology of automated detection of cardiac disorders (Figure 4), then it will become even more useful. As soon as the doctor applies the stethoscope, right away will s/he be able to know the disorders.
This comprehensive paper is designed to review recent technological advances, summarize promising innovations and perspectives in the field of computer-aided HS processing and analysis, investigate the gaps between technology and its real clinical application, and suggest probable areas for future research to mature the technology and to unlock the clinical value.
Molcer PS, Kecskes I, Delic V, Domijan E, Domijan M. Examination of formant frequencies for further classification of heart murmurs. In: Proceedings of the International Symposium on Intelligent Systems and Informatics; 2010. p. 575–8.
Zhong L, Zhang JM, Zhao XD, Tan RS, Wan M. Automatic localization of the left ventricle from cardiac cine magnetic resonance imaging: a new spectrum-based computer-aided tool. PLoS One. 2014;9(4):e92382.
Lim CW, Su Y, Yeo SY, Ng GM, Nguyen VT, Zhong L, et al. Automatic 4D reconstruction of patient-specific cardiac mesh with 1-to-1 vertex correspondence from segmented contours lines. PLoS One. 2013;9(4):e93747.
Cui HF, Wang DS, Wan M, Zhang JM, Zhao XD, Tan RS, et al. Coronary artery segmentation via hessian filter and curve skeleton extraction. In: Proceeding of 36th Int. Conf. IEEE Engineering in Medicine and Biology Society (EMBS); 2014.
Smith C. Transducer for sensing body sounds. US patent: US6661897B2. 2003.
Torres-Pereira L, Ruivo P, Torres-Pereira C, Couto C. A noninvasive telemetric heart rate monitoring system based on phonocardiography. In: Proceedings of the IEEE International Symposium on Industrial Electronics (ISIE), vol. 3; 1997. p. 856–9.
Grundlehner B, Buxi D. Methods to characterize sensors for capturing body sounds. In: International Conference on Body Sensor Networks (BSN); 2011. p. 59–64.
Popov B, Sierra G, Telfort V, Agarwal R, Lanzo V. Estimation of respiratory rate and heart rate during treadmill tests using acoustic sensor. In: Proceeding of 27th International Conference on IEEE Engineering in Medicine and Biology Society (EMBS); 2005. p. 5884–7.
Surtel W, Maciejewski M, Maciejewska B. Processing of simultaneous biomedical signal data in circulatory system conditions diagnosis using mobile sensors during patient activity. In: Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA); 2013. p. 163–7.
Hu YT, Xu Y: An ultra-sensitive wearable accelerometer for continuous heart and lung sound monitoring. In: Proceeding of 34th International Conference on IEEE Engineering in Medicine and Biology Society (EMBS); 2012. p. 694–7.
Bassiachvili E, Stewart R, Shavezipur M, Patricia N. A MEMS capacitive, passively powered heart rate monitor: design and analysis. In: Microsystems and Nanoelectronics Research Conference (MNRC); 2008. p. 81–4.
Chai KTC, Han D, Ravinder PS, Duy DP, Chin YP, Jian WL, et al. 118-db dynamic range, continuous-time, opened-loop capacitance to voltage converter readout for capacitive MEMS accelerometer. In: IEEE Asian Solid-State Circuits Conference; 2010. p. 345–8.
Miles RN, Cui W, Su QT, Homentcovschi D. A MEMS low-noise sound pressure gradient microphone with capacitive sensing. J Microelectromech Syst. 2014;24(1):241–8.
Chao CT, Maneetien N, Wang CJ. On the construction of an electronic stethoscope with real-time heart sound de-noising feature. In: Proceeding of 35th International Conference on Telecommunication and Signal Processing (TSP); 2012. p. 521–4.
Varady P. Wavelet-based adaptive denoising of phonocardiographic records. In: Proceeding of 23rd International Conference on IEEE Engineering in Medicine and Biology Society (EMBS), vol. 2; 2001. p. 1846–9.
Kumar D, Carvalho P, Antunes M, Paiva RP, Henriques J. Noise detection during heart sound recording using periodicity signatures. Physiol Meas. 2011;32:599–618.
Lee YJ, Kim PU, Lee G, Cho JH, Kim MN. Single input ANC for suppression of breath sound. In: World Academy of Science, Engineering and Technology; 2010. p. 1150.
Basak KS, Mandal S, Manjunatha M, Chatterjee J, Ray AK. Phonocardiogram signal analysis using adaptive line enhancer methods on mixed signal processor. In: International Conference on Signal Processing and Communications (SPCOM); 2010. p. 1–5.
Hegde VN, Deekshit R, Satyanarayana PS. Random noise cancellation in biomedical signals using variable step size griffith lms adaptive line enhancer. J Mech Med Biol. 2012;12(4):599–618.
Paul AS, Wan EA, Nelson AT. Noise reduction for heart sounds using a modified minimum-mean squared error estimator with ECG gating. In: Proceeding of 28th International Conference on IEEE Engineering in Medicine and Biology Society (EMBS); 2006. p. 3385–90.
Mandal S, Basak K, Mandana KM, Ray AK, Chatterjee J, Mahadevappa M. Development of cardiac prescreening device for rural population using ultralow-power embedded system. IEEE Trans Biomed Eng. 2011;58(3):745–9.
Gavrovska A, Slavkovic M, Reljin I, Reljin B. Application of wavelet and EMD-based denoising to phonocardiograms. In: International Symposium on Signals, Circuits and Systems (ISSCS); 2013. p. 1–4.
Sun H, Chen W, Gong J. An improved empirical mode decomposition-wavelet algorithm for phonocardiogram signal denoising and its application in the first and second heart sound extraction. In: 6th International Conference on Biomedical Engineering and Informatics (BMEI); 2013. p. 187–91.
Jatupaiboon N, Pan-ngum S, Israsena P. Electronic stethoscope prototype with adaptive noise cancellation. In: Proceeding of 8th International Conference on ICT and Knowledge Engineering; 2010. p. 32–6.
Bai YW, Lu CL. The embedded digital stethoscope uses the adaptive noise cancellation filter and the type I chebyshev IIR bandpass filter to reduce the noise of the heart sound. In: Proceedings of 7th International Workshop on Enterprise Networking and Computing in Healthcare Industry (HEALTHCOM); 2005. p. 278–81.
Leng S, Ser W. Adaptive beamformer derived from a constrained null steering design. Sig Process. 2010;90:1530–41.
Sudo T, Tanaka H, Sugimoto C, Kohno R. A study on single-channel non-stationary noise suppression for cardiac sound. In: 8th International Symposium on Medical Information and Communication Technology (ISMICT); 2014. p. 1–5.
Gradolewski D, Redlarski G. Wavelet-based denoising method for real phonocardiography signal recorded by mobile devices in noisy environment. Comput Biol Med. 2014;52:119–29.
Zhang L, Yang XH. The application of an improved wavelet threshold denoising method in heart sound signal. In: Cross Strait Quad-Regional Radio Science and Wireless Technology Conference (CSQRWC), vol 2; 2011. p. 1115–7.
Zhao XM, Cao GT. A novel de-noising method for heart sound signal using improved thresholding function in wavelet domain. In: International Conference on Future BioMedical Information Engineering (FBIE); 2009. p. 65–8.
Agrawal KA, Jha K, Sharma S, Kumar A, Chourasia VS. Wavelet subband dependent thresholding for denoising of phonocardiographic signals. In: signal processing: algorithms, architectures, arrangements, and applications (SPA); 2013. p. 158–62.
Mandal S, Chatterjee J, Ray AK. A new framework for wavelet based analysis of acoustical cardiac signals. In: Proceeding of 32nd International Conference on IEEE Engineering in Medicine and Biology Society (EMBS); 2010. p. 494–8.
Huang NE, Shen Z, Long SR, Wu MC, Shig HH, Zheng QN, et al. The empirical mode decomposition and the hilbert spectrum for nonlinear and non-stationary time series analysis. Proc R Soc A. 1998;454:903–55.
Hedayioglu FM, Jafari G, Mattos SS, Plumbley MD, Coimbra MT. Denoising and segmentation of the second heart sound using matching pursuit. In: Proceeding of 34th International Conference on IEEE Engineering in Medicine and Biology Society (EMBS); 2012. p. 3440–3.
Almasi AM, Shamsollahi B, Senhadji L. Bayesian denoising framework of phonocardiogram based on a new dynamical model. IRBM. 2013;34(3):214–25.
Ayari F, Ksouri M, Alouani AT. Lung sound extraction from mixed lung and heart sounds FASTICA algorithm. In: 16th IEEE Mediterranean Electrotechnical Conference (MELECON); 2012. p. 339–42.
Pourazad MT, Moussavi Z, Farahmand F, Ward RK. Heart sounds separation from lung sounds using independent component analysis. In: Proceeding of 27th International Conference on IEEE Engineering in Medicine and Biology Society (EMBS); 2005, p. 2736–9.
Shamsuddin N, Mustafa MN, Husin S, Taib MN: Classification of heart sounds using a multilayer feed-forward neural network. In: Asian Conference on Sensors and the International Conference on New Techniques in Pharmaceutical and Biomedical Research; 2005. p. 87–90.
Yuenyong S, Kongpravechnon W, Tungpimolrut K, Nishihara A. Automatic heart sound analysis for tele-cardiac auscultation. In: ICCAS-SICE; 2009. p. 1599–604.
Lehner RJ, Rangayyan RM. A three-channel microcomputer system for segmentation and characterization of the phonocardiogram. IEEE Trans Biomed Eng. 1987;34(6):485–9.
Liang H, Lukkarinen S, Hartimo I. Heart sound segmentation algorithm based on heart sound envelogram. In: Computers in cardiology; 1997. p. 105–8.
Liang H, Lukkarinen S, Hartimo I. A heart sound segmentation algorithm using wavelet decomposition and reconstruction. In: Proceeding of 19th International Conference on IEEE Engineering in Medicine and Biology Society (EMBS); 1997. p. 1630–3.
Liang H, Lukkarinen S, Hartimo I. A boundary modification method for heart sound segmentation algorithm. In: Computers in cardiology; 1998. p. 593–5.
Haghighi-Mood A, Torry JN. A sub-band energy tracking algorithm for heart sound segmentation. In: Computers in cardiology; 1995. p. 501–4.
Gupta CN, Palaniappan R, Swaminathan S, Krishnan SM. Neural network classification of homomorphic segmented heart sounds. Appl Soft Comput. 2007;7(1):286–97.
Nieblas CI, Alonso MA, Conte R, Villarreal S: High performance heart sound segmentation algorithm based on matching pursuit. In: IEEE Digital Signal Processing and Signal Processing Education Meeting (DSP/SPE); 2013. p. 96–100.
Papadaniil CD, Hadjileontiadis LJ. Efficient heart sound segmentation and extraction using ensemble empirical mode decomposition and kurtosis features. IEEE J Biomed Health Inf. 2014;18(4):1138–52.
Maragoudakis M, Loukis E. Automated aortic and mitral valves diseases diagnosis from heart sound signals using novel ensemble classification techniques. In: 22nd IEEE International Conference on Tools with Artificial Intelligence (ICTAI), vol. 2; 2010. p. 267–73.
Strunic SL, Rios-Gutierrez F, Alba-Flores R, Nordehn G, Burns S. Detection and classification of cardiac murmurs using segmentation techniques and artificial neural networks. In: IEEE Symposium on Computational Intelligence and Data Mining (CIDM); 2007. p. 397–404.
Kumar D, Carvalho P, Antunes M, Paiva RP, Henriques J. Heart murmur classification with feature selection. In: Proceeding of 31st International Conference on IEEE Engineering in Medicine and Biology Society (EMBS); 2009. p. 4566–9.
Kumar D, Carvalho P, Couceiro R, Antunes M, Paiva RP, Henriques J. Heart murmur classification using complexity signatures. In: 20th International Conference on Pattern Recognition (ICPR); 2010. p. 2564–7.
Kwak C, Kwon OW. Cardiac disorder classification by heart sound signals using murmur likelihood and hidden markov model state likelihood. IET Signal Proc. 2012;6(4):326–34.
Moukadem A, Dieterlen A, Brandt C. Shannon entropy based on the s-transform spectrogram applied on the classification of heart sounds. In: International Conference on Acoustics, Speech, and Signal Processing (ICASSP); 2013. p. 704–8.
Amiri AM, Armano G. Early diagnosis of heart disease using classification and regression trees. In: International Joint Conference on Neural Networks (IJCNN); 2013. p. 1–4.
Liang H, Hartimo I. A heart sound feature extraction algorithm based on wavelet decomposition and reconstruction. In: Proceeding of 20th International Conference IEEE Engineering in Medicine and Biology Society (EMBS), vol 3; 1998. p. 1539–42.
Phatiwuttipat P, Kongprawechon W, Tungpimolrut K, Yuenyong S. Cardiac auscultation analysis system with neural network and SVM technique. In: 8th International Conference on Electrical Engineering, Electronics, Computer, Telecommunications and Information Technology (ECTI-CON); 2011. p. 1027–30.
Markaki M, Germanakis I, Stylianou Y. Automatic classification of systolic heart murmurs. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); 2013. p. 1301–5.
Avendano-Valencia D, Martinez-Tabares F, Acosta-Medina D, Godino-Llorente I, Castellanos-Dominguez G. TFR-based feature extraction using PCA approaches for discrimination of heart murmurs. In: Proceeding of 31st International Conference on IEEE Engineering in Medicine and Biology Society (EMBS); 2009. p. 5665–8.
Zeljković V, Druzgalski C, Bojic M, Mayorga P, Zhang D. Diagnostic spectral segregation of cardiac sounds features. In: Pan American Health Care Exchanges (PAHCE); 2014. p. 1–7.
Ari S, Hembram K, Saha G. Detection of cardiac abnormality from PCG signal using LMS based least square SVM classifier. Expert Syst Appl. 2010;37:8019–26.
Quiceno-Manrique AF, Godino-Llorente JI, Blanco-Velasco M, Castellanos-Dominguez G. Selection of dynamic features based on time-frequency representations for heart murmur detection from phonocardiographic signals. Ann Biomed Eng. 2010;38(1):118–37.
Leung TS, White PR, Collis WB, Brown E, Salmon AP. Classification of heart sounds using time-frequency method and artificial neural networks. In: Proceeding of 22nd International Conference on IEEE Engineering in Medicine and Biology Society (EMBS), vol 2; 2000. p. 988–91.
Ning TK, Ning J, Atanasov N, Hsieh K. A fast heart sounds detection and heart murmur classification algorithm. In: 11th International Conference on Signal Processing (ICSP), vol 3; 2012. p. 1629–32.
Hadi HM, Mashor MY, Suboh MZ, Mohamed MS. Classification of heart sound based on s-transform and neural network. In: 10th International Conference on Information Sciences Signal Processing and Their Applications (ISSPA); 2010. p. 189–92.
Hadi HM, Mashor MY, Mohamed MS, Tat KB. Classification of heart sounds using wavelets and neural networks. In: 5th International Conference on Electrical Engineering, Computing Science and Automatic Control; 2008. p. 177–80.
Elbedwehy MN, Zawbaa HM, Ghali N, Hassanien AE. Detection of heart disease using binary particle swarm optimization. In: Federated Conference on Computer Science and Information Systems (FedCSIS); 2012. p. 177–82.
Stasis AC, Loukis EN, Pavlopoulos SA, Koutsouris D. A decision tree-based method, using auscultation findings, for the differential diagnosis of aortic stenosis from mitral regurgitation. In: Computers in cardiology; 2003. p. 769–72.
Kao WC, Wei CC. Automatic phonocardiograph signal analysis for detecting heart valve disorders. Expert Syst Appl. 2011;38(6):6458–68.
Sharif Z, Zainal MS, Sha’ameri AZ, Salleh SHS. Analysis and classification of heart sounds and murmurs based on the instantaneous energy and frequency estimations. In: Proceedings TENCON, vol 2; 2000. p. 130–4.
Brusco M, Nazeran H. Development of an intelligent PDA-based wearable digital phonocardiograph. In: Proceeding of 27th International Conference on IEEE Engineering in Medicine and Biology Society (EMBS), vol 4; 2005. p. 3506–9.
Abdel-Alim O, Hamdy N, El-Hanjouri MA. Heart diseases diagnosis using heart sounds. In: Proceedings of the Nineteenth National Radio Science Conference (NRSC); 2002. p. 634–40.
Wu CH, Lo CW, Wang JF. Computer-aided analysis and classification of heart sounds based on neural networks and time analysis. In: International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol 5; 1995. p. 3455–8.
Bentley PM, McDonnell JTE, Grant PM. Classification of native heart valve sounds using the choi-williams time-frequency distribution. In: Proceeding of 17th International Conference on IEEE Engineering in Medicine and Biology Society (EMBS), vol 2; 1995. p. 1083–4.
Hussain S, Salleh, Kamarulafizam I, Noor AM, Harris AA, Oemar H, et al. Classification of heart sound based on multipoint auscultation system. In: 8th International Workshop on Systems, Signal Processing and Their Applications (WoSSPA); 2013. p. 174–9.
Fu WJ, Yang XH, Wang YT. Heart sound diagnosis based on DTW and MFCC. In: 3rd International Congress on Image and Signal Processing (CISP), vol. 6; 2010. p. 2920–3.
Vepa J. Classification of heart murmurs using cepstral features and support vector machines. In: Proceeding of 31st International Conference on IEEE Engineering in Medicine and Biology Society (EMBS); 2009. p. 2539–42.
Gupta CN, Palaniappan R, Rajan S, Swaminathan S, Krishnan SM. Segmentation and classification of heart sounds. In: Canadian Conference on Electrical and Computer Engineering; 2005. p. 1674–7.
Say O, Dokur Z, Olmez T. Classification of heart sounds by using wavelet transform. In: Proceeding of the Second Joint Engineering in Medicine and Biology, 24th Annual Conference and the Annual Fall Meeting of the Biomedical Engineering Society (EMBS/BMES) Conference, vol 1; 2002. p. 128–9.
Karimi M, Amirfattahi R, Sadri S, Marvasti SA. Noninvasive detection and classification of coronary artery occlusions using wavelet analysis of heart sounds with neural networks. In: The 3rd IEE International Seminar on Medical Applications of Signal Processing; 2005. p. 117–20.
Zin ZM, Hussain-Salleh S, Sulaiman MD. Wavelet analysis and classification of mitral regurgitation and normal heart sounds based on artificial neural networks. In: Proceedings of 7th International Symposium on Signal Processing and Its Applications, vol 2; 2003. p. 619–20.
Voss A, Herold J, Schroeder R, Nasticzky R, Mix A, Ulrich P, et al. Diagnosis of aortic valve stenosis by correlation analysis of wavelet filtered heart sounds. In: Proceeding of 25th International Conference on IEEE Engineering in Medicine and Biology Society (EMBS), vol 3; 2003. p. 2873–6.
Cota NG, Palaniappan R, Swaminathan S. Classification of homomorphic segmented phonocardiogram signals using grow and learn network. In: Proceeding of 27th International Conference on IEEE Engineering in Medicine and Biology Society (EMBS); 2005. p. 4251–4.
Turkoglu I, Arslan A: An intelligent pattern recognition system based on neural network and wavelet decomposition for interpretation of heart sounds. In: Proceeding of 23rd International Conference on IEEE Engineering in Medicine and Biology Society (EMBS), vol 2; 2001. p. 1747–50.
Choi SJ, Shin YK, Cheong YJ, Lee GJ, Park HK. Wavelet packet based features for insufficient murmur identification. In: International Conference on Control Automation and Systems (ICCAS); 2010. p. 1184–7.
Rios-Gutierrez F, Alba-Flores R, Ejaz K, Nordehn G, Andrisevic N, Burns S. Classification of four types of common murmurs using wavelets and a learning vector quantization network. In: International Joint Conference on Neural Networks (IJCNN); 2006. p. 2206–13.
Omran S, Tayel M. A heart sound segmentation and feature extraction algorithm using wavelets. In: First International Symposium on Control, Communications and Signal Processing; 2004. p. 235–8.
Safara R, Doraisamy S, Azman A, Jantan A, Ranga S. Wavelet packet entropy for heart murmurs classification. Adv Bioinform. 2012;2012:327269.
Ahlström C. Processing of the phonocardiographic signal—methods for the intelligent stethoscope. Thesis, Department of Biomedical Engineering, Institute of Technology, Linköping University, Sweden, 2006.
Kotsiantis SB. Supervised machine learning: a review of classification techniques. Informatica. 2007;31:249–68.
Germanakos P, Mourlas C, Samaras G. A mobile agent approach for ubiquitous and personalized eHealth information systems. In: Proceedings of the Workshop on ‘Personalization for e-Health’ of the 10th International Conference on User Modeling (UM’05); 2005. p. 67–70.
Mahnke CB. Automated heart sound analysis/computer-aided auscultation: a cardiologist’s perspective and suggestions for future development. In: Proceeding of 31st International Conference on IEEE Engineering in Medicine and Biology Society (EMBS); 2009. p. 3115–8.
SL carried out studies on heart sound denosing, heart sound classification, market product review, stethoscope apps review, and wrote the manuscript. RST carried out studies on characteristics of heart sounds and clinical application of electronic stethoscope. KTCC and CW carried out studies on sensors for heart sound recording and hardware implementation of computer-aided auscultation system. DG helped conceive the review methodology design and critically revise the manuscript. LZ conceived of the review methodology design, carried out studies on the overall system and critically revised the manuscript. All authors read and approved the final manuscript.
The authors appreciate the support of Duke-NUS/SingHealth Academic Medicine Research Institute and the medical editing assistance of Taara Madhavan (Associate, Clinical Sciences, Duke-NUS Graduate Medical School).
Compliance with ethical guidelines
Competing interestsThe authors declare that they have no competing interest.
Authors and Affiliations
National Heart Research Institute Singapore, National Heart Centre Singapore, 5 Hospital Drive, Singapore, 169609, Singapore
Shuang Leng, Ru San Tan & Liang Zhong
Cardiovascular and Metabolic Disorders Program, Duke-NUS Graduate Medical School, 8 College Road, Singapore, 169857, Singapore
Ru San Tan & Liang Zhong
Institute of Microelectronics, A*STAR, 11 Science Park Road, Singapore, 117685, Singapore
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.