Skip to main content

An automatic segmentation method for heart sounds



There are two major challenges in automated heart sound analysis: segmentation and classification. An efficient segmentation is capable of providing valuable diagnostic information of patients. In addition, it is crucial for some feature-extraction based classification methods. Therefore, the segmentation of heart sound is of significant value.


This paper presents an automatic heart sound segmentation method that combines the time-domain analysis, frequency-domain analysis and time–frequency-domain analysis. Employing this method, the boundaries of heart sound components are first located, and the components are then recognized. Finally, the heart sounds are divided into several segments on the basis of the results of boundary localization and component identification.


In order to evaluate the performance of the proposed method, quantitative experiments are performed on an authoritative heart sound database. The experimental results show that the boundary localization has a sensitivity (Se) of 100%, a positive predictive value (PPV) of 99.3% and an accuracy (Acc) of 99.93%. Moreover, the Se, PPV and Acc of component identification reach 98.63, 99.86 and 98.49%, respectively.


The proposed method shows reliable performance on the segmentation of heart sounds. Compared with previous works, this method can be applied to not only normal heart sounds, but also the sounds with S3, S4 and murmurs, thus greatly increasing the applied range.


Cardiovascular disease (CAD) remains the leading cause of death worldwide. In China, CAD accounted for more than 40% of deaths in 2014 [1]. Heart sounds, which are generated by the beating of heart, are considered as an important signal for detecting cardiovascular problems. In general, a normal heart sound comprises two components, namely the first heart sound (S1) and the second heart sound (S2). In some special cases, three additional components, namely the third heart sound (S3), fourth heart sound (S4), and murmurs, may appear together or separately [2]. Heart sounds have been used to diagnose cardiovascular problems for hundreds of years. Even currently, auscultation remains a crucial approach for physicians to learn about patients’ heart health. However, this conventional diagnostic approach is highly subjective and relies largely on physicians’ experience [3]. Thus, an accurate and efficient automated heart sound analysis system is required.

There are two major challenges in developing an automated heart sound analysis tool: segmentation and classification. The primary operations of segmentation are positioning the boundaries of heart sound components and identifying the component types, namely S1, S2, S3, and S4. Several segmentation methods have been reported. However, most of the methods focus on either boundary detection or component identification. In addition, many methods only apply on normal heart sounds, thus severely limiting the application of these methods.

Liang et al. [4] proposed a Shannon-entropy-based heart sound segmentation method to recognize S1 and S2, but they did not study S3, S4, and murmurs. Kumar et al. [5] presented an S3 detection algorithm based on wavelet transform; however, this method could not detect the boundaries of the heart sound components. Moukadem et al. [6] developed an S-transform-based heart sound segmentation method; however, this method can only be applied to normal heart sounds. Springer et al. [7] successfully addressed the segmentation problem of noisy S1 and S2 by utilizing a logistic regression–hidden semi-Markov model. Nevertheless, this study, as most other studies, did not address the segmentation of heart sounds containing S3 and S4. In addition to the aforementioned studies, several other researchers have also contributed to the automated segmentation of heart sounds. For example, Naseri et al. [8] proposed a heart sound segmentation method based on the frequency-energy method, which had excellent performance for various heart sounds. Boutana et al. [9] presented a time–frequency-analysis–based heart sound segmentation method and validated the efficiency of the method through some pathological heart sounds. Tang et al. [10] reported a dynamic-clustering-based segmentation method, which was effective for both normal and abnormal heart sounds. However, these methods all have some of the aforementioned limitations.

Based on the advantages and drawbacks of the methods mentioned above, the present study is focused on developing an automated heart sound segmentation algorithm that can position the boundaries and recognize the components for both normal heart sounds and heart sounds with S3, S4 and murmurs.

In this study, a method that combines time-domain analysis, frequency-domain analysis, and time–frequency-domain analysis was developed, and some novel strategies were proposed. The complete method comprises four parts, namely preprocessing, murmur elimination, boundary detection, and component identification. The overall process is presented in Fig. 1. First, the noisy raw heart sound is standardized in the preprocessing phase. Then, the signal goes through murmur elimination algorithm, where any possible murmurs are considerably eliminated without affecting the characteristics of the heart sound signal. In the following boundary detection phase, the onset and offset of each heart sound component are positioned and marked automatically. Finally, in the component identification phase, the cardiac cycle is calculated first, and the heart sound components are then recognized in each heartbeat on the basis of cardiac cycle.

Fig. 1
figure 1

Block diagram of the proposed method



To validate the proposed method, two quantitative experiments were performed, and the heart sounds from the University of Michigan’s Heart Sound & Murmur Library were employed [11]. In the first experiment, 12 types of heart sounds with variable murmurs were used to verify the effect of murmur elimination. The total length of these sounds was 759 s, and the cardiac cycle number reached 888. The second experiment was performed to quantify the effectiveness of boundary detection and component identification. For this purpose, 16 types of sounds, including 2 normal and 14 abnormal types, were used, constituting a total of 1039 s (1251 cycles). The heart sound signals were recorded at 44.1 kHz, and each one lasted for at least 56 s. For the convenience of displaying and testing, each sound was divided into three-heart-cycle segments. Moreover, every two adjacent segments had an overlap of one cardiac cycle.



Because the maximal frequency of the heart sounds did not exceed 1 kHz [12], the heart sound signals were down-sampled to 2 kHz according to the Nyquist-Shannon sampling theorem.


A wavelet-based denoising method with soft threshold was utilized. The db4 wavelet was chosen as mother wavelet because its shape is similar to that of a heart sound signal, and the scale was established as 7 based on signal–noise-ratio (SNR) and normalized root mean squared error (NRMSE) [13]. The denoised heart sound was written as \(HS\left( n \right)\).


To standardize the processing of heart sound, the heart sound signals were normalized as

$$HS_{N} \left( n \right) = \frac{HS\left( n \right)}{{{ \hbox{max} }\left( {\left| {HS\left( n \right)} \right|} \right)}} , \quad n = 0,1, \ldots ,N - 1$$

where N is the total number of sampling points.

Unless otherwise specified, the expressions “original heart sound” and “original signal” appearing in the following refer to HSN(n).

Murmur elimination

Different from the conventional constant-cutoff-frequency-based murmur elimination method (usually 200 Hz) [8, 14], this paper presents a novel low-pass filter to remove the murmurs, namely the automatic-cutoff-frequency low pass filter (ALPF), whose cutoff frequency is calculated by analyzing the fast Fourier transform (FFT) of the heart sound.

Assuming that the modulus of the FFT sequence of \(HS_{N} \left( n \right)\) is \(FFT_{H} \left( n \right)\), n = 0, 1, …, N − 1, the envelope of \(FFT_{H} \left( n \right)\) is obtained by the moving average method as follows:

$$E_{FFT} \left( n \right) = \frac{1}{{L_{F} + n + 1}}\sum\nolimits_{k = 0}^{{n + L_{F} }} {FFT_{H} \left( k \right)} ,\quad 0 \le n \le L_{F} - 1$$
$$E_{FFT} \left( n \right) = \frac{1}{{2L_{F} + 1}}\sum\nolimits_{{k = n - L_{F} }}^{{n + L_{F} }} {FFT_{H} \left( k \right)} , \quad L_{F} \le n \le N - L_{F} - 1$$
$$E_{FFT} \left( n \right) = \frac{1}{{L_{F} + N - n}}\sum\nolimits_{{k = n - L_{F} }}^{{n + L_{F} }} {FFT_{H} \left( k \right)} ,\quad N - L_{F} \le n \le N - 1$$

where N is the sampling number of FFTH(n), and LF is the neighborhood radius of point n, LF N. Note that if LF is too large, some adjacent local peaks of \(E_{FFT} \left( n \right)\) are merged together. Considering this and experimental results, LF is set to 5. The envelope is also normalized using (1) and is written as \(E_{F} \left( n \right)\). Unless otherwise specified, the “FFT coefficient” and “FFT envelope” appearing hereinafter refer to \(E_{F} \left( n \right)\).

For the heart sounds without murmurs,\(E_{F} \left( n \right)\) is generally comprised of primary peak and side peak (see in Fig. 2b). The primary peak is located in the low-frequency area with the highest amplitude, and the side peak is located in the relatively higher-frequency area with much smaller amplitude. Thus, most energy of heart sounds is concentrated within the frequencies of primary peak, and filtering out the side peak does not affect the shape of heart sounds (see in Fig. 2c, d). Therefore, the primary peak preserves the information of heart sound components (S1, S2, S3 and S4).

Fig. 2
figure 2

The performance of ALFP. a Is a normal heart sound, c is the filtered normal sound by ALFP, e is a heart sound with S3 and systolic murmurs, g is the murmur-eliminated sound by ALPF. b, d, f and h Are the FFT of a, c, e and g, respectively. In b, d, f and h, the yellow curves are the FFT sequences, and the blue curves are the envelopes of them. In b, the points marked by purple diamonds are two examples of spikes. In b and f, the points marked by red squares and green squares are the primary peak points and the last valley points with FFT coefficient smaller than 0.2, respectively

For the heart sounds with murmurs, the primary peak still exists, but the side peak is usually merged with the frequency band of murmurs (see in Fig. 2f). Considering that the murmurs usually have higher frequency than the primary peak, the frequency of primary peak’s ending point is an appropriate estimation of the cutoff frequency to remove murmurs.

In this study, the ending point of primary peak is searched in the range of 20 to 200 Hz and is determined as the first valley point (the lowest point between two adjacent spikes) after the primary peak point with FFT coefficient smaller than 0.2. The value 0.2 is determined based on experimental experience. If there is no such an ending point between 20 and 200 Hz, the cutoff frequency is set to 200 Hz.

\(HS_{N} \left( n \right)\) is filtered using this frequency, and the filtered sound is then normalized; the normalized sound is denoted as HSNLP.

Boundary detection

Envelope extraction

After removing the interference of murmurs, the next step is to calculate the onsets and offsets of heart sound components. Because the envelope can reduce the complexity of computing while preserving the location information of the signals, the closing operation of mathematical morphology is utilized to obtain the envelope in the proposed method and is defined as

$$\left[ {f \cdot g} \right]\left( n \right) = \left\{ {\left[ {f \oplus g} \right] \otimes g} \right\}\left( n \right)$$

where f(n) refers to the input signal, and g(n) is a structure element; denotes the dilation operation, and denotes the erosion operation. The lengths of f(n) and g(n) are P and Q respectively, and generally P > Q [15, 16].

The choice of structure element g(n) directly affects the shape of the envelope obtained by this operation [17]. In this study, in order to make the shape of envelope simple, g(n) is designed as

$$g\left( n \right) = 0, \quad 0 \le n \le Q$$

Moreover, Q, the length of \(g\left( n \right)\), is also a significant parameter. According to physiological knowledge, in general, the time interval between adjacent heart sound components (S1S2 interval, S2S3 interval, S4S1 interval, etc.) is no less than 100 ms [8, 18], which corresponds to 200 points under the sampling rate of 2 kHz. Thus, Q should be smaller than 200. In this study, Q was set as 30 on the basis of the experimental results.

The \(f\left( n \right)\) in (3) is replaced by HSNLP, and the heart sound envelope is obtained. The envelope is then normalized. The result is denoted as ENLP.

Although the ALPF can effectively reduce murmurs, the filtered heart sound may retain some low-amplitude residuals, which are also reflected in the envelope and affect the detection of boundaries. In this method, three operations are used to overcome this problem.

Thresholding processing

The first operation is thresholding.

For some murmurs, such as early systolic murmurs, their onsets partially overlap the offsets of S1s. After filtering by ALPF, the residues of murmurs may still be linked with S1s. Consequently, the murmur residues and S1s may be mixed together in \(E_{NLP}\). Therefore, the purpose of thresholding is to separate the boundaries of murmurs from those of S1s to conduct the subsequent operations. Consequently, a small threshold can complete this task. Due to the use of small threshold, the boundary information of the heart sound components can be preserved as much as possible.

The envelope \(\tilde{E}_{NLP}\) after thresholding is calculated as

$$\tilde{E}_{NLP} = \left\{ {\begin{array}{ll} E_{NLP} ,& \quad E_{NLP} \ge \theta \\ 0,&\quad E_{NLP} < \theta \\ \end{array} } \right.$$

where θ = min(θAθC), \(\theta_{A} = \lambda \cdot \sqrt {\frac{1}{N}\mathop \sum \nolimits_{n = 0}^{N - 1} \left[ {E_{NLP} \left( n \right) - \mu } \right]^{2} }\), \(\lambda = 0.8, \mu = \frac{1}{N}\mathop \sum \nolimits_{m = 0}^{N - 1} E_{NLP} \left( m \right)\) and θC = 0.025. The parameter θA enables the algorithm to determine an appropriate threshold based on the characteristics of the signal itself, and θC is a reference threshold, preventing error in θA; the value of θC is determined by experimental results.

Opening operation

After thresholding, the boundaries of heart sound components and interfering residues are separated. The remaining residues are considerably smaller than the heart sound components in both duration and amplitude. Because the opening operation of mathematical morphology can reduce spikes, it is utilized as the second stage of residue removal. The opening operation is defined as [19]

$$\left[ {f \circ g} \right]\left( n \right) = \left\{ {\left[ {f \otimes g} \right] \oplus g} \right\}\left( n \right)$$

The g(n) in opening operation is still defined using (4), but the length of g(n) is denoted as Q’. Besides, the parameters in (6) are same as those in (3).

As in the closing operation, Q’ is also a significant parameter. Because the average durations of S1 and S2 are approximately 100 ms and the durations of S3 and S4 are approximately 50 ms [8], which correspond to 200 and 100 sampling points, respectively. Thus, Q’ must be smaller than 100; in this study, it was set to 50 based on experimental results. The results of the opening operation are normalized, and the normalized signal is denoted as \(\tilde{E}_{NO}\).

Energy thresholding processing

Finally, to completely exclude the interference components from the envelope, the energy thresholding approach is used. Considering that \(\tilde{E}_{NO}\) is comprised of points of zero and non-zero, i.e., the regions located by S1, S2, S3, S4 and few murmur residues are non-zero, and the rest regions are zero. Therefore, the boundaries of each component (including the interference components escaping the opening operation) can be automatically obtained as follows:

$$If \tilde{E}_{NO} \left( i \right) = = 0 \& \& \tilde{E}_{NO} \left( {i + 1} \right) \ne 0,\quad i = 0,1, \ldots N - 2$$

then, the ith point is determined as a potential onset;

$$If \tilde{E}_{NO} \left( {j - 1} \right) \ne 0 \& \& \tilde{E}_{NO} \left( j \right) = = 0,\quad j = 1,2, \ldots N - 1$$

the jth point is determined as a potential offset. The potential onset and the potential offset always appear in pairs and these onsets and offsets are denoted as SPT(i) and EPT(j) respectively, where \(i, j = 0,1, \ldots ,M - 1\). M is the number of potential onsets/offset.

Consequently, the energy of each component can be obtained as

$$E_{k} = \sum\nolimits_{i = SPT\left( k \right)}^{EPT\left( k \right)} {\tilde{E}_{NO} {}^{2}\left( i \right)} ,\quad k = 0,1, \ldots ,M - 1$$

Assuming for each Ek,

$$If \ E_{k} < \eta$$

then the kth potential onset and offset are considered invalid and removed from \(SPT\) and EPT. Moreover, the points between these invalid onsets and offsets are determined as interference parts and are set to zero using (8)

$$\tilde{E}_{NO} \left( n \right) = \left\{ {\begin{array}{l} {0,\ {SPT}\left( k \right) \le n \le EPT\left( k \right), \ 0 \le k \le M - 1, \ if \ E_{k} < \eta } \\ {\tilde{E}_{NO} \left( n \right), \ else } \\ \end{array} } \right.$$

In this inequality, \(\eta\) is set to 0.25 based on experimental experience. The remaining potential points are determined as the final onsets and offsets of the heart sound components.

Component identification

Cardiac cycle calculation

The last step is to recognize the heart sound components. Considering the quasi-periodic nature of heart sounds, this step can be more efficiently accomplished if the cardiac cycle is calculated. In some studies, the cardiac cycle was calculated by using the partial autocorrelation function (PACF) [10, 15]. However, because of the inherent defects of PACF, the calculating results are not satisfactory. In order to overcome the shortcomings of PACF, this study proposes a cardiac cycle calculation method based on the unbiased autocorrelation function (UACF), considerably improving the applicability.

The PACF and UACF are defined as (9) and (10), respectively.

$$R\left( m \right) = \frac{1}{N}\sum\nolimits_{n = 0}^{N - m - 1} {\tilde{E}_{NO} \left( n \right)\tilde{E}_{NO} \left( {n + m} \right)}$$
$$R^{\prime}\left( m \right) = \frac{1}{N - m}\sum\nolimits_{n = 0}^{N - m - 1} {\tilde{E}_{NO} \left( n \right)\tilde{E}_{NO} \left( {n + m} \right)}$$

where \(m = 0,1, \ldots N - 1, N\) is the sampling number of \(\tilde{E}_{NO}\). Figure 3 demonstrates the PACF and UACF.

Fig. 3
figure 3

PACF and UACF of heart sound: a is the thresholded heart sound envelope \(\tilde{E}_{NO}\); b is the PACF of a; c is the UACF of a. The peaks that are marked by red circles are primary peaks, while the peaks unmarked are side peaks

For PACF, as m increases, the number of sampling points involved in multiplication and accumulation decreases gradually. Consequently, the primary peaks of the PACF exhibit gradual attenuation (see in Fig. 3b), causing the side peaks between the primary peaks to be interference of the cardiac cycle calculation. By contrast, the UACF only averages the terms that are involved in multiplication and accumulation, thereby overcoming the drawback of primary peak’s attenuation in the PACF.

Figure 3c shows that there is a relatively significant difference in amplitude between the primary peaks and side peaks. To further expand the amplitude difference, square energy, which is defined in (11), is utilized. After obtaining the energy signal, thresholding is used to remove the side peaks. The energy signal of the UACF and the energy signal after thresholding are denoted as \(E_{{R^{\prime}}}\) and \(\tilde{E}_{{R^{\prime}}}\), respectively.

$$E_{{R^{\prime}}} \left( m \right) = R^{{\prime}^{2}} \left( m \right),\quad m = 0,2, \ldots, N - 1$$
$$\tilde{E}_{{R^{\prime}}} = \left\{ {\begin{array}{ll} E_{{R^{\prime}}} ,&\quad E_{{R^{\prime}}} \ge \sigma \\ 0,&\quad E_{{R^{\prime}}} < \sigma \\ \end{array} } \right.$$

where \(\sigma = 0.4 \times { \hbox{max} }\left( {E_{{R^{\prime}}} } \right)\).

The last peak of \(\tilde{E}_{{R^{\prime}}}\) is likely to be a side peak that escaped from thresholding because of its relatively higher amplitude (see in Fig. 4c). This kind of side peak is usually generated by the multiplication and accumulation of S1 and S2. In order to calculate the cardiac cycle accurately, the last peak of \(\tilde{E}_{{R^{\prime}}}\) is forcibly removed regardless of whether it is a side peak or a primary peak (see in Fig. 4d). Finally, the UACF of \(\tilde{E}_{{R^{\prime}}}\) with the last peak removed is calculated, and this UACF is denoted as RFinal (see in Fig. 4e). Then, thresholding is performed on RFinal with a threshold of 0.5 × max (RFinal), leaving only primary peaks in the thresholded RFinal. The intervals between adjacent peaks of thresholded RFinal are then calculated and averaged to obtain the average cardiac cycle. Figure 4 shows the main procedures of cardiac cycle calculation.

Fig. 4
figure 4

The main procedures of cardiac cycle calculation: a is R(m), the UACF of a heart sound with S4; b is \(E_{{R^{\prime}}} \left( m \right)\), the energy signal of R(m); c is \(\tilde{E}_{{R^{\prime}}} \left( m \right)\); d is c without the last peak; e is RFinal, the UACF of d

Component recognition

After determining the cardiac cycle, the heart sound components can be recognized beat by beat. Under normal circumstances, in terms of the number of components in one heartbeat, heart sounds can be divided into three categories: two components (S1 and S2), three components (S1, S2, S3, or S4) and four components (S1, S2, S3, and S4). Moreover, in the time domain, S1S2 interval < S2S1 interval, S2S3 interval < S3S1 interval, and S2S4 interval > S4S1 interval, whereas in the frequency domain, the frequencies of S1 and S2 are usually higher than those of S3 and S4 [8, 15]. These priori knowledge along with the mentioned three categories can be utilized to recognize the heart sound components.

Figure 5 presents the overall process of component recognition. The algorithm finds a heartbeat based on cardiac cycle and automatically obtains its component number. Then, the recognition processing is performed according to the component number found.

Fig. 5
figure 5

Flow diagram of heart sound component identification

For the convenience of expression, in the following part, the jth component of the ith heartbeat is denoted as C i j , and the time interval between C p m and C q n is denoted as T p, q mn .

Assuming that the \(i\) th heartbeat is being analyzed, then the recognition processing can be expressed as follows.

Two components: In this case, T i, i12 and T i, i+121 are calculated. If T i, i12 < T i, i+121 , C i1 is identified as S1, and C j2 is identified as S2; otherwise, C i1 is S2, and C j2 is S1.

Three components: In this case, because S3 and S4 are less than S1 and S2 in frequency, these three components can be recognized using three steps: detecting S3/S4, recognizing S1 and S2, and identifying S3/S4.

Time–frequency analysis is required for the detection of S3 and/or S4. In this study, S-transform is employed to accomplish this task. Assuming that the signal to be analyzed is h(n), n = 0, 1, …, N − 1, its Fourier transform is \(H\left( {k/NT} \right),k = 0,1, \ldots ,N - 1,\) where T is the sampling interval. Then, the S-transform of \(h\left( n \right)\) is given by [20].

$$\left\{ {\begin{array}{l} {S\left( {jT,\frac{n}{NT}} \right) = \sum\nolimits_{m = 0}^{N - 1} {H\left( {\frac{m + n}{NT}} \right)e^{{ - \frac{{2\pi^{2} m^{2} }}{{n^{2} }}}} e^{{\frac{i2\pi mj}{N}}} } ,\quad n \ne 0} \\ {S\left( {jT,0} \right) = \frac{1}{N}\sum\nolimits_{m = 0}^{N - 1} {\left( {\frac{m}{NT}} \right)} ,\quad n = 0} \\ \end{array} } \right.$$

where jmn = 0, 1, …, N − 1.

To obtain sufficient time–frequency information, the S-transform is directly performed on the original heart sound signal HSN(n). The result is a complex matrix and is written as S. Suppose that the element of the mth row in the nth column of the matrix is Sm,n = a + jb, then the modulus of \(S_{m,n}\) is defined as.

$$\left| S \right|_{m,n} = |S_{m,n} | = \sqrt {a^{2} + b^{2} }$$

The modulus of each element of S is calculated, and the modulus matrix |S| is then obtained. Unless otherwise specified, both the expression “S transform” and “S coefficient” in the following text refer to \(\left| S \right|\).

The two-dimensional |S| matrix is relatively difficult to analyze; thus, the one-dimensional instantaneous frequency is utilized to detect S3 and S4, and is defined as.

$$f_{H} \left( n \right) = \frac{{\mathop \sum \nolimits_{m = 0}^{N - 1} F\left( m \right) \cdot \left| S \right|_{m,n} }}{{\mathop \sum \nolimits_{m = 0}^{N - 1} \left| S \right|_{m,n} }}$$

where \(F\left( m \right) = \frac{m}{NT} = \frac{{m \cdot f_{s} }}{N}\) represents the frequency of the mth row of the S-transform, and fs is the sampling rate. Because the maximal frequency of the sampled signal is half of the sampling frequency, \(F\left( m \right)\) was set as \(\frac{1}{2}\frac{m}{NT} = \frac{{m \cdot f_{s} }}{2N},m = 0,1, \ldots ,N - 1\) in this study.

Furthermore, the boundaries of heart sound components are obtained in the boundary detection phase (i.e., SPT and EPT). Therefore, by extracting the parts between these onsets and offsets from fH(n), the instantaneous frequency of heart sound components is obtained (see in Fig. 6(3)).

Fig. 6
figure 6

Time–frequency analysis: a is a heart sound with murmurs and S4; b is the instantaneous frequency of this heart sound; c is the instantaneous frequency only for the heart sound components; d is the average instantaneous frequency of c. “NA” represents “normalized amplitude”

Finally, the average instantaneous frequency is calculated. Figure 6a–d present an example of these operations. It is observed that the average instantaneous frequency of S4 is significantly lower than those of S1 and S2. Similarly, the instantaneous frequency of S3 is also the lowest among S1, S2 and S3. Consequently, the component with the lowest frequency of each heartbeat is S3 or S4, which means that the previous component is S2 and the next is S1.

Without loss of generality, assuming that C i2 is the component with the lowest frequency, C i1 and C i3 can be easily determined as S2 and S1, respectively. Then, T i, i12 and T i, i23 are calculated and compared. If T i, i12  < T i, i23 , C i2 is recognized as S4; otherwise, if T i, i12  > T i, i23 , C i2 is S3.

Four components: In this case, the four components are arranged in the order: S1 → S2 → S3 → S4 → ··· S1 → S2. Thus, the first one of the two components with the lowest instantaneous frequencies is S3, and the second is S4. Moreover, the component preceding the detected S3 and S4 is S2, and the following component is S1.

After all the components in the ith heartbeat are recognized, the algorithm finds the next beat, and performs the same recognition processing. When all the heartbeats are analyzed, the recognition process is finished.

Finally, the heart sounds are divided into segments. For the sounds containing S1 and S2 only (including the sounds with murmurs), they are segmented into S1, S1S2 interval, S2 and S2S1 interval. For the sounds containing extra S3/S4 (sounds with murmurs are also included), they are segmented into S1, S1S2 interval, S2, S2S3/S2S4 intervals, S3/S4, S3S1/S4S1 intervals.


Experimental setup

Murmur elimination evaluation

For the evaluation of murmur elimination, a novel index, the signal murmur ratio (SMR), is proposed as follows:

$$SMR = 10 \cdot lg\frac{{\mathop \sum \nolimits_{n \in U} X^{2} \left( n \right)}}{{\mathop \sum \nolimits_{m \in V} X^{2} \left( m \right)}}$$

where X(n) is the heart sound signal, \(U\) is the region where heart sound components (S1, S2, S3, and S4) are located, and V is the region where murmurs are located. To ensure the validity of the indicator, both U and \(V\) are manually determined from the original signal HSN(n). Thereby, for a certain heart sound, U and V are constant when calculating SMRs of the murmur-reduced heart sound and the non-murmur-reduced heart sound (see in Fig. 7). A higher SMR indicates a smaller murmur energy proportion in signal \(X\left( n \right)\), which means a stronger murmur removing effect.

Fig. 7
figure 7

Murmur elimination effect for heart sound containing S4 and systolic murmurs using the three methods. a Is the non-murmur-reduced heart sound, and bd show the sounds filtered by ALPF, WT and 200-LPF, respectively. In the calculation of \(SMR, U = U1 + U2 + \cdot\cdot\cdot + U6\), and \(V = V1 + V2 + V3\)

Boundary detection and component identification evaluation

To validate the performance of boundary detection and component identification, three classical metrics, sensitivity (Se), positive predictive value (PPV), and accuracy (Acc) are calculated. These three indices are defined as follows:

$${\text{Sensitivity}}\left( {\text{Se}} \right) = \frac{\text{TP}}{{{\text{TP}} + {\text{FN}}}} \times 100{\text{\% }}$$
$${\text{Positive predictive value }}\left( {\text{PP}} \right) = \frac{\text{TP}}{{{\text{TP}} + {\text{FP}}}} \times 100{\text{\% }}$$
$${\text{Accuracy }}\left( {\text{Acc}} \right) = \frac{{{\text{TP}} + {\text{TN}}}}{{{\text{TP}} + {\text{TN}} + {\text{FP}} + {\text{FN}}}} \times 100{\text{\% }}$$

In this experiment, the onsets and offsets of heart sound components were automatically marked with green circles and red stars by the algorithm (see in Fig. 8c), respectively. In addition, the names of these components were automatically labeled above them (see in Fig. 8d). The results of boundary localization and component identification were judged by human and quantified using (17)–(19), in which the FN means false negative, FP means false positive, TN means true negative and TP means true positive.

Fig. 8
figure 8

The results of murmur elimination, boundary detection and component identification for heart sound containing S4 and systolic murmurs

Results of murmur elimination

The performance of ALPF was compared with two existing methods. The first method is the conventional low-pass filter with a constant cutoff frequency of 200 Hz (200-LPF), and the second method is the wavelet package transform method with a db10 mother wavelet and a scale of 5 (WPT) [14]. In the WPT method, the heart sound components were reconstructed by nodes (5, 0) to (5, 5) to remove interference components such as murmurs. The SMR of the heart sounds filtered by the ALPF, WPT and 200-LPF are denoted as SMRA, SMRWPT, and SMR200, respectively. Moreover, the SMR of the original heart sounds HSN(n) was calculated and denoted as \(SMR_{Ori}\). In addition, the SMR gains of these three methods were achieved by subtracting SMROri from \(SMR_{A}\), \(SMR_{WPT}\) and \(SMR_{200}\), respectively. All the experimental parameters of these three methods were identical.

Tables 1 and 2 show the statistical results of SMR and SMR gains. It is observed that the proposed ALPF has the highest score of SMR and SMR gains. The overall performances were evaluated by two indices, arithmetic average (AA) and weighted average (WA). AA was achieved by simply averaging the SMRs of 12 types of heart sounds and WA was affected by the weight of each type’s cycle number. In terms of SMR, the AA (± SD) and WA (± SD) of SMRA reach 26.95 ± 4.30 db and 26.92 ± 4.14 db, respectively; by contrast, the AA and WA are 24.42.07 ± 4.54 and 24.38 ± 4.43 for SMRWPT and 22.61 ± 5.86 and 22.61 ± 5.29 for SMR200. In terms of SMR gains, the heart sounds filtered by the ALPF have an arithmetic average and weighted average SMR gain (± SD) of 8.66 ± 4.52 db and 8.59 ± 4.37 db; by contrast, the arithmetic average and weighted average SMR gains (± SD) are 6.07 ± 3.88 and 5.99 ± 3.76 for WPT and 4.26 ± 4.51 and 4.21 ± 4.51 for 200-LPF.

Table 1 Statistics of SMR
Table 2 Statistics of SMR gains

Figure 7 shows an example of the murmur elimination results using the three methods.

Results of boundary detection and component identification

A detailed quantitative description of the boundary positioning result is shown in Table 3. It is observed that the proposed method achieves a sensitivity of 100% for all 16 types of heart sounds, and both the PPV and accuracy are 99.93%, indicating an outstanding performance.

Table 3 Quantitative estimation of positioning

Table 4 presents the quantitative data of component identification. Because of the good positioning effect, the identification performance is high, with an average Se of 98.63%, an average PPV of 99.86%, and an average accuracy of 98.49%.

Table 4 Quantitative estimation of identification

Figure 8 shows the complete process of the proposed method, including murmur elimination, boundary detection, and component identification. The first component (S4) in Fig. 8d is not labeled. This is because the identification of S4 requires information obtained from the previous cardiac cycle. However, this cycle lacks the information. The unlabeled S4 component is identified in the previous segment. Therefore, there is no problem of missed detection.


Tables 1 and 2 show the results of murmur elimination, all three methods exhibit an improvement in SMR compared with SMROri. As a result of selecting the appropriate cutoff frequency, the proposed ALPF has the highest SMR gain among all the tested methods. The arithmetic and weighted average SMR gains of ALPF reach 8.66db and 8.59db, which are 42.67% and 43.41% higher than WPT’s 6.07db and 5.99db, and 103.29% and 104.04% higher than 200-LPF’s 4.26db and 4.21db, respectively, indicating optimal murmur elimination performance.

In order to verify the superiority of the proposed method for murmur elimination, in this study, the paired two-sample t-test was performed on the SMR gains calculated for 12 types of heart sounds filtered with these three methods. In the paired two-sample t-test, the significance level α was set to 0.01. Under this condition, the rejection region is |t| ≥ tα/2(n − 1) = t0.005(11) = 3.1058, and tA,WPT = 5.5499 > 3.1058, tA,200 = 5.5039 > 3.1058, and tWPT,200 = 2.6273 < 3.1058. Thus, the proposed murmur elimination method was verified to have significant advantages compared with WPT and 200-LPF.

For boundary detection results shown in Table 3, among all 16 types of heart sounds, the proposed method failed on one signal labeled as a mid-systolic click: two click components were not removed in boundary detection process and were erroneously positioned as true components. Time–frequency analysis was then performed to determine the reason for this.

The time–frequency analysis revealed that these click components have a relatively high number of overlaps with S1 and S2 in frequency, resulting in the failure in removing them. Compared with this sound, the other signals’ interference components (clicks, murmurs, and residues of noise) show a smaller overlap with the true components; thus, the positioning score is higher. In fact, setting a higher threshold θ can completely remove the clicks, resulting in a higher performance score. However, the higher threshold causes a larger deviation between the calculated boundaries and the real values. After weighing the pros and cons, this approach was abandoned.

The identification results of Table 4 shows that the fourth signal (Umich No. 4) is consistent with the result of positioning. A difference appears in the result of the 13th signal (Umich No. 14). The FN of the signal reaches 36, resulting in a low Se and low Acc of 73.91%. Initially, this result was puzzling because the signal was labeled as normal. After a comprehensive analysis, the rhythm of some parts of this heart sound was found to be irregular, resulting in the failure of heart cycle calculation in these parts. Therefore, components of these parts could not be recognized.

For the performance evaluation, because the cycle number of each heart sound was very close and the performance of each type of signal was analyzed separately, the utilized dataset did not suffer from the problem of unbalance.

Some other operating characteristics obtained in this study and 11 previous studies are listed in Table 5 for comparison. Both the Se and PPV for positioning in literature [6] consist of two values separated by a slash. The two values are for normal and pathological cases of heart sounds, respectively. As mentioned in the “Background”, a complete segmentation is comprised of boundary detection and component identification. However, several existing works only focus on one of them. Moreover, some studies could be applied only to the basic situation, for example, heart sounds without murmurs. Therefore, in addition to achieving excellent performance, the proposed approach achieves the functions of both boundary localization and component identification and can be applied in more conditions.

Table 5 A comprehensive comparison with other studies


This paper presents an accurate heart sound segmentation algorithm that combines time-domain, frequency-domain and time–frequency-domain analysis. Compared to existing studies, this method is applicable to a wide range of heart sounds, from normal to those containing S3, S4 and various murmurs. To verify this method, quantitative experiments were performed using the University of Michigan’s Heart Sound & Murmur Library, an authoritative open database. The experimental materials incorporated two types of normal heart sounds and 14 types of abnormal heart sounds. The results show that the boundary localization has an average Se of 100%, an average PPV of 99.3% and an average Acc of 99.93%. Moreover, the Se, PPV and Acc of the component identification reach 98.63%, 99.86% and 98.49%, respectively, indicating outstanding performance of the proposed method. There are still some shortcomings of this work. For example, the component identification relies on the success of cardiac cycle calculation; therefore, this method cannot be applied to the heart sound with severe arrhythmia because of the failure to achieving accurate cardiac cycle by using UACF. The study of segmentation provides a good basis for extracting significant features of heart sounds. Therefore, the further study will focus on the classification of heart sounds.


  1. Chen W, Gao R, Liu L. Report on cardiovascular diseases in China 2014. Chin Circ J. 2015;30:617–22.

    Google Scholar 

  2. Tseng YL, Ko PY, Jaw FS. Detection of the third and fourth heart sounds using Hilbert-Huang transform. Biomed Eng Online. 2012;11:8.

    Article  Google Scholar 

  3. Yuenyong S, Nishihara A, Kongprawechnon W, et al. A framework for automatic heart sound analysis without segmentation. Biomed Eng Online. 2011;10:13.

    Article  Google Scholar 

  4. Liang H, Lukkarinen S, Hartimo I. Heart sound segmentation algorithm based on heart sound envelogram. In: Computers in Cardiology 1997. Lund: IEEE; 1997. 7–10 September 1997, p. 105–8.

  5. Kumar D, et al. Third heart sound detection using wavelet transform-simplicity filter. In: Engineering in Medicine and Biology Society, 2007. EMBS 2007. 29th Annual International Conference of the IEEE. IEEE, 2007. p. 1277–81.

  6. Moukadem A, et al. A robust heart sounds segmentation module based on S-transform. Biomed Signal Process Control. 2013;8:273–81.

    Article  Google Scholar 

  7. Springer DB, Tarassenko L, Clifford GD. Logistic regression-HSMM-based heart sound segmentation. IEEE Trans Biomed Eng. 2016;63:822–32.

    Google Scholar 

  8. Naseri H, Homaeinezha MR. Detection and boundary identification of phonocardiogram sounds using an expert frequency-energy based metric. Ann Biomed Eng. 2013;41:279–92.

    Article  Google Scholar 

  9. Boutana D, Benidir M, Barkat B. Segmentation and identification of some pathological phonocardiogram signals using time-frequency analysis. IET Signal Process. 2011;5:527–37.

    Article  MathSciNet  Google Scholar 

  10. Tang H, et al. Segmentation of heart sounds based on dynamic clustering. Biomed Signal Process Control. 2012;7:509–16.

    Article  Google Scholar 

  11. Heart Sound & Murmur Library. University of Michigan, US. Accessed 7 Apr 2017.

  12. Choi S, Jiang Z. Comparison of envelope extraction algorithms for cardiac sound signal segmentation. Expert Syst Appl. 2008;34:1056–69.

    Article  Google Scholar 

  13. Qiu S, et al. A stimulus artifact removal technique for SEMG signal processing during functional electrical stimulation. IEEE Trans Biomed Eng. 2015;62:8.

    Article  Google Scholar 

  14. Zheng Y, Guo X, Ding X. A novel hybrid energy fraction and entropy-based approach for systolic heart murmurs identification. Expert Syst Appl. 2015;42:2710–21.

    Article  Google Scholar 

  15. Chen Y, Duan H. A QRS complex detection algorithm based on mathematical morphology and envelope. In: Engineering in Medicine and Biology Society. IEEE-EMBS 2005. 27th Annual International Conference of the. IEEE, 2006. p. 4654–57.

  16. Chu CHH, Delp EJ. Impulsive noise suppression and background normalization of electrocardiogram signals using morphological operators. IEEE Trans Biomed Eng. 1989;36:262–73.

    Article  Google Scholar 

  17. Trahanias PE. An approach to QRS complex detection using mathematical morphology. IEEE Trans Biomed Eng. 1993;40:201–5.

    Article  Google Scholar 

  18. Barma S, et al. Detection of the third heart sound based on nonlinear signal decomposition and time–frequency localization. IEEE Trans Biomed Eng. 2016;63:1718–27.

    Article  Google Scholar 

  19. Hu JL, Bao SD. An approach to QRS complex detection based on multiscale mathematical morphology. In: Biomedical Engineering and Informatics (BMEI), 2010 3rd International Conference on. Vol. 2. IEEE, 2010.

  20. Stockwell RG, Mansinha L, Lowe RP. Localization of the complex spectrum: the S transform. IEEE Trans Signal Process. 1996;44:998–1001.

    Article  Google Scholar 

  21. Varghees VN, Ramachandran KI. A novel heart sound activity detection framework for automated heart sound analysis. Biomed Signal Process Control. 2014;13:174–88.

    Article  Google Scholar 

  22. Sepehri AA, et al. A novel method for pediatric heart sound segmentation without using the ECG. Comput Methods Appl Mech Eng. 2010;99:43–8.

    Google Scholar 

  23. Pedrosa J, Castro A, Vinhoza TTV. Automatic heart sound segmentation and murmur detection in pediatric phonocardiograms. In: Engineering in Medicine and Biology Society (EMBC), 2014 36th Annual International Conference of the IEEE. IEEE, 2014.

  24. Wang P, et al. Phonocardiographic signal analysis method using a modified hidden markov model. Ann Biomed Eng. 2007;35:367–74.

    Article  Google Scholar 

  25. Schmidt SE, et al. Segmentation of heart sound recordings from an electronic stethoscope by a duration dependent hidden-markov model. In: Computers in cardiology. Bologna: IEEE; 14–17 September 2008, p. 345–48.

  26. Moukadem A, et al. Localization of heart sounds based on S-transform and radial basis function neural network. In: 15th Nordic-Baltic Conference on Biomedical Engineering and Medical Physics (NBC 2011). Springer: Heidelberg, 2011.

  27. Zhong L, et al. A robust envelope extraction algorithm for cardiac sound signal segmentation. In: Bioinformatics and Biomedical Engineering, 2011 5th International Conference on. IEEE, 2011.

  28. Wang X, et al. Detection of the first and second heart sound using heart sound energy. In: Biomedical Engineering and Informatics, 2009. BMEI’09. 2nd International Conference on. IEEE, 2009.

Download references

Authors’ contributions

QL developed the main algorithm. XW and XM provided professional verification of the experimental results. All authors read and approved the final manuscript.


The authors would like to thank Science and Technology Commission of Shanghai Municipality for financial support.

Competing interests

The authors declare that they have no competing interests.

Availability of data and materials

The datasets analyzed during the current study are available in the [Heart Sound & Murmur Library of UMich] repository, (

Consent for publication

Not applicable.

Ethics approval and consent to participate

Not applicable.


This work received financial support from Science and Technology Commission of Shanghai Municipality, Grant No.15DZ2251700 and 13DZ1941802.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Xiaomei Wu.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, Q., Wu, X. & Ma, X. An automatic segmentation method for heart sounds. BioMed Eng OnLine 17, 106 (2018).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: