Skip to main content

Automatic online detection of atrial fibrillation based on symbolic dynamics and Shannon entropy



Atrial fibrillation (AF) is the most common and debilitating abnormalities of the arrhythmias worldwide, with a major impact on morbidity and mortality. The detection of AF becomes crucial in preventing both acute and chronic cardiac rhythm disorders.


Our objective is to devise a method for real-time, automated detection of AF episodes in electrocardiograms (ECGs). This method utilizes RR intervals, and it involves several basic operations of nonlinear/linear integer filters, symbolic dynamics and the calculation of Shannon entropy. Using novel recursive algorithms, online analytical processing of this method can be achieved.


Four publicly-accessible sets of clinical data (Long-Term AF, MIT-BIH AF, MIT-BIH Arrhythmia, and MIT-BIH Normal Sinus Rhythm Databases) were selected for investigation. The first database is used as a training set; in accordance with the receiver operating characteristic (ROC) curve, the best performance using this method was achieved at the discrimination threshold of 0.353: the sensitivity (Se), specificity (Sp), positive predictive value (PPV) and overall accuracy (ACC) were 96.72%, 95.07%, 96.61% and 96.05%, respectively. The other three databases are used as testing sets. Using the obtained threshold value (i.e., 0.353), for the second set, the obtained parameters were 96.89%, 98.25%, 97.62% and 97.67%, respectively; for the third database, these parameters were 97.33%, 90.78%, 55.29% and 91.46%, respectively; finally, for the fourth set, the Sp was 98.28%. The existing methods were also employed for comparison.


Overall, in contrast to the other available techniques, the test results indicate that the newly developed approach outperforms traditional methods using these databases under assessed various experimental situations, and suggest our technique could be of practical use for clinicians in the future.


Atrial fibrillation (AF) is recognized as the most common clinically encountered arrhythmia in adults [1], which affects approximately 0.4% of the general population. The prevalence of this tachyarrhythmia increases with age, with less than 1% affected in persons under the age of 60 years and in excess of 6% for those over the age of 80 years [2, 3]. Atrial fibrillation is associated with a high risk of stroke, heart disease (e.g., congestive cardiac failure), and cardiovascular mortality [1, 4]. There is also a close relationship between AF and obesity [5], obstructive sleep apnea [6], and long-term alcoholism [7], which reciprocally bear cumulative risks for promoting the development of AF [1]. The early identification of AF appears to be crucial for patients with cardiovascular disease, especially for stroke patients to whom the secondary stroke prevention is of primary importance.

Issues relating to clinical significance of rhythm classification and the impetus for improving the accuracy of atrial tachyarrhythmia estimation have motivated the development of innovative computerized AF detectors. Since the early 1980s, a series of sophisticated methods have been investigated to cope with the challenges of AF detection [825]. Most of which are based upon two main character traits of this type of arrhythmia shown in a surface electrocardiogram (ECG): (i) RR (R-wave peak to R-wave peak) interval irregularity (i.e., chaotic behavior of heart rate variability), and (ii) P-wave absence (PWA) or F-wave substitution (i.e., very low amplitude waveforms of odd morphologies) resulting from the abnormal rapid atrial activity (AA). Although P waves or cardiac AA can be an alternative clue in the detection of AF, the absence or presence of P waves are not readily identifiable as various types of high-intensity noise often coexist in ECGs, which may lead to a low degree of predictive accuracy. In addition, the relationship between AA in the surface ECG and the diverse mechanisms of AF has not yet been well delineated [3]. Due to the challenges in detecting AA in ECG measurements, detection techniques based on inferences from RR intervals are preferred to produce relatively robust outcomes [2123, 25].

In this study, a reliable method for the fully automated detection of AF episodes from surface ECGs is proposed. This method comprises of a three-pass procedure. The initial pass, where a RR interval sequence is pre-processed with nonlinear and integer filters, which aims to generate low/high scale reference sequences. The second pass, which aims to obtain a symbolic sequence, where the information of the RR interval sequence is subsequently compressed by the symbolic dynamics with sequences obtained from the initial pass. Finally, Shannon entropy is used in the third pass, to calculate the entropy of the symbolic sequence and thereby discriminate whether or not AF is present in the current cardiac beat. Further methodological insight of present key points on the online analytical processing of measurements through the recursive realization with respect to beat-by-beat classification is discussed in the following sections. Ultimately, we quantitatively investigate the performance of our newly developed technique to that of currently state-of-the-art techniques with four widely used clinical databases under various experimental situations.


Pre-processing of RR n series

A. Median filter

A median filter is implemented by windowing the acquired data, ranking the samples in the window, and outputting the median of the sorted samples. Considering a RR interval (RR n ) sequence x n , as shown in Figure 1(a), the output y n of this nonlinear filter is given by,

y n =median{ x n w ,, x n ,, x n + w }
Figure 1

Example for the application of this method for detecting AF. (a) Raw RR interval sequence x n ; (b) Low scale reference xl n ; (c) High scale reference xh n ; (d) Difference Δ RR n = x n xl n ; (e) The distribution of symbols sy n ; (f) The relevant word sequence of sy n in (e), and (g) The distribution of SE ′′ (A).

where the window is of a fixed width 2w+1. From the perspective of signal processing, the time delay of the median filter is w. A window size of 17 is used herein, with a delay of 8 samples. The introduction of a median filter brings about two advantages: (i) the suppression of unwanted outliers, which are mostly caused by erroneously detected (or missed) R-wave peaks; (ii) to preserve sharp edges (i.e., onsets and terminations of AF episodes) without extensively blurring the context.

B. Integer filter for low scale reference

Subsequently, we filter the output y n of median filter with a low-pass filter of the form

H l (z)= 1 z 16 1 z 1

where, the gain is G a i n 1=16=24, and the intrinsic delay of H l (z) is 7.5 samples. This low-pass filter is applied to smooth y n resulting from the previous median filtering. Another benefit of the low-pass filter is the removal of fluctuations possibly caused by Respiratory Sinus Arrhythmia (RSA) phenomena around the current sample from acquisition. Let xl n be the output of this filter, as illustrated in Figure 1(b).

C. Integer filter for high scale reference

Another low-pass filter H h (z) is then applied to the resultant xl n of the previous low-pass filter H l (z),

H h (z)= 1 z 32 z 64 + z 96 1 2 z 1 + z 2

where, the gain is G a i n 2=2048=211, and the relevant delay of H h (z) is 47 samples. This low-pass filter is introduced to generate a reference RR sequence of a larger scale, which needs to be exploited in the definition of symbolic series as explained in the following subsection. The resulting output denoted by xh n is shown in Figure 1(c).

As we have seen, the time delays of x n and xl n are −62.5 and −47 samples with respect to xh n , respectively. To ensure synchronization of the filtered data, let x n and xl n denote the corresponding time-delay corrected sequences of x n and xl n , respectively. Then, Δ RR n = x n xl n can be defined as the difference in time delay, seen in Figure 1(d).

Symbolic dynamics of Δ RR n

The purpose of employing symbolic dynamics is to describe the dynamic behavior of Δ RR n with respect to xh n . Symbolic dynamics encodes the information as a variation of RR n to a series with fewer symbols, with each symbol representing an instantaneous state. The implemented thresholds can be defined as: t h r e 1=xh n ×2−4 (with t h r e 1=xh n >>4), t h r e 2=xh n ×2−3 (with t h r e 2=xh n >>3), t h r e 3=t h r e 1+t h r e 2, t h r e 4=xh n ×2−2 (with t h r e 4=xh n >>2) and t h r e 5=t h r e 4+t h r e 1. The mapping function of the symbol transform can therefore be defined as,

sy n = 0 if Δ RR n < thre 4 1 else if Δ RR n < thre 3 2 else if Δ RR n < thre 2 3 else if Δ RR n < thre 1 4 else if Δ RR n < thre 1 5 else if Δ RR n < thre 2 6 else if Δ RR n < thre 3 7 else if Δ RR n < thre 4 8 else if Δ RR n < thre 5 9 other cases

The raw RR sequence x n is then quantified into symbol sequence sy n with specific symbols from the predefined “alphabet” in Eq. (4) (i.e., 0 to 9). Recalling Figure 1(a)-(d) and scanning the distribution of calculated symbols in Figure 1(e), we confirm that most of normal beats are defined as zero symbols, and possible abnormal beats (arrhythmias, e.g., AF) are defined as non-zero symbols by the transform Eq. (4).

To facilitate the analysis of sy n , the widely used 3-symbol template (i.e., a word consists of 3 successive symbols) is applied to examine entropic properties. The word value can then be calculated by a novel operator as defined below,

wv n =( sy n 2 × 2 8 )+( sy n 1 × 2 4 )+ sy n

where, sy n−2×28 and sy n−1×24 are implemented with sy n−2<<8 and sy n−1<<4, and 0≤wv n ≤2457. Figure 2 briefly elucidates the transformation of the symbol sequence with the template and the corresponding word, while Figure 1(f) depicts the word sequence of sy n shown in Figure 1(e).

Figure 2

Schematic illustrating the symbol definition and the word transformation by Eq. ( 5).

Shannon entropy

Shannon entropy (SE) is a statistical tool that quantifies a time series in terms of the information size. For the sake of completeness, we define the discrete probability space of a dynamic system as A=(A|P). The total number of elements in A is N. The characteristic elements can then be defined as A={a 1,,a k }, as well as the relevant probability set P={p 1,,p k }(1≤kN). Each element a i has probability p i =N i /N (0< p i 1, i = 1 k p i =1), where N i is the total number of the element a i in A. Thus, the SE of A is defined as [26],

(A)= i = 1 k p i log 2 p i

By Jensen’s inequality, we can prove (A) log 2 k log 2 N with equality if p i ≡1/N and kN for all i. Then, a uniform distribution of (A) can be expressed as,

(A)= 1 log 2 N i = 1 k p i log 2 p i

where, if N≡1, make log2N=1. Eq. (7) is also referred to as the normalized entropy, since the entropy is divided by the maximum entropy log2N. A coarser version of (A) can be defined as,

′′ (A)= k N log 2 N i = 1 k p i log 2 p i

Currently, the dynamic A consists of all 127 consecutive word elements from wv n−126 to wv n (the bin size in this case is N=127). By determining the characteristic set A and the relevant probability set P with these elements, we can thus calculate the SE ′′ (A). The presence of AF is then detectable, with the rhythm labeled AF if ′′ (A) exceeds a discrimination threshold, and otherwise non-AF, which can be seen in Figure 1(g). We utilize the training database to determine the optimal discrimination threshold by investigating various threshold settings which lie within the range [0.0, 1.0]; the best performing threshold of 0.353 is thus derived and employed for the performance assessment using different testing databases.

Key issues of online processing

From Eq. (1)–(5) and (8), outwardly, this AF detection technique poses computational challenges. However, these challenges can be overcome by implementing clever recursive algorithms with beat-by-beat, real-time processing.

A. Pseudo-recursive median filtering

The median filter in Eq. (1) can be implemented with a so-called pseudo-recursive method: for input x i , we define S={s r :1≤r≤2w+1} as a sorted array of successive elements from x i−2w−1 to x i−1, where the output y i is obtained by following steps - below,

A Binary search technique is used to seek out the position m of the sample x i−2w−1 which will depart from the window (i.e., s m =x i−2w−1. Simultaneously, x i will get into the window);

The Binary search technique is applied again to search for the position t at which the input x i needs to be set (i.e., s t <x i s t+1);

From positions m to t, the current s r is replaced with the adjacent s r ± 1 (the ’ ±’ indicates where the element is taken from the right or left, with the ’ +’ and ’ ’ symbols representing the element to the right and left, respectively);

Replace the element s t with x i ;

Median s w+1 of the updated S becomes output y i .

For the following input x i+1, we repeat steps to and obtain the new output s w+1 (i.e., y i+1), as shown in Figure 3, where the sorting utilizes the Binary search technique twice. Comparing our technique with the traditional median filter, the computational complexity can be decreased from approximately O(n2) to O(n).

Figure 3

Schema of the pseudo-recursive median filtering (the rightward case).

B. Recursive implementation of integer filters

The recursive implementation (also referred to as the “difference equation”) of the filter H l (z) can be expressed as,

xl n = xl n 1 + y n y n 16

The above equation, Eq. (9) includes 1 integer addition, 3 integer subtractions as well as 1 integer right-shift operation, when xl n >>4 (as G a i n 1=24) to offset the gain of H l (z).

The filter H h (z) can then be computed recursively using

xh n = ( xh n 1 × 2 ) xh n 2 + xl n xl n 32 xl n 64 + xl n 96

where, xh n−1×2 is implemented with xh n−1<<1. The above equation, Eq. (10) consists of 2 integer additions, 8 integer subtractions, 1 integer left-shift operation and 1 integer right-shift operation, when xh n >>11 (as G a i n 2=211) to offset the gain of H h (z).

C. Mapping the definition of 1 log 2 N p i log 2 p i

Investigating the dynamic A, we immediately see that each characteristic symbol of each bin N may have the probability p i =i/N (1≤iN, i.e., 1/Np i ≤1). Along these lines, a probability array PiMap can be pre-calculated,

PiMap [ 127 ] = Cons log 2 N p 1 log 2 p 1 , , p 63 log 2 p 63 , p 64 log 2 p 64 , , p 127 log 2 p 127 = · { 7874 , , 71790 , 71291 , , 0 }

where, C o n s=1000000 is a constant such that decimal floating points can be converted into integers and N=127, and = · indicates to take the integer part of each Cons log 2 N p i log 2 p i .

Notably, for each cardiac cycle screened, this predefined PiMap permits the sole operation by picking the straightforward integer (i.e., P i M a p[i]) from the set PiMap in accordance with the index i rather than calculating 1 log 2 N p i log 2 p i using arithmetic and logarithmic operations. The use of this predefined calculation significantly decreases calculation times.

D. Recursive implementation of ′′ (A)

We define a buffer array nu w v i (wv i ≤2457) to store the number of the i th characteristic element wv i in space A. For the input wv n , it will get into A (i.e., wv n will be the rightmost element), and simultaneously the leftmost element wv n−127 will depart from A, see Figure 2 for clarity. It is obvious that a variation of SE (A) is purely determined by nu w v n and nu w v n 127 in dynamic A. Therefore, ′′ (A) is calculated recursively by the algorithm below,

where s h n and s h n ′′ represent (A) and ′′ (A), respectively; ( indicates that P i M a p[i]=0 is fixed for the case i≡0; and 127000000=NC o n s=127×1000000. For the next input wv n+1, steps ➀-âž‚ are again executed to obtain s h n + 1 ′′ . From an online processing perspective, the time delays of s h n ′′ are 64 and 126.5 samples with respect to xh n and x n , respectively.

An architecture of the overall logic of the recursive realization can be seen in Figure 4. By using recursive algorithms, this AF detector consists of several basic operations, such as integer addition/subtraction, integer comparison and integer shifting. In effect, the calculation of s h n ′′ and distinguishing the current beat x n , only needs to include 1 multiplication and 1 division lying within k 127000000 ·, together with 1 floating-point comparison between s h n ′′ and a threshold. Consequently, a useful computational efficiency can be achieved.

Figure 4

Flowchart of the recursive realization of this detector for beat-by-beat assessing AF.

Materials and evaluation

Clinical ECG data sets

Performance of this new AF detection method is evaluated using four popular sets of clinical ECGs (the Long-Term AF Database [LTAFDB], the MIT-BIH AF Database [AFDB], the MIT-BIH Arrhythmia Database [MITDB], and the MIT-BIH Normal Sinus Rhythm Database [NSRDB]) [27]. The LTAFDB database is used as the initial training set, while the other three databases are used as the testing sets. The contents of these databases are summarized in Table 1. All reference annotations of the four databases are examined in this study.

Table 1 Four publicly-accessible sets of clinical data are selected for evaluation

Performance metrics

The performance of our newly developed algorithm and existing methods are investigated in terms of sensitivity (Se), specificity (Sp), positive predictive value (PPV), and overall accuracy (ACC),

Se = TP TP + FN , PPV = TP TP + FP , Sp = TN TN + FP , ACC = TP + TN TP + TN + FP + FN

where, for a specific data set, TP (true positive) is the number of beats in AF which are correctly detected as AF, TN (true negative) is the number of beats in non-AF which are correctly detected as non-AF, FP (false positive) is the number of beats in non-AF which are incorrectly detected as AF, and FN (false negative) is the number of beats in AF which are incorrectly detected as non-AF. The proportion of beats in true AF which are correctly identified as AF is represented by Se, while Sp represents the proportion of beats in true non-AF which are correctly identified as non-AF, PPV represents the proportion of algorithm results that are true positive, and ACC represents the overall accuracy of our method. We consider Se and Sp as the main metrics, while PPV and ACC are complementary.

Results and discussion

The values of SE ′′ (A) for AF (519687 beats) and non-AF (701887 beats) annotations in the AFDB database (a total of 1221574 beats for all of the 25 records) can be seen in Figure 5. It is apparent that ′′ (A) discriminates AF well.

Figure 5

Histogram distribution of the ′′ (A) for annotated AF and non-AF beats of the AFDB database.

The receiver operating characteristic (ROC) curves are widely used in the medical field to determine the optimal discrimination threshold for clinical tests. In this work, the LTAFDB database is used as the training set to obtain the optimal threshold for our algorithm. The threshold is tested from 0.0 to 1.0 in increments of 0.001 for the training set, and the values of Se, Sp, 1−S p, PPV and ACC are calculated for each threshold setting. Thus we obtain the ROC curve, as shown in Figure 6. In the ROC space of Figure 6, a is the point of the perfect classification, at which the Se and Sp are both equal to 100% and b is the point of the best performance of our method on the ROC curve, at which it has the shortest distance to a. We can thus determine the parameters at position b, where the discrimination threshold is 0.353, and the values for Se, Sp, PPV and ACC are 96.72%, 95.07%, 96.61% and 96.05%, respectively. We therefore take the best performing threshold value of 0.353 for quantitative assessment when our method is applied to other three testing databases.

Figure 6

ROC curve of the training set of LTAFDB database when our method was applied with the various threshold values from 0.0 to 1.0 in increments of 0.001. Based upon the results portrayed here, the best performing threshold of 0.353 is used for performance assessment.

For our newly presented method, the statistical results from the testing sets AFDB, MITDB and NSRDB databases are summarized in Table 2. Specifically, for the AFDB set, the calculated Se, Sp, PPV and ACC parameters are 96.89%, 98.25%, 97.62% and 97.67%, respectively. Examining the AFDB set ( indicates records “00735” and “03665” omitted), the parameters are 96.82%, 98.06%, 97.61% and 97.50%, respectively. For the AFDB set ( indicates records “04936” and “05091” omitted), the parameters are 97.83%, 98.19%, 97.56% and 98.04%, respectively. For the MITDB data set, the parameters are 97.33%, 90.78%, 55.29% and 91.46%. It is important to recognize that for MITDB data set, the PPV value (55.29%) is low which indicates that many of the positive results are detected as false positives using this testing procedure. Calculating the combined values from these databases, the parameters are 96.89%, 98.27%, 92.30% and 98.03%, respectively for the AFDB+NSRDB set, and 97.53%, 98.26%, 90.09% and 98.16% for the AFDB+NSRDB set. For the NSRDB set, the only calculated parameter Sp is 98.28%, as there is no manual AF annotation in the NSRDB database.

Table 2 Statistical results of this method for three testing databases (at the threshold of 0.353)

The existing algorithms for the AF detection are also investigated using the same databases (i.e., the same records and the same reference annotations), and using the same evaluation metrics. Table 3 shows a collection of latest published results from prior literature. The list is not intended to be exhaustive, and more complete investigations are available in [19, 28].

Table 3 Overview of published results of the existing methods using the same databases

We first introduce the methods based on the variability of RR intervals (RRI) [10, 11, 13, 2123, 25].

Kikillus, et al [10] conducted a Markov modeling (MM) technique to identify AF. The calculated test results of Se and Sp were 94.1% (+2.79%, values in parentheses are the differences between our results and the reported results, hereinafter the same) and 93.4% (+4.87%) for the AFDB+NSRDB database.

The method introduced by Dash, et al [11], relies on the combination of the root mean square of successive differences (RMSSD), the turning points ratio (TPR) and SE. The presence of AF using this method was considered if given conditions based on thresholds were satisfied. For the AFDB database, the calculated Se and Sp values were 94.4% (+3.43%) and 95.1% (+3.09%), respectively; and 90.2% (+7.13%) and 91.2% (-0.42%) for the MITDB set, respectively. When compared to our method with respect to the MITDB set, the Sp is slightly better than our method, however, there is an unacceptably lower rate of AF identification Se.

Tatento, et al [13] presented a novel technique using the Kolmogorov-Smirnov test. By choosing the AFDB data set for evaluation, the calculated Se, Sp and PPV values were 94.4% (+2.49%), 97.2% (+1.05%) and 96.0% (+1.62%), respectively. Other researchers’ re-investigated corresponding values were 91.20% (+5.69%), 96.08% (+2.17%) and 90.32% (+7.30%) [19], respectively.

Lian, et al [21] developed an AF detector with its basis centered on the Map of RR intervals versus change of RR intervals (RdR). For the AFDB and MITDB sets, the Se and Sp values were 95.8% (+1.09%) and 96.4% (+1.85%), 98.9% (-1.57%) and 78.8% (+11.98%), respectively. The calculated Sp for the NSRDB database was 90.0% (+8.28%). By comparison, when tested on the MITDB set, the Se is slight higher than that of our new method; there is, however, a markedly lower rate of non-AF detection Sp.

An attractive approach to AF detection was initiated by Huang, et al [23]. It utilized a histogram of Δ RR n and standard deviation (SD) analysis. The calculated Se and Sp were 96.1% (+0.79%) and 98.1% (+0.15%), when the AFDB set was assessed. The calculated Sp was 97.9% (+0.38%) for the NSRDB database. It provided the closest performance to that of this newly proposed method, as can be seen in Tables 2 and 3.

Lee, et al [25] investigated three statistical techniques to determine the presence of AF, and the best performance achieved when Sample entropy was employed. Using the AFDB+NSRDB data set, the calculated Se, Sp and ACC were 97.26% (+0.27%), 95.91% (+2.35%) and 96.14% (+2.02%), respectively.

Parvaresh, et al [20] evaluated three classifiers for AF screening by using autoregressive modeling (AR). Within this method, AR coefficients of 15-second segments of ECGs were taken as features. When tested with the AFDB set, the best performance occurred at the so-called LDA classifier: the calculated Se, Sp and PPV were 96.14% (+0.68%), 93.20% (+4.86%) and 90.09% (+7.52%), respectively.

Slocum, et al [12] published a method based on the reference of AA. The frequency spectrum analysis (FSA) of the remainder generated by canceling the ventricular activity from the surface ECG was applied for differentiating rhythms. Due to the lack of a constant phase relationship between the atrial and ventricular activities, the performance of this type of technique is not high. The AF detection method based only on AA showed inferior performance as can be clearly seen from the Table 3: evaluated on the AFDB set, the calculated Se, Sp and PPV values were 62.80% (+34.02%), 77.46% (+20.60%) and 64.90% (+32.71%), respectively [19].

Other methods that take advantage of multiple character traits (i.e., RRI/AA and FSA/PWA) have also been developed [1416], and were re-investigated in [19]. Recent data consistently indicates these techniques have relatively lower performance, as can be seen in Table 3. The accuracy of multi-feature (or only AA feature) based techniques have been limited by practical challenges encountered in the reliable determination of AA (and/or PWA). Currently, a common rule of thumb is that, as a whole, the sole RRI based techniques are likely to yield better results than those rely on making inferences from multi-feature (or only AA feature) of the surface ECG, since the R-wave peak is the most prominent characteristic trait of an ECG recording and the least susceptible to various kinds of noise.It can be frequently difficult to determine a perfect discrimination threshold for AF episode classification, and it is therefore worth performing further analysis to determine whether the varying discrimination threshold settings significantly influence the performance of our method. For the testing databases, the performance of our method is investigated at various threshold settings. Discrimination threshold values from 0.20 to 0.50 in increments of 0.001 are tested for each of the data sets. Plots of the corresponding results can be seen in Figure 7, where we clearly see that our new method is, preferably performed with the threshold ranging from 0.30 to 0.36. It is sufficient to select a random threshold in this range to investigate the performance of this method. Therefore, the calculated best performing threshold value (i.e., 0.353), derived from the ROC curve of the training set (i.e., LTAFDB database), is appropriate for performance evaluation.

Figure 7

Distributions of Se , Sp , PPV and ACC with respect to various threshold settings when our method was applied to different testing sets. (a) Results of the AFDB set; (b) Results of the AFDB database ( indicates records “00735” and “03665” omitted); (c) Results of the AFDB database ( indicates records “04936” and “05091” omitted); (d) Results of the MITDB database; (e) Results of the NSRDB database; (f) Results of the AFDB+NSRDB database and (g) Results of the AFDB+NSRDB database.

In summary, the results of this study demonstrate that the combination of nonlinear/linear integer filters, symbolic dynamics and SE yields a robust detector. This new detector exhibited a higher detection rate than previous methods. This could possibly lead to incorporation into computerized ECG interpretation systems to improve the reliability of arrhythmia classification.

A special issue on computational complexity

The computational time of our method is also investigated. Our technique is implemented using the C++ programming language. Table 4 displays the computation time taken by our method while testing with the available databases. Detailed information of the Desktop test environment can be seen in the footnote of Table 4. The computation time of our method is significantly less than the total duration of all records in each database, which indicates that the time consumption is negligible: typically about 0.116 seconds per 24 hours of data processed. Larburu, et al [19, 28] investigated a variety of existing methods processed on a computer server and they concluded that the method proposed by Cerutti, et al [29] had the lowest computation time of approximately 0.36 seconds per 1 hour of data processed (using the AFDB set, the relevant Se, Sp and PPV were 96.10% (+0.72%), 81.55% (+16.51%) and 75.76% (+21.85%) [19], see [19, 28, 29] for details). This implies that our method is especially suitable in real-time, long-term ECG monitoring. In addition, Big data is coming of age; our newly developed method shows promise to be of practical use.

Table 4 The computation time of the processing of this method

Benefits and limitations

In this study, we use a discrimination threshold of 0.353 for AF classification. Of note, from Figure 7, increasing in threshold value improves Sp but decreases Se. By contrast, the decreasing in threshold values improves Se but decreases Sp. A compromising solution is thus necessary, and this makes it easy for one to apply specific threshold settings to the concrete application. In spite of this, comparing the latest detection methods when testing with each database, we confirm that a discrimination threshold of 0.353 is adequate to permit better performance of this new method under various situations.

It is commonly asserted and accepted that there is a great deal of time-consuming routines involved in the assessment of AF due to the statistical analysis of irregular/chaotic arrhythmia characteristics. Dramatic benefits can be achieved with the implementation of this AF detector through properly designed recursive algorithms as well as a novel predefined set 1 log 2 N p i log 2 p i for the calculation of ′′ (A), which may markedly reduce computational complexity.

The bin size N was set to 127 in this study because a small quantity of words inside a small bin (N), in general, might indeed reduce the accuracy of estimating the word wv n probability distribution [30]. However, for sporadic AF episodes of relatively short duration (e.g., ten seconds), it might incur false negative detection, and this may be a potential limitation. In this regard, it is an inherent technical difficulty that needs to be overcome in the future, though AF episodes of very short duration are rare in practice. Nevertheless, it is essential to remember this limitation.

Once again, as stated in the previous section, a small PPV calculated from the MITDB database implies that this newly proposed approach needs to be further refined towards a universally applicable method.


As currently available techiniques are only modestly effective in AF episode screening, we developed a fully automated detection method which aims to fulfill two essential needs: (i) earlier real-time identification of AF, and (ii) higher reliability of detection. Therefore, with a method available elsewhere for real-time R-wave detection [31], this newly proposed method could be used in intensive care units. The online realization is easy to implement and is computationally attractive as it consists of only several basic operations such as integer addition/subtraction, integer left/right -shifting, integer comparison, and multiplication and division lying only within k 127000000 ·, as well as 1 floating-point comparison between ′′ (A) and the threshold for the rhythm classification. Several state-of-the-art methods have been briefly reviewed, along with their methodologies and detection accuracy. Our new method is evaluated and compared with these existing methods using the LTAFDB, AFDB, NSRDB, and MITDB databases under various situations. We have also presented explicit tables for quantitative assessment of the performance and computation times. Collectively, our results suggest that this AF detector outperforms the existing methods with respect to the performance metrics Se, Sp, PPV and ACC. It is also worth emphasizing that a few reference annotations of these data sets are themselves imprecise, just as in the AFDB set. Therefore, extensive sets of exact reference annotations are still needed for investigation.


Please visit the “;mkt=zh-CN#cid=498A9A3CCEE3B366&amp;id=498A9A3CCEE3B366%21132” for the compiled C++ dynamic link library files or contact the author for them.


  1. 1.

    Mathew S, Patel J, Joseph S: Atrial fibrillation: mechanistic insights and treatment options. Eur J Intern Med 2009,20(7):672–681. 10.1016/j.ejim.2009.07.011

    Article  Google Scholar 

  2. 2.

    Fuster V, Rydén L, Asinger R, Cannom D, Crijns H, Frye R, Halperin J, Neal Kay G, Klein W, Lévy S, Mcnamara R, Prystowsky E, Samuel Wann L, George Wyse D, Gibbons R, Antman E, Alpert J, Faxon D, Gregoratos G, Hiratzka L, Jacobs A, Russell R, Smith S, Alonso-Garcia A, Blomström-Lundqvist C, De Backer G, Flather M, Hradec J, Oto A, Parkhomenko A, et al.: ACC/AHA/ESC guidelines for the management of patients with atrial fibrillation: executive summary: a report of the A merican C ollege of C ardiology/ A merican H eart A ssociation Task Force on Practice Guidelines and the E uropean S ociety of C ardiology C ommittee for Practice Guidelines and Policy Conferences (Committee to Develop Guidelines for the Management of Patients With Atrial fibrillation) developed in collaboration with the N orth A merican S ociety of P acing and E lectrophysiology . J Am Coll Cardiol 2001,38(4):1231–1265. 10.1016/S0735-1097(01)01587-X

    Article  Google Scholar 

  3. 3.

    Petrutiu S, Ng J, Nijm G, Al-Angari H, Swiryn S, Sahakian A: Atrial fibrillation and waveform characterization. IEEE Eng Med Biol Mag 2006,25(6):24–30.

    Article  Google Scholar 

  4. 4.

    Hylek E, Go A, Chang Y, Jensvold N, Henault L, Selby J, Singer D: Effect of intensity of oral anticoagulation on stroke severity and mortality in atrial fibrillation. N Engl J Med 2003,349(11):1019–1026. 10.1056/NEJMoa022913

    Article  Google Scholar 

  5. 5.

    Miyasaka Y, Barnes M, Gersh B, Cha S, Bailey K, Abhayaratna W, Seward J, Tsang T: Secular trends in incidence of atrial fibrillation in Olmsted County, M innesota, 1980 to 2000, and implications on the projections for future prevalence . Circulation 2006,114(2):119–125. 10.1161/CIRCULATIONAHA.105.595140

    Article  Google Scholar 

  6. 6.

    Gami A, Pressman G, Caples S, Kanagala R, Gard J, Davison D, Malouf J, Ammash N, Friedman P, Somers V: Association of atrial fibrillation and obstructive sleep apnea. Circulation 2004,110(4):364–367. 10.1161/01.CIR.0000136587.68725.8E

    Article  Google Scholar 

  7. 7.

    Mukamal K, Tolstrup J, Friberg J, Jensen G, Grønbæk M: Alcohol consumption and risk of atrial fibrillation in men and women the Copenhagen City Heart Study. Circulation 2005,112(12):1736–1742. 10.1161/CIRCULATIONAHA.105.547844

    Article  Google Scholar 

  8. 8.

    Moody G, Mark R: A new method for detecting atrial fibrillation using RR intervals . In Computers in Cardiology 1983, Volume 10. Aachen: IEEE Computer Society Press; 1983:227–230.

    Google Scholar 

  9. 9.

    Artis S, Mark R, Moody G: Detection of atrial fibrillation using artificial neural networks. In Computers in Cardiology 1991. Venice: IEEE Computer Society Press; 1991:173–176.

    Google Scholar 

  10. 10.

    Kikillus N, Hammer G, Lentz N, Stockwald F, Bolz A: Three different algorithms for identifying patients suffering from atrial fibrillation during atrial fibrillation free phases of the ECG . In Computers in Cardiology 2007. Durham: IEEE; 2007:801–804.

    Google Scholar 

  11. 11.

    Dash S, Chon K, Lu S, Raeder E: Automatic real time detection of atrial fibrillation. Ann Biomed Eng 2009,37(9):1701–1709. 10.1007/s10439-009-9740-z

    Article  Google Scholar 

  12. 12.

    Slocum J, Sahakian A, Swiryn S: Diagnosis of atrial fibrillation from surface electrocardiograms based on computer-detected atrial activity. J Electrocardiol 1992, 25: 1–8.

    Article  Google Scholar 

  13. 13.

    Tateno K, Glass L: Automatic detection of atrial fibrillation using the coefficient of variation and density histograms of RR and Δ RR intervals . Med Biol Eng Comput 2001,39(6):664–671. 10.1007/BF02345439

    Article  Google Scholar 

  14. 14.

    Schmidt R, Harris M, Novac D, Perkhun M: Atrial fibrillation detection. [WO Patent 2,008,007,236. Jan 18, 2008]

  15. 15.

    Couceiro R, Carvalho P, Henriques J, Antunes M, Harris M, Habetha J: Detection of atrial fibrillation using model-based ECG analysis . In 19th International Conference on Pattern Recognition, 2008. Tampa: IEEE; 2008:1–5.

    Google Scholar 

  16. 16.

    Babaeizadeh S, Gregg R, Helfenbein E, Lindauer J, Zhou S: Improvements in atrial fibrillation detection for real-time monitoring. J Electrocardiol 2009,42(6):522–526. 10.1016/j.jelectrocard.2009.06.006

    Article  Google Scholar 

  17. 17.

    Park J, Lee S, Jeon M: Atrial fibrillation detection by heart rate variability in Poincare plot. Biomed Eng Online 2009,8(38):1–12.

    Google Scholar 

  18. 18.

    Yaghouby F, Ayatollahi A, Bahramali R, Yaghouby M, Alavi A: Towards automatic detection of atrial fibrillation: A hybrid computational approach. Comput Biol Med 2010,40(11):919–930.

    Article  Google Scholar 

  19. 19.

    Larburu N, Lopetegi T, Romero I: Comparative study of algorithms for Atrial Fibrillation detection. In Computing in Cardiology 2011. Hangzhou: IEEE; 2011:265–268.

    Google Scholar 

  20. 20.

    Parvaresh S, Ayatollahi A: Automatic atrial fibrillation detection using autoregressive modeling. In 2011 International Conference on Biomedical Engineering and Technology. Kuala Lumpur: APCBEES; 2011:4–5.

    Google Scholar 

  21. 21.

    Lian J, Wang L, Muessig D: A simple method to detect atrial fibrillation using RR intervals . Am J Cardiol 2011,107(10):1494–1497. 10.1016/j.amjcard.2011.01.028

    Article  Google Scholar 

  22. 22.

    Lake D, Moorman J: Accurate estimation of entropy in very short physiological time series: the problem of atrial fibrillation detection in implanted ventricular devices. Am J Physiol Heart Circ Physiol 2011, 300: H319-H325. 10.1152/ajpheart.00561.2010

    Article  Google Scholar 

  23. 23.

    Huang C, Ye S, Chen H, Li D, He F, Tu Y: A novel method for detection of the transition between atrial fibrillation and sinus rhythm. IEEE Trans Biomed Eng 2011,58(4):1113–1119.

    Article  Google Scholar 

  24. 24.

    Yaghouby F, Ayatollahi A, Bahramali R, Yaghouby M: Robust genetic programming-based detection of atrial fibrillation using RR intervals. Expert Syst 2012,29(2):183–199.

    Google Scholar 

  25. 25.

    Lee J, Reyes B, McManus D, Mathias O, Chon K: Atrial fibrillation detection using an Iphone 4S. IEEE Trans Biomed Eng 2013, 60: 203–206.

    Article  Google Scholar 

  26. 26.

    Shannon C: A mathematical theory of communication. Bell Syst Tech J 1948,27(3):379–423. 10.1002/j.1538-7305.1948.tb01338.x

    MATH  MathSciNet  Article  Google Scholar 

  27. 27.

    The Physionet ECG database [Online]. Available:

  28. 28.

    Larburu N: Comparative study of algorithms for atrial fibrillation detection. Master’s thesis Public University of Navarra, Pamplona, Spain, 2011, [Online]. Available:

  29. 29.

    Cerutti S, Mainardi L, Porta A, Bianchi A: Analysis of the dynamics of RR interval series for the detection of atrial fibrillation episodes. In Computers in Cardiology 1997. Lund: IEEE Computer Society Press; 1997:77–80.

    Google Scholar 

  30. 30.

    Voss A, Kurths J, Kleiner H, Witt A, Wessel N, Saparin P, Osterziel K, Schurath R, Dietz R: The application of methods of non-linear dynamics for the improved and predictive recognition of patients threatened by sudden cardiac death. Cardiovasc Res 1996,31(3):419–433. 10.1016/0008-6363(96)00008-9

    Article  Google Scholar 

  31. 31.

    Pan J, Tompkins W: A real-time QRS detection algorithm. IEEE Trans Biomed Eng 1985,32(3):230–236.

    Article  Google Scholar 

Download references


This work was supported in part by the National Basic Research Program 973 (2010CB732606), the Guangdong Innovation Research Team Fund for Low-Cost Healthcare Technologies in China, the External Cooperation Program of the Chinese Academy of Sciences (GJHZ1212), the Key Lab for Health Informatics of Chinese Academy of Sciences, the Enhancing Program of Key Laboratories of Shenzhen City (ZDSY20120617113021359), and the Supportive Program of “Peacock Program” of Shenzhen City for GIRTF-LCHT Team.

Author information



Corresponding authors

Correspondence to Xiaolin Zhou or Yuanting Zhang.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

XZ developed the algorithm and drafted the manuscript. HD collected the clinical ECG data and revised the manuscript. BU, EP and YZ provided suggestions and comments as well as much help in revising the manuscript. All authors read and approved the final manuscript.

Authors’ original submitted files for images

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and Permissions

About this article

Cite this article

Zhou, X., Ding, H., Ung, B. et al. Automatic online detection of atrial fibrillation based on symbolic dynamics and Shannon entropy. BioMed Eng OnLine 13, 18 (2014).

Download citation


  • ECG
  • RR interval
  • Atrial fibrillation
  • Nonlinear filter
  • Integer filter
  • Symbolic dynamics
  • Shannon entropy