- Research
- Open Access
- Published:

# Automatic online detection of atrial fibrillation based on symbolic dynamics and Shannon entropy

*BioMedical Engineering OnLine*
**volume 13**, Article number: 18 (2014)

## Abstract

### Background

Atrial fibrillation (AF) is the most common and debilitating abnormalities of the arrhythmias worldwide, with a major impact on morbidity and mortality. The detection of AF becomes crucial in preventing both acute and chronic cardiac rhythm disorders.

### Objective

Our objective is to devise a method for real-time, automated detection of AF episodes in electrocardiograms (ECGs). This method utilizes RR intervals, and it involves several basic operations of nonlinear/linear integer filters, symbolic dynamics and the calculation of Shannon entropy. Using novel recursive algorithms, online analytical processing of this method can be achieved.

### Results

Four publicly-accessible sets of clinical data (Long-Term AF, MIT-BIH AF, MIT-BIH Arrhythmia, and MIT-BIH Normal Sinus Rhythm Databases) were selected for investigation. The first database is used as a training set; in accordance with the receiver operating characteristic (ROC) curve, the best performance using this method was achieved at the discrimination threshold of 0.353: the sensitivity (*Se*), specificity (*Sp*), positive predictive value (*PPV*) and overall accuracy (*ACC*) were 96.72%, 95.07%, 96.61% and 96.05%, respectively. The other three databases are used as testing sets. Using the obtained threshold value (i.e., 0.353), for the second set, the obtained parameters were 96.89%, 98.25%, 97.62% and 97.67%, respectively; for the third database, these parameters were 97.33%, 90.78%, 55.29% and 91.46%, respectively; finally, for the fourth set, the *Sp* was 98.28%. The existing methods were also employed for comparison.

### Conclusions

Overall, in contrast to the other available techniques, the test results indicate that the newly developed approach outperforms traditional methods using these databases under assessed various experimental situations, and suggest our technique could be of practical use for clinicians in the future.

## Background

Atrial fibrillation (AF) is recognized as the most common clinically encountered arrhythmia in adults [1], which affects approximately 0.4% of the general population. The prevalence of this tachyarrhythmia increases with age, with less than 1% affected in persons under the age of 60 years and in excess of 6% for those over the age of 80 years [2, 3]. Atrial fibrillation is associated with a high risk of stroke, heart disease (e.g., congestive cardiac failure), and cardiovascular mortality [1, 4]. There is also a close relationship between AF and obesity [5], obstructive sleep apnea [6], and long-term alcoholism [7], which reciprocally bear cumulative risks for promoting the development of AF [1]. The early identification of AF appears to be crucial for patients with cardiovascular disease, especially for stroke patients to whom the secondary stroke prevention is of primary importance.

Issues relating to clinical significance of rhythm classification and the impetus for improving the accuracy of atrial tachyarrhythmia estimation have motivated the development of innovative computerized AF detectors. Since the early 1980s, a series of sophisticated methods have been investigated to cope with the challenges of AF detection [8–25]. Most of which are based upon two main character traits of this type of arrhythmia shown in a surface electrocardiogram (ECG): (*i*) RR (R-wave peak to R-wave peak) interval irregularity (i.e., chaotic behavior of heart rate variability), and (*ii*) P-wave absence (PWA) or F-wave substitution (i.e., very low amplitude waveforms of odd morphologies) resulting from the abnormal rapid atrial activity (AA). Although P waves or cardiac AA can be an alternative clue in the detection of AF, the absence or presence of P waves are not readily identifiable as various types of high-intensity noise often coexist in ECGs, which may lead to a low degree of predictive accuracy. In addition, the relationship between AA in the surface ECG and the diverse mechanisms of AF has not yet been well delineated [3]. Due to the challenges in detecting AA in ECG measurements, detection techniques based on inferences from RR intervals are preferred to produce relatively robust outcomes [21–23, 25].

In this study, a reliable method for the fully automated detection of AF episodes from surface ECGs is proposed. This method comprises of a three-pass procedure. The initial pass, where a RR interval sequence is pre-processed with nonlinear and integer filters, which aims to generate low/high scale reference sequences. The second pass, which aims to obtain a symbolic sequence, where the information of the RR interval sequence is subsequently compressed by the symbolic dynamics with sequences obtained from the initial pass. Finally, Shannon entropy is used in the third pass, to calculate the entropy of the symbolic sequence and thereby discriminate whether or not AF is present in the current cardiac beat. Further methodological insight of present key points on the online analytical processing of measurements through the recursive realization with respect to beat-by-beat classification is discussed in the following sections. Ultimately, we quantitatively investigate the performance of our newly developed technique to that of currently state-of-the-art techniques with four widely used clinical databases under various experimental situations.

## Methodology

### Pre-processing of *RR*
_{
n
}series

#### A. Median filter

A median filter is implemented by windowing the acquired data, ranking the samples in the window, and outputting the median of the sorted samples. Considering a RR interval (*RR*
_{
n
}) sequence *x*
_{
n
}, as shown in Figure 1(a), the output *y*
_{
n
} of this nonlinear filter is given by,

where the window is of a fixed width 2*w*+1. From the perspective of signal processing, the time delay of the median filter is *w*. A window size of 17 is used herein, with a delay of 8 samples. The introduction of a median filter brings about two advantages: (*i*) the suppression of unwanted outliers, which are mostly caused by erroneously detected (or missed) R-wave peaks; (*ii*) to preserve sharp edges (i.e., onsets and terminations of AF episodes) without extensively blurring the context.

#### B. Integer filter for low scale reference

Subsequently, we filter the output *y*
_{
n
} of median filter with a low-pass filter of the form

where, the gain is *G* *a* *i* *n* 1=16=2^{4}, and the intrinsic delay of *H*
_{
l
}(*z*) is 7.5 samples. This low-pass filter is applied to smooth *y*
_{
n
} resulting from the previous median filtering. Another benefit of the low-pass filter is the removal of fluctuations possibly caused by Respiratory Sinus Arrhythmia (RSA) phenomena around the current sample from acquisition. Let *xl*
_{
n
} be the output of this filter, as illustrated in Figure 1(b).

#### C. Integer filter for high scale reference

Another low-pass filter *H*
_{
h
}(*z*) is then applied to the resultant *xl*
_{
n
} of the previous low-pass filter *H*
_{
l
}(*z*),

where, the gain is *G* *a* *i* *n* 2=2048=2^{11}, and the relevant delay of *H*
_{
h
}(*z*) is 47 samples. This low-pass filter is introduced to generate a reference RR sequence of a larger scale, which needs to be exploited in the definition of symbolic series as explained in the following subsection. The resulting output denoted by *xh*
_{
n
} is shown in Figure 1(c).

As we have seen, the time delays of *x*
_{
n
} and *xl*
_{
n
} are −62.5 and −47 samples with respect to *xh*
_{
n
}, respectively. To ensure synchronization of the filtered data, let ${x}_{n}^{\prime}$ and ${\mathit{\text{xl}}}_{n}^{\prime}$ denote the corresponding time-delay corrected sequences of *x*
_{
n
} and *xl*
_{
n
}, respectively. Then, $\Delta {\mathit{\text{RR}}}_{n}={x}_{n}^{\prime}-{\mathit{\text{xl}}}_{n}^{\prime}$ can be defined as the difference in time delay, seen in Figure 1(d).

### Symbolic dynamics of *Δ* *RR*
_{
n
}

The purpose of employing symbolic dynamics is to describe the dynamic behavior of *Δ* *RR*
_{
n
} with respect to *xh*
_{
n
}. Symbolic dynamics encodes the information as a variation of *RR*
_{
n
} to a series with fewer symbols, with each symbol representing an instantaneous state. The implemented thresholds can be defined as: *t* *h* *r* *e* 1=*xh*
_{
n
}×2^{−4} (with *t* *h* *r* *e* 1=*xh*
_{
n
}>>4), *t* *h* *r* *e* 2=*xh*
_{
n
}×2^{−3} (with *t* *h* *r* *e* 2=*xh*
_{
n
}>>3), *t* *h* *r* *e* 3=*t* *h* *r* *e* 1+*t* *h* *r* *e* 2, *t* *h* *r* *e* 4=*xh*
_{
n
}×2^{−2} (with *t* *h* *r* *e* 4=*xh*
_{
n
}>>2) and *t* *h* *r* *e* 5=*t* *h* *r* *e* 4+*t* *h* *r* *e* 1. The mapping function of the symbol transform can therefore be defined as,

The raw RR sequence *x*
_{
n
} is then quantified into symbol sequence *sy*
_{
n
} with specific symbols from the predefined “alphabet” in Eq. (4) (i.e., 0 to 9). Recalling Figure 1(a)-(d) and scanning the distribution of calculated symbols in Figure 1(e), we confirm that most of normal beats are defined as zero symbols, and possible abnormal beats (arrhythmias, e.g., AF) are defined as non-zero symbols by the transform Eq. (4).

To facilitate the analysis of *sy*
_{
n
}, the widely used 3-symbol template (i.e., a word consists of 3 successive symbols) is applied to examine entropic properties. The word value can then be calculated by a novel operator as defined below,

where, *sy*
_{
n−2}×2^{8} and *sy*
_{
n−1}×2^{4} are implemented with *sy*
_{
n−2}<<8 and *sy*
_{
n−1}<<4, and 0≤*wv*
_{
n
}≤2457. Figure 2 briefly elucidates the transformation of the symbol sequence with the template and the corresponding word, while Figure 1(f) depicts the word sequence of *sy*
_{
n
} shown in Figure 1(e).

### Shannon entropy

Shannon entropy (SE) is a statistical tool that quantifies a time series in terms of the information size. For the sake of completeness, we define the discrete probability space of a dynamic system as **A**=(*A*|*P*). The total number of elements in **A** is *N*. The characteristic elements can then be defined as *A*={*a*
_{1},⋯,*a*
_{
k
}}, as well as the relevant probability set *P*={*p*
_{1},⋯,*p*
_{
k
}}(1≤*k*≤*N*). Each element *a*
_{
i
} has probability *p*
_{
i
}=*N*
_{
i
}/*N* ($0<{p}_{i}\le 1,\sum _{i=1}^{k}{p}_{i}=1$), where *N*
_{
i
} is the total number of the element *a*
_{
i
} in **A**. Thus, the SE of **A** is defined as [26],

By Jensen’s inequality, we can prove $\mathcal{\mathscr{H}}(\mathbf{A})\le \underset{2}{log}k\le {log}_{2}N$ with equality if *p*
_{
i
}≡1/*N* *and* *k*≡*N* for all *i*. Then, a uniform distribution of $\mathcal{\mathscr{H}}(\mathbf{A})$ can be expressed as,

where, if *N*≡1, make log2*N*=1. Eq. (7) is also referred to as the normalized entropy, since the entropy is divided by the maximum entropy log2*N*. A coarser version of ${\mathcal{\mathscr{H}}}^{\prime}(\mathbf{A})$ can be defined as,

Currently, the dynamic **A** consists of all 127 consecutive word elements from *wv*
_{
n−126} to *wv*
_{
n
} (the bin size in this case is *N*=127). By determining the characteristic set *A* and the relevant probability set *P* with these elements, we can thus calculate the SE ${\mathcal{\mathscr{H}}}^{\mathrm{\prime \prime}}(\mathbf{A})$. The presence of AF is then detectable, with the rhythm labeled AF if ${\mathcal{\mathscr{H}}}^{\mathrm{\prime \prime}}(\mathbf{A})$ exceeds a discrimination threshold, and otherwise non-AF, which can be seen in Figure 1(g). We utilize the training database to determine the optimal discrimination threshold by investigating various threshold settings which lie within the range [0.0, 1.0]; the best performing threshold of 0.353 is thus derived and employed for the performance assessment using different testing databases.

### Key issues of online processing

From Eq. (1)–(5) and (8), outwardly, this AF detection technique poses computational challenges. However, these challenges can be overcome by implementing clever recursive algorithms with beat-by-beat, real-time processing.

#### A. Pseudo-recursive median filtering

The median filter in Eq. (1) can be implemented with a so-called pseudo-recursive method: for input *x*
_{
i
}, we define *S*={*s*
_{
r
}
*↑*:1≤*r*≤2*w*+1} as a sorted array of successive elements from *x*
_{
i−2w−1} to *x*
_{
i−1}, where the output *y*
_{
i
} is obtained by following steps ➊-➎ below,

➊ A Binary search technique is used to seek out the position *m* of the sample *x*
_{
i−2w−1} which will depart from the window (i.e., *s*
_{
m
}=*x*
_{
i−2w−1}. Simultaneously, *x*
_{
i
} will get into the window);

➋ The Binary search technique is applied again to search for the position *t* at which the input *x*
_{
i
} needs to be set (i.e., *s*
_{
t
}<*x*
_{
i
}≤*s*
_{
t+1});

➌ From positions *m* to *t*, the current *s*
_{
r
} is replaced with the adjacent *s*
_{
r ± 1} (the ’ _{±}’ indicates where the element is taken from the right or left, with the ’ _{+}’ and ’ _{−}’ symbols representing the element to the right and left, respectively);

➍ Replace the element *s*
_{
t
} with *x*
_{
i
};

➎ Median *s*
_{
w+1} of the updated *S* becomes output *y*
_{
i
}.

For the following input *x*
_{
i+1}, we repeat steps ➊ to ➎ and obtain the new output *s*
_{
w+1} (i.e., *y*
_{
i+1}), as shown in Figure 3, where the sorting utilizes the Binary search technique twice. Comparing our technique with the traditional median filter, the computational complexity can be decreased from approximately *O*(*n*^{2}) to *O*(*n*).

#### B. Recursive implementation of integer filters

The recursive implementation (also referred to as the “difference equation”) of the filter *H*
_{
l
}(*z*) can be expressed as,

The above equation, Eq. (9) includes 1 integer addition, 3 integer subtractions as well as 1 integer right-shift operation, when *xl*
_{
n
}>>4 (as *G* *a* *i* *n* 1=2^{4}) to offset the gain of *H*
_{
l
}(*z*).

The filter *H*
_{
h
}(*z*) can then be computed recursively using

where, *xh*
_{
n−1}×2 is implemented with *xh*
_{
n−1}<<1. The above equation, Eq. (10) consists of 2 integer additions, 8 integer subtractions, 1 integer left-shift operation and 1 integer right-shift operation, when *xh*
_{
n
}>>11 (as *G* *a* *i* *n* 2=2^{11}) to offset the gain of *H*
_{
h
}(*z*).

#### C. Mapping the definition of $-\frac{1}{\underset{2}{log}N}{p}_{i}\underset{2}{log}{p}_{i}$

Investigating the dynamic **A**, we immediately see that each characteristic symbol of each bin *N* may have the probability *p*
_{
i
}=*i*/*N* (1≤*i*≤*N*, i.e., 1/*N*≤*p*
_{
i
}≤1). Along these lines, a probability array *PiMap* can be pre-calculated,

where, *C* *o* *n* *s*=1000000 is a constant such that decimal floating points can be converted into integers and *N*=127, and $\stackrel{\lfloor \xb7\rfloor}{=}$ indicates to take the integer part of each $-\frac{\mathit{\text{Cons}}}{\underset{2}{log}N}{p}_{i}\underset{2}{log}{p}_{i}$.

Notably, for each cardiac cycle screened, this predefined *PiMap* permits the sole operation by picking the straightforward integer (i.e., *P* *i* *M* *a* *p*[*i*]) from the set *PiMap* in accordance with the index *i* rather than calculating $-\frac{1}{\underset{2}{log}N}{p}_{i}\underset{2}{log}{p}_{i}$ using arithmetic and logarithmic operations. The use of this predefined calculation significantly decreases calculation times.

#### D. Recursive implementation of ${\mathcal{\mathscr{H}}}^{\mathrm{\prime \prime}}(\mathbf{A})$

We define a buffer array ${\mathit{\text{nu}}}_{w{v}_{i}}$ (*wv*
_{
i
}≤2457) to store the number of the *i* th characteristic element *wv*
_{
i
} in space **A**. For the input *wv*
_{
n
}, it will get into **A** (i.e., *wv*
_{
n
} will be the rightmost element), and simultaneously the leftmost element *wv*
_{
n−127} will depart from **A**, see Figure 2 for clarity. It is obvious that a variation of SE ${\mathcal{\mathscr{H}}}^{\prime}(\mathbf{A})$ is purely determined by ${\mathit{\text{nu}}}_{w{v}_{n}}$ and ${\mathit{\text{nu}}}_{w{v}_{n-127}}$ in dynamic **A**. Therefore, ${\mathcal{\mathscr{H}}}^{\mathrm{\prime \prime}}(\mathbf{A})$ is calculated recursively by the algorithm below,

where $s{h}_{n}^{\prime}$ and $s{h}_{n}^{\mathrm{\prime \prime}}$ represent ${\mathcal{\mathscr{H}}}^{\prime}(\mathbf{A})$ and ${\mathcal{\mathscr{H}}}^{\mathrm{\prime \prime}}(\mathbf{A})$, respectively; ^{(∗} indicates that *P* *i* *M* *a* *p*[*i*]=0 is fixed for the case *i*≡0; and 127000000=*N*∗*C* *o* *n* *s*=127×1000000. For the next input *wv*
_{
n+1}, steps âž€-âž‚ are again executed to obtain $s{h}_{n+1}^{\mathrm{\prime \prime}}$. From an online processing perspective, the time delays of $s{h}_{n}^{\mathrm{\prime \prime}}$ are 64 and 126.5 samples with respect to *xh*
_{
n
} and *x*
_{
n
}, respectively.

An architecture of the overall logic of the recursive realization can be seen in Figure 4. By using recursive algorithms, this AF detector consists of several basic operations, such as integer addition/subtraction, integer comparison and integer shifting. In effect, the calculation of $s{h}_{n}^{\mathrm{\prime \prime}}$ and distinguishing the current beat *x*
_{
n
}, only needs to include 1 multiplication and 1 division lying within $\frac{k}{127000000}\xb7$, together with 1 floating-point comparison between $s{h}_{n}^{\mathrm{\prime \prime}}$ and a threshold. Consequently, a useful computational efficiency can be achieved.

## Materials and evaluation

### Clinical ECG data sets

Performance of this new AF detection method is evaluated using four popular sets of clinical ECGs (the Long-Term AF Database [LTAFDB], the MIT-BIH AF Database [AFDB], the MIT-BIH Arrhythmia Database [MITDB], and the MIT-BIH Normal Sinus Rhythm Database [NSRDB]) [27]. The LTAFDB database is used as the initial training set, while the other three databases are used as the testing sets. The contents of these databases are summarized in Table 1. All reference annotations of the four databases are examined in this study.

### Performance metrics

The performance of our newly developed algorithm and existing methods are investigated in terms of sensitivity (*Se*), specificity (*Sp*), positive predictive value (*PPV*), and overall accuracy (*ACC*),

where, for a specific data set, *TP* (true positive) is the number of beats in AF which are correctly detected as AF, *TN* (true negative) is the number of beats in non-AF which are correctly detected as non-AF, *FP* (false positive) is the number of beats in non-AF which are incorrectly detected as AF, and *FN* (false negative) is the number of beats in AF which are incorrectly detected as non-AF. The proportion of beats in true AF which are correctly identified as AF is represented by *Se*, while *Sp* represents the proportion of beats in true non-AF which are correctly identified as non-AF, *PPV* represents the proportion of algorithm results that are true positive, and *ACC* represents the overall accuracy of our method. We consider *Se* and *Sp* as the main metrics, while *PPV* and *ACC* are complementary.

### Results and discussion

The values of SE ${\mathcal{\mathscr{H}}}^{\mathrm{\prime \prime}}(\mathbf{A})$ for AF (519687 beats) and non-AF (701887 beats) annotations in the AFDB database (a total of 1221574 beats for all of the 25 records) can be seen in Figure 5. It is apparent that ${\mathcal{\mathscr{H}}}^{\mathrm{\prime \prime}}(\mathbf{A})$ discriminates AF well.

The receiver operating characteristic (ROC) curves are widely used in the medical field to determine the optimal discrimination threshold for clinical tests. In this work, the LTAFDB database is used as the training set to obtain the optimal threshold for our algorithm. The threshold is tested from 0.0 to 1.0 in increments of 0.001 for the training set, and the values of *Se*, *Sp*, 1−*S* *p*, *PPV* and *ACC* are calculated for each threshold setting. Thus we obtain the ROC curve, as shown in Figure 6. In the ROC space of Figure 6, *a* is the point of the perfect classification, at which the *Se* and *Sp* are both equal to 100% and *b* is the point of the best performance of our method on the ROC curve, at which it has the shortest distance to *a*. We can thus determine the parameters at position *b*, where the discrimination threshold is 0.353, and the values for *Se*, *Sp*, *PPV* and *ACC* are 96.72%, 95.07%, 96.61% and 96.05%, respectively. We therefore take the best performing threshold value of 0.353 for quantitative assessment when our method is applied to other three testing databases.

For our newly presented method, the statistical results from the testing sets AFDB, MITDB and NSRDB databases are summarized in Table 2. Specifically, for the AFDB set, the calculated *Se*, *Sp*, *PPV* and *ACC* parameters are 96.89%, 98.25%, 97.62% and 97.67%, respectively. Examining the AFDB^{‡} set (^{‡} indicates records “00735” and “03665” omitted), the parameters are 96.82%, 98.06%, 97.61% and 97.50%, respectively. For the AFDB^{†} set (^{†} indicates records “04936” and “05091” omitted), the parameters are 97.83%, 98.19%, 97.56% and 98.04%, respectively. For the MITDB data set, the parameters are 97.33%, 90.78%, 55.29% and 91.46%. It is important to recognize that for MITDB data set, the *PPV* value (55.29%) is low which indicates that many of the positive results are detected as false positives using this testing procedure. Calculating the combined values from these databases, the parameters are 96.89%, 98.27%, 92.30% and 98.03%, respectively for the AFDB+NSRDB set, and 97.53%, 98.26%, 90.09% and 98.16% for the AFDB^{†}+NSRDB set. For the NSRDB set, the only calculated parameter *Sp* is 98.28%, as there is no manual AF annotation in the NSRDB database.

The existing algorithms for the AF detection are also investigated using the same databases (i.e., the same records and the same reference annotations), and using the same evaluation metrics. Table 3 shows a collection of latest published results from prior literature. The list is not intended to be exhaustive, and more complete investigations are available in [19, 28].

We first introduce the methods based on the variability of RR intervals (RRI) [10, 11, 13, 21–23, 25].

Kikillus, *et al*
[10] conducted a Markov modeling (MM) technique to identify AF. The calculated test results of *Se* and *Sp* were 94.1% (+2.79%, values in parentheses are the differences between our results and the reported results, hereinafter the same) and 93.4% (+4.87%) for the AFDB+NSRDB database.

The method introduced by Dash, *et al*
[11], relies on the combination of the root mean square of successive differences (RMSSD), the turning points ratio (TPR) and SE. The presence of AF using this method was considered if given conditions based on thresholds were satisfied. For the AFDB^{†} database, the calculated *Se* and *Sp* values were 94.4% (+3.43%) and 95.1% (+3.09%), respectively; and 90.2% (+7.13%) and 91.2% (-0.42%) for the MITDB set, respectively. When compared to our method with respect to the MITDB set, the *Sp* is slightly better than our method, however, there is an unacceptably lower rate of AF identification *Se*.

Tatento, *et al*
[13] presented a novel technique using the Kolmogorov-Smirnov test. By choosing the AFDB data set for evaluation, the calculated *Se*, *Sp* and *PPV* values were 94.4% (+2.49%), 97.2% (+1.05%) and 96.0% (+1.62%), respectively. Other researchers’ re-investigated corresponding values were 91.20% (+5.69%), 96.08% (+2.17%) and 90.32% (+7.30%) [19], respectively.

Lian, *et al*
[21] developed an AF detector with its basis centered on the Map of RR intervals versus change of RR intervals (RdR). For the AFDB and MITDB sets, the *Se* and *Sp* values were 95.8% (+1.09%) and 96.4% (+1.85%), 98.9% (-1.57%) and 78.8% (+11.98%), respectively. The calculated *Sp* for the NSRDB database was 90.0% (+8.28%). By comparison, when tested on the MITDB set, the *Se* is slight higher than that of our new method; there is, however, a markedly lower rate of non-AF detection *Sp*.

An attractive approach to AF detection was initiated by Huang, *et al*
[23]. It utilized a histogram of *Δ* *RR*
_{
n
} and standard deviation (SD) analysis. The calculated *Se* and *Sp* were 96.1% (+0.79%) and 98.1% (+0.15%), when the AFDB set was assessed. The calculated *Sp* was 97.9% (+0.38%) for the NSRDB database. It provided the closest performance to that of this newly proposed method, as can be seen in Tables 2 and 3.

Lee, *et al*
[25] investigated three statistical techniques to determine the presence of AF, and the best performance achieved when Sample entropy was employed. Using the AFDB^{†}+NSRDB data set, the calculated *Se*, *Sp* and *ACC* were 97.26% (+0.27%), 95.91% (+2.35%) and 96.14% (+2.02%), respectively.

Parvaresh, *et al*
[20] evaluated three classifiers for AF screening by using autoregressive modeling (AR). Within this method, AR coefficients of 15-second segments of ECGs were taken as features. When tested with the AFDB^{‡} set, the best performance occurred at the so-called LDA classifier: the calculated *Se*, *Sp* and *PPV* were 96.14% (+0.68%), 93.20% (+4.86%) and 90.09% (+7.52%), respectively.

Slocum, *et al*
[12] published a method based on the reference of AA. The frequency spectrum analysis (FSA) of the remainder generated by canceling the ventricular activity from the surface ECG was applied for differentiating rhythms. Due to the lack of a constant phase relationship between the atrial and ventricular activities, the performance of this type of technique is not high. The AF detection method based only on AA showed inferior performance as can be clearly seen from the Table 3: evaluated on the AFDB^{‡} set, the calculated *Se*, *Sp* and *PPV* values were 62.80% (+34.02%), 77.46% (+20.60%) and 64.90% (+32.71%), respectively [19].

Other methods that take advantage of multiple character traits (i.e., RRI/AA and FSA/PWA) have also been developed [14–16], and were re-investigated in [19]. Recent data consistently indicates these techniques have relatively lower performance, as can be seen in Table 3. The accuracy of multi-feature (or only AA feature) based techniques have been limited by practical challenges encountered in the reliable determination of AA (and/or PWA). Currently, a common rule of thumb is that, as a whole, the sole RRI based techniques are likely to yield better results than those rely on making inferences from multi-feature (or only AA feature) of the surface ECG, since the R-wave peak is the most prominent characteristic trait of an ECG recording and the least susceptible to various kinds of noise.It can be frequently difficult to determine a perfect discrimination threshold for AF episode classification, and it is therefore worth performing further analysis to determine whether the varying discrimination threshold settings significantly influence the performance of our method. For the testing databases, the performance of our method is investigated at various threshold settings. Discrimination threshold values from 0.20 to 0.50 in increments of 0.001 are tested for each of the data sets. Plots of the corresponding results can be seen in Figure 7, where we clearly see that our new method is, preferably performed with the threshold ranging from 0.30 to 0.36. It is sufficient to select a random threshold in this range to investigate the performance of this method. Therefore, the calculated best performing threshold value (i.e., 0.353), derived from the ROC curve of the training set (i.e., LTAFDB database), is appropriate for performance evaluation.

In summary, the results of this study demonstrate that the combination of nonlinear/linear integer filters, symbolic dynamics and SE yields a robust detector. This new detector exhibited a higher detection rate than previous methods. This could possibly lead to incorporation into computerized ECG interpretation systems to improve the reliability of arrhythmia classification.

### A special issue on computational complexity

The computational time of our method is also investigated. Our technique is implemented using the C++ programming language. Table 4 displays the computation time taken by our method while testing with the available databases. Detailed information of the Desktop test environment can be seen in the footnote of Table 4. The computation time of our method is significantly less than the total duration of all records in each database, which indicates that the time consumption is negligible: typically about 0.116 seconds per 24 hours of data processed. Larburu, *et al*
[19, 28] investigated a variety of existing methods processed on a computer server and they concluded that the method proposed by Cerutti, *et al*
[29] had the lowest computation time of approximately 0.36 seconds per 1 hour of data processed (using the AFDB^{‡} set, the relevant *Se*, *Sp* and *PPV* were 96.10% (+0.72%), 81.55% (+16.51%) and 75.76% (+21.85%) [19], see [19, 28, 29] for details). This implies that our method is especially suitable in real-time, long-term ECG monitoring. In addition, Big data is coming of age; our newly developed method shows promise to be of practical use.

## Benefits and limitations

In this study, we use a discrimination threshold of 0.353 for AF classification. Of note, from Figure 7, increasing in threshold value improves *Sp* but decreases *Se*. By contrast, the decreasing in threshold values improves *Se* but decreases *Sp*. A compromising solution is thus necessary, and this makes it easy for one to apply specific threshold settings to the concrete application. In spite of this, comparing the latest detection methods when testing with each database, we confirm that a discrimination threshold of 0.353 is adequate to permit better performance of this new method under various situations.

It is commonly asserted and accepted that there is a great deal of time-consuming routines involved in the assessment of AF due to the statistical analysis of irregular/chaotic arrhythmia characteristics. Dramatic benefits can be achieved with the implementation of this AF detector through properly designed recursive algorithms as well as a novel predefined set $-\frac{1}{\underset{2}{log}N}{p}_{i}\underset{2}{log}{p}_{i}$ for the calculation of ${\mathcal{\mathscr{H}}}^{\mathrm{\prime \prime}}(\mathbf{A})$, which may markedly reduce computational complexity.

The bin size *N* was set to 127 in this study because a small quantity of words inside a small bin (≪*N*), in general, might indeed reduce the accuracy of estimating the word *wv*
_{
n
} probability distribution [30]. However, for sporadic AF episodes of relatively short duration (e.g., ten seconds), it might incur false negative detection, and this may be a potential limitation. In this regard, it is an inherent technical difficulty that needs to be overcome in the future, though AF episodes of very short duration are rare in practice. Nevertheless, it is essential to remember this limitation.

Once again, as stated in the previous section, a small *PPV* calculated from the MITDB database implies that this newly proposed approach needs to be further refined towards a universally applicable method.

## Conclusions

As currently available techiniques are only modestly effective in AF episode screening, we developed a fully automated detection method which aims to fulfill two essential needs: (*i*) earlier real-time identification of AF, and (*ii*) higher reliability of detection. Therefore, with a method available elsewhere for real-time R-wave detection [31], this newly proposed method could be used in intensive care units. The online realization is easy to implement and is computationally attractive as it consists of only several basic operations such as integer addition/subtraction, integer left/right -shifting, integer comparison, and multiplication and division lying only within $\frac{k}{127000000}\xb7$, as well as 1 floating-point comparison between ${\mathcal{\mathscr{H}}}^{\mathrm{\prime \prime}}(\mathbf{A})$ and the threshold for the rhythm classification. Several state-of-the-art methods have been briefly reviewed, along with their methodologies and detection accuracy. Our new method is evaluated and compared with these existing methods using the LTAFDB, AFDB, NSRDB, and MITDB databases under various situations. We have also presented explicit tables for quantitative assessment of the performance and computation times. Collectively, our results suggest that this AF detector outperforms the existing methods with respect to the performance metrics *Se*, *Sp*, *PPV* and *ACC*. It is also worth emphasizing that a few reference annotations of these data sets are themselves imprecise, just as in the AFDB set. Therefore, extensive sets of exact reference annotations are still needed for investigation.

## Appendix

Please visit the “https://onedrive.live.com/?gologin=1&mkt=zh-CN#cid=498A9A3CCEE3B366&id=498A9A3CCEE3B366%21132” for the compiled C++ dynamic link library files or contact the author for them.

## References

- 1.
Mathew S, Patel J, Joseph S:

**Atrial fibrillation: mechanistic insights and treatment options.***Eur J Intern Med*2009,**20**(7):672–681. 10.1016/j.ejim.2009.07.011 - 2.
Fuster V, Rydén L, Asinger R, Cannom D, Crijns H, Frye R, Halperin J, Neal Kay G, Klein W, Lévy S, Mcnamara R, Prystowsky E, Samuel Wann L, George Wyse D, Gibbons R, Antman E, Alpert J, Faxon D, Gregoratos G, Hiratzka L, Jacobs A, Russell R, Smith S, Alonso-Garcia A, Blomström-Lundqvist C, De Backer G, Flather M, Hradec J, Oto A, Parkhomenko A,

*et al*.:**ACC/AHA/ESC****guidelines for the management of patients with atrial fibrillation: executive summary: a report of the****A****merican****C****ollege of****C****ardiology/****A****merican****H****eart****A****ssociation Task Force on Practice Guidelines and the****E****uropean****S****ociety of****C****ardiology****C****ommittee for Practice Guidelines and Policy Conferences (Committee to Develop Guidelines for the Management of Patients With Atrial fibrillation) developed in collaboration with the****N****orth****A****merican****S****ociety of****P****acing and****E****lectrophysiology****.***J Am Coll Cardiol*2001,**38**(4):1231–1265. 10.1016/S0735-1097(01)01587-X - 3.
Petrutiu S, Ng J, Nijm G, Al-Angari H, Swiryn S, Sahakian A:

**Atrial fibrillation and waveform characterization.***IEEE Eng Med Biol Mag*2006,**25**(6):24–30. - 4.
Hylek E, Go A, Chang Y, Jensvold N, Henault L, Selby J, Singer D:

**Effect of intensity of oral anticoagulation on stroke severity and mortality in atrial fibrillation.***N Engl J Med*2003,**349**(11):1019–1026. 10.1056/NEJMoa022913 - 5.
Miyasaka Y, Barnes M, Gersh B, Cha S, Bailey K, Abhayaratna W, Seward J, Tsang T:

**Secular trends in incidence of atrial fibrillation in Olmsted County,****M****innesota, 1980 to 2000, and implications on the projections for future prevalence****.***Circulation*2006,**114**(2):119–125. 10.1161/CIRCULATIONAHA.105.595140 - 6.
Gami A, Pressman G, Caples S, Kanagala R, Gard J, Davison D, Malouf J, Ammash N, Friedman P, Somers V:

**Association of atrial fibrillation and obstructive sleep apnea.***Circulation*2004,**110**(4):364–367. 10.1161/01.CIR.0000136587.68725.8E - 7.
Mukamal K, Tolstrup J, Friberg J, Jensen G, Grønbæk M:

**Alcohol consumption and risk of atrial fibrillation in men and women the Copenhagen City Heart Study.***Circulation*2005,**112**(12):1736–1742. 10.1161/CIRCULATIONAHA.105.547844 - 8.
Moody G, Mark R:

**A new method for detecting atrial fibrillation using****RR****intervals****.**In*Computers in Cardiology 1983, Volume 10*. Aachen: IEEE Computer Society Press; 1983:227–230. - 9.
Artis S, Mark R, Moody G:

**Detection of atrial fibrillation using artificial neural networks.**In*Computers in Cardiology 1991*. Venice: IEEE Computer Society Press; 1991:173–176. - 10.
Kikillus N, Hammer G, Lentz N, Stockwald F, Bolz A:

**Three different algorithms for identifying patients suffering from atrial fibrillation during atrial fibrillation free phases of the****ECG****.**In*Computers in Cardiology 2007*. Durham: IEEE; 2007:801–804. - 11.
Dash S, Chon K, Lu S, Raeder E:

**Automatic real time detection of atrial fibrillation.***Ann Biomed Eng*2009,**37**(9):1701–1709. 10.1007/s10439-009-9740-z - 12.
Slocum J, Sahakian A, Swiryn S:

**Diagnosis of atrial fibrillation from surface electrocardiograms based on computer-detected atrial activity.***J Electrocardiol*1992,**25:**1–8. - 13.
Tateno K, Glass L:

**Automatic detection of atrial fibrillation using the coefficient of variation and density histograms of****RR****and****Δ****RR****intervals****.***Med Biol Eng Comput*2001,**39**(6):664–671. 10.1007/BF02345439 - 14.
Schmidt R, Harris M, Novac D, Perkhun M:

**Atrial fibrillation detection.**[WO Patent 2,008,007,236. Jan 18, 2008] - 15.
Couceiro R, Carvalho P, Henriques J, Antunes M, Harris M, Habetha J:

**Detection of atrial fibrillation using model-based****ECG****analysis****.**In*19th International Conference on Pattern Recognition, 2008*. Tampa: IEEE; 2008:1–5. - 16.
Babaeizadeh S, Gregg R, Helfenbein E, Lindauer J, Zhou S:

**Improvements in atrial fibrillation detection for real-time monitoring.***J Electrocardiol*2009,**42**(6):522–526. 10.1016/j.jelectrocard.2009.06.006 - 17.
Park J, Lee S, Jeon M:

**Atrial fibrillation detection by heart rate variability in Poincare plot.***Biomed Eng Online*2009,**8**(38):1–12. - 18.
Yaghouby F, Ayatollahi A, Bahramali R, Yaghouby M, Alavi A:

**Towards automatic detection of atrial fibrillation: A hybrid computational approach.***Comput Biol Med*2010,**40**(11):919–930. - 19.
Larburu N, Lopetegi T, Romero I:

**Comparative study of algorithms for Atrial Fibrillation detection.**In*Computing in Cardiology 2011*. Hangzhou: IEEE; 2011:265–268. - 20.
Parvaresh S, Ayatollahi A:

**Automatic atrial fibrillation detection using autoregressive modeling.**In*2011 International Conference on Biomedical Engineering and Technology*. Kuala Lumpur: APCBEES; 2011:4–5. - 21.
Lian J, Wang L, Muessig D:

**A simple method to detect atrial fibrillation using****RR****intervals****.***Am J Cardiol*2011,**107**(10):1494–1497. 10.1016/j.amjcard.2011.01.028 - 22.
Lake D, Moorman J:

**Accurate estimation of entropy in very short physiological time series: the problem of atrial fibrillation detection in implanted ventricular devices.***Am J Physiol Heart Circ Physiol*2011,**300:**H319-H325. 10.1152/ajpheart.00561.2010 - 23.
Huang C, Ye S, Chen H, Li D, He F, Tu Y:

**A novel method for detection of the transition between atrial fibrillation and sinus rhythm.***IEEE Trans Biomed Eng*2011,**58**(4):1113–1119. - 24.
Yaghouby F, Ayatollahi A, Bahramali R, Yaghouby M:

**Robust genetic programming-based detection of atrial fibrillation using RR intervals.***Expert Syst*2012,**29**(2):183–199. - 25.
Lee J, Reyes B, McManus D, Mathias O, Chon K:

**Atrial fibrillation detection using an Iphone 4S.***IEEE Trans Biomed Eng*2013,**60:**203–206. - 26.
Shannon C:

**A mathematical theory of communication.***Bell Syst Tech J*1948,**27**(3):379–423. 10.1002/j.1538-7305.1948.tb01338.x - 27.
**The Physionet ECG database**[Online]. Available: http://physionet.org/physiobank/database/ - 28.
Larburu N:

**Comparative study of algorithms for atrial fibrillation detection.***Master’s thesis*Public University of Navarra, Pamplona, Spain, 2011, [Online]. Available: http://academica-e.unavarra.es/handle/2454/4136 - 29.
Cerutti S, Mainardi L, Porta A, Bianchi A:

**Analysis of the dynamics of RR interval series for the detection of atrial fibrillation episodes.**In*Computers in Cardiology 1997*. Lund: IEEE Computer Society Press; 1997:77–80. - 30.
Voss A, Kurths J, Kleiner H, Witt A, Wessel N, Saparin P, Osterziel K, Schurath R, Dietz R:

**The application of methods of non-linear dynamics for the improved and predictive recognition of patients threatened by sudden cardiac death.***Cardiovasc Res*1996,**31**(3):419–433. 10.1016/0008-6363(96)00008-9 - 31.
Pan J, Tompkins W:

**A real-time QRS detection algorithm.***IEEE Trans Biomed Eng*1985,**32**(3):230–236.

## Acknowledgements

This work was supported in part by the National Basic Research Program 973 (2010CB732606), the Guangdong Innovation Research Team Fund for Low-Cost Healthcare Technologies in China, the External Cooperation Program of the Chinese Academy of Sciences (GJHZ1212), the Key Lab for Health Informatics of Chinese Academy of Sciences, the Enhancing Program of Key Laboratories of Shenzhen City (ZDSY20120617113021359), and the Supportive Program of “Peacock Program” of Shenzhen City for GIRTF-LCHT Team.

## Author information

### Affiliations

### Corresponding authors

## Additional information

### Competing interests

The authors declare that they have no competing interests.

### Authors’ contributions

XZ developed the algorithm and drafted the manuscript. HD collected the clinical ECG data and revised the manuscript. BU, EP and YZ provided suggestions and comments as well as much help in revising the manuscript. All authors read and approved the final manuscript.

## Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

## Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

## About this article

### Cite this article

Zhou, X., Ding, H., Ung, B. *et al.* Automatic online detection of atrial fibrillation based on symbolic dynamics and Shannon entropy.
*BioMed Eng OnLine* **13, **18 (2014). https://doi.org/10.1186/1475-925X-13-18

Received:

Accepted:

Published:

### Keywords

- ECG
- RR interval
- Atrial fibrillation
- Nonlinear filter
- Integer filter
- Symbolic dynamics
- Shannon entropy