Real time QRS complex detection using DFA and regular grammar

Background The sequence of Q, R, and S peaks (QRS) complex detection is a crucial procedure in electrocardiogram (ECG) processing and analysis. We propose a novel approach for QRS complex detection based on the deterministic finite automata with the addition of some constraints. This paper confirms that regular grammar is useful for extracting QRS complexes and interpreting normalized ECG signals. A QRS is assimilated to a pair of adjacent peaks which meet certain criteria of standard deviation and duration. Results The proposed method was applied on several kinds of ECG signals issued from the standard MIT-BIH arrhythmia database. A total of 48 signals were used. For an input signal, several parameters were determined, such as QRS durations, RR distances, and the peaks’ amplitudes. σRR and σQRS parameters were added to quantify the regularity of RR distances and QRS durations, respectively. The sensitivity rate of the suggested method was 99.74% and the specificity rate was 99.86%. Moreover, the sensitivity and the specificity rates variations according to the Signal-to-Noise Ratio were performed. Conclusions Regular grammar with the addition of some constraints and deterministic automata proved functional for ECG signals diagnosis. Compared to statistical methods, the use of grammar provides satisfactory and competitive results and indices that are comparable to or even better than those cited in the literature.

the characteristics of the ECG signal. The algorithm detected the QRS complex and the T wave, and then the P wave. Gramatikov et al. [27] focused on the morphology of the QRS complex and used the Morlet wavelet transform for the analysis of ECG recordings in patients with left or right coronary stenosis. The detection of the QRS complexes can be performed by a simple thresholding of the signal in terms of amplitude as the R peaks are generally larger than the other waves. The amplitude of the T wave is sometimes similar to that of the R peak, which can cause errors in the final result and the detection rate.
Several QRS-complex-research algorithms based extensively on the proportionately high amount of QRS energy [28] were used. Most algorithms were based on the application of neural networks, hidden Markov model, syntactic methods, etc. [29][30][31][32][33][34][35][36][37][38][39][40]. More details on the QRS complex detection techniques, comparing their effectiveness and their calculation complexities, can be found in the presence of artifacts. Generally, the QRS detection algorithms are based on one of the temporal derivatives of methods, wavelets, filter banks and mathematical morphology [41][42][43][44][45]. These approaches are very effective and have a high accuracy rate that exceeds 99%. Kohler et al. [46] established a detailed study summarizing the different techniques for QRS detection. The discussed methods were sorted by categories and their performance was compared. Dotsinsky et al. [47] developed a heuristic algorithm applied on two channel recordings from AHA and MIT-BIH Arrhythmia Database.
Few approaches were based on the grammatical formalism [48]. Gao et al. [49] affirmed that the use of grammar, compared to statistical methods, provides more flexibility in applications. The syntactic approaches can efficiently represent the signal structures and consequently facilitate data retrieval by means of their structures. The main advantage of these methods is that the representation is concise. The syntactic approaches can better represent the ECG structures and therefore facilitate information recovery. As grammar clearly represents hierarchical structures using non-terminal and terminal nodes, the input data seem to be a structured scene having a hierarchical order. Moreover, the syntactic approaches can describe a large set of complex patterns utilizing small sets of simple primitives and grammatical rules. Kokai et al. [50] used grammar for QRS complex classification and distinction between QRS and non-QRS patterns. Panagiotis et al. [51] applied a syntactic method for ECG recognition and the measurement of the associated parameters. However, those methods were very sensitive to noise. Several morphologies generated erroneous peaks and thus hindered the grammatical description of the signal. The authors also did not use the grammar formalism during the extraction phase of the peaks. Peak recognition was performed using another method independently of grammar. Hamdi et al. [52] presented a context-free grammar to describe an entire ECG signal. However, context-free grammar could not represent all the different kinds of ECG signals. The author focused only on normal cases and the method was applied on signal of short durations. Furthermore, the author compared his method with the old techniques of Holsinger [53] and Fraden and Neuman [54]. Hanieh et al. [55] proposed a method to detect atrial arrhythmia. The suggested method modelled arrhythmia by a regular expression. The input signal was transformed into a character string in which each character represented an ECG signal component. Different experiments on MIT-BIH arrhythmia database show the efficiency of the method and the detection algorithm compared to conventional approaches. However, this algorithm has a sensitivity rate that does not exceed 96.3%.
The present work is based on learning automata to recognize rest phases, negative and positive peaks. The QRS complex was described by automation devices. Several parameters were determined, such as the number of QRS complexes, the QRS durations, the RR distances and the amplitudes of peaks.
The remainder of the paper is organized as follows. "Methods" section explains the material and the proposed method. "Results and discussion" section presents and discusses the obtained results, and a comparative study in terms of sensitivity rates was performed on several statistical methods. "Conclusion" section concludes the paper.

Method overview
The suggested method recognizes the QRS complex in an ECG waveform based on grammar formalism. The grammatical formalism can efficiently represent the ECG and consequently facilitate the retrieval of signal features. The main advantage of this method allows a representation of several QRS in a concise way. It can better represent the QRS complex structures and therefore facilitate the recovery of several parameters. As regular grammar clearly represents hierarchical structures using a set of symbols and regular expressions, he ECG input seems to be a structured scene having a hierarchical order. Furthermore, the proposed method can describe a large set of QRS complexes using sets of simple primitives, grammatical rules, and deterministic finite automata. Figure 1 summarizes all the steps. The input signal amplitude is filtered, centralized and normalized. Then, the lexical analysis step recognizes tokens including positive and negative peaks. A QRS complex is assimilated to a pair of adjacent peaks that satisfy certain criteria of standard deviation. It is described using deterministic automata and regular expressions. Finally, the analyzer computes the RR distances, the complex-QRS durations, the standard deviation of RR distances, the standard deviation of QRS durations, and generates a report according to sampling frequency, time and amplitude values.
An ECG signal S[n] is actually too noisy and contains many artifacts, hence the need for preprocessing phases to reduce noise and facilitate lexical analysis afterwards. The band-pass filter reduces the influence of muscle noise, 60 Hz interference, base line wander, and T wave interference. The desirable pass-band to maximize the QRS energy is approximately 5-15 Hz [30].
The following mathematical equations describe the various steps of the preprocessing phase: band-pass filtering, signal centering, and normalization of signal amplitude. An example is displayed later in Fig. 2 where a normalized and centered ECG signal representing a tachycardia is filtered by a band-Pass filter.
Step 1: Band-pass filtering of the signal S[n] where H[n] is a band-pass filter and 5-15 Hz is the cutoff frequency.
Step 2: Signal centering:  The m parameter is the signal length.
Step 3: Amplitude signal normalization: Figure 2 presents an example of a real ECG signal before and after the filtering process. The input signal was issued from one patient with tachycardia. Preprocessing did eliminate the artifacts and centralize the signal.

Grammatical analysis of the signal
The output signal amplitude is processed in the form of a value sequence belonging to the bounded interval [−1, 1]. The normalized amplitude is described as a sequence of almost nil, negative and positive values; i.e., the signal is assimilated to a language where the QRS complex represents a suite of lexemes.
The alphabet ∑ = {0,1,2,3,4,5,6,7,8,9, -,.} contains all symbols that can represent a normalized amplitude belonging to the bounded interval [−1, 1]. Then, the regular expressions make the lexical analysis of the signal. In fact, the deterministic automata and the regular expressions represent the rest phase, the positive peak and the negative peak, and make up the QRS complex with the addition of some constraints of standard deviation.
Mathematically, a positive or negative peak must show a higher standard deviation σ that is much greater than a threshold σ1.
Given the sampling frequency Fe, a peak, a wave or a rest phase are made of a sequence of k normalized simples {a 1 , a 2 ,…, a k } having an average amplitude ā The calculation of the standard deviation σ and the duration Δ are as follows: Figure 3 plots the standard deviations of several Q, R and S peaks as well as P and T waves. Figure 3 confirms that both R and S peaks show very important standard deviations that are higher than 0.2. The Q peak has standard deviations that are higher than 0.1 while both P and T waves have very low values of standard deviations below 0.05. According to Fig. 3, σ1 = 0.1. Starting from this value, we can distinguish between the peaks and the waves. Actually, a QRS complex is assimilated to a pair of adjacent peaks that satisfy the criteria of standard deviation. Figure 4 plots the durations of several Q, R and S peaks as well as P and T waves for several ECG signal recordings. This confirms that the durations of these peaks are small and shorter than 0.1 s while both P and T waves' durations are longer than 0.1 s.
Based on Fig. 4, Δ1 = 0.1 s is defined as a threshold. Starting from this value, we can distinguish between the peaks and the waves. Thus, a QRS complex is assimilated to a pair of adjacent peaks that satisfy the criteria of standard deviation and duration. A Deterministic Finite Automaton (DFA) on an alphabet Σ is a quadruple (Q, δ, q0, F) where: • q0 is the start state.
• Q is a finite set of states.
• F is a part of Q called final states.
• δ is a transition function Q × Σ in Q.
The DFA consists of a finite set of states (often denoted Q), a finite set Σ of symbols (alphabet), a transition function that takes as argument a state and a symbol and returns a state (often denoted δ), a start state often denoted q0, and a set of final or accepting states (often denoted F). We have q0 ∈ Q and F ⊆ Q. Grammatically, the symbol '∊' means an empty word having zero length, '*' means 'zero or more times' , '+' means 'one or more times' , and the symbol '?' means 'zero or one time' .
The following regular expression and the deterministic automaton (Fig. 5) The following regular expression and the deterministic automaton (Fig. 6) describe a normalized negative peak: The following regular expression and the deterministic automaton (Fig. 7) describe a normalized and short rest phase separating the peaks: The start sate q0 = {0}. The finite set of states Q = {0, 1, 2, 3, 4, 5}.
The final state F = {3}. The transition functions are: The regular expression below and Fig. 8 describe a normalized QRS complex. Q is the first peak pointing down, which is not always visible on the plot. The R peak is the second one. It is of high amplitude and directed upward. The S peak is the last one, and it is directed downward.  Grammatically, QRS is assimilated to a suite of negative and positive peaks which may be separated by a very short resting phase. It should be noted that the above regular expression and the deterministic automaton presume that the Q peaks and the rest phases may be absent.

Results
In this section, the method described above was applied on several real ECG signals representing different patients and issued from the standard MIT-BIH arrhythmia database. For all the input signals, the QRS complexes were detected, the Q, R and S peaks were separated and the RR distances were measured. The RR distance refers to the duration between two successive R peaks. Furthermore, a comparative study with several methods [30,37,38,44,[56][57][58][59][60][61][62][63][64] was performed with regard to QRS complex detection. Table 1 shows an application on several real ECG signals to extract the QRS complex.  The average sensitivity (Se) rate of the proposed method was 99.74% and the average specificity (Sp) rate was 99.86%. The average False Detection Rate (FDR) rate and the average False Negative Rate (FNR) rate were 0.14 and 0.26% respectively Page 13 of 20 Hamdi et al. BioMed Eng OnLine (2017) 16:31 For each recording, the standard deviation of the RR distances denoted σRR and the standard deviation of the QRS durations denoted σQRS were computed. It should be noted that the standard deviation parameters are a sign of relationship between the obtained values and the average value where the n parameter is the total number of RR distances: σRR and σQRS parameters were added to quantify the regularity of RR distances and QRS durations, respectively. A short σRR meant that all the RR distances were stable. A short σQRS meant that all the QRS durations were also stable.
In order to validate the proposed method, we used several kinds of ECG signals issued from the MIT-BIH arrhythmia database. These signals had a 360 Hz sampling frequency, a 200 gain and a 1024 mV base. For each input signal, several parameters were determined, such as the number of QRS, the RR distances, the QRS durations, the standard deviation of RR distances, the standard deviation of QRS durations, and the peaks amplitudes (Table 1).
According to these results, a σRR lower than 0.1 meant that all the RR distances were regular. However, a σRR higher than 0.1 meant that the obtained values of the RR distances were irregular. Similarly, a short σQRS lower than 0.1 meant that all the QRS durations were regular and a high σQRS more than 0.1 implied that the obtained values were irregular. Figures 9 and 10 show the result obtained from a portion of an ECG representing an irregular beat rate. The various indicators of the signal (RR distance; QRS complex; Q, R and S amplitudes) are displayed. The average RR and QRS values are 0.84 and 0.03 s respectively. However, the RR distances are irregular. In fact, the standard deviation of the RR distances is σRR = 0.15. This high value proves that the RR distance is not stable.
The QRS complexes have regular durations of less than 0.1 s. Indeed, the standard deviation of the QRS durations is σQRS = 0.01. In this case, this low value indicates that the QRS duration is stable. Figures 11 and 12 show the results obtained from an ECG portion representing a regular beat rate. The average RR and QRS values are 0.46 and 0.02 s respectively. The RR    16:31 distances are regular and the standard deviation of the RR distances is σRR = 0.00. This low value shows that the RR distance is stable.
The QRS complexes have regular durations of less than 0.1 s, the standard deviation of the QRS durations being σQRS = 0.00. This low value indicates that the QRS duration is stable.

Noise sensitivity
In this section, we examined the present method's sensitivity to noise by adding a different noise value to the ECG recordings. Table 2 shows the variation of sensitivity and specificity rates according to Signal-to-Noise Ratio (SNR).
For SNR values greater than 40 dB, the method provided high sensitivity values that exceeded 99%. For SNR values greater than 30 dB, the method yielded sensitivity values which exceeded 97%. For the SNR values that were lower than 24 dB, the sensitivity value decreased to 90%. Figure 13 shows the variation sensitivity rate depending on the SNR for different ECG recordings issued from the MIT-BIH database (100, 101, 102, 103 and 105 records). For the SNR values lower than 20 dB, the method provided sensitivity rates lower than 50%.
Generally, sensitivity becomes increasingly important where SNR values are greater than 30 dB. For SNR values exceeding 30 dB, the method gave sensitivity rates which exceeded 97%. When SNR exceeded 40 dB, the method provided high sensitivity values that reached 99%.

Comparison of performance
In order to compare the detection algorithm with other works in the literature, the quality performance detection was compared with several algorithms tested and validated on the MIT-BIH data base. Those algorithms varied and each one was based on an appropriate technique. Table 3 shows a comparative study with several methods [30,37,38,44,[56][57][58][59][60][61][62][63][64] applied on the same MIT-BIH database in terms of sensitivity rates.
Based on the results presented in Table 3, all the above mentioned algorithms have good QRS complex detection capability with a sensitivity that exceeds 99%. Similarly, the  proposed method provided satisfactory and competitive results and could be considered for QRS complex detection in the ECG signal.

Discussion
In summary, a few approaches based on grammatical formalism for ECG signal processing and controls were used. The proposed method confirmed that regular grammar domains could be extended to be applied for negative and positive peaks recognition. The QRS complex is assimilated to a pair of adjacent peaks which satisfy certain criteria of standard deviation and duration. Various parameters were determined, such as the  number of QRS complex, the QRS durations, the RR distances, and the standard deviations σRR as well σQRS. Compared with usual methods, the proposed approach affirmed that the use of grammar can represent the QRS structures efficiently. The syntactic approach can describe different types of ECG signals issued from the standard MIT-BIH arrhythmia database. The average sensitivity (Se) rate of the proposed method was 99.74% and the average specificity (Sp) rate was 99.86%. The average False Detection Rate (FDR) rate and the average False Negative Rate (FNR) rate were 0.26 and 0.14% respectively. These results are interesting and can be further improved by enhancing preprocessing.
We used σRR and σQRS of the RR and QRS distances regularity. We defined a threshold where these two variables would be irregular.
In order to study noise sensitivity, the method was applied on different ECG recordings for different SNR values. The variation of the sensitivity and the specificity rates according to SNR was performed. When the SNR values were greater than 40 dB, the method gave high sensitivity values which exceeded 99%. When the SNR values were lower than 24 dB, the sensitivity value decreased to 90%.

Conclusion
In this paper, the DFA proved useful for QRS complex recognition and ECG signal interpretation. A QRS complex is assimilated to a pair of adjacent peaks that satisfy certain criteria of standard deviation. This method recognizes the QRS complex in an ECG waveform. The QRS complex is described using deterministic automata and regular expressions. For an input signal, all the various indicators such as the complex-QRS durations, the RR distances, the σRR and the σQRS were deduced. The σRR and σQRS parameters were added to quantify the regularity of the RR distances and QRS durations, respectively. This work is aimed at assisting medical diagnosis and providing clinical decision aid for ECG analysis.
Currently, we are working on improving preprocessing and we will propose other grammatical rules to represent distinct pathological cases. We are also working on a hybrid method based on grammar and statistics to ensure a good performance in all cases. The σRR and σQRS variables will be better analyzed on a large scale population in order to provide a fine classification of pathologies. Authors' contributions SH carried out the studies, participated in the sequence alignment and drafted the manuscript. ABA participated in the design of the study and performed the statistical analysis. MHB conceived of the study, and participated in its design and coordination. All authors read and approved the final manuscript.