### Data preprocessing

In both databases, lead _{
V 1}was chosen for the analysis because previous works have shown that AA is dominant in this lead [20]. This signal was preprocessed using forward/backward highpass filtering with 0.5 Hz cut-off frequency to remove baseline wander, next, lowpass filtering with 70 Hz cut-off frequency was applied to reduce high frequency noise and, finally, notch filtering at 50 Hz was applied to remove powerline interference [21]. On the other hand, reliable analysis of the AA from surface ECG recordings requires that ventricular activity has first been cancelled [10]. Although a variety of different techniques exist for this purpose, an adaptive singular value QRST cancellation template was applied [22].

### Wavelet Transform

From a mathematical perspective, the *wavelet* is a smooth and quickly vanishing oscillating function with good localization in both time and frequency. A *wavelet family*
_{
Ψ a,b
}(*t*) is the set of elementary functions generated by dilations and translations of a unique admissible *mother wavelet* *Ψ*(*t*) [23], i.e.

{\Psi}_{a,b}\left(t\right)=|a{|}^{-\frac{1}{2}}\Psi \left(\frac{t-b}{a}\right),

(1)

where *a* *b*∈*ℜ* *a*≠0 are the scale and translation parameters, respectively, and *t* is the time. As *a* increases, the wavelet becomes narrower. Thus, one have a unique analytic pattern and its replications at different scales and with variable time localization.

The Discrete Wavelet Transform (DWT) is the sampled version of the Continuous Wavelet Transform (CWT) in a dyadic grid employing orthonormal wavelet basis functions [23]. Hence, the parameters *a* and *b* are sampled using a logarithmic discretization of the *a* scale (*a*=^{2m}) and this, in turn, is linked to the steps size taken between the *b* locations. To link *b* to *a*, each location *b*, which is proportional to the *a* scale, is moved in discrete steps (*b*=*n*·^{2m}). Thus, the discretized mother wavelet is

{\Psi}_{m,n}\left(k\right)={2}^{-\frac{m}{2}}\Psi ({2}^{-m}k-n),

(2)

being *m* and *n* the new scale and translation discrete parameters, respectively, and *k* the discrete time instant. Hence, the wavelet decomposition of the AA signal, _{
x
AA
}(*k*), can be defined as its correlation with the chosen wavelet family _{
Ψ m,n
}(*k*) for each *m* and *n*, i.e.

{C}_{m}\left(n\right)=\sum _{k}{x}_{\mathrm{AA}}\left(k\right)\xb7{\Psi}_{m,n}\left(k\right).

(3)

The decomposition results in wavelet coefficients *C*, which depend on scale and position. In fact, a vector of wavelet coefficients _{
C
m
} is obtained for each analyzed discrete scale *m*. The information stored in the wavelet coefficients vectors is not repeated elsewhere and allows the complete regeneration of the original signal without redundancy, because the used discretization of the mother wavelet employs orthonormal basis functions [23].

### Central Tendency Measure

Chaotic equations are sometimes used to generate graphs. Thus, the graph *x*(*n* + 2)−*x*(*n* + 1) versus *x*(*n* + 1)−*x*(*n*) produces a scatter plot of first differences of the data, *x*(*n*) being the value of a time series at time *n*. This plot, centered around the origin, gives a graphical representation of the variability degree in the time series and is useful in modeling biological systems, such as hemodynamics and heart rate variability [14]. With this approach, rather than defining a time series as chaotic or not chaotic, the degree of variability or chaos is evaluated.

The CTM is computed from a first differences scatter plot of the data selecting a circular region of radius *ρ* around the origin, counting the number of points that fall within the radius, and dividing by the total number of points. A low CTM value indicates a large amount of dispersion and a high value indicates concentration near the centre. Given *N* data points from a time series, *N*−2 would be the total number of points in the scatter plot. Then, the CTM can be computed as [14]

\mathrm{CTM}=\frac{\sum _{i=1}^{N-2}\delta \left(i\right)}{N-2},

(4)

where

\delta \left(i\right)=\left\{\begin{array}{ll}1,& \text{if}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\sqrt{{\left(x(i+2)-x(i+1)\right)}^{2}+{\left(x(i+1)-x\left(i\right)\right)}^{2}}<\rho ,\\ 0,& \text{otherwise}.\end{array}\right.

(5)

In the present work, *f* waves organization was estimated by computing the CTM from the first differences scatter plot of the wavelet coefficients vector for the scale containing the typical AF frequency range. For this purpose, a seven-level wavelet decomposition was applied to the AA signal. The seventh discrete scale resulted in a good match to the AA frequency band of interest [24], i.e. 4–8 Hz. In fact, the dominant atrial frequency (DAF), i.e., the highest amplitude frequency, was within this range for all the analyzed patients. Furthermore, previous works have proved that values for this frequency below 4 Hz or above 8Hz are unusual [25].

A good matching between wavelet scale and AF frequency range is relevant to preserve the essential behavior of the arrhythmia. In this respect, the DAF has provided to be a concomitant indicator of atrial refractroriness [26], thus revealing very clinically interesting information about AF progression under different therapies [16]. Remark that for sampling rates different from 1kHz, the seventh wavelet scale could not match properly to the typical AF frequency range and, consequently, the proposed method performance could be notably reduced. Thus, resampling up to 1kHz of new input ECG recordings has to be considered as a previous step before their processing under the proposed method.

### Optimal parameters selection

The proposed method performance depends on the chosen wavelet function and the radius *ρ* selected for computation of CTM. However, there are no established rules for the choice of their optimal values [2]. To this respect, a cautious and still exploratory approach is to test different wavelet families and then to compare their efficiency in the specific problem [27]. Unfortunately, on each electrocardiographic application where the WT has been used, a different wavelet family was chosen [4]. In this study, several orthogonal wavelet families were tested, because only in an orthogonal basis any signal can be uniquely decomposed and the decomposition can be inverted without loosing information [23]. Thus, all the different functions from Haar, Daubechies, Coiflet, Biorthogonal, Reverse Biorthogonal and Symlet wavelet families were tested.

On the other hand, an approach similar to the developed by Hornero *et al*[28] was used to select the optimal *ρ* value. Thus, the CTM was computed for radius values of 0.1, 0.2, …, 10 times the standard deviation of the analyzed data, i.e., the wavelet coefficients vector associated with the seventh discrete scale. Normalizing *ρ* in this way provides translation and scale invariance, in the sense that CTM remains unchanged under uniform process magnification, reduction or constant shift to higher o lower values. To this respect, note that the wavelet vector interesting information is carried by the coefficients temporal variations, rather than its absolute amplitude variations. In fact, the amplitude is only dependent on the similarity between the AA under study and the selected mother wavelet. Hence, amplitude gives no useful information about *f* waves variability.

### Statistical analysis

As the considered number of episodes for both analyzed databases was not notably large, the proposed method was validated by a resubstitution approach, i.e., it was learned from all the available data and then tested on the same set of data. The CTM value, for a specific *ρ* value and wavelet function, providing maximum discrimination between groups, that is, the optimum threshold, was obtained by means of a receiver operating characteristic (ROC) curve. The ROC curve is a graphical representation of the trade-offs between sensitivity and specificity when discrimination threshold is varied. For PAF patients, sensitivity (i.e. the true positive rate) was considered as the proportion of non-terminating PAF episodes correctly discerned, whereas specificity (i.e. the true negative rate) represented the percentage of terminating episodes properly identified. Similarly, for the ECV outcome analysis, sensitivity was the proportion of patients relapsing to AF appropriately classified and specificity was the percentage of patients resulting in NSR after ECV accurately predicted. The total number of PAF patients and ECVs precisely classified was considered as the diagnostic accuracy corresponding to each prediction. The CTM value providing the highest accuracy was selected as optimum threshold. The area under the ROC curve (AROC) was also computed. AROC is a single number which represents a summary of performance. For a perfect test the area is 1, while an AROC of 0.5 represents a worthless test.

To verify representativeness of the used databases and generality of the results obtained by the resubstitution approach, a leave-one-out cross-validation (LOOCV) scheme was also used. In this procedure, the method was trained and tested a number of times equal to the number of patients in the database. In each iteration, all the data, except for a single observation, were used for training and the model was tested on that single observation. In addition, significant differences between terminating and non-terminating PAF episodes and between patients who resulted in NSR and relapsed to AF were evaluated making use of a Student’s *t*-test. All the groups had a normal and homoscedastic distribution as the Shapiro–Wilk and Levene tests proved, respectively. A two-tailed value of statistical significance *p*<0.05 was considered statistically significant.