 Research
 Open Access
A knowledge discovery methodology from EEG data for cyclic alternating pattern detection
 Fátima Machado^{1},
 Francisco Sales^{2},
 Clara Santos^{3},
 António Dourado^{1} and
 C. A. Teixeira^{1}Email authorView ORCID ID profile
https://doi.org/10.1186/s129380180616z
© The Author(s) 2018
 Received: 8 October 2018
 Accepted: 11 December 2018
 Published: 18 December 2018
Abstract
Background
Detection and quantification of cyclic alternating patterns (CAP) components has the potential to serve as a disease biomarker. Few methods exist to discriminate all the different CAP components, they do not present appropriate sensitivities, and often they are evaluated based on accuracy (AC) that is not an appropriate measure for imbalanced datasets.
Methods
We describe a knowledge discovery methodology in data (KDD) aiming the development of automatic CAP scoring approaches. Automatic CAP scoring was faced from two perspectives: the binary distinction between Aphases and Bphases, and also for multiclass classification of the different CAP components. The most important KDD stages are: extraction of 55 features, feature ranking/transformation, and classification. Classification is performed by (i) support vector machine (SVM), (ii) knearest neighbors (kNN), and (iii) discriminant analysis. We report the weighted accuracy (WAC) that accounts for class imbalance.
Results
The study includes 30 subjects from the CAP Sleep Database of Physionet. The best alternative for the discrimination of the different Aphase subtypes involved feature ranking by the minimum redundancy maximum relevance algorithm (mRMR) and classification by SVM, with a WAC of 51%. Concerning the binary discrimination between Aphases and Bphases, kNN with mRMR ranking achieved the best WAC of 80%.
Conclusions
We describe a KDD that, to the best of our knowledge, was for the first time applied to CAP scoring. In particular, the fully discrimination of the three different Aphases subtypes is a new perspective, since past works tried multiclass approaches but based on grouping of different subtypes. We also considered the weighted accuracy, in addition to simple accuracy, resulting in a more trustworthy performance assessment. Globally, better subtype sensitivities than other published approaches were achieved.
Keywords
 Cyclic alternating pattern
 Aphase detection
 EEG processing
 Knowledge discovery in data
Background
A cyclic alternating pattern (CAP) sequence is composed by a succession of CAP cycles, each one composed by two types of phases: The Aphases and the Bphases. The Aphases in lighter stages of comas are closely related to hyperventilation, restless, increase of pulse rate, and can be associated with increase in muscle activity. In contrast, autonomic and muscular activities are attenuated during the Bphases. Investigations discovered also that CAP is a physiologic component of nonrapid eye movement (NREM) sleep stages and do not occur, under normal conditions, in rapid eye movement (REM) [1, 2]. CAP sequences tend to appear associated with some dynamic sleep events like a change in sleep stages, falling asleep, or arousal without awaking [3]. However, some pathological conditions generate CAP sequences in REM, therefore it has potential to be used as a prognostic element of such diseases. Higher CAP rates are present in some types of insomniac patients [4], epilepsies [5], among other disorders. Thus, CAP quantification could be used as diseases biomarker, for example for seizure prediction [6].

Subtype A1: It is composed by highvoltage slow waves. This subtype can be classified by an increase in amplitude of at least a third of the normal background activity. The synchronized EEG pattern must occupy more than 80% of the epoch to be classified as A1. The associated waveforms with this subtype are bursts, and Kcomplex sequences.

Subtype A2: This subtype has elements from subtypes A1 and A3, and, therefore, it is composed by a mixture of fast and slow rhythms. The elements from the subtype A1 must occupy between 50 and 80% of the length of an entire Aphase. In this subtype, the typical waveforms are polyphasic bursts.

Subtype A3: Rapid low voltage rhythms prevail in this subtype, and there is an increase in frequency compared to the background. Kalpha, EEG arousals, and polyphasic bursts are the EEG waveforms associated with this subtype.
To mention also that the Aphases subtypes present different signatures in the conventional EEG frequency bands: delta (\(\delta = [1,4]\) Hz), theta (θ = [4, 8] Hz), alpha (α = [8, 13] Hz), sigma (σ = [13, 16] Hz) and beta (β = [16, 35] Hz) [8].
Different algorithms have been proposed in the last years for automatic scoring of Aphases and Bphases. Barcaro et al. proposed a detection method for Aphases [9–11], more precisely in [10, 11] these authors considered only three classes: B, A1 and A2/A3 phases. The association of subtypes A2 and A3 was due to their similarity, as reported in the literature. The rationale of the methodology was the same in both papers [10, 11]: an amplitude feature was computed for each of the conventional frequency bands and then compared with a threshold. When one of the five features crossed the threshold a tentative Aphase detection was performed. The type assigned to the Aphases detected (A1 or A2/A3) depended on which features crossed the threshold. The best reported results were the ones proposed with the methodology presented in [11], which had a correctness of 83.5% for the binary discrimination between Aphases and Bphases, and 73.7% for the multiclass distinction between A1, A2/A3 subtypes.
A model based on feedback loops that simulates the EEG activity was proposed by [12]. This methodology was used to detect Kcomplexes and vertex waves and detected CAP sequences by considering four rhythm generators, which corresponded to different frequency bands (δ, α, θ and σ). The results obtained with this algorithm pointed for a mean sensitivity of 90% [12].
A method based on wavelets and genetic algorithm was proposed in [13] to identify the Aphases independently of the subtype. Basically, the signal was decomposed into five signals corresponding to the different frequency bands, using the discrete wavelet transform. A feature based on amplitude was computed for each decomposed signal. The Aphases were detected comparing the features to a threshold. The accuracy reported was 79%.
Stam and van Dijk [14] published a study that defined the synchronization likelihood (SL) between two EEG channels and proposed a method to calculate this value. This measurement was applied for A1 detection [15], where the signal was filtered between 0.25 and 2.5 Hz. They concluded that during sleep the levels of SL in this frequency range had significant fluctuations with CAP occurrence. This feature presented a good performance in distinguishing the A1 subtype from the background [16]. Although this was only true for NREM stage 2. For the others sleep stages this feature cannot distinguish A1 or the other subtypes [15, 16].
In 2012, a method using different classifiers was proposed for the binary discrimination between Aphases and Bphases, using seven EEG features. The classifiers considered were support vector machine (SVM), linear discriminant analysis (LDA), Adaboost, and artificial neural networks (ANN) [17]. Five out of seven features were related with the signal amplitude filtered in the conventional frequency bands. The other two features were the Hjorth activity and EEG variance. The reported accuracy for LDA, SVM, Adaboost, and ANN were 84.9 ± 4.9%, 81.9 ± 7.8%, 79.4 ± 5.5% and 81.5 ± 6.4%, respectively. Concerning the sensitivity, the results for LDA, SVM, Adaboost, and ANN were 72.5 ± 10.9%, 70.1 ± 8.6%, 68.5 ± 6.7% and 72.9 ± 7.5%, respectively. Finally, the specificity for LDA, SVM, Adaboost, and ANN were 86.6 ± 6.3%, 84.0 ± 11.1%, 79.3 ± 9.4% and 82.3 ± 7.1%, respectively.
However, the approaches previously reported did not present a good performance to score correctly the microstructure without a posteriori technician revision. Moreover, good results were only obtained for the distinction between Aphases and Bphases, and not for the multiclass discrimination between all the different subtypes. Another drawback was the consideration of simple accuracy that lead to an overestimation of the performance in imbalanced datasets, which is the case of Aphase scoring.
Here, we describe a knowledge discovery methodology in data (KDD) that is for the first time applied to CAP scoring. The KDD encompasses: the extraction of multiple EEG features, different preprocessing options, and different pattern recognition techniques. The KDD aims to inspect about proper processing alternatives for the binary distinction between Aphases and Bphases, and also for multiclass classification of the different CAP components, i.e. the three Aphase subtypes as well as the Bphases. The fully discrimination of the three different Aphases subtypes is a new perspective, since past works tried multiclass approaches but based on grouping of different subtypes, as for example A2 and A3. We also considered the weighted accuracy, in addition to simple accuracy, resulting in a more trustworthy performance assessment. The next section describes the methods used, the database and data characteristics. “Results” section describes the classification performance achieved for the different options considered. Insights about results and future directions are given in “Discussion” and “Conclusion” sections.
Methods
Database
The study was carried out on 30 subjects with nocturnal frontal lobe epilepsy, 14 females and 16 males, with ages between 14 and 67 years old (mean = 31.03 ± 11.64). The dataset is available from the CAP Sleep Database [18], that comprises several onenight polysomnographic recordings from different patients with different pathologies, and has been used in several studies in the past [10, 11, 13, 17, 19]. The recordings were acquired at the Sleep Disorders Center of the Ospedale Maggiore of Parma, Italy. The polysomnographic data includes at least three EEG channels (F3 or F4, C3 or C4 and O1 or O2, referred to electrodes placed in the earlobes, labeled as A1 or A2), two EOG channels, three electromyography signals (EMG), respiration signals and the ECG. The sampling rates for the recordings vary from 128 to 512 Hz, depending on the patient.
The macrostructure and microstructure scoring were annotated by neurophysiologists and supplied together with the raw EEG data. The macrostructure was annotated according to the R&K rules [20], while CAP was detected in agreement with Terzano reference atlas [7].
Knowledge detection methodology in data
Filtering
The original signal was filtered to obtain its components in six frequency bands, i.e. in the band 0.3–35 Hz [originating a signal called in this work as the broadband signal (BB)], and in the five conventional frequency subbands. The filter of the signal between 0.3 and 35 Hz was chosen to be in agreement with the usual practice in clinics [21]. The five conventional frequency bands considered were: delta (δ: 0.3–4 Hz), theta (θ: 4–8 Hz), alpha (α: 8–13 Hz), sigma (σ: 13–16 Hz) and beta (β: 16–35 Hz), that will be designated by conventional frequency bands [8] along this article.
The EEG signal was filtered using a thirdorder bandpass Butterworth filter [22] and was chosen due to its flat and without ripples frequency response in both pass and stopbands.
Feature extraction
Macro–micro structure descriptor
This means that \({\text{MMSD}}_{\varphi }\) results from the combination of two primary features \(C_{\varphi ,\tau }\) and \(C_{{\varphi ,\tau_{0} }}\) that were also considered in this work. In a general way, \(C_{\varphi ,\tau }\) represents the mean amplitude at a given time instant, computed taking in account the past samples over an interval of size \(\tau\). If \(\tau\) is long enough, it represents the background of the signal, while if it is too short it is related with instantaneous signal activity. It was reported that if \(\tau = 60\;s\) and \(\tau_{0} = 2\;s\), \(C_{\varphi ,\tau }\) and \(C_{{\varphi ,\tau_{0} }}\) are generally related to the sleep macro and microstructure, respectively [9]. Therefore, these were the values used to compute \({\text{MMSD}}_{\varphi }\), \(C_{{\varphi ,\tau_{0} }}\) and \(C_{\varphi ,\tau }\). Given the \({\text{MMSD}}_{\varphi }\) dependence on the sleep microstructure, it has been used to classify Aphases [9–11].
One the one hand, given the selected \(\tau\) value, \({\text{MMSD}}_{\varphi }\) and \(C_{\varphi ,\tau }\) were computed by considering a window of 60 s and a superposition between consecutive windows of 59 s (98% overlap). On the other hand, \(\tau_{0}\) defines that \(C_{{\varphi ,\tau_{0} }}\) was computed based on a window of 2 s and a superposition between consecutive windows of 1 s (50% of overlap).
Teager energy operator
Zerocrossing rate
Zerocrossing rate (ZCR) is a measure of the dominant frequency of a signal, and is obtained in the time domain by counting the number of baseline crossings in a fixed time interval [30]. The ZCR is a fast, intuitive and low complexity way to obtain information about the signal frequency in a short period of time [31, 32]. This feature has been widely used in different applications [33], in particular in the sleep staging field [30, 34]. To compute this feature a nonoverlapping moving window of 1 s duration was considered.
Lempel–Ziv complexity
Lempel–Ziv complexity (LZC) is a metric used to evaluate the randomness of finite sequences proposed by Lempel and Ziv, in 1976 [35], and has been used to characterize sleep [36, 37]. To compute the LZC complexity a numerical sequence has to be transformed into a symbolic sequence. A frequent approach is to convert the signal, x[n], into a binary sequence, P = s[n], and comparing the signal with a threshold, T_{d}. The points whose value is greater than T_{d} are converted to 1, otherwise to 0. Afterwards, a dictionary is build based on the sequences present in the signal s[n]. The size of the dictionary is proportional to the LZC. The methodology used to compute this feature is described in [38]. LZC as well as ZCR were computed for the six signals: broadband signal and five frequency bands, and using a nonoverlapping moving window of 1 s.
Discrete time short time Fourier transform
The window w[n] is assumed to be nonzero only in an interval of length N_{w} and is referred to as the analysis window. The sequence x[m]w[n − m] is called short section of x[m] at time n [40]. In this work, for each window, it was obtained a spectrum and from it was extracted the frequency of maximum energy (\(Max\_freq\)), the frequency of mean energy (\(Mean\_freq\)) and the area under the magnitude spectrum curve (\(Spec\_area\)). The spectrum was computed considering a window length of 3 s, with an overlap between windows of 2 s.
Empirical mode decomposition
In empirical mode decomposition (EMD) a given signal is decomposed in intrinsic mode functions (IMF), where each one represents an embedded characteristic oscillation on a separated timescale. The EMD application requires a continuous signal, with the number of maxima equal to the number of minima, and also a signal with zero mean. The EMD has been used in EEG applications, for example, for classification of mental tasks [23] and in automatic sleep staging using ECG [41].
In this work EMD was computed for 12 decomposition levels, and the procedure used is available in [42]. To overcome problems with EEG segmentation EMD was applied to the entire signal and segmented afterwards by using consecutive windows of 1 s without overlap. The features derived from EMD were the average values of the different IMFs obtained for each window.
Shannon entropy
Shannon entropy (ShEnt) evaluates the randomness (complexity) of a signal by computing its amplitude distribution and the probability for a given value to occur. The higher the probability for the values to happen, the less information exists on the signal, resulting in a smaller entropy. ShEnt has been widely used in EEG processing, for example to distinguish between normal and epileptic EEG [43]. It was proved that this feature is proportional to the sleep macro and microstructure [44] and it has been applied in the automatic sleep staging [45–48]. A nonoverlapping moving window of 1 s duration was applied to the broadband signal, then the ShEnt was computed.
Fractal dimension
Fractal dimension (FD) quantifies the number of times the same sequence appears in a signal. A signal can be composed by basic buildingblocks forming a pattern, the FD quantifies the number of these basic buildingblocks. The algorithm proposed by Higuchi [49] is generally used for finding FD in EEG signals.
Fractal dimension is related with sleep macro and microstructure [44] and has been used in the automatic staging of sleep patterns [45, 47]. The broadband signal was divided into epochs of 1 s length, afterwards the FD was computed for each one of them.
Variance
Variance was computed by segmenting the broadband EEG signal with a nonoverlapping moving window of 1 s duration.
Summary
Computed EEG features
Measure  Acronym  Frequency band (φ)  

BB  \(\delta\)  \(\theta\)  \(\alpha\)  \(\sigma\)  β  
Macro–micro structure descriptor  MMSD_{φ}  ✓  ✓  ✓  ✓  ✓  
\(C_{\varphi ,\tau }\)  ✓  ✓  ✓  ✓  ✓  
\(C_{{\varphi ,\tau_{0} }}\)  ✓  ✓  ✓  ✓  ✓  
Teager energy operator  \({\text{TEO}}_{\varphi }\)  ✓  ✓  ✓  ✓  ✓  
Zerocrossing ratio  \({\text{ZCR}}_{\varphi }\)  ✓  ✓  ✓  ✓  ✓  ✓ 
Lempel–Ziv complexity  \({\text{LZC}}_{\varphi }\)  ✓  ✓  ✓  ✓  ✓  ✓ 
Discrete time  \(Max\_freq\)  ✓  
Shorttime  \(Mean\_freq\)  ✓  
Fourier transform  \(Spec\_area\)  ✓  
Empirical mode decomposition  \({\text{EMD}}_{l}\)  ✓  
Shannon entropy  ShEnt  ✓  
Fractal dimension  FD  ✓  
Variance  \(s_{\varphi }^{2}\)  ✓  ✓  ✓  ✓  ✓  ✓ 
Features postextraction processing
Firstly, all features except MMSD, TEO and EMD were smoothed using a causal moving average FIR filter of order 30 [50]. These features were excluded because they detect changes in amplitude and frequency, and if the smoothing technique had been applied, important information could had been lost.
Secondly, the samples four standard deviations away from the mean were considered outliers. Their values were replaced by the median of the feature. If the frequently used value of three standard deviations would be used, the Aphases would likely be considered as outliers. Therefore, the maximum distance allowed was extended. Finally, the values of each feature were normalized to be in the [0–1] range.
Feature ranking and transformation

Minimum redundancy maximum relevance (mRMR): This algorithm ranks the features based on two objectives: obtain the highest correlation between the selected features and class labels (maximum relevance) and reduce the redundancy between features (minimum redundancy). This technique is described in detail in references [51].

Principal component analysis (PCA): PCA finds the directions in a ddimensional features space where data presents the higher variance, guarantying that these directions are orthogonal among them. The final step, the reduction phase, encompass the selection of some of the directions and the projection of data according to them. The directions are found by computing the eigenvectors of the data covariance matrix [52].
Classification
Different linear and nonlinear classification methods were used in this work aiming to see for which methods and conditions a better performance was achieved. From the previous steps 55 features were obtained, and each feature had a value and a class label for each second, thus we trained the models to classify the EEG signal at every second. The class labels were computed based on the original annotations provided for the raw EEG data, as described in “Database” subsection. The different classification methods considered are described next.
Discriminant analysis
Discriminant analysis (DA) [53] has been used in different areas like in statistics, pattern recognition and machine learning. Considering a feature vector x of size \(\left( {d \times 1} \right)\), and c classes \(\omega_{k}\) (\(k = 1,2, \ldots ,c\)). To find the best discrimination function, g, the first step is to define the function type. It can be linear, \(g\left( x \right) = wx + w_{0}\) and in this case the classifier performs a linear discriminant analysis (LDA), or quadratic, \(g\left( x \right) = w_{0} + \mathop \sum \nolimits_{i = 1}^{d} w_{i} x_{i} + \mathop \sum \nolimits_{i = 1}^{d} \mathop \sum \nolimits_{j = 1}^{d} x_{i} x_{j} w_{ij}\), and in this case a quadratic discriminant analysis (QDA) is implemented, where w is the weight vector and \(w_{0}\) is the bias.
In a multiclass problem with c classes, where \(c > 2\), the discrimination procedure is to compute the c discriminant functions. The linear discriminant functions are given by \(g_{k} \left( x \right) = w_{k}^{T} x + w_{k,0}\). To each point assign a class \(c_{k}\) if \(g_{k} \left( x \right)\) assume the highest value among all the discriminant functions.
kNearest neighbors (kNN)
kNearest neighbors is a nonparametric method, which means that there is no assumption about the underlying pattern distributions. Unlikely the DA the kNN does not find the best function to divide the space into regions. Instead, the training data is stored in a matrix containing the features and the labels assigned to each pattern. To label a new point, x, it is compared with the kclosest training points. The class assigned to x is the most prevalent class in these kpoints [54].
Support vector machines
In its native formulation support vector machine (SVM) finds a decision hyperplane that maximizes the margin that separates the two different classes [55].
The SVM is defined as a twoclass classifier, but Aphase classification is a multiclass problem. The usual approach for multiclass SVM classification is to use a combination of several binary SVM classifiers. Different methods exist but in this work we used the oneagainstall multiclass approach [56]. This method transforms the multiclass problem, with c classes, into a series of c binary sub problems that can be solved by the binary SVM. The i th classifier output function \(\rho_{i}\) is trained taking the examples from \(c_{i}\) as 1 and the examples from all other classes as − 1. For a new example x, this method assigns x to the class associated with the largest value of \(\rho_{i}\) [56].
Postprocessing
The unique postprocessing implemented was focused on the validation the Aphase duration that must be within the interval 2 s to 60 s. Thus, the number of consecutive 1 s epochs classified as Aphases were quantified. If a single epoch was classified as Aphase, or if more than 60 consecutive epochs were classified as Aphases, they were considered as background signal, i.e. as Bphases.
Performance evaluation
Generic confusion matrix, where “^” indicates predictions provided by the algorithms
Predicted  True  

\(\varvec{c}_{1}\)  \(\varvec{c}_{2}\)  \(\cdots\)  \(\varvec{c}_{\varvec{K}}\)  
\(\widehat{\varvec{c}}_{1}\)  \(n_{1,1}\)  \(n_{1,2}\)  \(\cdots\)  \(n_{1,K}\) 
\(\widehat{\varvec{c}}_{2}\)  \(n_{2,1}\)  \(n_{2,2}\)  \(\cdots\)  \(n_{2,K}\) 
\(\vdots\)  \(\cdots\)  \(\cdots\)  \(\cdots\)  \(\cdots\) 
\(\widehat{\varvec{c}}_{\varvec{K}}\)  \(n_{K,1}\)  \(n_{K,2}\)  \(\cdots\)  \(n_{K,K}\) 
For the binary problem c = {Aphase, Bphase}, while for the multiclass problem c = {Bphase, A1, A2, A3}. In Table 2 the diagonal terms \(n_{ij}\), where \(i = j\), correspond to the instances where the algorithm’s output was consistent with to the real class label, i.e. the true positive patterns. The values \(n_{ij}\), where \(i \ne j\), are the number of instances misclassified by the algorithm, which can be considered as false positives or false negatives depending on the class under analysis.

True positive (\(TP_{k}\)): number of instances correctly classified as class k;

False positive (\(FP_{k}\)): number of instances classified as k when in fact they belong to other class;

True negative (\(TN_{k}\)): number of instances correctly not classified as k;

False negatives (\(FN_{k}\)): number of instances assigned to other classes when in fact they belong to class k;
AC is also presented, although it provides poorer performance, to better compare the results with those reported in literature.
Results
In total, 55 features were computed. An evaluation of the performance using different number of features and principal components to build a classifier was performed and the results are shown in the following subsections. A straightforward approach was implemented and worked as follow: (i) at start only the two more important features or principal components were considered; (ii) then we introduced the next feature/principal component in the next step; (iii) until all the features/components were considered as classifier’s input.
Multiclass Aphase classification
mRMR
Features ranking with mRMR did not returned exactly the same result for all the patients, being an evidence of the interindividual variability. Analyzing all the rankings sequences for each patient, the most prevalent ranking sequence was (from higher to lower importance): \({\text{LZC}}_{BB}\), \(C_{\beta ,\tau }\), \(C_{\sigma ,\tau }\), \(C_{{\beta ,\tau_{0} }}\), \(C_{{\sigma ,\tau_{0} }}\), \(C_{\delta ,\tau }\), \({\text{ZCR}}_{BB}\), \(C_{{\delta ,\tau_{0} }}\), \(Max\_freq\), \(C_{{\alpha ,\tau_{0} }}\), \(C_{\theta ,\tau }\), \({\text{MMSD}}_{{}}\), \(C_{{\theta ,\tau_{0} }}\), \(C_{\alpha ,\tau }\), \({\text{TEO}}_{\delta }\), \({\text{MMSD}}_{{}}\), \({\text{EMD}}_{1}\), \(s_{\delta }^{2}\), \({\text{MMSD}}_{\sigma }\), \({\text{EMD}}_{2}\), \({\text{EMD}}_{6}\), \({\text{MMSD}}_{\alpha }\), \({\text{EMD}}_{5}\), \({\text{EMD}}_{7}\), \({\text{MMSD}}_{\beta }\), \(s_{\theta }^{2}\), \({\text{MMSD}}_{\sigma }\), \(Spec\_area\), \(s_{BB}^{2}\), \({\text{EMD}}_{10}\), \({\text{EMD}}_{4}\), \(s_{\beta }^{2}\), \({\text{EMD}}_{11}\), \({\text{EMD}}_{12}\), ShEnt, \({\text{EMD}}_{3}\), \(Mean\_freq\), \(s_{\sigma }^{2}\), \({\text{TEO}}_{\theta }\), \({\text{EMD}}_{9}\), \({\text{EMD}}_{8}\), \(s_{\alpha }^{2}\), \({\text{TEO}}_{\beta }\), \({\text{TEO}}_{\sigma }\), \({\text{TEO}}_{\alpha }\), \({\text{ZCR}}_{\delta }\), \({\text{ZCR}}_{\theta }\), \({\text{ZCR}}_{\alpha }\), \({\text{ZCR}}_{\sigma }\), \({\text{ZCR}}_{\beta }\), \({\text{LZC}}_{\delta }\), \({\text{LZC}}_{\theta }\), \({\text{LZC}}_{\alpha }\), \({\text{LZC}}_{\sigma }\), \({\text{LZC}}_{\beta }\) and FD.
Mean of confusion matrix for an SVM classifier using the 40 features ranked by mRMR, for the parameters of C = 2^{−1} and γ = 2^{−1}
Predicted  True  

BPhase  A1  A2  A3  
BPhase  76 ± 5  15 ± 8  21 ± 12  27 ± 11 
A1  11 ± 4  59 ± 14  27 ± 13  17 ± 10 
A2  11 ± 6  22 ± 12  44 ± 15  32 ± 11 
A3  7 ± 2  5 ± 4  8 ± 8  24 ± 10 
PCA
Mean of confusion matrix for a SVM classifier using 30 principal components, for the parameters of C = 2^{−5} and γ = 2^{−9}
Predicted  True  

BPhase  A1  A2  A3  
BPhase  61 ± 3  5 ± 2  9 ± 4  26 ± 9 
A1  17 ± 4  60 ± 17  35 ± 12  25 ± 8 
A2  16 ± 4  32 ± 15  53 ± 10  38 ± 9 
A3  6 ± 2  2 ± 3  3 ± 3  13 ± 5 
Binary classification of Aphase vs Bphase
For the twoclass problem, where it was only necessary to detect the Aphases from the Bphases without concerning the Aphase subtype, all the classifiers applied to the previous problem are analyzed in the following.
mRMR
With DA, the higher value of WAC was 72% with a LDA classifier, and using the 43 best features. The registered SP, SE and AC were: 65%, 78% and 67%, respectively. Using kNN with \(k = 25\) the better performance values were obtained for 55 features, being AC equal to 75% and WAC equal to 80%. In these conditions, sensitivity and specificity were 82% and 68%, respectively. Considering the SVM with 40 features, the highest WAC achieved was 78% for \(C = 2^{  1}\) and \(\gamma = 2^{  1}\), assuming AC, SP and SE the values 76%, 79% and 77%, respectively.
PCA
Combining the principal components with DA classifier the higher WAC obtained was 76% for a QDA with 15 dimensions. In this case SE, SP and AC were 79%, 74% and 73%, respectively. Regarding kNN, the highest WAC, 75%, was achieved for 54 components and for \(k = 25\). SE, SP and AC were 86%, 65% and 68%, respectively. For SVM the best result was for 40 principal components and for \(c = 2^{  5}\) and \(\gamma = 2^{  11}\). The achieved WAC was 75%, being AC, SE and SP: 73%, 76% and 73%, respectively.
Summary
Summary of the best results obtained for all the main methodological options considered
Multiclass problem  Binary problem  

mRMR  PCA  mRMR  PCA  
#F.  AC (%)  WAC (%)  #P. C.  AC (%)  WAC (%)  #F.  AC (%)  WAC (%)  #P. C.  AC (%)  WAC (%)  
DA  30  61  47  12  68  49  43  67  72  15  73  76 
kNN  30  70  46  40  61  47  55  75  80  54  68  75 
SVM  40  71  51  30  56  47  40  76  78  40  73  75 
Discussion
Comparison of the best classifier model proposed in this work with the ones presented in literature, that distinguish among Aphases subtypes
Most of the A3 were arousals that are events with characteristics of awake sleep stage. Therefore, the classifiers were trained with Bphases similar to A3 phases, which led to a lot of A3 phases being wrongly classified as Bphase. This can be seen observing the confusion matrices represented in Tables 3, 4. The typical waveform associated with A1 are only the Kcomplex that counts with appropriate features for its detection, which explained the higher results for this subtype, compared to the A2 and A3.
Subtypes sensitivities can be improved by taking into account context information. Information from the macrostructure could be important to improve A3 discrimination, given that it is confused with the awake state. Other aspect that is important to take into account is the subject condition. For example, NFLE patients, to which belongs the population considered in this work, are characterized by a significant enhancement of all Aphases subtypes when compared with controls [58]. In addition, Aphases offers favorable conditions for the occurrence of nocturnal motor seizures [6], and for the generation of paroxysmal EEG features that can be used as biomarkers.
In general, the accuracy reported in literature is higher than the ones reported in this work for the various methods. This is due to the fact that they reported the standard accuracy without accounting for class imbalance. Therefore, again, they gave more importance to the Bphases that were more numerous than the Aphases. The consideration of WAC enabled a more appropriate estimation of the algorithms performance. In fact, if WAC was considered instead of AC, the algorithms presented in [9–11], presented much lower performance values, as can be observed in Table 7.
Conclusion
We propose a multistep KDD to determine appropriate processing options for automatic CAP scoring. We extracted several features, approached two different feature selection/reduction methods, and different classification methods. It was concluded that using the features ranked by mRMR without any transformations leads to better results. It was shown that the SVM was the best classification method for the full discrimination of all CAP components, while kNN performed better when one just wants to discriminate Aphases from Bphases.
The proposed KDD is for the first time applied to CAP scoring. In particular, the fully discrimination of the three different Aphases subtypes is a new perspective, since past works tried multiclass approaches but based on grouping of different subtypes. We also considered the weighted accuracy, in addition to simple accuracy, resulting in a more trustworthy performance assessment. Globally, better subtype sensitivities than other published approaches were achieved.
Future steps to improve classification will encompass the consideration of context information related with CAP classification, as for example sleep stage and subject medical condition. Other future step is the analysis of the benefits of microstructure staging as a disease biomarker, as for example as a precursor of epileptic seizures.
Declarations
Authors’ contributions
FM implemented all the algorithms and performed the main analysis. FS and CS contributed with the clinical knowledge, including results validation. AD and CT supervised FM work and implemented major manuscript changes. All authors read and approved the final manuscript.
Acknowledgements
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Availability of data and materials
All of the datasets used in this article are freely available at https://physionet.org/pn6/capslpdb/.
Consent for publication
Not applicable.
Ethics approval and consent to participate
We used a publicly available database; thus, this section is not applicable.
Funding
The authors would like to thank the financial support of Liga Portuguesa Contra a Epilepsia that awarded the scientific grant “EPICAPLongterm evaluation of the cyclic alternating pattern (CAP) as a precursor of epileptic seizures”.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Authors’ Affiliations
References
 Parrino L, Ferri R, Bruni O, Terzano MG. Cyclic alternating pattern (CAP): the marker of sleep instability. Sleep Med Rev. 2012;16:27–45.View ArticleGoogle Scholar
 Terzano MG, Parrino L. Origin and significance of the cyclic alternating pattern (CAP). Sleep Med Rev. 2000;4(1):101–23.View ArticleGoogle Scholar
 Terzano MG, Mancia D, Salati MR, Costani G, Decembrino A, Parrino L. The cyclic alternating pattern as a physiologic component of normal NREM sleep. Sleep. 1985;8(2):137–45.View ArticleGoogle Scholar
 Thomas RJ. Arousals in sleepdisordered breathing: patterns and implications. Sleep. 2003;26(8):1042–7.View ArticleGoogle Scholar
 Parrino L, Halasz P, Tassinari CA, Terzano MG. CAP, epilepsy and motor events during sleep: the unifying role of arousal. Sleep Med Rev. 2006;10:267–85.View ArticleGoogle Scholar
 Parrino L, Smerieri A, Spaggiari MC, Terzano MG. Cyclic alternating pattern (CAP) and epilepsy during sleep: how a physiological rhythm modulates a pathological event. Clin Neurophysiol. 2000;111(SUPPL. 2):S39–46.View ArticleGoogle Scholar
 Terzano MG, Parrino L, Smerieri A, Chervin R, Chokroverty S, Guilleminault C, et al. Erratum: “Atlas, rules, and recording techniques for the scoring of cyclic alternating pattern (CAP) in human sleep” (Sleep Med (2001) vol. 2 (6) (537–553)). Sleep Med. 2002;3(2):185.View ArticleGoogle Scholar
 Tatum WO. Ellen R. grass lecture: extraordinary EEG. Neurodiagn J. 2014;54(1):3–21.Google Scholar
 Barcaro U, Navona C, Belloli S, Bonanni E, Gneri C, Murri L. A simple method for the quantitative description of sleep microstructure. Electroencephalogr Clin Neurophysiol. 1998;106(5):429–32.View ArticleGoogle Scholar
 Navona C, Barcaro U, Bonanni E, Di Martino F, Maestri M, Murri L. An automatic method for the recognition and classification of the Aphases of the cyclic alternating pattern. Clin Neurophysiol. 2002;113(11):1826–31.View ArticleGoogle Scholar
 Barcaro U, Bonanni E, Maestri M, Murri L, Parrino L, Terzano MG. A general automatic method for the analysis of NREM sleep microstructure. Sleep Med. 2004;5(6):567–76.View ArticleGoogle Scholar
 Rosa AC, Parrino L, Terzano MG. Automatic detection of cyclic alternating pattern (CAP) sequences in sleep: preliminary results. Clin Neurophysiol. 1999;110(4):585–92.View ArticleGoogle Scholar
 Largo R, Munteanu C, Rosa A. CAP event detection by wavelets and GA tuning. In: Intelligent signal processing, 2005 IEEE international workshop on, vol. 2, no. 1. 2005. pp. 44–8.Google Scholar
 Stam CJ, Van Dijk BW. Synchronization likelihood: an unbiased measure of generalized synchronization in multivariate data sets. Phys D Nonlinear Phenom. 2002;163(3–4):236–51.MathSciNetMATHView ArticleGoogle Scholar
 Ferri R, Rundo F, Bruni O, Terzano MG, Stam CJ. Dynamics of the EEG slowwave synchronization during sleep. Clin Neurophysiol. 2005;116(12):2783–95.View ArticleGoogle Scholar
 Ferri R, Rundo F, Bruni O, Terzano MG, Stam CJ. Regional scalp EEG slowwave synchronization during sleep cyclic alternating pattern A1 subtypes. Neurosci Lett. 2006;404(3):352–7.View ArticleGoogle Scholar
 Mariani S, Manfredini E, Rosso V, Grassi A, Mendez MO, Alba A, et al. Efficient automatic classifiers for the detection of A phases of the cyclic alternating pattern in sleep. Med Biol Eng Comput. 2012;50(4):359–72.View ArticleGoogle Scholar
 Goldberger AL, Amaral LAN, Glass L, Hausdorff JM, Ivanov PC, Mark RG, et al. PhysioBank, PhysioToolkit, and PhysioNet. Circulation. 2000;101(23):E215–20.View ArticleGoogle Scholar
 Ferri R, Bruni O, Miano S, Smerieri A, Spruyt K, Terzano MG. Interrater reliability of sleep cyclic alternating pattern (CAP) scoring and validation of a new computerassisted CAP scoring method. Clin Neurophysiol. 2005;116(3):696–707.View ArticleGoogle Scholar
 Rechtschaffen A, Kales A. A manual of standardized terminology, techniques and scoring system for sleep stages of human subjects. Washington, DC: Public Heal Serv US Gov Print Off; 1968.Google Scholar
 Iber C, AncoliIsrael S, Chesson AL, Quan S. The AASM manual for the scoring of sleep and associated events: rules, terminology and technical specifications. Westchester, IL: American Academy of Sleep Medicine; 2007. p. 59.Google Scholar
 Butterworth S. On the theory of filter amplifiers. Wirel Eng. 1930;7:536–41.Google Scholar
 Kaleem M, Guergachi A, Krishnan S. Application of a variation of empirical mode decomposition and teager energy operator to EEG signals for mental task classification. In: Proceedings of the annual international conference of the IEEE engineering in medicine and biology society, EMBS. 2013. p. 965–8.Google Scholar
 Kvedalen E. Signal processing using the Teager energy operator and other nonlinear operators. Signal Process. 2010;9120(May):121.Google Scholar
 Bahoura M, Rouat J. Wavelet speech enhancement based on timescale adaptation. Speech Commun. 2006;48(12):1620–37.View ArticleGoogle Scholar
 Jabloun F, Çetin AE, Erzin E. Teager energy based feature parameters for speech recognition in car noise. IEEE Signal Process Lett. 1999;6(10):259–61.View ArticleGoogle Scholar
 Lauer RT, Prosser LA. Use of the Teager–Kaiser energy operator for muscle activity detection in children. Ann Biomed Eng. 2009;37(8):1584–93.View ArticleGoogle Scholar
 Duman F, Erdamar A, Erogul O, Telatar Z, Yetkin S. Efficient sleep spindle detection algorithm with decision tree. Expert Syst Appl. 2009;36(6):9980–5.View ArticleGoogle Scholar
 Erdamar A, Duman F, Yetkin S. A wavelet and teager energy operator based method for automatic detection of KComplex in sleep EEG. Expert Syst Appl. 2012;39(1):1284–90.View ArticleGoogle Scholar
 Carrozzi M, Accardo A, Bouquet F. Analysis of sleepstage characteristics in fullterm newborns by means of spectral and fractal parameters. Sleep. 2004;27(7):1384–93.View ArticleGoogle Scholar
 Cai H. Fast frequency measurement algorithm based on zero crossing method. In: ICCET 2010—2010 international conference on computer engineering and technology, proceedings. 2010.Google Scholar
 Djurić MB, Djurišić ŽR. Frequency measurement of distorted signals using Fourier and zero crossing techniques. Electr Power Syst Res. 2008;78(8):1407–15.View ArticleGoogle Scholar
 Aye YY. Speech recognition using Zerocrossing features. In: Proceedings—2009 international conference on electronic computer technology, ICECT 2009. 2009. p. 689–92.Google Scholar
 Drinnan MJ, Murray A, White JE, Smithson AJ, Griffiths CJ, Gibson GJ. Automated recognition of EEG changes accompanying arousal in respiratory sleep disorders. Sleep. 1996;19(4):296–303.View ArticleGoogle Scholar
 Lempel A, Ziv J. On the complexity of finite sequences. IEEE Trans Inf Theory. 1976;22:75–81.MathSciNetMATHView ArticleGoogle Scholar
 Abásolo D, Simons S, Morgado da Silva R, Tononi G, Vyazovskiy VV. LempelZiv complexity of cortical activity during sleep and waking in rats. J Neurophysiol. 2015;113(7):2742–52.View ArticleGoogle Scholar
 Casali AG, Gosseries O, Rosanova M, Boly M, Sarasso S, Casali KR, et al. A theoretically based index of consciousness independent of sensory processing and behavior. Sci Transl Med. 2013;5(198):198ra105.View ArticleGoogle Scholar
 Aboy M, Hornero R, Abásolo D, Álvarez D. Interpretation of the LempelZiv complexity measure in the context of biomedical signal analysis. IEEE Trans Biomed Eng. 2006;53(11):2282–8.View ArticleGoogle Scholar
 Bracewell RN. The Fourier transform and its applications. New York: McGrawHill; 1986. p. 1–4.Google Scholar
 Allen J. Short term spectral analysis, synthesis, and modification by discrete Fourier transform. IEEE Trans Acoust. 1977;25(3):235–8.MATHView ArticleGoogle Scholar
 Ebrahimi F, Setarehdan SK, AyalaMoyeda J, Nazeran H. Automatic sleep staging using empirical mode decomposition, discrete wavelet transform, timedomain, and nonlinear dynamics features of heart rate variability signals. Comput Methods Programs Biomed. 2013;112(1):47–57.View ArticleGoogle Scholar
 Rutkowski TM, Mandic DP, Cichocki A, Przybyszewski AW. EMD approach to multichannel EEG Data—The amplitude and phase synchrony analysis technique. Lecture notes in computer science (including subseries Lecture notes in artificial intelligence and Lecture notes in bioinformatics). 2008. p. 122–9.Google Scholar
 Kannathal N, Choo ML, Acharya UR, Sadasivan PK. Entropies for detection of epilepsy in EEG. Comput Methods Programs Biomed. 2005;80(3):187–94.View ArticleGoogle Scholar
 Chouvarda I, Mendez MO, Rosso V, Bianchi AM, Parrino L, Grassi A, et al. Predicting EEG complexity from sleep macro and microstructure. Physiol Meas. 2011;32(8):1083–101.View ArticleGoogle Scholar
 Chouvarda I, Mendez MO, Alba A, Bianchi AM, Grassi A, ArceSantana E, et al. Nonlinear analysis of the change points between A and B phases during the Cyclic Alternating Pattern under normal sleep. In: Proceedings of the annual international conference of the IEEE engineering in medicine and biology society, EMBS. 2012. p. 1049–52.Google Scholar
 Chouvarda I, Mendez MO, Rosso V, Bianchi AM, Parrino L, Grassi A, et al. CAP sleep in insomnia: new methodological aspects for sleep microstructure analysis. In: Proceedings of the annual international conference of the IEEE engineering in medicine and biology society, EMBS. 2011. p. 1495–8.Google Scholar
 Koley B, Dey D. An ensemble system for automatic sleep stage classification using single channel EEG signal. Comput Biol Med. 2012;42(12):1186–95.View ArticleGoogle Scholar
 RodríguezSotelo JL, OsorioForero A, JiménezRodríguez A, CuestaFrau D, CirugedaRoldán E, Peluffo D. Automatic sleep stages classification using EEG entropy features and unsupervised pattern analysis techniques. Entropy. 2014;16(12):6573–89.View ArticleGoogle Scholar
 Higuchi T. Approach to an irregular time series on the basis of the fractal theory. Phys D Nonlinear Phenom. 1988;31(2):277–83.MathSciNetMATHView ArticleGoogle Scholar
 Savitzky A, Golay MJE. Smoothing and differentiation of data by simplified least squares procedures. Anal Chem. 1964;36(8):1627–39.View ArticleGoogle Scholar
 Peng H, Long F, Ding C. Feature selection based on mutual information: criteria of maxdependency, maxrelevance, and minredundancy. IEEE Trans Pattern Anal Mach Intell. 2005;27(8):1226–38.View ArticleGoogle Scholar
 Shlens J. A tutorial on principal component analysis. arXiv Prepr arXiv14041100. 2014;1–13. .Google Scholar
 Izenman AJ. Linear discriminant analysis. In: Modern multivariate statistical techniques. 2013. p. 237–80.Google Scholar
 Altman NS. An introduction to Kernel and nearestneighbor nonparametric regression. Am Stat. 1992;46(3):175–85.MathSciNetGoogle Scholar
 Vapnik V, Chervonenkis A. Ordered risk minimization. Autom Remote Control. 1974;34:1226–35.MathSciNetMATHGoogle Scholar
 Duan KB, Keerthi SS. Which is the best multiclass SVM method? An empirical study. Mult Classif Syst. 2005;3541:278–85.View ArticleGoogle Scholar
 Picard RR, Cook RD. Crossvalidation of regression models. J Am Stat Assoc. 1984;79(387):575–83.MathSciNetMATHView ArticleGoogle Scholar
 Parrino L, De Paolis F, Milioli G, Gioi G, Grassi A, Riccardi S, et al. Distinctive polysomnographic traits in nocturnal frontal lobe epilepsy. Epilepsia. 2012;53(7):1178–84.View ArticleGoogle Scholar