Epileptic seizure classifications using empirical mode decomposition and its derivative

Background Epilepsy is one of the most common neurological disorders associated with disruption of brain activity. In the classification and detection of epileptic seizures, electroencephalography (EEG) measurements, which record the electrical activities of the brain, are frequently used. Empirical mode decomposition (EMD) and its derivative, ensemble EMD (EEMD) are recently developed methods used to decompose non-stationary and nonlinear signals such as EEG into a finite number of oscillations called intrinsic mode functions (IMFs). Our main objective in this study is to present a hybrid IMF selection method combining four different approaches (energy, correlation, power spectral distance, and statistical significance measures), and investigate the effect of selected IMFs extracted by EMD and EEMD on the classification. We have applied the proposed IMF selection approach on the classification of EEG signals recorded from epilepsy patients who are under treatment at our collaborator hospital. Multichannel EEG signals collected from epilepsy patients are decomposed into IMFs, and then IMF selection was performed. Finally, time- and spectral-domain, and nonlinear features are extracted and feature sets are created for the classification. Results The maximum classification accuracies obtained using various combinations of IMFs were 94.56%, 95.63%, 96.8%, and 96.25% for SVM, KNN, naive Bayes, and logistic regression classifiers, respectively, by using EMD analysis; whereas, the EEMD approach has provided maximum classification accuracies of 96.06%, 97%, 97%, and 96.25% for SVM, KNN, naive Bayes, and logistic regression, respectively. Classification performance with the same features obtained using direct EEG signals instead of the decomposed IMFs was worse than the aforementioned 2 approaches for every combination. Conclusion Simulation results demonstrate that the proposed IMF selection approach affects the classification results. Also, EEMD provides a robust method for feature extraction from EEG signals in order to classify pre-seizure and seizure segments.


Background
Epilepsy is one of the neurological disorders associated with disruption of brain activity that affects approximately 50 million people of the world's population [1,2]. Detection of epileptic seizures is performed by neurologists by a visual examination of long-term electroencephalogram (EEG) signals. However, this method is very time-consuming and generally yields incorrect results. On the other hand, epileptic seizures are initiated in different brain lobes of different individuals, so it is not possible to determine a standard focus center for the studies. Therefore long-term EEG recordings are needed to detect epileptic seizures and determine focus center [2][3][4][5].
Since visual examination of long-term EEG data makes it difficult to diagnose the disease, automatic seizure detection has become a very popular research area and various signal processing methods have been applied to solve this problem [2,5,6].
Many types of seizure detection and classification algorithms have been proposed in the literature [5]. These studies will be briefly discussed in "Related studies" section. In this present study, empirical mode decomposition (EMD) and its derivative, ensemble EMD (EEMD) based classification model for epileptic EEG data is introduced. Our aim is to distinguish pre-seizure and seizure epileptic EEG signals by classifying the features extracted from selected IMFs of EMD, or EEMD. Simulations are performed to evaluate the effectiveness of selecting the IMFs based on some metrics as opposed to using first several IMFs for the classification.
The rest of the paper is organized as follows. The review of some of the previous related work is given in "Related studies" section. Experimental results of the proposed method are shown in "Results" section. Discussion of the results is reported in "Discussion" section. The description of the data set, EMD algorithm, EEMD algorithm and the details of the proposed methodology are discussed in "Materials and methods" section.

Related studies
Epileptic seizure detection and classification studies have been reported frequently in the literature using various signal processing and classification methods. A variety of features such as temporal, spectral, statistical and nonlinear features are exploited to improve the detection and classification performance.
Several methods have been presented for the detection and classification of seizure and seizure-free EEG segments by using time and frequency domain features such as energy [7], exponential energy [8], matrix determinant [2], spectral power of Hjorth's mobility components [9], cross-correlation, power spectral density [10], subband spectral powers [11], average value, maximum value, and minimum value [5]. Furthermore, several studies may be found in the literature using the wavelet transform and its derivative approaches [6,12].
Weighted multiscale Renyi permutation entropy (WMRPE), weighted permutation entropy (WPE), fuzzy entropy (FuzzyEn), a sigmoid entropy, approximate entropy (ApEn) based methods have also been frequently applied to this problem [13][14][15]. Additionally, nonlinear parameters such as fractal dimension, scaling exponent obtained with detrended fluctuation analysis (DFA), Hurst's exponent have been utilized in many studies and successful results have been obtained for the detection and classification of seizure and seizure-free epileptic EEG signals [16,17].
Time-frequency analysis methods such as EMD, EEMD, multivariate empirical mode decomposition (MEMD), complete ensemble empirical mode decomposition (CEEMD) which are developed for the analysis of nonlinear and non-stationary signals, have been successfully applied into detection or classification of seizure and seizure-free epileptic EEG signals in many studies [1,[18][19][20][21][22][23][24][25][26][27][28]. These methods decompose a given signal into a finite number of zero-mean oscillations called intrinsic mode functions (IMFs). One of the major problems while using EMD and other similar decomposition methods is how to choose which IMFs to be used in the classification algorithms. In most studies, the first several IMFs, known to have high-frequency oscillations, are automatically selected for feature extraction [19][20][21][22]. It may be discussed that there is a lack of methods in the literature for the selection of best IMFs while using EMD and other similar decompositions.

Results
EEG signals including pre-seizure and seizure segments obtained from 10-channel EEG recordings of 16 epilepsy patients who are under treatment at Izmir Katip Celebi University School of Medicine, Department of Neurology, were analyzed using EMD, and EEMD approaches and various classifiers. The hybrid IMF selection process including energy, correlation, power spectral distance, and statistical significance measures was carried out for EMD and EEMD approaches in order to identify the IMFs that best represent the original signal as described in "Selection of intrinsic mode functions (IMFs)" section. After the IMF selection process, time-domain (energy, mean value, skewness, and kurtosis) and spectral-domain (total power, spectral entropy, 1st, 2nd, and 3rd moments), and nonlinear (Hurst exponent and Higuchi fractal dimension) feature-sets were created using the selected three IMFs (IMF1, IMF3, IMF2) obtained by EMD, and EEMD approaches, and the EEG signal itself. In addition, we also performed simulations to compare the performance of our proposed approach with that of Discrete Wavelet Transforms (DWT). Since three selected IMFs of EMD and EEMD approaches are used for feature extraction and classification, three-level decomposition is used for DWT utilizing Daubechies4 (db4) mother wavelet function [23]. Finally, SVM, KNN, naive Bayes, and logistic regression classifiers are used for the classification, and the results are evaluated.
Performance evaluation results of our proposed approach are given in Tables 1, 2, 3, 4. In these tables IMF1, IMF2, or IMF3; show that the features for classifications are calculated by using the corresponding IMF; IMF 1-3 denotes that the features are extracted using all three IMFs. On the other hand IMF1-IMF2 shows that the features are extracted from IMF1 and IMF2. Additionally, AC+DC1-3 show that the features are extracted from approximation coefficient (AC) and 3 detail coefficients (DC)     in table cells indicate the best performance in accuracy for each approach (Tables 1, 2, 3) and classifier (Table 4). Table 1 summarizes the performance evaluation of time-domain features used for classification. Using the time-domain features calculated from the IMF1-IMF3 (the most favorable two IMFs) of EMD, we obtain 97.18% classification accuracy and 97.14% F-score using the logistic regression classifier. While the logistic regression algorithm yields the highest accuracy (98.13%) and F-score (98.13%) values by using the time-domain features calculated from IMF1-IMF3 of EEMD, the SVM algorithm performs the worst (ACC: 62.44%, F1-score: 60.80%) for the same features calculated from IMF2. When the same features calculated from the subbands obtained using DWT, we achieved 94.25% accuracy and 94.31% F-score for the KNN classifier. To reveal the effect of decomposition, we analyzed the EEG signal itself and repeated the above feature extractions and classification. Using the time-domain features and KNN classifier, we obtain 89.75% accuracy and 89.96% F-score, where the SVM performed very poorly (ACC: 53.94% and F-score: 45.26%). Results of all classification using time-domain features are provided in Table 1.
We give the performance metrics for spectral features used in classification for different IMF combinations in Table 2. We observe that naive Bayes provides 96.88% accuracy and 96.77% F-score using spectral features calculated from IMF1-IMF3 of EMD. However, higher classification performance is obtained by the same features calculated from IMF2-IMF3 of EEMD with logistic regression. While 95% accuracy and 94.87% F-score were obtained from the spectral feature of DWT using naive Bayes classifier; 93.31% accuracy and 93.37% F-score were achieved using the same feature obtained from EEG signals itself.
Classification results using nonlinear features are given in Table 3. The results suggest that the nonlinear features extracted from IMF1-4 of EMD provided classification performance with 95% accuracy and 95.01% F-score using KNN and SVM. However, EEMD approach provided 92.94% accuracy and 92.90% F-score using the same features with SVM. Using the features obtained from the EEG signal itself, accuracy and F-score are obtained 69.38% and 68.95%, respectively, with KNN. On the other hand, 87.50% accuracy and 87.42% F-score were obtained using the nonlinear feature of the DWT approach by the logistic regression classifier.
In order to determine the effect of IMF selection on the classification performance and to compare the approaches, the classification is performed with the combination of time, spectral, and nonlinear features. The classification results are shown in Table 4. In EMD approach, the SVM provided the maximum classification accuracy (94.56%) using combined features of IMF1-IMF2. However, KNN (95.63%), naive Bayes (96.88%), and logistic regression (96.25%) classifiers resulted in the highest accuracies using combined features of IMF1-IMF3.
On the other hand, in the EEMD approach SVM (96.06%) and logistic regression (96.25%) classifiers provided the highest classification accuracy for the combined features of IMF1. While KNN (97.06 %) achieves the best performance using combined features of IMF1-3, naive Bayes (97%) yielded maximum classification accuracy using the combined feature of IMF1-IMF3.
DWT approach provided maximum classification accuracy of 94.56% with naive Bayes classifier for the combined features of subbands. Notice that by using the same features extracted from the EEG signal (the last row), KNN (93.25%) provides the best classification performance. We also observed that the classification performance of the combined feature-set created by using the EEG signal is worse than the EMD and EEMD approaches. Furthermore, the highest classification performance for all classifiers is achieved using features extracted by EEMD approach. Apart from the selected first 3 IMF, the success of the classification was not improved when the features obtained using the 4th IMF were included in the classification process.
In order to investigate the channel-based performance of our approaches, the classification is performed for 10 channels separately using total features of IMF1-3. The average mean classification accuracies for the channels in the left (Fp1-F7, F7-T1, T1-T3, T3-T5, Fp1-F3 channels) and right (Fp2-F8, F8-T2, T2-T4, T4-T6, Fp2-F4 channels) hemispheres are calculated. The classification accuracy of EEMD-and EEG-signal based approaches are higher in the left hemisphere for all four classifiers (shown in Fig. 1b, c). These results are supported by the clinical information about epileptic focus areas of patients in our study, shown in Table 5. However, in the EMD-based approach, the classification accuracy is higher for the left hemisphere only for KNN and naive Bayes classifier (shown in Fig. 1a).

Discussion
In our proposed study, the main objective is to present a hybrid IMF selection method and explore the effect of selected IMFs extracted by EMD and EEMD, on the classification performance. Our approach investigates the advantage of using EEMD, where noise-added versions of the signal are decomposed to eliminate the well-known, mode-mixing problem of EMD. The problem of mode mixing can be described as the Fig. 1 Hemisphere-based mean classification accuracy for a EMD approach, b EEMD approach, and c EEG signals. Here, left and right hemispheres were represented with blue and red, respectively occurrence of very different oscillations in one mode, or very similar oscillations in different modes. EEMD method has been developed to overcome this shortcoming of EMD. As such, in our experiments we included EEMD as well as EMD to compare their classification performance. We have applied the proposed IMF selection approach on the classification of EEG signals recorded from epilepsy patients who are under treatment at our collaborator hospital. We have used 10-channel EEG signals recorded from 16 patients, providing a total of 160 pre-seizure, and 160 seizure (320 total) EEG segments. In addition, 4 time-domain, 5 frequency domain, and 2 nonlinear features are extracted from each selected IMF of those EEG segments. The time-domain, spectral-domain, and nonlinear features obtained from the selected three IMFs (IMF1, IMF3 and IMF2; in this order) were classified using support vector machine (SVM), K-nearest neighbor (KNN), naive Bayes, and logistic regression classifiers, and the performances of EMD and EEMD approaches were compared. Then by using this selection approach, we explore the advantages of IMF selection in either EMD or EEMD approaches as opposed to using first several IMFs (IMF1-4). In order to reveal the advantages of using EMD or EEMD approaches, the same features were extracted from the EEG signal itself, and the subbands obtained by the DWT approach, and classification processes is repeated.
Performance of SVM classifier with time feature-set was found to be poor for both approaches. When nonlinear feature-set was used, the success of four classifiers was found to be low in both approaches. Using the spectral feature-set, we obtain higher accuracies for all classifiers except logistic regression. This suggests that epileptic seizures cause distinctive changes in the frequency domain. In addition, when IMFbased classification results were evaluated, we notice that the success of classification performed only by the features obtained from the combination of selected IMFs was higher or similar to randomly selected first 4 IMFs (except nonlinear feature set). This shows that the IMF selection process helps improve the classification performance as selected IMFs carry the most useful information for the discrimination between the seizure and pre-seizure segments of EEG signals. The classification accuracy obtained using EMD or EEMD approaches using each feature-set is higher than that of the features obtained directly from EEG signals, and subbands of DWT, for all four classifiers. The computational complexity of EMD and its derivative, over classical approaches such as DWT, and fast Fourier transform (FFT) is generally considered as a disadvantage. Contrary to common knowledge, if the number of sifting steps in the EMD algorithm is equal to 10, the computational complexity is given as O (N logN ), which is same as the computational complexity of FFT, where O denotes the order of computation, and N shows the signal sample size. In addition to EMD, the number of ensembles is added to the computational complexity in the EEMD approach [29]. Therefore, in signal processing applications, EMD-based approaches may be preferred considering the trade-off between the performance and computational cost.
Evaluating the channel-based classification performances, the classification success of the features obtained by EEMD approach was found to be higher than other approaches for all 4 classifiers (shown in Fig. 1).
The innovative contributions of our study can be highlighted as follows: • We propose a hybrid IMF selection method considering different approaches such us energy, correlation, power spectral distance, and statistical significance test. • We demonstrate the advantages of using selected IMFs by the proposed approach of either EMD or EEMD approaches as opposed to randomly selecting first several IMFs. • We investigate the performance improvement by using ensemble EMD in the classification of epileptic seizures as compared to traditional EMD, the EEG signal itself, and DWT-based approaches.

Conclusion
There are many studies in the literature for the detection and classification of epileptic seizures. Many studies have been performed in this field by using EMD and derivative approaches used in our study [1,[18][19][20][21][22][23][24][25][26][27][28]. EMD and its extensions (ensemble, multivariate and other) are suitable for the analysis of nonlinear and non-stationary signals such as EEG. In these methods, EEG signals are decomposed into IMFs which are zero-mean oscillations. Determining which of these IMFs contain useful information is vital for the success of the analysis. In most of the previous studies, the first 5 IMFs [19,22] or first 4 IMFs [1,17,20,25] have been selected, because they contain high-frequency information. In other words, no IMF selection process was performed in the initial stage of these studies. On the other hand, there are several IMF selection procedures presented in the literature based on energy, correlation coefficient, power spectrum, and statistical significance [24,[30][31][32][33]. If the signal to be analyzed contains noise, the energy and correlation coefficient of the IMFs where the noise component is dominant, will be high and misleading [30]. Therefore, the use of these IMF selection methods alone is not sufficient to determine the appropriate IMFs. In our study, we propose a hybrid IMF selection approach considering energy, correlation, power spectral distance, and statistical significance measures. We explore the advantages of the proposed IMF selection in either EMD or EEMD approaches as opposed to using randomly selected IMFs. In our epileptic EEG classification experiments, the proposed EMD-and EEMD-based approaches outperformed the EEG-based and DWT-based approaches for all classifiers and feature sets we used. The selection algorithm for both EMD and EEMD suggests IMF1, IMF3 and IMF2 in this order. We use these IMFs separately and their combinations for feature extraction and evaluate the classification performance. The classification performance of selected IMFs and their combinations was generally higher than the classification success of randomly selected IMF1-4. It is obvious that in another signal processing problem, the selection algorithm may yield a completely different set of IMFs. Hence the use of first k IMFs in the classification process, as generally done in previous studies, is not the best approach. In our simulations, highest classification accuracies were obtained by using the EEMD approach where the discriminative information about epileptic seizures in the channels may be revealed more clearly (shown in Fig. 1). Note that, working with 3 or more IMFs increases both the computational load and processing time. It may be concluded that performing an IMF selection procedure before obtaining the features directly affects the success and computational load of the study.

Proposed approach
In this study, we present a method for pre-seizure and seizure classification algorithm using EMD-and EEMD-based feature extraction methods and various classifiers as depicted in Fig. 2. EEG data recorded from diagnosed epilepsy patients are labeled by physicians, and divided into pre-seizure and seizure sections. These EEG segments are decomposed into intrinsic mode functions (IMFs) using both EMD and EEMD methods for each EEG channel separately. Subsequently, optimum IMFs that best represent the signal are selected by combining several selection approaches. Following the IMF selection process, temporal, spectral, statistical and nonlinear features were calculated from the selected IMFs. Finally, the extracted features were classified by using naive Bayes, K-nearest neighbor (KNN), support vector machine (SVM), and logistic regression methods.

Analysis of EEG signals using EMD and EEMD methods
We applied empirical mode decomposition (EMD) and ensemble EMD methods for the analysis of EEG signals in our study. In the following, we present a brief introduction to these decomposition methods.

Empirical mode decomposition (EMD)
Empirical mode decomposition which produces a collection of intrinsic mode functions (IMF) with zero-mean oscillations, is used as an adaptive time-frequency signal analysis method. In nonlinear and non-stationary processes, it is applied as a feature extraction and noise reduction method in signal processing applications. It is the most important rule of the EMD method that the sum of these obtained IMFs give the original signal. It is essential for the IMF to satisfy two conditions: (1) the number of zero crossing and extrema should be equal or it varies with one, (2) the mean value of the upper and lower envelopes should be zero. The process of the EMD algorithm is to extract IMF, also called Sifting, can be performed as shown in Algorithm 1 [19,24].

Ensemble empirical mode decomposition (EEMD)
Although the standard EMD algorithm provides successful results in signal processing applications as a time-frequency analysis method, it suffers from a problem called "mode mixing". The problem of mode mixing can be described as the occurrence of very different oscillations in one mode, or very similar oscillations in different modes. The ensemble empirical mode decomposition (EEMD) method has been developed to overcome this problem. In the EEMD method, Gaussian white noise is added to the signal to be analyzed and the signal is decomposed into the intrinsic mode functions (IMF) using the EMD method. Due to the statistical properties of Gaussian white noise, the continuity of the signal is obtained in different frequency regions, so that the problem of mode mixing is reduced. The process of the EEMD algorithm is demonstrated in Algorithm 2 [28]. In the proposed method, we had a 10-channel and two-epoch EEG signal for each patient (total number of patients is 16). Hence the size of the pre-seizure and seizure EEG data set was 16 × 10. Maximum numbers of obtained IMFs after applying the EMD and EEMD were 16 and 15, respectively. Therefore, since it would be time-consuming and meaningless to obtain features from all IMFs, IMF selection process was performed before the feature extraction.
Discrete wavelet transforms (DWT) has widely been used for the analysis of non-stationary signals [23]. In our study, we use the DWT-based approach for feature extraction and classification of epileptic EEG segments to investigate the advantages of proposed EMD-and EEMD-based approaches. DWT decomposes a given signal x[n] into detail and approximation coefficients by using a set of mother wavelet function [23,35]. In our study, Daubechies4 (db4) mother wavelet and 3-level subband decomposition are used.

Selection of intrinsic mode functions (IMFs)
In this study, we propose a hybrid IMF selection method by using energy-based, correlation-based, PSD distance-based, and t-test-based approaches. Pre-seizures and seizures epileptic EEG data of 16 patients recorded from 10 channels were decomposed into the IMFs using both EMD and EEMD approaches (example signals are shown in Fig. 3), then the proposed IMF selection procedure in the following described is executed. Fig. 3 a Surface pre-seizure EEG signal and its first three IMF obtained using EMD; b surface seizure EEG signal and its first three IMF obtained using EMD; c surface pre-seizure EEG signal and its first three IMF obtained using EEMD; d surface seizure EEG signal and its first three IMF obtained using EEMD

Energy-based selection method
The energies of each IMFs are calculated as shown in Eq. (1). Since the higher-energy IMF is considered to be the best representative of the original signal, the IMFs were ranked from the high to the low energy IMF [30].
Here, IMF i is the ith IMF and E IMF i is the energy of this IMF.

The correlation-based selection method
The correlation coefficient of each IMFs are calculated as shown in Eq. (2). Since the IMF with high correlation coefficient is considered to be a good representative IMF of the original signal, the IMFs are ranked from the high to low correlation coefficient IMF [31].
Here, C x,IMF i is the cross-covariance of the original signal and ith IMF , σ x , and σ IMF i are the standard deviations of the original signal and IMF i , respectively, andρ denotes the correlation coefficient.

The PSD distance-based selection method
Another IMF selection method, based on power spectral densities (PSD) was also utilized by using the power spectral densities of the original signal and IMFs. The distances between the estimated PSDs are calculated using the Kullback Liebler distance (KLD) method as shown in Eq. (3). If the distance between the PSDs of original signal and an IMF is minimum, that IMF is considered to be the best representative IMF of the original signal. Hence, the IMFs are ranked from the low to the high PSD distance IMF [32,33].
where S x (.) is the power spectrum of the original signal, S IMF i (.) is the power spectrum of the ith IMF , the dis KLD (x, IMF i ) shows the KLD between the power spectra of the ith IMF and that of the original signal.

Statistical significance-based selection method
We also use the t-test statistical significance measure for the selection of best IMFs. The t-test is based on the principle of generating a null hypothesis that a single sample data set comes from a normal distribution. In this statistical significance test, test statistic values; h-value and p-value are calculated. Here, the h-value indicates whether the distribution of data is normal, and the p-value indicates the statistical significance of the data. If a p-value greater than the specified threshold of α (often chosen as 0.05 or 5% in the literature), the distribution of data can be interpreted as normal (null hypothesis is satisfied, h-value = 0). Otherwise, if this p-value is less than that threshold, the distribution of data may not be interpreted as normal (null hypothesis is not satisfied, h-value = 1). The p-values of the data whose distribution is known to be normal (h-value = 0) can be used as a statistical significance measure. It has previously been recommended to select the IMFs with high p-values in order to create a feature set with improved classification performance [24]. As such, we calculate the p-value for every IMFs by applying the t-test.
Since the p-value obtained here shows the statistical significance of IMFs, the IMFs are ranked from the high to low p-value IMF. Table 6 shows the results of the above four selection approaches for one of the patients and one EEG channel.
These procedures were applied to the pre-seizure and seizure EEG data of 10 different channels of each patient separately. Finally, 40 metrics for 10 channels are calculated for each patient. All ranking matrices were combined and a 1280 × 16 -dimensional ranking matrix for all pre-seizure and seizure EEG data was obtained. To determine the first Table 6 Example of IMF ranking matrix Here, 7th IMF has the highest energy while 12th IMF has the lowest energy.
7th IMF has the highest correlation coefficient while 2nd IMF has the lowest correlation coefficient.
1st IMF has the lowest PSD distance while 12th IMF has the highest PSD distance.
3rd IMF has the highest p value while 12th IMF has the lowest p value.
Each row shows the ranking of the obtained IMFs according to that features priority selected IMFs for all signals, the histogram of the 1st column of the ranking matrix was calculated. The resulting histogram is shown in Fig. 4. Examining the histogram shown in Fig. 4, we observe that the IMF1 is the first priority selected IMF, IMF2 is the third, and IMF3 is the second priority selected IMF. In our simulation, we choose these three IMFs (IMF1, IMF3, IMF2) for feature extraction.
The histogram shown in Fig. 4 suggests IMF1, IMF3 and IMF2 in this order.

Classification of pre-seizure and seizure EEG segments
In this section, we present a method to classify the pre-seizure and seizure segment of EEG signals collected from epilepsy patients. These EEG signals are detailed introduced in "Data set" section. We use the selected best IMFs represented the EEG signals, we extract a set of feature.

Feature extraction
Time-domain, spectral, and nonlinear features were obtained using the selected IMFs and original EEG signals to obtain feature sets.
• Time-domain feature set: after the IMF selection process was carried out, the timedomain feature data set was created, using directly the EEG signals, using the first three of the IMFs obtained by EMD and EEMD methods, and using the subbands of DWT. Energy, mean value, skewness, and kurtosis values were calculated for 3 IMFs, DWT subbands, and EEG signals in the time-domain [8,23].
In the above equations, X[n] indicates the EEG signal or IMFs, N is the size of the signal or IMFs. E denotes the energy, µ is the mean value; S indicates the skewness, K is the kurtosis value.
In the EMD-and EEMD-based approaches a total of 320 × 12 size, and DWT -based approach a total of 320 × 16 size feature sets were obtained. Applying the same procedure to the EEG signal itself, a total of 320 × 14 size feature set for pre-seizure and seizure EEG data was obtained. Higuchi fractal dimension (HFD) used to calculate the fractal dimension (FD) directly from time-series signals. The most important parameter that must be determined for the calculation of Higuchi fractal dimension is k(max) . The HFD values calculated in a given k(max) range are plotted against this range in order to determine the optimal value for the k(max) parameter. The k value that the obtained curve reaches the saturation point is determined as k(max) [17,36].
In Eq. (16) , X indicates the one-dimensional time-series EEG signal or the IMFs and In our study, HFD values calculated against different k (max) values were plotted and a graph was obtained. It was observed that this graph reached saturation point when k max = 30.
In the EMD-and EEMD-based approaches a total of 320 × 6 size, and DWT -based approach a total of 320 × 8 size feature sets were obtained. Applying the same L[m, k], m = 1, 2, . . . , k. procedure to the EEG signal itself, a total of 320 × 2 size feature set for pre-seizure and seizure EEG data was obtained.

Classification
Features extracted from the selected IMFs of the EEG signals are used to discriminate the pre-seizure and seizure segments of the EEG by using the support vector machines (SVM), K-nearest neighbor (KNN), and naive Bayes classifiers. In the following, we present the fundamentals of these classification methods.
• Support vector machine (SVM): support vector machine (SVM), a supervised machine learning algorithm, is a successful algorithm that is frequently used in both classification and regression studies. In this algorithm, the elements of the data set containing n features are placed as elements of the coordinate system in an n-dimensional space. Then, the classification is performed by finding the hyperplane that separates the classes best. There are many possible hyperplanes that can separate the two classes. What is important here is to choose the hyperplane from which the highest classification performance may be achieved.
Let (x k , y k ) be given as a separable sample example. Here, k indicates the size of the feature set and y ∈ {−1, 1} indicates the class label. Thence, separating hyperplane can be formulated with f (x) = � wx + c. Here, w indicates the hyperplane parameters and c indicates the offset. The hyperplanes that can separate the two classes from each other with minimum error provide y k [( � wx k ) + c] − 1 ≥ 0, k = 1, 2, . . . , n condition. The main purpose here is to achieve the maximum margin. Here, the margin is the distance between the support vectors belonging to two different class. Finally, the data falling on different sides of the hyperplane is assigned as an element of a different class [13,14,18,19,26].
• K-nearest neighbor (KNN): it is one of the learning-based pattern recognition methods. The data set is divided into two parts as training and tests then the learning process is performed according to the data in the training set. First, the distance between the sample to be classified and all the data in the training set is calculated. Then, the K-nearest neighbors that have minimum distance is determined. Finally, the most common class among these K-nearest neighbors is selected as the class of the new sample. Various distance measurement methods such as Euclidean, Manhattan, Minkowski, and Hamming can be used for distance calculation [26,35,37]. In our study, the most commonly used Euclidean distance calculation method is used [shown in Eq. (19)] and k value is chosen as 5.
• Naive Bayes: it is one of the probabilistic classifier based on Bayes theorem in which classification is performed according to probability basics. The classification process is performed by calculating the membership probability of a sample to all classes in the data set. Let X = {x 1 , x 2 , . . . , x n } be given. Here, n is the number of features, X indicates the sample in the feature-set. In addition, {M 1 , M 2 , . . . , M m } represents classes, here m is the number of classes. The probability that each X data in the data set is a member of the M i class is calculated as given in Eq. (20): Then the X data is assigned to the class in which class membership is highest. Here, X data is assigned to the M i class, where P(M i ) indicates the class prior probabilities, P(X) indicates the prior probability of sample X, P(X/M i ) indicates the probability of X conditioned on M i and P(M i /X) indicates the probability of M i conditioned on X [35,37].
• Logistic regression: logistic regression (LR) is a frequently used statistical classification technique in which the probability (P1), of dichotomous outcome event limited to two values such as yes/no, on/off, or 1/0, is related to a set of independent variables, and given in Eq. (21): Here, β 0 is the intercept and {β 1 X 1 + · · · + β n X n } are the coefficients associated with the independent variable {X 1 , X 2 , . . . , X n } . Generally, in the logistic regression method, the maximum likelihood estimation (MLE) method is used to calculate the coefficients {β 1 X 1 + · · · + β n X n }. The probability of an event existing as a function of the independent variables is nonlinear as extracted from Eq. (22) [38]: Here, P 1 ∈ {0, 1} indicates the probability value.
If the result of our Eq. (22) is −∞ , the probability is 0 ( P 1 = 0 ), and if the result of this equation is ∞ , our probability is 1.

Performance evaluation
In this study, accuracy (ACC), sensitivity (SEN), selectivity (SPE), and precision (PRE) expressed as the performance criteria and F-score values that is the combination of previous parameters were used for performance evaluation. Fivefold cross-validation (CV) method has been used to establish the performances of the classifiers.
The feature set used in the k-fold CV method is randomly separated into k different folds with the same size. Of these k folds, (k − 1) folds are used for training and the other one (1) fold is used for testing. No fold is used for validation processes. This process is repeated k times and the accuracy value is calculated separately for each iteration. After k iterations, the average accuracy value is obtained. This average accuracy obtained is accepted as CV accuracy [21,23]. (21) logit(P 1 ) = ln P 1 1 − P 1 = β 0 + β 1 X 1 + . . . + β n X n .