### Methods

#### L2 Norm based CSP

The basic idea of CSP is to find a group of spatial filters that maximize the variance of band-pass filtered EEG signals from one class [20, 21], while the variance from the other class are minimized. Let *ϕ*_{1} and *ϕ*_{2} be the recordings for the two tasks, the spatial filters are the projections, which is equivalent to maximize the following function,

J\left(w\right)=\frac{{w}^{T}{\varphi}_{1}^{T}{\varphi}_{1}w}{{w}^{T}{\varphi}_{2}^{T}{\varphi}_{2}w}=\frac{{w}^{T}{C}_{1}w}{{w}^{T}{C}_{2}w}

(1)

where *C*_{
1
} and *C*_{
2
} are the covariance matrix for the two tasks. Note that the scaling of the projection *w* will have no effect on the object value. Equation (1) can be transformed into a constrained optimization problem of the form:

\left\{\begin{array}{l}\underset{\mathit{w}}{arg\phantom{\rule{.2em}{0ex}}max}\phantom{\rule{1.5em}{0ex}}{w}^{T}{C}_{1}w\\ \mathit{subject}\phantom{\rule{0.5em}{0ex}}\mathit{to}\phantom{\rule{0.5em}{0ex}}{w}^{T}{C}_{2}w=1\end{array}\right.

(2)

By introducing the Lagrange multiplier, the objective function can be rewritten as:

L\left(w;\lambda \right)={w}^{T}{C}_{1}w-\lambda \left({w}^{T}{C}_{2}w-1\right)

(3)

By taking the derivative of (3) respect to *w* under the condition \frac{\partial L}{\partial w}=0, the objective projection *w* can be estimated using the generalized eigenvalue equation,

{C}_{1}w=\lambda {C}_{2}w

(4)

where *λ* denotes the eigenvalue of the generalized eigenvalue equation, and *w* is the corresponding eigen vector [3]. As for the multiple *m* spatial filters, the above equation (4) can be solved as:

{C}_{2}^{-1}{C}_{1}W=\mathit{\Sigma W}

(5)

where *W* is the matrix consisting of the eigen vectors of {C}_{2}^{-1}{C}_{1}, and ∑ = *diag*(*λ*_{1}, *λ*_{2},......, *λ*_{
m
}). The detailed implementation for CSP is shown in Figure 1.

#### L1 Norm based CSP

Obviously, the singular value decomposition can be used to find *W* in (5). Let Y={C}_{2}^{-1}{C}_{1}, the objective function of L2-SVD to find the eigen vector *W* is [15]:

\underset{W}{arg}\phantom{\rule{.2em}{0ex}}min\left|\right|Y-\mathit{WV}|{|}_{2}^{2},\phantom{\rule{0.5em}{0ex}}\mathit{with}\phantom{\rule{0.5em}{0ex}}V={W}^{T}Y

(6)

Equation (6) can be obtained by the fact that the objective *Y* can be decomposed as *Y* = *WSV*^{T},with the right singular matrix *V* = *Y*^{T}*WS*^{–1},and *WSV*^{T} = *WSS*^{− 1}*W*^{T}*Y* = *WW*^{T}*Y*. The dual problem of the objective function can be written as [16]:

\left\{\begin{array}{l}\underset{\mathit{W}}{arg}\phantom{\rule{0.2em}{0ex}}max\left|\right|{W}^{T}Y|{|}_{2}^{2}\\ \mathit{subject}\phantom{\rule{0.5em}{0ex}}\mathit{to}\phantom{\rule{0.5em}{0ex}}{W}^{T}W=I\end{array}\right.

(7)

where *I* is the identity matrix. As for (7), when the conventional singular value decomposition is used to find the spatial filter, it is in essence based on the *L2* norm strategy, which will be largely influenced by the outliers delivered into the variance matrix [16].

Obviously, *L1* norm will be more immune to those outliers than *L2* norm, and we will maximize the *L1* dispersion in the feature space to estimate the projection vector *W* as presented below.

The *L1* norm SVD will have the below objective function [15]:

\underset{W}{arg}\phantom{\rule{0.2em}{0ex}}min\left|\right|Y-\mathit{WV}|{|}_{1},\phantom{\rule{0.5em}{0ex}}\mathit{with}\phantom{\rule{0.5em}{0ex}}V={W}^{T}Y

(8)

Similarly, equation (8) can be transformed to below optimization problem,

\left\{\begin{array}{l}\underset{\mathit{W}}{arg}\phantom{\rule{0.2em}{0ex}}max\left|\right|{W}^{T}Y|{|}_{1}\\ \mathit{subject}\phantom{\rule{0.5em}{0ex}}\mathit{to}\phantom{\rule{0.5em}{0ex}}{W}^{T}W=I\end{array}\right.

(9)

Expression (9) implies to find the projection *W*, in which the projection of the original dataset will have the max dispersion in view of *L1* norm. The utilized *L1* norm based dispersion will be robust to the outlier introduced into the covariance matrix accounting for the estimation of spatial filters. Then, to solve the *L1* norm based SVD in equation (9), we adopt the fast *L1* norm iteration algorithm proposed in [15]. Obviously, the main difference between the two CSP filters is whether *L1* norm or *L2* norm based SVD decomposition is adopted for CSP filter estimation as shown in Figure 1, and the *L1* scheme may provide more robust immune ability for outliers.

### Material

#### Simulation study

##### Simulation dataset

To quantatively evaluate the performance of L1-SVD-CSP, we applied L1-SVD-CSP, the conventional CSP and the regularized CSP with Tikhonov regularization (TR-CSP) proposed in [16] to one public motor imagery EEG dataset, i.e., Dataset IVa of BCI Competition III, by adding the simulated outliers. This dataset consists of EEG signals recorded from five subjects using 118 electrodes [3, 22]. In each trial, a visual cue with respect to motor imagery was shown for 3.5 s, presenting three kinds of imageries, i.e., left hand, right hand and right foot. This dataset has no outlier contained, where the trials contaminated with obvious artifacts such as ocular, head movement have been discarded by the dataset provider [23]. The tasks for the imagery of right hand and left hand are adopted to evaluate the performance in current work. As for the concerned two tasks, the total number of EEG trials for each subject was 280, and the data were band-pass filtered between 0.05 and 200 Hz and down-sampled to 100 Hz [21]. Following the reported results in [24], the time interval between 0.5 s to 2.5 s after the trial onset was selected for task recognition.

#### Simulated outliers

As for above dataset, the trials are free of such outliers due to the ocular or the imperfect contact of electrodes, and we will add some simulated outliers to construct the datasets with outlier noise corruption with aim to quantatively evaluate the methodology performance.

Considering the intrinsic characteristics of outlier that usually has extremely large amplitude, we generated the outliers from the Gaussian distribution *N*(*μ* + 10*σ*, *Σ*) for each subject, where *μ* and *σ* denoted the mean and variance of EEG across all channels for the training and testing datasets, and Σ was the corresponding covariance matrix. We added the outliers to the dataset by varying the number of outliers from *0.01(m + n)* to *0.05(m + n)* with step of *0.01(m + n)*, where the trials and time position for outliers corruption are randomly determined, with *m* and *n* being the number of trials in the training and testing sets, respectively. The above strategy used for outlier generation is mainly to simulate the actual condition that recordings may be corrupted with outliers of different probabilities during experiment.

#### Evaluation index for simulation study

For this dataset, when the occurrence of outliers is defined, the outliers are randomly added for 50 times, and L1-SVD-CSP as will as the original CSP and TR-CSP are used to extract the related features. The most discriminative 3 pairs of optimal CSP spatial filters (i.e. 6 filters) in the projection matrix are selected to transform the band pass filtered EEG signal, and the logarithm of the variance of the transformed surrogate channel EEG signal serves as the final features for task recognition. Regularized parameter of TR-CSP is determined by the 10-fold cross-validation based on training set proposed in [16].

After features are extracted for each of the 50 runs, linear discrimination analysis (LDA) free of the setup parameter optimization is used for classification based on a 5-fold cross-validation, and the averaged accuracies of the 50 repetitions are used as the index to evaluate the performance. Based on the 50 repetitions, the McNemar test [25] is used to investigate whether the difference between each pair of the three CSPs is of statistical significance.

#### Evaluations on real BCI dataset

##### Real BCI dataset

The dataset comes from the MI BCI system developed in our group, consisting of EEG data from 13 subjects. During online experiment, subjects were required to sit in a comfortable armchair in front of a computer screen, and they were asked to perform motor imagery with left hand or right hand according to the instructions appeared on the screen. Motor imagery lasts for 5 seconds, and follows a 5 seconds rest. 15 Ag/AgCl electrodes covers sensorimotor area were used to record the EEG, and the signals were sampled with 1000 Hz and band pass filtered between 0.5 Hz and 45 Hz. 4 runs on the same day were recorded for each subject, with each run consisting of 50 trials, 25 trials for each class, and there is a 3 minutes break between the two consecutive runs. All experiments were performed in accordance with the Ethical Committee of University of Electronic Science and Technology of China (UESTC). Informed consent was obtained from all participants, according to the Declaration of Helsinki.

#### Preprocessing

All the EEG segments during motor imagery are selected for analysis, and those trials with outliers caused by ocular movement, head movement are involved in further analysis, without specific inspection to remove them. The specific frequency band for each subject is obtained by [26], and then used to design band pass filter for the EEG data. The LDA classifier still used the 6-dimensional features corresponding to the most discriminative three pairs of optimal CSP spatial filters as input for task recognition.