### Experimental Procedure and Data Collection

The same SEP data set tested in [9] is used in this study. Twenty-eight mature rats weighing between 260 and 280 grams were used. All the experimental procedures were performed under intravenous pentobarbital (0.05 mg/g) anaesthesia augmented by local 1% xylocaine infiltration. Additional pentobarbital was given at intervals and in amounts established in non-curarized rats to assure adequate anaesthesia.

To elicit cortical SEP, a constant current stimulator was used to apply a 5.1 Hz square wave 0.2 ms in duration to the hind paw. The current was set to cause mild twitch of limbs (3–10 mA). The cortical SEP was recorded from the skull at Cz-Fz. The signal was amplified 100,000 times with two amplifiers (SCXI-1120, National Instruments Co, TX, USA). Bandpass filtering between 2 Hz and 2000 Hz was used. All the SEP signals were acquired with a data acquisition card (DAQcard-1200, National Instruments Co, TX, USA) at 12 bit resolution and a sampling rate of 5000 Hz. To obtain a good SNR for the SEP signals, a total of 100 SEP responses were averaged for each trial.

In this study, all the following signal processing programs were developed in MATLAB environment (version 7.0, Mathworks, MA, USA) using the Pentium 4 PC platform (3.2G Hz, 1G bytes RAM).

### Matching Pursuit Decomposition

Given a discrete-time signal *x*(*n*), the MP method decomposes the signal *x*(*n*) into a linear combination of basic functions {*g*_{0}(*n*), *g*_{1}(*n*), ..., *g*_{M-1}(*n*)}, which are described by well-defined parameters from a very large and redundant dictionary **D**:

x(n)={\displaystyle {\sum}_{m=0}^{M-1}{a}_{m}{g}_{m}(n)}+e(n),

(1)

where *M* is the number of decomposed t-f components, *a*_{
m
}is the coefficient, *a*_{
m
}*g*_{
m
}(*n*) is the *m*-th t-f component and *e*(*n*) is the decomposition residue. Note that, in the MP algorithm, a signal's t-f components are in the form of the basis functions defined by the dictionary.

Although *x*(*n*) can be perfectly approximated using orthonormal functions such as the delta functions, we are interested in sparse approximation. That is, we aim to adaptively choose a finite number (*M*) of basic waveform functions, which are capable of explaining the signal's representative time-frequency features, from a redundant dictionary **D**. An optimal sparse approximation of the signal *x*(*n*) is obtained when the norm of the residue vector *e*(*n*) in (1) is minimized. It is generally achieved by an iterative algorithm as follows:

{\tilde{g}}^{(k)}(n)=\underset{{g}_{m}(n)\in \mathbf{D}}{\mathrm{arg}\mathrm{max}}\Vert \u3008{e}_{x}^{(k)}(n),{g}_{m}(n)\u3009\Vert ,

(2)

{e}_{x}^{(k+1)}(n)={e}_{x}^{(k)}(n)-\u3008{e}_{x}^{(k)}(n),{\tilde{g}}^{(k)}(n)\u3009{\tilde{g}}^{(k)}(n).

(3)

In the initial step, the waveform {\tilde{g}}^{(0)}(n) with the highest inner product with the signal *x*(*n*) [the initial residue {e}_{x}^{(0)}(n)=x(n)], which means that it accounts for the largest part of the signal energy, is found from the dictionary **D** to generate the main t-f component {a}^{(0)}{\tilde{g}}^{(0)}(n)=\u3008{\tilde{g}}^{(0)}(n),{e}_{x}^{0}(n)\u3009{\tilde{g}}^{(0)}(n). Then, the residual {e}_{x}^{(1)}(n) is computed as (3) and {\tilde{g}}^{(1)}(n) is obtained by matching it to {e}_{x}^{(1)}(n) as (2). The two steps are executed iteratively until some stopping criterion is reached. For instance, in this study the iteration will stop when decomposed components account for 99.5% of the signal energy.

The Gabor dictionary was employed in this study because Gabor functions exhibit good time-frequency localizations [14, 15]. A Gabor function is expressed as:

g(n)=K\cdot {e}^{-\pi {[(n-t)/s]}^{\text{2}}}\mathrm{cos}(2\pi f(n-t)+\varphi ),

(4)

where *K* is the normalization parameter to make ||*g*(*n*)|| = 1. In a Gabor function, {e}^{-\pi {[(n-t)/s]}^{\text{2}}} is the waveform envelope with center at time *t* and span described by *s*. The parameter *t* is the waveform's latency, which is defined as the time duration from the stimulus onset to the maximum of the waveform envelope. It can be seen that the basic functions used in MP algorithm are generated by dilating (with *s*), translating (with *t*), modulating (with *f*) and phase-shifting (with *φ*) an envelope function (a Gauss function). Therefore, the parameters to determine a Gabor function of (4) include *t* (latency), *f* (frequency), *s* (time span) and *φ* (phase). These parameters, together with *a* (amplitude) of (1), will constitute a parameter vector *u*= [*t*, *f*, *s*, *φ*, *a*]^{T}and will be used to characterize a decomposed t-f component *a*_{
m
}*g*_{
m
}(*n*).

Because the parameters are chosen from the Gabor dictionary **D**, the ideal size of the dictionary **D** should be infinite in theory to achieve perfect matching results. In practice, the dictionary **D** only includes limited candidate values to avoid heavy computational load, so the precision of the parameters is not high enough. This paper proposes to further refine the parameters by nonlinear least-squares (NLS) algorithm. We notice that, in fact, Eq. (2) is a NLS problem, and it can be solved by the Gauss-Newton method or the Levenberg-Marquardt method. The optimal parameters found by the greedy research can be used as the starting value for the NLS algorithms. By the NLS-based refining, the optimal parameters beyond the scope of **D** can be obtained and they have higher precision than the parameters obtained by the greedy search. The Gaussian-Newton NLS algorithm was adopted in this study and more details regarding the NLS algorithms were referred to [24].

### Identification and classification of t-f components

To further classify the t-f components and identify the stable SEP t-f components, a clustering method based on joint probability density function (PDF) estimation was used. The PDF-based clustering was employed in this study because it can handle the noise (i.e., the spurious and unstable t-f components) well [22]. However, because it is difficult to compute and illustrate PDF in a five-dimensional space defined by the Gabor parameter vector *u*= [*t*, *f*, *s*, *φ*, *a*]^{T}, a dimensionality reduction is first required. In this study, the five features are reduced to three: latency *t*, frequency *f*, and relative energy *ε*. These three features are employed because time-frequency analysis leads to a representation for the signal in the time-frequency-energy space. The energy of a t-f component is calculated as the sum of the squared magnitudes of the t-f component {E}_{m}={\displaystyle {\sum}_{n}{\left|{a}_{m}{g}_{m}(n)\right|}^{2}}={a}_{m}^{2}. That is, the energy of a t-f component is just the squared amplitude parameter, and the information on span and phase parameters is not included in the energy. Furthermore, the feature "relative energy" is introduced to make t-f components from different SEP signals comparable. For a t-f component decomposed from a SEP signal *x*(*n*), its relative energy was calculated as the ratio between the energy of the t-f component, *E*_{
m
}, to the total energy of signal *x*(*n*), *E*_{
total
}, i.e., *ε*_{
m
}= *E*_{
m
}/*E*_{
total
}, where {E}_{total}={\displaystyle {\sum}_{n}{\left|x(n)\right|}^{2}}. Accordingly, the joint PDF is calculated in the three-dimensional (3D) space described by the simplified parameter vector *v*= [*t*, *f*, *ε*]^{T}.

Suppose *N* t-f components in total are extracted from 28 rat SEP signals, and each component is described by a parameter vector *v*_{
i
}= [*t*_{
i
}, *f*_{
i
}, *ε*_{
i
}]^{T}, *i* = 1, 2, ..., *N*. The joint PDF in the 3D time-frequency-energy space was estimated by the conventional kernel density estimation algorithm using Gaussian kernel as:

P(v)={\displaystyle {\sum}_{i=1}^{N}\frac{\text{1}}{{\text{(2}\pi \text{)}}^{N\text{/2}}\mathrm{det}(\Delta )}{e}^{\text{-[}{(v-{v}_{i})}^{T}{\Delta}^{-1}(v-{v}_{i})]/2}},

(5)

where Δ is the bandwidth matrix to adjust the smoothness of the PDF and det(Δ) is the determinant of Δ. The bandwidth matrix should be carefully selected to avoid an under-smoothed or over-smoothed PDF. In our study, the bandwidth matrix was chosen as a diagonal matrix Δ = diag([*δt*, *δf*, *δε*]), where *δt*, *δf* and *δε* were optimally selected to minimize the asymptotic mean integrated squared error (AMISE) in respective dimension. More details of the kernel density estimation and optimal bandwidth selection can be found in [25].

All the peaks (local maxima with higher density than neighbouring points) in the 3D PDF were detected as potential locations of a cluster (a class of stable t-f components), and the region of a potential cluster is the peak region around one peak and with PDF values greater than *ξ* ‧ *P*_{
LocalMax
}, where *P*_{
LocalMax
}is the local peak value and *ξ* is a threshold parameter between 0 and 1. If the number of t-f components inside one peak region is larger than *η* ‧ *N*, where *η* is another threshold between 0 and 1 to determine the minimum occurrence of t-f components to confirm a cluster, these components were clustered into a class of stable SEP t-f component. Otherwise, the t-f components were recognized as unstable or spurious.

Two thresholds *η* and *ξ* work jointly to influence the clustering results. A smaller *η* means that the condition to be a cluster is becoming loose and more clusters can be discovered. A larger *η* tightens the condition so that fewer clusters will be identified. On the other hand, if the threshold *ξ* is larger, the region of a potential cluster will be smaller and fewer t-f components will be contained in the peak region. As a result, fewer clusters will be recognized when *ξ* become larger. On the contrary, a smaller *ξ* yields a broader region covering more t-f components and thus more clusters can be confirmed. However, if *ξ* is too small, these peak regions may overlap each other. In this situation, the peak region with a smaller peak value will be merged into the peak region with a larger peak value, which reduces the number of clusters. In addition, a too wide region implies unwanted large variances for parameters of t-f components in the peak region.

In our study, we first determined the threshold *η*, and *ξ* was chosen based on the given *η*. We believe a stable t-f component of SEP originates from an inherent response to the stimulus and it should occur in each single subject with similar properties. Even if the noise is very large, the stable t-f component should be detected in at least half of subjects. Thus, the threshold *η* was set as *η* = 0.5. As to the threshold *ξ*, its optimal value is difficult to be obtained in an analytical form. We tested and compared a series of values and set *ξ* as *ξ* = 0.5 because this value was capable of identifying a set of stable t-f components having concentrated regions in the feature space. Other threshold values far from *ξ* = 0.5 could not identify any cluster (*ξ* is too large) or could not identify clusters with concentrated regions (*ξ* is too small).

It should be also noted that some t-f components may be wrongly classified because of the dimension reduction. After detailed investigation on the experimental results, we found that a few anomaly values (outliers), which were quite different from data majority, often existed in parameters of clustered t-f components. The components with outlier parameters should be removed from the cluster because they had different nature from most other components in the cluster and they contributed considerably to the large SD values. In this paper, an easily-implemented outlier detection technique, the Grubbs' test (with significance level 0.05), was employed to identify the outliers [26]. If one parameter (or more) of a t-f component is far away from the rest in the cluster, the t-f component was labeled as an "outlier" and was removed from the cluster.