EMG-based facial gesture recognition through versatile elliptic basis function neural network

Background Recently, the recognition of different facial gestures using facial neuromuscular activities has been proposed for human machine interfacing applications. Facial electromyograms (EMGs) analysis is a complicated field in biomedical signal processing where accuracy and low computational cost are significant concerns. In this paper, a very fast versatile elliptic basis function neural network (VEBFNN) was proposed to classify different facial gestures. The effectiveness of different facial EMG time-domain features was also explored to introduce the most discriminating. Methods In this study, EMGs of ten facial gestures were recorded from ten subjects using three pairs of surface electrodes in a bi-polar configuration. The signals were filtered and segmented into distinct portions prior to feature extraction. Ten different time-domain features, namely, Integrated EMG, Mean Absolute Value, Mean Absolute Value Slope, Maximum Peak Value, Root Mean Square, Simple Square Integral, Variance, Mean Value, Wave Length, and Sign Slope Changes were extracted from the EMGs. The statistical relationships between these features were investigated by Mutual Information measure. Then, the feature combinations including two to ten single features were formed based on the feature rankings appointed by Minimum-Redundancy-Maximum-Relevance (MRMR) and Recognition Accuracy (RA) criteria. In the last step, VEBFNN was employed to classify the facial gestures. The effectiveness of single features as well as the feature sets on the system performance was examined by considering the two major metrics, recognition accuracy and training time. Finally, the proposed classifier was assessed and compared with conventional methods support vector machines and multilayer perceptron neural network. Results The average classification results showed that the best performance for recognizing facial gestures among all single/multi-features was achieved by Maximum Peak Value with 87.1% accuracy. Moreover, the results proved a very fast procedure since the training time during classification via VEBFNN was 0.105 seconds. It was also indicated that MRMR was not a proper criterion to be used for making more effective feature sets in comparison with RA. Conclusions This work was accomplished by introducing the most discriminating facial EMG time-domain feature for the recognition of different facial gestures; and suggesting VEBFNN as a promising method in EMG-based facial gesture classification to be used for designing interfaces in human machine interaction systems.


Introduction
A recent report released by World Health Organization (WHO) and World Bank shows that more than one billion people with disabilities face substantial barriers in their daily lives [1]. In order to help these people, especially the ones with critical disabilities as the result of strokes, neuro-diseases, and muscular dystrophy, human machine interaction (HMI) has been proposed as a promising way to improve the quality of their lives [2]. Controlling assistive devices, such as wheelchairs [3] and prosthetic limbs [4] are instances in this area. Designing such devices requires applying reliable interfaces as a communication channel between humans and machines. Interfaces that rely on facial neuromuscular activities generated from facial gestures have been lately suggested. The goal here is to recognize facial gestures through facial EMG signals and transform them into input commands to control the devices. The most recent approaches are: the extraction of three facial gestures during speech via four recording channels and transforming them to control commands [5]; controlling a hands-free wheelchair using five different facial myosignals [6]; the application of five facial gestures to design and control a virtual crane training system [7]; the enhancement of human computer interaction by applying six various facial muscle EMG recordings through eight superficial sensors [8]; the use of EMG and visual based HMI to control an intelligent wheelchair [9]; and controlling an electric wheelchair applying six surface facial EMGs [10]. The reliability and flexibility of these systems directly depends on the numbers of classes (gestures), and the methods used for analyzing facial gestures EMGs.
EMG signals are grouped as stochastic and non-stationary and their analysis is too complex [11]; thus, much investigation is needed. Noise reduction, conditioning, smoothing, data windowing, segmentation, feature extraction, dimension reduction and classification are the common stages of recognizing different EMG patterns. Facial gestures recognition ratio mainly depends on the effectiveness of the EMG feature and classification algorithms which are the focus of this paper.
In order to discriminate different muscle movements (gestures), the most prominent parts of the EMGs (features) that represent the characteristics with enough information for classification should be extracted. Various types of features, such as time-domain, autoregressive coefficients, cepstral coefficients, and wavelet coefficients have been applied to classify of upper limb EMG signals [12]. Other types of EMG features have been used in different applications [13][14][15]. According to previous studies on facial EMG signals, there are some restrictions when analyzing them through their spectrums. This is because of the similarity of facial EMGs frequency components; therefore, they cannot be processed either by frequency-domain or time-frequency distribution algorithms to classify facial gestures [16,17]. These methods can be applied only during muscle fatigue and for inferring changes in motor unit recruitment investigations [18]. More appropriate characteristics of facial EMGs are time-domain ones because of being easy to compute, working based on signal amplitudes, and possessing high stability for EMG pattern recognition [16,19]. There are several methods of time-domain feature extraction; however, to achieve better results, the feature must contain enough information to represent the significant properties of the signal and it must be simple enough for fast training and classification. Extracted features must be trained and classified into distinguishing categories. Hence, a suitable classifier must be considered to provide a fast process and accurate results. Table 1 reviews the related studies of EMG-based facial gesture recognition systems.
In these studies, the number of classes and recording channels varied and different facial gestures were considered. As can be seen from the table, only a few methods were investigated for feature extraction and classification. Since this field of study is still in its primary stage, it needs much more investigation.
Since there is not much work reported on facial EMG analysis, this paper considers the same setup used in [23] to investigate more on the impact of different facial EMG features on the classification of facial gestures. Therefore, characteristics of ten facial gestures EMGs were explored by extracting ten different time-domain features. The relationship between these features was examined by means of Mutual Information (MI) measure. Moreover, MRMR and RA were employed to select and rank the features for the purpose of constructing feature combinations.
Classification of features through a fast, reliable and accurate algorithm was another objective of this paper. Accordingly, a VEBFNN was applied to classify the single/multi features and evaluate their effectiveness in order to find the most discriminative one based on the recognition performance and the training time. Furthermore, the efficiency and robustness of this classifier was inspected for facial myoelectric signal classification through being assessed and compared with the conventional SVM and multilayer perceptron neural network (MLPNN) methods.
The rest of this paper is organized as follows. The next section describes all the materials needed to record facial EMGs. Then, the methodology of analyzing the EMG signals is explained. Subsequently, experimental results including statistical analysis and detailed discussions are stated. Finally, a brief summary and recommendations for future work are presented in last section.

Methods and materials
The procedure of the current study was divided into several steps as demonstrated in Figure 1. The first step consisted of subject preparation, electrode placement, system setup and EMG signals acquisition. Then, all recorded signals were conditioned and filtered prior to processing. Data windowing and segmentation methods were applied in the preprocessing step. Afterwards, ten different types of time-domain features were extracted from all EMG signals. Subsequently, features correlation was analyzed through MI measures. And feature combinations were constructed by considering two criteria MRMR and RA. In order to train and classify the features a very fast VEBFNN was used. This algorithm was employed for the first time to classify EMG signals. Finally, experimental results were discussed in order to evaluate the effectiveness of each feature/combination to find the most discriminative and accurate one that could deliver the highest performance in terms of facial gesture recognition and computational load. Moreover, the efficiency of VEBFNN was assessed and compared with two other popular supervised classifiers, SVM and MLPNN.

Facial EMG acquisition
Subject preparation and electrode placement EMGs are known to be one of the most contaminated signals with a low signal to noise ratio [11]. To achieve clear EMGs, some precautions were considered before signal recording. The subject's skin was cleaned by means of alcohol pads to remove any dust or sweat in order to reduce the fat layer. In addition, to obtain better signals with higher amplitudes, the electrodes were placed on the right sites [25]. EMGs were recorded through three channels via three pairs of surface rounded pre-gelled Ag/AgCl electrodes. The first and third channels were placed on left and right temporalis muscles and the second channel was positioned on frontalis muscle above the eyebrows ( Figure 2). These electrodes were formed in a bipolar configuration (2 cm interelectrodes distance) on the EMG recording areas to reduce any common noise between them. Another electrode was placed on the boney part of the left wrist to eliminate motion artifacts.

System setup and data acquisition
The protocol of this experiment was approved by the Universiti Teknologi Malaysia Human Ethics Research Committee. In the present experiment, facial EMGs were captured via BioRadio 150 (Clevemed) and the signals were recorded at the rate of~1000 Hz sampling frequency. Through the activation of filters with a low cut-off frequency 0.1 Hz and a notch filter of 50 Hz, unwanted artifacts from user movements and power line inference noises were removed by the device software itself. Ten mentally and physically healthy volunteers including five male and five female between the ages of 26 and 41 were chosen for this work. Before recording the data, all participants were trained to make facial gestures. The gestures considered for this study were: smiling with both sides of the mouth, smiling with left side of the mouth, smiling with right side of the mouth, opening the mouth (saying 'a' in the word apple), clenching the molars, gesturing 'notch' by raising the eyebrows, frowning, closing both eyes, closing the right eye and closing the left eye. The subjects were asked to perform each facial gesture five times for two seconds (active signal), and with 5 seconds rest between to eliminate the effect of muscle fatigue. Since the only useful part of a signal for discriminating and recognizing different facial gestures is the active one, only 10 seconds (5×2sec) was considered for the processing of each gesture. Moreover, signals were recorded by the three channels synchronically resulting in a three dimensional data set (3×10 sec) for each gesture. Therefore, ten sets of 3×10 sec active signals were obtained from each subject who performed ten gestures.

EMG filtration and conditioning
To envelope the most significant spectrum of signals, they were passed through a band-pass filter in the range of 30-450 Hz [7].

Data windowing and segmentation
Due to the huge amount of data available for processing, the most essential characteristics of facial EMGs (features) should be extracted and considered for further processing. Prior to the feature extraction, filtered signals were segmented into non-overlapped windows with 256 msec length [26]. Since there was a signal of 10000 msec in each channel; 39 portions (10000÷256≈39) were obtained and prepared for feature extraction.

Feature extraction
Feature extraction is an essential step during EMG processing which has direct effect on final system performance. Good features should highlight the most important properties and characteristics of the facial EMG signal and they should have low computational cost to be used in real-time applications. As mentioned earlier, a number of different features with various complexity and efficiency were suggested and used for EMG signals. In this paper, the ten types of time-domain features extracted from seg-  Table 2. Since the EMGs were segmented into 39 portions, for each gesture in each channel 39 features were extracted. By considering three channels, a three dimensional feature vector containing 390 features (for 10 gestures) was achieved for each subject using each method.
In order to investigate the correlation between the single features, the statistical dependence was measured in form of MI which is a more general measurement than a simple cross-correlation [27]. MI is an entropy type quantity, which provides a measure of the amount of information that one random variable contains about another. It can be thought of as the reduction in uncertainty about one random variable given knowledge of the other. Thus, the more mutual information between two random variables A and B, the less uncertainty there is in A knowing B or B knowing A and zero mutual information means the variables are independent [28]. Given two features A and B, their MI is computed by Table 2 Time-Domain features considered in this study

Feature Equation Description
MAV x i j j It adds the absolute value of all the values in a segment divided by the length of the segment.
It estimates the difference between the mean absolute values of the adjacent segments k + 1 and k. RMS It is modeled as amplitude modulated Gaussian random process whose RMS is related to the constant force and non-fatiguing contraction.
It is a measure of how far the numbers in each segment lie from the mean.
It is the cumulative length of the waveform over the segment. The resultant values indicate a measure of waveform amplitude, frequency and duration. IEMG It calculates the summation of the absolute values of EMG signals (Signal Power estimator).
Given three consecutive samples x i-1 , x i and x i+1 , the slope sign change is incremented if the equation is satisfied. A Threshold ε = 0.02 x i It represents the EMG potential from any shift in values of the mean.
It determines the energy of EMGs in each segment.

MPV
x k = max |x i | It is used to find the maximum absolute peak value of EMGs.
where p(a, b) is the joint probability distribution function of A and B, p(a) and p(b) are the marginal probability density functions of A and B respectively. It is indicated that a combination of several single features can achieve better recognition accuracy if the features provide complementary information [29]. In this work, the combinations including two to ten features were constructed by considering two feature selection concepts. In pattern recognition, feature selection aims to identify subsets of data that are relevant and best characterizes the statistical property of a target classification variable, which is normally called Maximum Relevance [30]. These subsets often contain material which is relevant but redundant. Among the common measures between features like similarity or correlation coefficient, MI can represent both relevancy and redundancy [30]. The MRMR technique using MI for feature selection was firstly proposed by Peng et al. [30]. The relevance of a feature set A for the class C is defined by the average value of all MI values between the individual feature f i and the class C as follows: And the redundancy of all features in the set A is computed by: Then, MRMR can be achieved by max In addition to MRMR, the single features were also selected and ranked based on their individual power in terms of RA. Accordingly, feature combinations were constructed using the rankings appointed by MRMR as well as RA. As stated earlier, each single feature had 3 dimensions (three channels); so, the dimensions of constructed feature combinations including 2, 3, 4, 5, 6, 7, 8, 9, and 10 features were 6,9,12,15,18,21,24,27, and 30 respectively. For instance, feature set related to the single feature MPV was [mpv ch1 , mpv ch2 , mpv ch3 ] T while the feature set including two features MPV and MAV was [mpv ch1 , mpv ch2 , mpv ch3 , mav ch1 , mav ch2 , mav ch3 ] T .

Data classification
To recognize the considered facial gestures, the extracted features must be classified into distinctive classes. A classifier must be able to cope with the factors which remarkably affect the EMG patterns over time such as intrinsic variation of EMG signals, electrode positions, sweat and fatigue. More significantly, a proper classifier has to classify the novel patterns during the online training accurately with very low computational cost to meet real-time processing constraints as the major prerequisite of HMI systems. It was reported that the neural network-based classifiers appropriately addressed the above concerns for myoelectric feature classification [31]. In this study, a VEBFNN was employed to classify the facial EMG features. This method was proposed by Saichon Jaiyen and its robustness was verified and validated by various data sets [32]. The main advantage of this supervised network is that it can learn data sets accurately in only one epoch, and discard datum after passing through which makes it powerful to train the incoming patterns during online training. As reported, this training procedure is very fast in comparison to the traditional neural networks such as MLPNN, and it needs only a small amount of memory [32]. This algorithm also aimed to evaluate the effectiveness of each facial EMG feature on the system performance. The structure of this network depicted in Figure 3 is the same as RBF neural network, which consists of three layers. In the input layer, the number of neurons was equal to the dimension of feature vector, which was three in this study: x i , i = 1, 2, 3. The hidden layer, where the number of neurons was not defined in advance since they were formed during the training procedure, was divided into ten sub-hidden layers (number of classes in the training data). The number of neurons in the output layer was also the same as the number of classes in the training data set (ten neurons).
The basis function of neurons in the hidden layer is hyperellipsoid and the output of the kth neuron in the hidden layer for each given input X = [x 1 , x 2 , x 3 ] T is calculated by the following equation: This equation shows a 3-dimensional hyperellipsoid which is centered at C = [c 1 , c 2 , c 3 ] T and rotated along with orthonormal basis {u 1 , u 2 , u 3 } that enables the neuron to cover neighbor data without translation or any change of size. The width of this hyperellipsoid along each axis is a i , i = 1, 2, 3.
Since the input feature vectors for each sample are in ℜ 3 , the coordinates corresponding to these vectors are standard orthogonal basis [1, 0, 0] T , [0, 1, 0] T , and [0, 0, 1] T . Therefore, component x i of each input vector X with respect to the new axes is computed by x i = X T u i . The rotation along orthogonal basis vectors enables the neurons to cover all nearby data without increasing the radius. Figure 4(a) shows how the VEBF neuron is trying to adjust itself to cover the new data; finally, the neuron locates as in Figure 4(b).
As mentioned earlier, a feature set with the size of 3×390 (3 is the number of channels) was obtained in the feature extraction step for each subject using each of the different methods. For the purpose of classification, each dataset was shuffled and then divided into 300×3 and 90×3 data features for training and testing stages respectively.
The orthonormal basis was computed through the eigenvectors of the covariance matrix. Since the training data was introduced to the network one by one, the mean vector and covariance matrix were computed recursively. For N (300 for each feature set) samples X = {x 1 , x 2 , …, x N } in which x j = ℜ 3 , j = 1, …, N the mean vector is calculated by: where μ old is the mean vector of the data set X and X N+1 is the new data vector added into the data set X. Then the covariance matrix was computed as follows: To find the orthonormal basis for the VEBF, the concept of principal component analysis was considered. Eigenvalues {λ 1 , λ 2 , λ 3 } and the corresponding eigenvectors {u 1 , u 2 , u 3 } were computed from the achieved covariance matrix. Then, the set of eigenvectors, which are orthogonal, form the orthonormal basis. The training procedure is represented in the following.

Training procedure
Consider that X = {(x j , t j )|1 ≤ j ≤ N} is a set of N=300 training data where x j is a feature vector (x j ∊ ℜ 3 ) and t j is its target. Let Ω = {Ω k |1 ≤ k ≤ m} be a set of m neurons. Each neuron has five parameters Ω k = (C k , S k , N k , A k , d k ) where C k is the center of the kth neuron , S k is the covariance matrix of the kth neuron, N k is the number of data corresponding to kth neuron, A k is the width vector of the kth neuron, and d k is the class label of the kth neuron. The whole training procedure can be summarized in the following six steps: 1) The width vector was initialized. Since three dimension feature vectors were used in the current study, a sphere with a radius of 0.5 was considered for simplicity;   2) The network was fed with training data set (x j , t j ). When no neuron was in the network (K=0), K=K+1 and a new neuron Ω k was shaped with the following parameters: C old k = x j , S old k = 0, N k = 1, d k = t j , A k = A 0 ; then the trained data was discarded. If K≠0, the nearest neuron in the hidden layer Ω k ∈ Ω was found such that d k = t j and k = arg min l (‖x j − C (l) ‖), l = 1, 2,…,K; then, their mean vector and covariance matrix were updated.
3) The orthonormal basis for Ω k was calculated.
4) The output of kth neuron was computed by If ψ k (X j ) ≤ 0, then the neuron covered the data so the temporary parameters were set to its fixed parameters. Otherwise, if ψ k (X j ) > 0, then a new neuron was created. 5) Since new neurons can be automatically added to the network and these neurons could be very close together, a merging strategy was considered to avoid growth of the network to the maximum structure (one neuron for each data). The details of this strategy are explained in [32]. 6) If there was any more training data, the algorithm was repeated from Step 2; otherwise, the procedure was finished.

Results and discussion
This section discusses the results of several experiments conducted during the course of this study. First, the classification and recognition accuracy, obtained by training and testing data, achieved by VEBFNN for each feature over all subjects were presented. The impact of each feature on the performance of the recognition system was investigated and compared with others. The computational load consumed during the training stage while using each feature was examined. The effect of each feature on the recognition of each facial gesture was explored. The sensitivity and stability of single features with high discrimination ratios over all subjects were compared. The performances achieved by the most accurate and the one with the lowest level of accuracy were visualized in confusion matrices. Statistical relationships between the considered EMG features were investigated through MI measures. The feature combinations, constructed based on the selected features by MRMR and RA, were examined in terms of recognition accuracy and training time. In the last experiment, the efficiency and reliability of the VEBFNN algorithm was validated by being compared with two conventional classifiers SVM and MLPNN. Table 3 presents the classification and the recognition accuracy obtained by VEBFNN for all features and participants. As can be seen, VEBFNN was trained well by different features since the average classification accuracy over all subjects for each feature was above 90%. The maximum degree of accuracy was achieved by MAV (98.5%). On the other hand, the results obtained from the testing stage showed that the ability of VEBFNN for facial gesture recognition varied depending on the type of features used. For instance, notwithstanding that WL features were trained 92.8%; their average recognition accuracy was only 24.5%. The maximum (Test) and minimum (Test) indicated the best and the worst features for each participant based on their achieved test performances. Subjects 1, 2, 3, 6, 7, and 8 reached the maximum recognition performance by utilizing MPV feature; subjects 4, 5, and 9 achieved the highest accuracy by employing IEMG; and subject 10 obtained the best results using RMS feature. Figure 5 demonstrates the classification accuracy for all features averaged over all subjects. It shows how different features affect recognition performance. As can be observed, using various features did not result in significant differences in the training performance. In other words, the effectiveness of all features to train VEBFNN was almost similar. On the contrary, the test results determined the real performance and indicated noticeable changes in recognition accuracies by applying diverse features, which delivered different impacts. This figure reported that MAV, MAVS, RMS, IEMG, SSI, and MPV were counted as discriminative and reliable features that contained essential information for the classification of facial states. Amongst them, MPV attained the best performance with the mean recognition accuracy (87.1%) and standard deviation (1.1%) over all subjects whereas WL obtained the lowest result with 24.5% recognition accuracy.  Table 3 also emphasizes the robustness of MPV and the weakness of WL features due to their Mean Absolute Error values over all subjects, which were 12.9% and 75.5% respectively; therefore, they were selected as the most and the least accurate features. Distribution of these two features in the feature space is demonstrated in Figure 6. The classes (gestures) were well-discriminated in MPV features. By contrast, the classes were mixed and could not be recognized from each other in WL features. G1-G10 represent the following facial gestures: opening the mouth (saying 'a' in the word apple), clenching the molars, gesturing 'notch' by raising the eyebrows, closing both eyes, closing the left eye, closing the right eye, frowning, smiling with both sides of the mouth, smiling with left side of the mouth and smiling with right side of the mouth.

Computational load
The rate of computation during the training procedure was noted as an important factor in designing the interfaces especially when being used in real-time applications. As can be seen in Figure 5, the consumed training time when using different features was less than a second; explicitly, the maximum time was 0.105 seconds when training MPV and SSI. Overall, this experience proved that VEBFNN was trained very fast using all considered EMG time-domain features which showed the low dependency level of this classifier respect to different features in terms of computational cost. Hence, Figure 5 Classification accuracy of training/testing procedures for all features averaged over all subjects and consumed time during training stage. recognition accuracy was a more reliable metric to compare the capability of features for facial gesture recognition.

Effectiveness of features on recognition of each facial gesture
In this experiment, we investigated the effectiveness of different features for recognizing each facial gesture using VEBFNN algorithm ( Table 4). As can be seen, the best features for the recognition of the facial gestures were as follows: MV for G1; MPV for G2, G3 and G4; MAV, MAVS, IEMG and MPV for G5; MAV and RMS for G6; MAV and MPV for G7; IEMG for G8; MAV and MAVS for G9; and IEMG for G10. According to this table, G3, G5, G7, G9 and G10 were recognized 100% by using different features. Besides, G5 was the most distinguishable gesture since it was accurately recognized with four features whereas G1 was poorly detected considering all features. It is also indicated that MPV provided the highest accuracy for more gestures (5 out of 10) comparing with other features. Therefore, it can be selected as the most proficient feature for single gesture recognition; while, VAR was not effective enough since it resulted in the lowest accuracies for recognizing G2, G6, G8, and G9. Table 4 also indicates that by considering a same feature for all facial gestures, G1-G10 led to different classification ratios. This may be caused by various reasons such as differences in the involvement of muscles with minor role in shaping each facial gesture; the signal magnitude of muscles which depends on the number of motor units (muscle fibers + motor neuron) and firing rate; action potential resulting from different muscle movements; signaling source of facial gestures; innervation ratio of muscles [33].

Analytical comparisons of features over subjects
Further work was carried out to understand the distributional characteristics obtained by VEBFNN over all participants for the features which provided high discrimination  Figure 7 reports that MAV and IEMG had almost the same degree of dispersion since their interquartile were limited in a similar range. MPV was shaped in a short box which meant that all subjects reached close recognition ratios for this feature. In contrast, long spread of accuracies for RMS indicates the high sensitivity of this feature over different subjects. Symmetric boxes for RMS, IEMG, and SSI features point out that the achieved accuracies for different subjects split evenly at the median. The significant point of the figure is the position of MPV median which states that the recognition accuracy exceeded 87% for at least 5 subjects.

Performance visualization by confusion matrix
The training and testing performances of VEBFNN on the best and the worst single features are visualized as confusion matrices in Tables 5(a) and (b) respectively. These tables illustrate how MPV and WL were classified and misclassified during the training and testing procedures for all facial gestures. As indicated, the significant interaction in Table 5(a) happened between G1 and G8 since in the training stage G1 was 4.3% misclassified in place of G8. This affected the testing stage where just 36.7% of data were recognized correctly. The reason was a similar signaling source for these two gestures. Table 5(b) shows extensive interactions that occurred between all gestures during both training and testing steps which emphasized the weakness of WL for discriminating the facial gesture.

Statistical feature analysis
In this section, statistical relationships between the single features averaged over all subjects were inspected by means of MI measure (Figure 8). In this figure, brighter pixels stand for higher MI and more relevance between features. The noticeable point is where the MI between MAV and MAVS equaled to 1 which proved that they contained similar characteristics of facial EMGs. The next high degree of relevancy was reported between RMS and MPV, followed by RMS and IEMG whereas SSC and MV had the lowest relationship. Moreover, the very low relevancy of WL with most of the features (MAV, MAVS, RMS, SSC, and MV) denoted either unlike facial EMG information or weakness of this feature in characterizing the EMGs patterns.   Train  G1  G2  G3  G4  G5  G6  G7  G8  G9

Effectiveness of feature combinations on system performance
This experiment aimed to examine the effectiveness of feature combinations on the system performance. Moreover, the results achieved by these sets were compared with the single feature MPV which was suggested earlier. These combinations were formed based on the rankings shown in Table 6 which were appointed to the single features using MRMR and RA criteria. It can be seen that the feature rankings were different with regard to each criterion. That was due to the fact that MRMR selected the features by considering the relationships among all of them while RA ranked the features with regard to their individual strength in recognizing the facial gestures. According to MRMR, MAV was selected as the best feature whereas based on RA this rank was taken by MPV. Besides, MV reached the second rank via MRMR since this criterion assumed that MV contained complementary information in combinations and might increase the performance; although this feature resulted in too low accuracy as a single feature.
In this study, the feature sets including two (C2) to ten (C10) features were constructed as shown in Table 7. The performance of the feature sets formed based on MRMR in terms of recognition accuracy and the consumed training time averaged over all subjects were investigated in Figure 9(a). It can be seen that the recognition performance of all combinations was too low though it was slightly enhanced by increasing the number of features. In addition, it is indicated that the time consumed to train the VEBFNN was raised by applying more features without any considerable improvement in the final system performance. According to Figure 9(b) which demonstrates the performance of the feature combinations formed via RA, once again applying more features generally resulted in lower accuracy and more computational load during the training. Considering C2 in Figure 9(a) and C9 in Figure 9(b), it is observed that the accuracy sharply decreased when MV was added to the combinations. This feature was selected by MRMR as the second one to have the maximum relevancy and the minimum redundancy and it was supposed to improve the system performance by its   Table 7 Combinations including two to ten features based on MRMR and RA criteria   Combinations  MRMR  RA   C2  MAV,MV  MPV,MAV   C3  MAV,MV,MPV  MPV,MAV,IEMG   C4  MAV,MV,MPV,IEMG  MPV,MAV,IEMG,RMS   C5  MAV,MV,MPV,IEMG, MPV. The main reason was that although some of the single features provided meaningful power for classifying the gestures individually, their combinations not only delivered less discriminative feature sets but also caused more data overlapping between the classes which reduced the classification accuracy.

VEBFNN efficiency assessment
The following experiment evaluated the robustness of VEBFNN in comparison with SVM and MLPNN. In Figure 10 training the features with the minimum of 7.35 seconds for training RMS. As expected, VEBFNN consumed the lowest computational cost since the maximum time was only 0.105 seconds for training MPV. As mentioned before, the purpose of our study was identifying the method which can provide robust performance by considering a reliable trade-off between accuracy and time. Accordingly, although MLP provided the accuracy of 88.2% using SSI; it could not be counted as the best method because the time consumed during training was significantly high, about 8.14 seconds. Therefore, VEBFNN was recommended as the most effective classifier by using MPV feature since it achieved 87.1% accuracy (which is not meaningfully different respect to 88.2% achieved by MLP), and consumed only 0.105 seconds in the training stage.
As stated earlier, facial myoelectric signals have been considered in several studies to design interfaces for HMI systems (Table 1). In [6][7][8]10,16,[20][21][22]24], the number of employed facial gestures (classes) varied between 3 and 8; whereas, in our study the flexibility of such interface was improved by using ten classes. In terms of feature extraction, a few types of EMG features were focused [6][7][8]10,16,[20][21][22]24], while in this paper the characteristic of different facial EMG single/multi features were investigated and analyzed comprehensively. For classification of EMG features, this work made use of the accurate and very fast algorithm VEBFNN which was designed and proposed recently; whilst, [6][7][8]10,16,[20][21][22]24], employed traditional methods. It must be mentioned that, comparing the overall performance of the previous works with the results of this paper was not fair since the number of classes as well as the participants, signal recording protocol and the considered facial gestures were not the same. When comparing with [23] in which a similar setup was considered, it should be noticed that despite the lower accuracy (about 3%) achieved by VEBFNN, this classifier was considerably faster than FCM.
To sum up, due to the fact that real-time myoelectric control requires high levels of accuracy and speed, a trustworthy trade-off must be considered between these two key factors. The main advantage of VEBFNN was that it needed only one epoch to train new data which resulted in very fast training procedure (less than a second). This algorithm was validated using different types of data [32], and its reliability and usefulness was also proved for EMG-based facial gesture recognition in this study. Moreover, in order to find the best recognition performance, various types of facial EMG single features as well as feature combinations were evaluated among which MPV was the most discriminative one.

Conclusion and future works
In this paper, a reliable facial gesture recognition-based interface to be used in human machine interfacing applications was presented. The effectiveness of ten EMG time-domain single features were explored and compared in order to find the most discriminating. Statistical analysis was carried out by means of MI to reveal the rate of relevancy between the features. The impact of feature combinations, formed based on MRMR and RA criteria, was investigated on system performance and compared with the best single feature. The application of a VEBFNN was proposed and evaluated for the classification of facial gestures EMG signals. The best facial myoelectric feature introduced in this study was MPV which provided the highest discrimination ratio between the facial gestures. Considering this feature, VEBFNN offered a robust recognition performance with 87.1% level of accuracy and very fast training process with only 0.105 seconds. This study clarified that MPV outperformed all the feature combinations constructed through either MRMR or RA criteria in both terms of accuracy and computational cost.
The findings of this study are meant to be practically applied for processing and recognizing the facial gestures EMGs so as to design reliable interfaces for HMI systems. They can also be applied in the fields that require analyzing and classifying EMG signals for other purposes. This technology will be used to control prosthesis and assistive devices that aid the disabled. Designing trustworthy interfaces requires highly efficient methods in terms of accuracy and computational manners. So, in future a more thorough investigation on facial gesture EMGs analysis is recommended and other successful techniques in the field of biomedical signal processing will be examined. Furthermore, as the disabled are intended to benefit from this research, they will be the focus of future studies.