Skip to main content

EMG-based facial gesture recognition through versatile elliptic basis function neural network

Abstract

Background

Recently, the recognition of different facial gestures using facial neuromuscular activities has been proposed for human machine interfacing applications. Facial electromyograms (EMGs) analysis is a complicated field in biomedical signal processing where accuracy and low computational cost are significant concerns. In this paper, a very fast versatile elliptic basis function neural network (VEBFNN) was proposed to classify different facial gestures. The effectiveness of different facial EMG time-domain features was also explored to introduce the most discriminating.

Methods

In this study, EMGs of ten facial gestures were recorded from ten subjects using three pairs of surface electrodes in a bi-polar configuration. The signals were filtered and segmented into distinct portions prior to feature extraction. Ten different time-domain features, namely, Integrated EMG, Mean Absolute Value, Mean Absolute Value Slope, Maximum Peak Value, Root Mean Square, Simple Square Integral, Variance, Mean Value, Wave Length, and Sign Slope Changes were extracted from the EMGs. The statistical relationships between these features were investigated by Mutual Information measure. Then, the feature combinations including two to ten single features were formed based on the feature rankings appointed by Minimum-Redundancy-Maximum-Relevance (MRMR) and Recognition Accuracy (RA) criteria. In the last step, VEBFNN was employed to classify the facial gestures. The effectiveness of single features as well as the feature sets on the system performance was examined by considering the two major metrics, recognition accuracy and training time. Finally, the proposed classifier was assessed and compared with conventional methods support vector machines and multilayer perceptron neural network.

Results

The average classification results showed that the best performance for recognizing facial gestures among all single/multi-features was achieved by Maximum Peak Value with 87.1% accuracy. Moreover, the results proved a very fast procedure since the training time during classification via VEBFNN was 0.105 seconds. It was also indicated that MRMR was not a proper criterion to be used for making more effective feature sets in comparison with RA.

Conclusions

This work was accomplished by introducing the most discriminating facial EMG time-domain feature for the recognition of different facial gestures; and suggesting VEBFNN as a promising method in EMG-based facial gesture classification to be used for designing interfaces in human machine interaction systems.

Introduction

A recent report released by World Health Organization (WHO) and World Bank shows that more than one billion people with disabilities face substantial barriers in their daily lives [1]. In order to help these people, especially the ones with critical disabilities as the result of strokes, neuro-diseases, and muscular dystrophy, human machine interaction (HMI) has been proposed as a promising way to improve the quality of their lives [2]. Controlling assistive devices, such as wheelchairs [3] and prosthetic limbs [4] are instances in this area. Designing such devices requires applying reliable interfaces as a communication channel between humans and machines. Interfaces that rely on facial neuromuscular activities generated from facial gestures have been lately suggested. The goal here is to recognize facial gestures through facial EMG signals and transform them into input commands to control the devices. The most recent approaches are: the extraction of three facial gestures during speech via four recording channels and transforming them to control commands [5]; controlling a hands-free wheelchair using five different facial myosignals [6]; the application of five facial gestures to design and control a virtual crane training system [7]; the enhancement of human computer interaction by applying six various facial muscle EMG recordings through eight superficial sensors [8]; the use of EMG and visual based HMI to control an intelligent wheelchair [9]; and controlling an electric wheelchair applying six surface facial EMGs [10]. The reliability and flexibility of these systems directly depends on the numbers of classes (gestures), and the methods used for analyzing facial gestures EMGs.

EMG signals are grouped as stochastic and non-stationary and their analysis is too complex [11]; thus, much investigation is needed. Noise reduction, conditioning, smoothing, data windowing, segmentation, feature extraction, dimension reduction and classification are the common stages of recognizing different EMG patterns. Facial gestures recognition ratio mainly depends on the effectiveness of the EMG feature and classification algorithms which are the focus of this paper.

In order to discriminate different muscle movements (gestures), the most prominent parts of the EMGs (features) that represent the characteristics with enough information for classification should be extracted. Various types of features, such as time-domain, autoregressive coefficients, cepstral coefficients, and wavelet coefficients have been applied to classify of upper limb EMG signals [12]. Other types of EMG features have been used in different applications [13–15]. According to previous studies on facial EMG signals, there are some restrictions when analyzing them through their spectrums. This is because of the similarity of facial EMGs frequency components; therefore, they cannot be processed either by frequency-domain or time-frequency distribution algorithms to classify facial gestures [16, 17]. These methods can be applied only during muscle fatigue and for inferring changes in motor unit recruitment investigations [18]. More appropriate characteristics of facial EMGs are time-domain ones because of being easy to compute, working based on signal amplitudes, and possessing high stability for EMG pattern recognition [16, 19]. There are several methods of time-domain feature extraction; however, to achieve better results, the feature must contain enough information to represent the significant properties of the signal and it must be simple enough for fast training and classification. Extracted features must be trained and classified into distinguishing categories. Hence, a suitable classifier must be considered to provide a fast process and accurate results. Table 1 reviews the related studies of EMG-based facial gesture recognition systems. In these studies, the number of classes and recording channels varied and different facial gestures were considered. As can be seen from the table, only a few methods were investigated for feature extraction and classification. Since this field of study is still in its primary stage, it needs much more investigation.

Table 1 Related studies on facial gesture recognition

Since there is not much work reported on facial EMG analysis, this paper considers the same setup used in [23] to investigate more on the impact of different facial EMG features on the classification of facial gestures. Therefore, characteristics of ten facial gestures EMGs were explored by extracting ten different time-domain features. The relationship between these features was examined by means of Mutual Information (MI) measure. Moreover, MRMR and RA were employed to select and rank the features for the purpose of constructing feature combinations.

Classification of features through a fast, reliable and accurate algorithm was another objective of this paper. Accordingly, a VEBFNN was applied to classify the single/multi features and evaluate their effectiveness in order to find the most discriminative one based on the recognition performance and the training time. Furthermore, the efficiency and robustness of this classifier was inspected for facial myoelectric signal classification through being assessed and compared with the conventional SVM and multilayer perceptron neural network (MLPNN) methods.

The rest of this paper is organized as follows. The next section describes all the materials needed to record facial EMGs. Then, the methodology of analyzing the EMG signals is explained. Subsequently, experimental results including statistical analysis and detailed discussions are stated. Finally, a brief summary and recommendations for future work are presented in last section.

Methods and materials

The procedure of the current study was divided into several steps as demonstrated in Figure 1. The first step consisted of subject preparation, electrode placement, system setup and EMG signals acquisition. Then, all recorded signals were conditioned and filtered prior to processing. Data windowing and segmentation methods were applied in the preprocessing step. Afterwards, ten different types of time-domain features were extracted from all EMG signals. Subsequently, features correlation was analyzed through MI measures. And feature combinations were constructed by considering two criteria MRMR and RA. In order to train and classify the features a very fast VEBFNN was used. This algorithm was employed for the first time to classify EMG signals. Finally, experimental results were discussed in order to evaluate the effectiveness of each feature/combination to find the most discriminative and accurate one that could deliver the highest performance in terms of facial gesture recognition and computational load. Moreover, the efficiency of VEBFNN was assessed and compared with two other popular supervised classifiers, SVM and MLPNN.

Figure 1
figure 1

System block diagram of current study.

Facial EMG acquisition

Subject preparation and electrode placement

EMGs are known to be one of the most contaminated signals with a low signal to noise ratio [11]. To achieve clear EMGs, some precautions were considered before signal recording. The subject’s skin was cleaned by means of alcohol pads to remove any dust or sweat in order to reduce the fat layer. In addition, to obtain better signals with higher amplitudes, the electrodes were placed on the right sites [25]. EMGs were recorded through three channels via three pairs of surface rounded pre-gelled Ag/AgCl electrodes. The first and third channels were placed on left and right temporalis muscles and the second channel was positioned on frontalis muscle above the eyebrows (Figure 2). These electrodes were formed in a bipolar configuration (2 cm inter-electrodes distance) on the EMG recording areas to reduce any common noise between them. Another electrode was placed on the boney part of the left wrist to eliminate motion artifacts.

Figure 2
figure 2

Electrode positions and muscles involved in considered facial gestures.

System setup and data acquisition

The protocol of this experiment was approved by the Universiti Teknologi Malaysia Human Ethics Research Committee. In the present experiment, facial EMGs were captured via BioRadio 150 (Clevemed) and the signals were recorded at the rate of ~1000 Hz sampling frequency. Through the activation of filters with a low cut-off frequency 0.1 Hz and a notch filter of 50 Hz, unwanted artifacts from user movements and power line inference noises were removed by the device software itself.

Ten mentally and physically healthy volunteers including five male and five female between the ages of 26 and 41 were chosen for this work. Before recording the data, all participants were trained to make facial gestures. The gestures considered for this study were: smiling with both sides of the mouth, smiling with left side of the mouth, smiling with right side of the mouth, opening the mouth (saying ‘a’ in the word apple), clenching the molars, gesturing ‘notch’ by raising the eyebrows, frowning, closing both eyes, closing the right eye and closing the left eye. The subjects were asked to perform each facial gesture five times for two seconds (active signal), and with 5 seconds rest between to eliminate the effect of muscle fatigue. Since the only useful part of a signal for discriminating and recognizing different facial gestures is the active one, only 10 seconds (5×2sec) was considered for the processing of each gesture. Moreover, signals were recorded by the three channels synchronically resulting in a three dimensional data set (3×10 sec) for each gesture. Therefore, ten sets of 3×10 sec active signals were obtained from each subject who performed ten gestures.

EMG filtration and conditioning

To envelope the most significant spectrum of signals, they were passed through a band-pass filter in the range of 30–450 Hz [7].

Data windowing and segmentation

Due to the huge amount of data available for processing, the most essential characteristics of facial EMGs (features) should be extracted and considered for further processing. Prior to the feature extraction, filtered signals were segmented into non-overlapped windows with 256 msec length [26]. Since there was a signal of 10000 msec in each channel; 39 portions (10000÷256≈39) were obtained and prepared for feature extraction.

Feature extraction

Feature extraction is an essential step during EMG processing which has direct effect on final system performance. Good features should highlight the most important properties and characteristics of the facial EMG signal and they should have low computational cost to be used in real-time applications. As mentioned earlier, a number of different features with various complexity and efficiency were suggested and used for EMG signals. In this paper, the ten types of time-domain features extracted from segmented EMGs were Mean Absolute Value Slope (MAVS), Simple Square Integral (SSI), Sign Slope Changes (SSC), Mean Value (MV), Mean Peak Value (MPV), IEMG, WL, MAV, RMS and VAR. The mathematical definition as well as description of these features is provided in Table 2. Since the EMGs were segmented into 39 portions, for each gesture in each channel 39 features were extracted. By considering three channels, a three dimensional feature vector containing 390 features (for 10 gestures) was achieved for each subject using each method.

Table 2 Time-Domain features considered in this study

In order to investigate the correlation between the single features, the statistical dependence was measured in form of MI which is a more general measurement than a simple cross-correlation [27]. MI is an entropy type quantity, which provides a measure of the amount of information that one random variable contains about another. It can be thought of as the reduction in uncertainty about one random variable given knowledge of the other. Thus, the more mutual information between two random variables A and B, the less uncertainty there is in A knowing B or B knowing A and zero mutual information means the variables are independent [28]. Given two features A and B, their MI is computed by

MI A ; B = ∑ b ∈ B ∑ a ∈ A p a , b log p a , b p a p b
(1)

where p(a, b) is the joint probability distribution function of A and B, p(a) and p(b) are the marginal probability density functions of A and B respectively.

It is indicated that a combination of several single features can achieve better recognition accuracy if the features provide complementary information [29]. In this work, the combinations including two to ten features were constructed by considering two feature selection concepts. In pattern recognition, feature selection aims to identify subsets of data that are relevant and best characterizes the statistical property of a target classification variable, which is normally called Maximum Relevance [30]. These subsets often contain material which is relevant but redundant. Among the common measures between features like similarity or correlation coefficient, MI can represent both relevancy and redundancy [30]. The MRMR technique using MI for feature selection was firstly proposed by Peng et al. [30]. The relevance of a feature set A for the class C is defined by the average value of all MI values between the individual feature f i and the class C as follows:

D A , C = 1 A ∑ f i ∈ A MI f i ; C
(2)

And the redundancy of all features in the set A is computed by:

R A = 1 A 2 ∑ f i , f j ∈ A MI f i ; f j
(3)

Then, MRMR can be achieved by max A D A , C − R A

In addition to MRMR, the single features were also selected and ranked based on their individual power in terms of RA. Accordingly, feature combinations were constructed using the rankings appointed by MRMR as well as RA. As stated earlier, each single feature had 3 dimensions (three channels); so, the dimensions of constructed feature combinations including 2, 3, 4, 5, 6, 7, 8, 9, and 10 features were 6, 9, 12, 15, 18, 21, 24, 27, and 30 respectively. For instance, feature set related to the single feature MPV was [mpvch 1, mpvch 2, mpvch 3]T while the feature set including two features MPV and MAV was [mpvch 1, mpvch 2, mpvch 3, mavch 1, mavch 2, mavch 3]T.

Data classification

To recognize the considered facial gestures, the extracted features must be classified into distinctive classes. A classifier must be able to cope with the factors which remarkably affect the EMG patterns over time such as intrinsic variation of EMG signals, electrode positions, sweat and fatigue. More significantly, a proper classifier has to classify the novel patterns during the online training accurately with very low computational cost to meet real-time processing constraints as the major prerequisite of HMI systems. It was reported that the neural network-based classifiers appropriately addressed the above concerns for myoelectric feature classification [31]. In this study, a VEBFNN was employed to classify the facial EMG features. This method was proposed by Saichon Jaiyen and its robustness was verified and validated by various data sets [32]. The main advantage of this supervised network is that it can learn data sets accurately in only one epoch, and discard datum after passing through which makes it powerful to train the incoming patterns during online training. As reported, this training procedure is very fast in comparison to the traditional neural networks such as MLPNN, and it needs only a small amount of memory [32]. This algorithm also aimed to evaluate the effectiveness of each facial EMG feature on the system performance. The structure of this network depicted in Figure 3 is the same as RBF neural network, which consists of three layers. In the input layer, the number of neurons was equal to the dimension of feature vector, which was three in this study: x i , i = 1, 2, 3. The hidden layer, where the number of neurons was not defined in advance since they were formed during the training procedure, was divided into ten sub-hidden layers (number of classes in the training data). The number of neurons in the output layer was also the same as the number of classes in the training data set (ten neurons).

Figure 3
figure 3

VEBF neural network structure.

The basis function of neurons in the hidden layer is hyperellipsoid and the output of the k th neuron in the hidden layer for each given input X = [x1, x2, x3]T is calculated by the following equation:

ψ k X = ∑ i = 1 3 X − C T u i 2 a i 2 − 1
(4)

This equation shows a 3-dimensional hyperellipsoid which is centered at C = [c1, c2, c3]T and rotated along with orthonormal basis {u1, u2, u3} that enables the neuron to cover neighbor data without translation or any change of size. The width of this hyperellipsoid along each axis is a i , i = 1, 2, 3.

Since the input feature vectors for each sample are in â„œ3, the coordinates corresponding to these vectors are standard orthogonal basis [1, 0, 0]T, [0, 1, 0]T, and [0, 0, 1]T. Therefore, component x i of each input vector X with respect to the new axes is computed by x i = XTu i . The rotation along orthogonal basis vectors enables the neurons to cover all nearby data without increasing the radius. Figure 4(a) shows how the VEBF neuron is trying to adjust itself to cover the new data; finally, the neuron locates as in Figure 4(b).

Figure 4
figure 4

Data coverage by orthonormal basis rotation. (a) The attempt of neuron to adjust itself to cover the new data. (b) The final position of neuron after new data coverage.

As mentioned earlier, a feature set with the size of 3×390 (3 is the number of channels) was obtained in the feature extraction step for each subject using each of the different methods. For the purpose of classification, each dataset was shuffled and then divided into 300×3 and 90×3 data features for training and testing stages respectively.

The orthonormal basis was computed through the eigenvectors of the covariance matrix. Since the training data was introduced to the network one by one, the mean vector and covariance matrix were computed recursively. For N (300 for each feature set) samples X = {x1, x2, …, x N } in which x j = ℜ3, j = 1, …, N the mean vector is calculated by:

μ new = N N + 1 μ old + X N + 1 N + 1
(5)

where μ old is the mean vector of the data set X and XN+1 is the new data vector added into the data set X.

Then the covariance matrix was computed as follows:

τ new = N N + 1 τ old + θ
(6)
θ = X N + 1 X N + 1 T N + 1 − μ new μ new T + μ old μ old T − μ old μ old T N + 1
(7)

To find the orthonormal basis for the VEBF, the concept of principal component analysis was considered. Eigenvalues {λ1, λ2, λ3} and the corresponding eigenvectors {u1, u2, u3} were computed from the achieved covariance matrix. Then, the set of eigenvectors, which are orthogonal, form the orthonormal basis. The training procedure is represented in the following.

Training procedure

Consider that X = {(x j , t j )|1 ≤ j ≤ N} is a set of N=300 training data where x j is a feature vector (x j ε ℜ3) and t j is its target. Let Ω = {Ω k |1 ≤ k ≤ m} be a set of m neurons. Each neuron has five parameters Ω k = (Ck, Sk, N k , A k , d k ) where Ck is the center of the k th neuron , Sk is the covariance matrix of the k th neuron, N k is the number of data corresponding to k th neuron, A k is the width vector of the k th neuron, and d k is the class label of the k th neuron. The whole training procedure can be summarized in the following six steps:

  1. 1)

    The width vector was initialized. Since three dimension feature vectors were used in the current study, a sphere with a radius of 0.5 was considered for simplicity; A 0 = [0.5, 0.5, 0.5]T.

  2. 2)

    The network was fed with training data set (x j , t j ). When no neuron was in the network (K=0), K=K+1 and a new neuron Ω k was shaped with the following parameters: C old k = x j , S old k = 0, N k = 1, d k = t j , A k = A 0; then the trained data was discarded. If K≠0, the nearest neuron in the hidden layer Ω k ∈ Ω was found such that d k = t j and k = arg min l (‖x j − C (l)‖), l = 1, 2,…,K; then, their mean vector and covariance matrix were updated.

  3. 3)

    The orthonormal basis for Ω k was calculated.

  4. 4)

    The output of k th neuron was computed by

    ψ k X j = ∑ i = 1 n X j − C new k T u i 2 a i k 2 − 1
    (8)

If ψ k (X j ) ≤ 0, then the neuron covered the data so the temporary parameters were set to its fixed parameters. Otherwise, if ψ k (X j ) > 0, then a new neuron was created.

  1. 5)

    Since new neurons can be automatically added to the network and these neurons could be very close together, a merging strategy was considered to avoid growth of the network to the maximum structure (one neuron for each data). The details of this strategy are explained in [32].

  2. 6)

    If there was any more training data, the algorithm was repeated from Step 2; otherwise, the procedure was finished.

Results and discussion

This section discusses the results of several experiments conducted during the course of this study. First, the classification and recognition accuracy, obtained by training and testing data, achieved by VEBFNN for each feature over all subjects were presented. The impact of each feature on the performance of the recognition system was investigated and compared with others. The computational load consumed during the training stage while using each feature was examined. The effect of each feature on the recognition of each facial gesture was explored. The sensitivity and stability of single features with high discrimination ratios over all subjects were compared. The performances achieved by the most accurate and the one with the lowest level of accuracy were visualized in confusion matrices. Statistical relationships between the considered EMG features were investigated through MI measures. The feature combinations, constructed based on the selected features by MRMR and RA, were examined in terms of recognition accuracy and training time. In the last experiment, the efficiency and reliability of the VEBFNN algorithm was validated by being compared with two conventional classifiers SVM and MLPNN.

Classification and recognition accuracy

Table 3 presents the classification and the recognition accuracy obtained by VEBFNN for all features and participants. As can be seen, VEBFNN was trained well by different features since the average classification accuracy over all subjects for each feature was above 90%. The maximum degree of accuracy was achieved by MAV (98.5%). On the other hand, the results obtained from the testing stage showed that the ability of VEBFNN for facial gesture recognition varied depending on the type of features used. For instance, notwithstanding that WL features were trained 92.8%; their average recognition accuracy was only 24.5%. The maximum (Test) and minimum (Test) indicated the best and the worst features for each participant based on their achieved test performances. Subjects 1, 2, 3, 6, 7, and 8 reached the maximum recognition performance by utilizing MPV feature; subjects 4, 5, and 9 achieved the highest accuracy by employing IEMG; and subject 10 obtained the best results using RMS feature.

Table 3 Classification and recognition accuracy for each subject, Mean value, Standard deviation, and Mean absolute error (%)

Figure 5 demonstrates the classification accuracy for all features averaged over all subjects. It shows how different features affect recognition performance. As can be observed, using various features did not result in significant differences in the training performance. In other words, the effectiveness of all features to train VEBFNN was almost similar. On the contrary, the test results determined the real performance and indicated noticeable changes in recognition accuracies by applying diverse features, which delivered different impacts. This figure reported that MAV, MAVS, RMS, IEMG, SSI, and MPV were counted as discriminative and reliable features that contained essential information for the classification of facial states. Amongst them, MPV attained the best performance with the mean recognition accuracy (87.1%) and standard deviation (1.1%) over all subjects whereas WL obtained the lowest result with 24.5% recognition accuracy.

Figure 5
figure 5

Classification accuracy of training/testing procedures for all features averaged over all subjects and consumed time during training stage.

Table 3 also emphasizes the robustness of MPV and the weakness of WL features due to their Mean Absolute Error values over all subjects, which were 12.9% and 75.5% respectively; therefore, they were selected as the most and the least accurate features. Distribution of these two features in the feature space is demonstrated in Figure 6. The classes (gestures) were well-discriminated in MPV features. By contrast, the classes were mixed and could not be recognized from each other in WL features. G1-G10 represent the following facial gestures: opening the mouth (saying ‘a’ in the word apple), clenching the molars, gesturing ‘notch’ by raising the eyebrows, closing both eyes, closing the left eye, closing the right eye, frowning, smiling with both sides of the mouth, smiling with left side of the mouth and smiling with right side of the mouth.

Figure 6
figure 6

Distribution of MPV and WL features in feature space.

Computational load

The rate of computation during the training procedure was noted as an important factor in designing the interfaces especially when being used in real-time applications. As can be seen in Figure 5, the consumed training time when using different features was less than a second; explicitly, the maximum time was 0.105 seconds when training MPV and SSI. Overall, this experience proved that VEBFNN was trained very fast using all considered EMG time-domain features which showed the low dependency level of this classifier respect to different features in terms of computational cost. Hence, recognition accuracy was a more reliable metric to compare the capability of features for facial gesture recognition.

Effectiveness of features on recognition of each facial gesture

In this experiment, we investigated the effectiveness of different features for recognizing each facial gesture using VEBFNN algorithm (Table 4). As can be seen, the best features for the recognition of the facial gestures were as follows: MV for G1; MPV for G2, G3 and G4; MAV, MAVS, IEMG and MPV for G5; MAV and RMS for G6; MAV and MPV for G7; IEMG for G8; MAV and MAVS for G9; and IEMG for G10. According to this table, G3, G5, G7, G9 and G10 were recognized 100% by using different features. Besides, G5 was the most distinguishable gesture since it was accurately recognized with four features whereas G1 was poorly detected considering all features. It is also indicated that MPV provided the highest accuracy for more gestures (5 out of 10) comparing with other features. Therefore, it can be selected as the most proficient feature for single gesture recognition; while, VAR was not effective enough since it resulted in the lowest accuracies for recognizing G2, G6, G8, and G9.

Table 4 Recognition accuracy achieved for facial gestures using different features averaged over all subjects (%)

Table 4 also indicates that by considering a same feature for all facial gestures, G1-G10 led to different classification ratios. This may be caused by various reasons such as differences in the involvement of muscles with minor role in shaping each facial gesture; the signal magnitude of muscles which depends on the number of motor units (muscle fibers + motor neuron) and firing rate; action potential resulting from different muscle movements; signaling source of facial gestures; innervation ratio of muscles [33].

Analytical comparisons of features over subjects

Further work was carried out to understand the distributional characteristics obtained by VEBFNN over all participants for the features which provided high discrimination ratios: MAV, MAVS, RMS, IEMG, SSI, and MPV. Figure 7 reports that MAV and IEMG had almost the same degree of dispersion since their interquartile were limited in a similar range. MPV was shaped in a short box which meant that all subjects reached close recognition ratios for this feature. In contrast, long spread of accuracies for RMS indicates the high sensitivity of this feature over different subjects. Symmetric boxes for RMS, IEMG, and SSI features point out that the achieved accuracies for different subjects split evenly at the median. The significant point of the figure is the position of MPV median which states that the recognition accuracy exceeded 87% for at least 5 subjects.

Figure 7
figure 7

Analytical comparisons of selected features over all subjects.

Performance visualization by confusion matrix

The training and testing performances of VEBFNN on the best and the worst single features are visualized as confusion matrices in Tables 5(a) and (b) respectively. These tables illustrate how MPV and WL were classified and misclassified during the training and testing procedures for all facial gestures. As indicated, the significant interaction in Table 5(a) happened between G1 and G8 since in the training stage G1 was 4.3% misclassified in place of G8. This affected the testing stage where just 36.7% of data were recognized correctly. The reason was a similar signaling source for these two gestures. Table 5(b) shows extensive interactions that occurred between all gestures during both training and testing steps which emphasized the weakness of WL for discriminating the facial gesture.

Table 5 Confusion matrices averaged over all subjects for (a) MPV and (b) WL features (%)

Statistical feature analysis

In this section, statistical relationships between the single features averaged over all subjects were inspected by means of MI measure (Figure 8). In this figure, brighter pixels stand for higher MI and more relevance between features. The noticeable point is where the MI between MAV and MAVS equaled to 1 which proved that they contained similar characteristics of facial EMGs. The next high degree of relevancy was reported between RMS and MPV, followed by RMS and IEMG whereas SSC and MV had the lowest relationship. Moreover, the very low relevancy of WL with most of the features (MAV, MAVS, RMS, SSC, and MV) denoted either unlike facial EMG information or weakness of this feature in characterizing the EMGs patterns.

Figure 8
figure 8

Facial EMG features correlations using Mutual Information measures averaged over all subjects.

Effectiveness of feature combinations on system performance

This experiment aimed to examine the effectiveness of feature combinations on the system performance. Moreover, the results achieved by these sets were compared with the single feature MPV which was suggested earlier. These combinations were formed based on the rankings shown in Table 6 which were appointed to the single features using MRMR and RA criteria. It can be seen that the feature rankings were different with regard to each criterion. That was due to the fact that MRMR selected the features by considering the relationships among all of them while RA ranked the features with regard to their individual strength in recognizing the facial gestures. According to MRMR, MAV was selected as the best feature whereas based on RA this rank was taken by MPV. Besides, MV reached the second rank via MRMR since this criterion assumed that MV contained complementary information in combinations and might increase the performance; although this feature resulted in too low accuracy as a single feature.

Table 6 Feature ranking based on MRMR and RA

In this study, the feature sets including two (C2) to ten (C10) features were constructed as shown in Table 7. The performance of the feature sets formed based on MRMR in terms of recognition accuracy and the consumed training time averaged over all subjects were investigated in Figure 9(a). It can be seen that the recognition performance of all combinations was too low though it was slightly enhanced by increasing the number of features. In addition, it is indicated that the time consumed to train the VEBFNN was raised by applying more features without any considerable improvement in the final system performance. According to Figure 9(b) which demonstrates the performance of the feature combinations formed via RA, once again applying more features generally resulted in lower accuracy and more computational load during the training. Considering C2 in Figure 9(a) and C9 in Figure 9(b), it is observed that the accuracy sharply decreased when MV was added to the combinations. This feature was selected by MRMR as the second one to have the maximum relevancy and the minimum redundancy and it was supposed to improve the system performance by its complementary information. However, MV undesirably impacted the performance since it was very weak in terms of recognition accuracy individually according to the previous findings. On the other hand, the feature sets formed based on RA performed better than those constructed via MRMR which was due to the fact that MV participated in all combinations suggested by the second criterion. Finally, it was proven that all of the feature combinations considered in this study resulted in lower recognition accuracy and consumed more time for training in comparison with the single feature MPV. The main reason was that although some of the single features provided meaningful power for classifying the gestures individually, their combinations not only delivered less discriminative feature sets but also caused more data overlapping between the classes which reduced the classification accuracy.

Table 7 Combinations including two to ten features based on MRMR and RA criteria
Figure 9
figure 9

The effect of feature combinations on recognition accuracy and training time by considering (a) MRMR, (b) RA.

VEBFNN efficiency assessment

The following experiment evaluated the robustness of VEBFNN in comparison with SVM and MLPNN. In Figure 10(a), the recognition accuracy achieved by these classifiers was investigated by considering the discriminative single features MAV, MAVS, RMS, IEMG, SSI, and MPV. As can be seen clearly, VEBFNN outperformed the other two classifiers in recognizing the facial gestures when applying MAV, MAVS, IEMG, and MPV features. Besides, all methods delivered almost similar accuracies for the classification of RMS feature. And as observed, MLPNN achieved the highest level of accuracy (88.2%) when classifying SSI. In addition to the above metric, the computational load consumed by these classifiers during the training stage was examined (Figure 10(b)). Comparing all results, it is indicated that MLPNN required too much time for training the features with the minimum of 7.35 seconds for training RMS. As expected, VEBFNN consumed the lowest computational cost since the maximum time was only 0.105 seconds for training MPV. As mentioned before, the purpose of our study was identifying the method which can provide robust performance by considering a reliable trade-off between accuracy and time. Accordingly, although MLP provided the accuracy of 88.2% using SSI; it could not be counted as the best method because the time consumed during training was significantly high, about 8.14 seconds. Therefore, VEBFNN was recommended as the most effective classifier by using MPV feature since it achieved 87.1% accuracy (which is not meaningfully different respect to 88.2% achieved by MLP), and consumed only 0.105 seconds in the training stage.

Figure 10
figure 10

Comparison of VEBFNN, SVM, and MLPNN classifiers over selected features on (a) recognition accuracy and (b) consumed training time.

As stated earlier, facial myoelectric signals have been considered in several studies to design interfaces for HMI systems (Table 1). In [6–8, 10, 16, 20–22, 24], the number of employed facial gestures (classes) varied between 3 and 8; whereas, in our study the flexibility of such interface was improved by using ten classes. In terms of feature extraction, a few types of EMG features were focused [6–8, 10, 16, 20–22, 24], while in this paper the characteristic of different facial EMG single/multi features were investigated and analyzed comprehensively. For classification of EMG features, this work made use of the accurate and very fast algorithm VEBFNN which was designed and proposed recently; whilst, [6–8, 10, 16, 20–22, 24], employed traditional methods. It must be mentioned that, comparing the overall performance of the previous works with the results of this paper was not fair since the number of classes as well as the participants, signal recording protocol and the considered facial gestures were not the same. When comparing with [23] in which a similar setup was considered, it should be noticed that despite the lower accuracy (about 3%) achieved by VEBFNN, this classifier was considerably faster than FCM.

To sum up, due to the fact that real-time myoelectric control requires high levels of accuracy and speed, a trustworthy trade-off must be considered between these two key factors. The main advantage of VEBFNN was that it needed only one epoch to train new data which resulted in very fast training procedure (less than a second). This algorithm was validated using different types of data [32], and its reliability and usefulness was also proved for EMG-based facial gesture recognition in this study. Moreover, in order to find the best recognition performance, various types of facial EMG single features as well as feature combinations were evaluated among which MPV was the most discriminative one.

Conclusion and future works

In this paper, a reliable facial gesture recognition-based interface to be used in human machine interfacing applications was presented. The effectiveness of ten EMG time-domain single features were explored and compared in order to find the most discriminating. Statistical analysis was carried out by means of MI to reveal the rate of relevancy between the features. The impact of feature combinations, formed based on MRMR and RA criteria, was investigated on system performance and compared with the best single feature. The application of a VEBFNN was proposed and evaluated for the classification of facial gestures EMG signals. The best facial myoelectric feature introduced in this study was MPV which provided the highest discrimination ratio between the facial gestures. Considering this feature, VEBFNN offered a robust recognition performance with 87.1% level of accuracy and very fast training process with only 0.105 seconds. This study clarified that MPV outperformed all the feature combinations constructed through either MRMR or RA criteria in both terms of accuracy and computational cost.

The findings of this study are meant to be practically applied for processing and recognizing the facial gestures EMGs so as to design reliable interfaces for HMI systems. They can also be applied in the fields that require analyzing and classifying EMG signals for other purposes. This technology will be used to control prosthesis and assistive devices that aid the disabled. Designing trustworthy interfaces requires highly efficient methods in terms of accuracy and computational manners. So, in future a more thorough investigation on facial gesture EMGs analysis is recommended and other successful techniques in the field of biomedical signal processing will be examined. Furthermore, as the disabled are intended to benefit from this research, they will be the focus of future studies.

Authors’ information

About the Author—Mahyar Hamedi Currently he is a Ph.D. candidate at the Center for Biomedical Engineering, Faculty of Bioscience and Biomedical Engineering, University Technology Malaysia, Johor Bahru, Malaysia. He is IEEE member and his research interests are Biomedical Signal/Image Processing, Human Machine Interaction, Brain Computer Interaction and Neural Engineering. About the Author—Sh-Hussain Salleh Currently, he is a Full Professor, Faculty of Bioscience and Medical Engineering, University Technology Malaysia, director of Center for Biomedical Engineering, IEEE member, IEM member, professional engineer. Project leader: speech recognition and synthesis, heart sound and ECG signal processing, motor imagery signal analysis for brain computer interaction. His research interests are Biomedical Signal Processing, Pattern recognition and neural network. He has published over 100 papers, and he has supervised many Master, PhD and Postdoc students. About the Author—Mehdi Astaraki received the Master of Biomedical Engineering at department of Biomedical Engineering, Science and Research Branch, Islamic Azad University Tehran, Iran. His research interests are Biomedical Signal/Image Processing and neural network. About the Author—Alias Mohd Noor is the dean of Research Alliance with four research centers under his administration one of which is Centre for Biomedical Engineering. He is a Professor in Mechanical Engineering. He is Fellow in Institute of Engineers Malaysia, Professional Engineer in Board of Engineers Malaysia, ASEAN Eng., Int. PE, Asian Chartered Engineers, APEC Eng. He has won several awards from his research products and innovations.

Abbreviations

EMG:

Electromyogram

VEBFNN:

Versatile elliptic basis function neural network

WHO:

World Health Organization

HMI:

Human machine interaction

IEMG:

Integrated EMG

MAV:

Mean absolute value

MSD:

Maximum scatter difference

RMS:

Root mean square

PSD:

Power spectrum density

AV:

Absolute value

MAD:

Mean absolute deviation

WL:

Wave length

SD:

Standard deviation

ZC:

Zero crossing

FMN:

Frequency mean

FMD:

Frequency median

VAR:

Variance

SVM:

Support Vector machine

MLPNN:

Multi-Layer perceptron neural network

K-NN:

K-nearest neighbors

FCM:

Fuzzy C-means

MAVS:

Mean absolute value slope

SSI:

Simple square integral

SSC:

Sign slope changes

MV:

Mean value

MPV:

Maximum peak value

SFCM:

Subtractive fuzzy C-means

GM:

Gaussian model

MI:

Mutual information

MRMR:

Minimum-redundancy-maximum-relevance

ANFIS:

Adaptive Neuro-fuzzy inference system

RBF:

Radial basis function

RA:

Recognition accuracy.

References

  1. World Health OrganizationWebsite [http://www.who.int/mediacentre/news/releases/2011/disabilities_20110609/en/]

  2. Kawamura K, Iskarous M: Trends in service robots for the disabled and the elderly. IEEE. Intelligent Robots and Systems. Advanced Robotic Systems and the Real World’, IROS’94. In Proceedings of the IEEE/RSJ/GI International Conference 1994, 3:1647–1654.

    Google Scholar 

  3. EMG- IRWw, Gesture- V-b, EMG- IRWw, Gesture- V-b: Intelligent Robotic Wheelchair with EMG-, Gesture-, and Voice-based Interfaces. In Proceedings of the 2003 lEEE/RSJ International Conference on intelligent Robots and Systems. 3rd edition. Las Vegas, NV: IEEE; 2003:2453–3458.

    Google Scholar 

  4. Mak AF, Zhang M, Boone DA: State-of-the-art research in lower-limb prosthetic biomechanics-socket interface. J Rehabil Res Dev 2001, 38(2):161–173.

    Google Scholar 

  5. Arjunan SP, Kumar DK: Recognition of facial movements and hand gestures using surface Electromyogram (sEMG) for HCI based applications. In Proceedings of 9th Biennial Conference of the Australian Pattern Recognition Society on Digital Image Computing Techniques and Applications. Glenelg, Australia: IEEE; 2008:1–6.

    Google Scholar 

  6. Firoozabadi SMP, Asghari Oskoei MR, Hu H: A Human-Computer Interface based on Forehead Multi-Channel Bio-signals to Control a Virtual Wheelchair. 272: Proceedings of 14th ICBME; 2008:272–277.

    Google Scholar 

  7. Mohammad Rezazadeh I, Wang X, Firoozabadi SMP, Hashemi Golpayegani MR: Using affective human machine interface to increase the operation performance in virtual construction crane training system: a novel approach. Autom Construct J 2010, 20: 289–298.

    Article  Google Scholar 

  8. Gibert G, Pruzinec M, Schultz T, Stevens K: Enhancement of Human Computer Interaction with facial Electromyographic sensors, Proceeding of the 21st Annual Conference of the Australian Computer Human Interaction Special Interest Group on Design Open 247 OzCHI. Melborn, Australia: ACM press; 2009:1–4.

    Google Scholar 

  9. Wei L, Hu H: EMG and visual based HMI for hands-free control of an intelligent wheelchair. In 8th World Congress on Intelligent Control and Automation. Jinan: IEEE; 2010:1027–1032.

    Google Scholar 

  10. Tamura H, Manabe T, Tanno K, Fuse Y: The Electric Wheelchair Control System Using Surface Electromyogram of Facial Muscles. World Automation Congress (WAC): 19–23 Sept. 2010. Kobe: IEEE; 2010:1–6.

    Google Scholar 

  11. Reaz MBI, Hussain MS, Mohd-Yasin F: Techniques of EMG signal analysis: detection, processing, classification and applications. Biol Proced Online 2006, 8(1):11–35. 10.1251/bpo115

    Article  Google Scholar 

  12. Yücel K, Mehmet K: EMG Signal Classification Using Wavelet Transform and fuzzy Clustering Algorithms. Istanbul, Turkey: Ayazaga; 2001.

    Google Scholar 

  13. Huang CN, Chen CH, Chung HY: The review of applications and measurements in facial electromyography. J Med Biol Eng 2004, 25: 15–20.

    Google Scholar 

  14. Fukuda O, Tsuji T, Kaneko M, Otsuka A: A human-assisting manipulator teleoperated by EMG signals and arm motions. IEEE Trans Robot Autom 2003, 19: 210–222. 10.1109/TRA.2003.808873

    Article  Google Scholar 

  15. Liejun W, Xizhong Q, Taiyi Z: Facial expression recognition using improved support vector machine by modifying kernels. Inform Technol J 2009, 8: 595–599. 10.3923/itj.2009.595.599

    Article  Google Scholar 

  16. Hamedi M, Sheikh HS, Tan TS, Kamarul A: SEMG based Facial Expression Recognition in Bipolar configuration. J Comp Sci 2011, 7(9):1407–1415. 10.3844/jcssp.2011.1407.1415

    Article  Google Scholar 

  17. Sawarkar KG: Analysis and Inference of EMG Using FFT. Mumbai, India: Proceeding of SPIT-IEEE Colloquium and International Conference; 2007:1.

    Google Scholar 

  18. Subasi A, Kiymik MK: Muscle fatigue detection in EMG using time-frequency methods, ICA and neural networks. J Med Syst 2010, 34(4):775–785.

    Article  Google Scholar 

  19. Tkach D, Huang H, Kuiken TA: Research study of stability of time-domain features for electromyographic pattern recognition. J Neuroeng Rehabil 2010, 7: 21. 10.1186/1743-0003-7-21

    Article  Google Scholar 

  20. Ang LBP, Belen EF, Bernardo RA, Boongaling ER, Briones GH, Coronel JB: Facial expression recognition through pattern analysis of facial muscle movements utilizing electromyogram sensors. In Proceedings of the TENCON 2004 IEEE Region 10 Conference: 21–24 Nov. 2004. 3rd edition. Chiang Mai, Thailand: IEEE; 2004:600–603.

    Google Scholar 

  21. Van den Broek EL, Lis’y V, Janssen JH, Westerink JHDM, Schut MH, Tuinenbreijer K: Affective Man–machine Interface: Unveiling human emotions through biosignals. Biomedical Engineering Systems and Technologies: Communications in Computer and Information Science. Berlin, Germany: Springer Verlag; 2010:21–47. 52(Part 1)

    Google Scholar 

  22. Hamedi M, Rezazadeh IM, Firoozabadi SMP: Facial gesture recognition using two-channel biosensors configuration and fuzzy classifier: a pilot study. In Proceeding of the International Conference on Electrical, Control and Computer Engineering: 21–22 June 2011. Pahang, Malaysia: IEEE; 2011:338–343.

    Chapter  Google Scholar 

  23. Hamedi M, Salleh SH, Tan TS, Ismail K, Ali J, Dee-Uam C, Pavaganun C, Yupapin PP: Human facial neural activities and gesture recognition for machine interfacing applications. Int J Nanomedicine 2011, 6: 3461–3472.

    Google Scholar 

  24. Mohammad Rezazadeh I, Firoozabadi SM, Hu H, Hashemi Golpayegani SMR: A novel human-machine interface based on recognition of multi-channel facial bioelectric signals. Australas Phys Eng Sci Med 2011, 34(4):497–513. 10.1007/s13246-011-0113-1

    Article  Google Scholar 

  25. Mohammad Rezazadeh I, Firoozabadi M, Hu H, Hashemi Golpayegani MR: Determining the surface electrodes locations to capture facial bioelectric signals. Iran J Med Phys 2010, 7: 65–79.

    Google Scholar 

  26. Englehart K, Hudgins B: A robust, real-time control scheme for multifunction myoelectric control. IEEE Trans Biomed Eng 2003, 50(7):848–854. 10.1109/TBME.2003.813539

    Article  Google Scholar 

  27. Battiti R: Using mutual information for selecting features in supervised neural net learning. Neural Networks, IEEE Trans on 1994, 5(4):537–550. 10.1109/72.298224

    Article  Google Scholar 

  28. Cover TM, Thomas JA: Entropy, relative entropy and mutual information. In Elements of Inform Theory. New York: John Wiley & Sons; 1991:12–49.

    Chapter  Google Scholar 

  29. Rechy-Ramirez EJ, Hu H: Stages for Developing Control Systems using EMG and EEG signals: A survey. Technical Report: CES-513 in School of Computer Science and Electronic Engineering. United Kingdom: University of Essex; 2011.

    Google Scholar 

  30. Peng HC, Long F, Ding C: Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 2005, 27(8):1226–1238.

    Article  Google Scholar 

  31. Asghari Oskoei M, Hu H: Myoelectric control systems—A survey. Biomed Signal Proces 2007, 2(4):275–294. 10.1016/j.bspc.2007.07.009

    Article  Google Scholar 

  32. Saichon J, Chidchanok L, Suphakant P: A very fast neural learning for classification using only new incoming datum. IEEE Trans Neural Netw 2010, 21(3):381–392.

    Article  Google Scholar 

  33. Andris F: Biomechanics of the Upper Limbs Mechanics, Modelling and Musculoskeletal Injuries. 1st edition. London, England: CRC Press; 2004.

    Google Scholar 

Download references

Acknowledgments

This research project is supported by Center of Biomedical Engineering (CBE), Transport Research Alliance, Universiti Teknologi Malaysia research university grant (Q.J130000.2436.00G31) and funded by Ministry of Higher Education (MOHE).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mahyar Hamedi.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

MH conception and design of the study, working on the algorithm design, analysis and interpretation of data, drafting of manuscript, revision of manuscript. S-HS study supervision, contribution in discussion and suggestions, critical revision of the manuscript for important intellectual content, approval of the final version of the manuscript. MA working on the algorithm design, contribution in discussion and suggestions, approval of the final version of the manuscript. AMN contribution in discussion and suggestions, approval of the final version of the manuscript. All authors read and approved the final manuscript.

Authors’ original submitted files for images

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Hamedi, M., Salleh, SH., Astaraki, M. et al. EMG-based facial gesture recognition through versatile elliptic basis function neural network. BioMed Eng OnLine 12, 73 (2013). https://doi.org/10.1186/1475-925X-12-73

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/1475-925X-12-73

Keywords