Emotional data acquisition
Gathering good and meaningful data is necessary in any signal processing application. In works related to emotion recognition using physiological signals, acquiring emotional physiological data is challenging because of the subjective nature of emotions and cognitive dependence of physiological signals. This necessitates the six emotional states to be elicited internally in the subject unlike other modalities of facial action or speech where the emotions can be enacted. The intensity of emotion induced varies among subjects and depends on psychological factors such as attention, orientation, social interaction and appraisal [2, 4].
Researchers have used different methods to elicit the target emotions. Visual based elicitation using images, audio based elicitation using music and audio-visual elicitation using short film video clips are commonly used [2, 15, 28, 29]. Other modalities such as recall paradigm where the subject is asked to repeatedly recall emotional instances from their life and dyadic interaction where a facilitator helps in inducing the various emotions are also used by researchers [30, 31]. Audio-visual elicitation using short film clips is found to elicit the target emotion better [32, 33] compared to the other modalities. Hence, in this work emotions were induced by using short video clips.
Pilot study
One of the major tasks in inducing emotions using short audio-visual clips is to identify video clips that would elicit the target emotions better. For this, around 20 video clips per emotional state were collected from various sources on the internet, and a pilot study was conducted. Fifteen volunteers in the mean age of 25 years participated in the pilot study to rate the emotions they experienced when watching the video clips. Sixty audio visual clips (ten for each emotion) with the highest rating were chosen for data collection. The emotional state ‘anger’ was excluded for further study because of poor rating, which points back to the local culture.
Emotion induction protocol
The protocol used for data acquisition is as shown in Figure 2. There were two sessions with five trials in each section. Video clips pertaining to all the six emotional states (neutral, happiness, sadness, fear, surprise and disgust) were played in each trial in a predetermined random fashion. Care was taken not to play dimensionally opposite emotional video clips consecutively. Each of the emotional video clips lasted from 15 to 40 seconds, and was sandwiched between neutral images for 10 seconds. The neutral images between the video clips provided a small time gap for smooth transition between the emotional states. The entire protocol lasted for an hour with a break of 15 to 20 minutes between the two sessions. The participants were allowed to relax and refresh during the break.
Participants
Sixty healthy volunteers, inclusive of thirty under graduate students from the university (18 to 25 years old), fifteen school children (9 to 16 years old) and fifteen adults (39 to 68 years old) participated in the data collection experiment. Each group had equal number of male and female participants. The participants of the pilot study did not take part in the data collection experiment.
The participants signed a consent form, after getting to know the purpose and procedure of the experiment. For children under the age of eighteen, consent was obtained from their parents or teacher. The experiment process complied with the recommendations of the AMA’s Declaration of Helsinki for human studies and the institutional policies.
Procedure
In this work, ECG and EMG data were acquired simultaneously as the subjects watched the emotional video clips displayed on the screen using a self guided protocol. However, in this work we focus only on ECG signals. Power Lab data Acquisition System developed by AD Instruments, Australia was used to collect the emotional ECG data. Three electrodes were used; two active electrodes were placed on the hands (left and right) and one reference electrode on the left leg. The sampling frequency was set to 1000 Hz.
The subjects were requested to relax, minimize movement and concentrate on the audio-visual clips before starting the experiment. The set-up of the experiment is shown in Figure 3. The subjects watched the video clips on the LCD screen placed at a distance of seven meters in front of them. The video clips were played in the same order for all the subjects. After the experiment, they filled a self assessment questionnaire identifying the emotional state they experienced during the experiment. They also rated the intensity of their emotional state on a five point scale (1-very low to 5-very high). These ratings were then used to understand the intensity of the emotional state they experienced. However, despite the intensity levels, all the emotional data was taken into consideration and were randomized during processing.
Data processing
The raw ECG data was split as per the emotional states and the noises that occur due to power line interference, muscle and movement artifacts were removed. Baseline wander that occurs at low frequency was removed by using the wavelet based algorithm proposed by Bunluechokchai et al., [34]. High frequency noises and power line interference were removed by using a 6th order Butterworth filter with a cut off frequency of 45 Hz. The reliability of the acquired signals was measured using the NN/RR ratio, where NN refers to the number of normal to normal beat intervals and RR refers to the total number of RR intervals in the ECG signal. Records with ratio less than 90% were excluded from further processing [17]. The QRS complex was derived by performing a non-linear transformations on the first derivative of the filtered ECG signal [35]. Figure 4 depicts the various stages in obtaining the QRS complex. The QRS peaks are distinctly seen after the second non-linear transformation. Hurst exponent and the proposed HOS features were computed from the QRS complex using two methods – Rescaled Range Statistics (RRS) and Finite Variance Scaling (FVS). It should be noted that the features were extracted from the QRS complex and not from HRV signals.
Rescaled range statistics
This method analyzes the smoothness of a fractal time series based on the asymptotic pattern of the rescaled range of the process. First, the accumulated deviation of mean of time series over time is computed. The rescaled range R/S follows a power law relationship with time T as,
R is the difference between the maximum and minimum deviation from the mean and S represents the standard deviation. Hurst, H is then derived as,
(2)
where T is the length of sample data and R/S represents the corresponding value of rescaled range [20].
Finite variance scaling
Finite variance scaling method is also known as standard deviation analysis and is based on the standard deviation D(t) of the variable x(t).
Considering the time series x(t) to be of length n, the standard deviation is computed as,
(3)
for j = 1,2,….,n.
Eventually,
where H is the Hurst exponent. It is evaluated by finding the gradient of the best fitted log-log plot of D(t) and t [22].
Proposed higher order statistics (HOS) based hurst features
HOS descriptors of order greater than two [36] retain finer information from the data and are appropriate for non-Gaussian and non-linear data [37, 38]. Skewness and Kurtosis are normalized versions of third and fourth order cummulants respectively. Skewness measures the symmetry of a distribution around its mean and Kurtosis measures the relative heaviness of the tail of a distribution with respect to its normal distribution [24].
Let S(t) be the rescaled or finite variance scaled data defined by equations (1) and (4) respectively. Now generalizing and approximating the proportionality to equal we get,
Subtracting the mean μs and standard deviation σs of S(t) on both sides of the equation, cubing them and normalizing with length N, we get,
(6)
This equation can also be rephrased as,
Skewness,
(7)
Eventually,
(8)
Now, Skewness based Hurst,
(9)
A relation for Kurtosis similar to equation 8 can be constructed in the same way but for order 4 as,
(10)
Now, Kurtosis based Hurst,
(11)
Skewness based Hurst and Kurtosis based Hurst are non-linear, higher order features that can be easily computed.
Classification of emotional states
Sixty subjects with six emotions and ten trials per emotion resulted in a total of 3600 samples. However, four trials of one subject had loose electrode contact because of which the data was ignored. Data from four kids and three young adults were also ignored because of unreliability captured using the NN/RR ratio. This resulted in a total of 3300 samples, which were processed. All the six features were extracted from these samples.
The performance of the different features as analyzed by four classifiers – Regression tree, naïve Bayes, K- Nearest Neighbour (KNN) and fuzzy KNN (FKNN). Regression tree classifier creates a decision tree for predicting the classes based on Gini’s diversity index whereas bayesian classifier is a probabilistic classifier based on Bayes theorem with strong independence assumptions. KNN and FKNN assigns a class based on the predominant class among the k nearest neighbors. The value of k was chosen to vary from six to fifteen as the number of classes used for classification here is six. Euclidean distance was used as the metric in KNN and FKNN allocates fuzzy class membership before making decisions.
In this work, random-cross validation was done to test the performance of the classifiers. The features derived from all the subjects were permutated and then categorized into 70% and 30% for all the six emotional states. Then the 70% features were used for training the classifier and 30% features were used for testing. The testing and training features belonged to random subjects and varied in each run of the program. However, they were mutually exclusive. Subject independent validation (also called leaving-one-person-out) was also performed for the RRS and FVS based combined analysis [39]. The features derived from 38 subjects were used for training the system and the other 16 subjects were used for testing adhering to the 70–30 rule. The classification accuracy is computed for the different emotional states as,
(12)
where Emotion refers to the six emotional states namely happiness, sadness, fear, surprise, disgust and neutral. The average accuracy was computed by taking the mean of the accuracies of all the six emotional states.