Assessment of eye fatigue caused by head-mounted displays using eye-tracking

Background Head-mounted displays (HMDs) and virtual reality (VR) have been frequently used in recent years, and a user’s experience and computation efficiency could be assessed by mounting eye-trackers. However, in addition to visually induced motion sickness (VIMS), eye fatigue has increasingly emerged during and after the viewing experience, highlighting the necessity of quantitatively assessment of the detrimental effects. As no measurement method for the eye fatigue caused by HMDs has been widely accepted, we detected parameters related to optometry test. We proposed a novel computational approach for estimation of eye fatigue by providing various verifiable models. Results We implemented three classifications and two regressions to investigate different feature sets, which led to present two valid assessment models for eye fatigue by employing blinking features and eye movement features with the ground truth of indicators for optometry test. Three graded results and one continuous result were provided by each model, respectively, which caused the whole result to be repeatable and comparable. Conclusion We showed differences between VIMS and eye fatigue, and we also presented a new scheme to assess eye fatigue of HMDs users by analysis of parameters of the eye tracker.

eyes, headache, and tiredness [7]. Ohno and Ukai [8] found that subjective feelings on eye fatigue included descriptions, such as "trouble focusing, " "hazy, " "gritty, " "near-vision difficulty, " and "far-vision difficulty. " Hockey et al. [9] defined eye fatigue as a decrease in the performance of a particular mission. The objective indicators involve critical fusion frequency (CFF) [10], binocular vision [11], eye-blink rate (EBR) [12], and pupil constriction rate [13]. Changes in accommodative and vergence functions were reported to occur after working periods at a visual display terminal (VDT), and these changes were proposed as objective indicators for visual fatigue [14]. Bando et al. demonstrated that the difference between natural observation of real-world scenes versus display technology may cause visual discomfort and eye fatigue [15]. Eye fatigue in HMDs is mainly caused by the vergence-accommodation conflict [16]. This kind of conflict is created by a mismatch between perceived and virtual depth. In order to obtain a clear vision, our eyes converge and accommodate by a level dependent on the distance to the viewing object. When eyes converge to look at the object, accommodation changes the eye's lens to obtain and maintain the object in the fovea. When the HMD is used in a stereoscopic mode, vergence is adjusted according to the virtual distance of the fixated object. However, accommodation is fixed in the HMD, and this is a cause of eye fatigue because accommodation and vergence are cross-linked functions. Fatigue has been mainly defined in association with muscle performance [17]. In the eye muscles, the same case can be assumed since eye movement is habitual from birth, eliminating symptoms of muscle fatigue [18]. In the present study, eye fatigue is defined as the functional decline in a number of specific eye muscles. Since extraocular muscles (EOMs) have been reported as fatigue-resistant muscles in the literature [19,20] and EOMs can control different rotation movements of the eye [21,22], the specific eye muscles are only among those control the activities inside the eyeball. We selected the muscles that control accommodation response, pupil size, and lens thickness as the specific muscles. Ciliary muscles control accommodation response and lens thickness [23]. The size of pupil can be controlled by the pupillary muscle plant which includes two kinds of antagonistic muscles: the sphincter and dilator [24]. We demonstrated that these muscles have a short-term stable functional decline when eye fatigue occurred. Even though this definition can be used as a measurement of eye fatigue, the measurement process is complex and non-real time.
Questionnaires were developed to subjectively assess visual fatigue caused by observation of various types of visual stimuli. The questionnaires were evaluated using four types of moving images: playing a TV game using an HMD or a TV, viewing images with and without elaborating camera shake, viewing a movie with and without color breakup, and viewing either a stereoscopic movie (anaglyph method) or a non-stereoscopic movie [7,25]. Emoto et al. assessed and compared visual fatigue by measuring fusional amplitude [26]. Li et al. found that delay in transmission of visual information measured by electroencephalogram (EEG) was efficient in visual fatigue [27]. A number of scholars also attempted to assess eye fatigue by measuring brain activity; however, their approach was indeed complicated, as well as being very sensitive to the individual state of the subjects. Thus, a series of subjective indexes had been captured by optometry instruments and accommodation response were explored as indicators of visual fatigue [14,28]. Moreover, eye movement was employed to assess mental fatigue and visual fatigue [29,30]. It was revealed that EBR decreased in HMD environment compared with that in natural environment [30]. However, due to equipment limitations, EBR could not be measured when the subject was using the HMD. Some scholars measured EBR only in the natural environment for both experimental group and control group. To deeply increase the scientific knowledge on eye fatigue and eye activities, Kim et al. proposed a visual fatigue monitoring system based on eye movement and eye-blink detection [31]. They found that the saccade movement of the eye decreased, while the frequency of eyeblink increased when eye fatigue was accumulated due to the increase in fixation time. In addition, a new assessment of eye fatigue related to three-dimensional (3D) display was proposed based on multimodal measurements [32]. To our knowledge, no research has assessed eye fatigue induced by HMDs using eye-tracking methods, even if eyetrackers can be embedded into HMDs [33,34].
In order to overcome the shortcomings of previously reported studies, we proposed an objective algorithm to estimate eye fatigue caused by an HMD using eye-tracking data. Based on our previous research, we adopted seven objective indicators as the ground truth of eye fatigue, which were all subjective optometry data, including binocular crossed cylinder (BCC) test, negative relative accommodation (NRA), positive relative accommodation (PRA), left pupil diameter (PL), right pupil diameter (PR), left lens thickness (LTL), and right lens thickness (LTR) [35][36][37][38][39][40]. Furthermore, we further developed the concept of eye fatigue by proposing a new assessment strategy due to measurable eye activities (e.g., accommodation, pupil change, lens change, and eye movement).

SSQ results
The SSQ scores of 105 subjects are presented in Fig. 1. Here, 105 subjects on the horizontal axis are arranged in ascending order of the data measured in the fourth SSQ. No consistent change can be observed through time of all the subjects, indicating that the subjects did not suffer from obvious VIMS after the experiment. For total of 420 samples, the linear correlations between SSQ and different indicators for optometry test are shown in Table 1. It can be seen that VIMS has a poor correlation with the feeling of "eye fatigue. " As all the subjects were fully relaxed before the experiment, all the 105 subjects were scored 0 in this item at the first SSQ, and it was revealed that only 41 out of 105 subjects were scored more than 0 in this item at the fourth SSQ.  Optometry results Table 2 shows the mean values and the changes of the indicators in the fourth measurement of the main and control experiments. There was an increasing trend for BCC, while a decreasing trend was observed for pupil size, lens thickness, and the amplitude of PRA and NRA in the first measurement compared with the second, third, and fourth measurements; however, no obvious trend was found in the control experiment.

Weighted eye fatigue
We proposed a weighted eye fatigue measurement based on seven indicators of optometry test. In order to demonstrate the reasonability of every indicator to eye fatigue, we conducted the Student's t test on the seven indicators of optometry test. The results of statistical analysis are listed in Table 2. The significance level was set to 0.05. The results showed that all the indicators for optometry test showed significant differences in data collected from the beginning of the experiment to the end. Figure 2 shows the results of weighted eye fatigue of all the 105 subjects in the experiment computed by Eq. (4). The 105 subjects on the horizontal axis are arranged in ascending order of the data measured in the fourth time. Compared with healthy controls, the eye fatigue level of all the 105 subjects increased over time in the main experiment, which was caused by viewing the HMD display. The values of mean and standard deviation (SD) related to changes of the weighted fatigue are depicted in Fig. 3. The red crosses are outliers. It can be seen that in  addition to mean values changes, the SD values of the subjects gradually changed with progress of the experiment. Table 3 shows the values of mean and SD related to changes of eye movement features in the main experiment. There was an increasing trend for SD duration of fixation and mean duration of fixation, as well as a decreasing trend for the number of fixation and total duration of fixation in the 1st measurement compared with the 2nd, 3rd, and 4th measurements. The findings disclosed that the four blinking features were enhanced over time for all the participants, and their eye fatigue level increased as well. In general, individuals normally reduce EBR due to sleepiness or boredom. Compared with the 1st measurement, the 2nd, 3rd, and 4th measurements were increased, representing the increase in eye fatigue level. In the present study, Kolmogorov-Smirnov [41] test was used to assess whether data were normally distributed. In addition, the Student's t test was carried out on all the ten features selected in this study, and the results are presented in Table 3. P < 0.05 was considered statistically significant. It was revealed that there were significant differences in eye movement features between data collected from the beginning of the experiment to the end.    4.01E-04 ± 3.05E-04

Eye movements results
Length of scanpath (pixel)

Ranking the eye movement features
With the help of minimal-redundancy-maximal-relevance (MRMR) criterion [42], we ranked the ten eye movement features. The ranked features according to the unified weighted eye fatigue level are listed in Table 4. The total duration of computation was 37 s. The findings indicated that the top five features of the all three kinds of classifications were consistent.

Assessment results
We established eleven feature sets for analyzing by support vector machine (SVM). The first ten feature sets were named as Set 1 to Set 10, and the name of the 11th feature set was blink set. Set 1 contained one eye movement feature, Set 2 contained two eye movement features, and so forth. We applied the selection method of four-class classification for the epsilon regression and nonlinear regression. We established blink set to investigate whether the blink features alone are qualified to assess individuals' eye fatigue, considering that it is more economic to achieve blinking features than general eye movement features. We analyzed kernels as follows: linear kernel, polynomial kernel, radial basis function kernel, and sigmoid kernel, corresponding to kernel 1 to kernel 4, respectively. All the five SVMs were trained in each round, which lasted for 48 min. Table 5 shows the results of the three kinds of classifications. It was unveiled that the set 10 had the highest accuracy of classification compared with the other feature sets. It also was revealed that the maximum accuracy of the two-class classification, three-class classification, and fourclass classification was 0.9079, 0.7947, and 0.7425, respectively. As displayed in Table 5, the blink set had a performance that was slightly worse than the feature set 10. Regarding the kernel selection, kernel 3, kernel 1, and kernel 2 were the most appropriate kernels for the three types of classifications, respectively. Table 6 represents the correlations between the regression results and the ground truth of eye fatigue. The feature set 10 also showed the best performance compared with the other feature sets. Once comparisons were made between the two regressions, it was revealed that the epsilon regression outperformed.
According to the results of classification and regression, two assessment models were established: an eye tracker model and a blink detector model. The eye tracker model  used the feature set 10 as input data. The blink detector model utilized the blink set as input data. Both models had four applications, including three classifications and one regression. The mentioned three classifications used radial basis function kernel, linear kernel, and polynomial kernel, respectively. The regression was the epsilon regression using polynomial kernel. The performances of the two assessment models are listed in Table 7.

Discussion
In the present research, an objective algorithm was developed to estimate eye fatigue caused by an HMD using eye-tracking data. We used a weighted combination of seven indicators for optometry test as the ground truth of eye fatigue, which was found to be a novel definition based on our previous study. Based on SSQ and optometry data, we note that the relationship between VIMS and the feeling of eye fatigue was insignificant. For further study on HMDs related to eye fatigue, an unobtrusive eye tracker was mounted on the VR gear, which led to achieving ten features related to eye movement. With the help of MRMR criterion for ranking the features, three kinds of classifications and two kinds of regressions were resulted to evaluate the performance for different feature sets. A series of methods have been proposed to assess eye fatigue; however, the eye fatigue caused by HMD has been rarely investigated yet. Previous studies were mainly concentrated on VIMS when discomfort caused by HMD was target [43,44]. In the present  study, we first demonstrated that eye fatigue and VIMS are different. We used SSQ to assess the subjects' VIMS throughout the experiment. Figure 1 shows that the majority of the subjects did not feel any VIMS after the experiment. Figure 2 depicts that all the 105 subjects suffered from eye fatigue after the experiment, indicating that eye fatigue is a muscular disorder, and subjective feeling of eye fatigue may associate with long consciousness after the occurrence of eye fatigue. A number of scholars considered the subjective feeling as the ground truth of the eye fatigue [26], which was not accurate. In the present experiment, there was also an item "eye fatigue" in the SSQ, in which 41 out of 105 subjects reported that they felt eye fatigue after the experiment, reflecting that subjective feelings cannot accurately measure eye fatigue. Li et al. found that delay in the transmission of visual information measured with EEG could be helpful for visual fatigue [27]. However, eye fatigue is taken as an eye problem in lieu of a brain problem.
In the current research, we used seven indicators for optometry test, including BCC, NRA, PRA, PL, PR, LTL, and LTR to define eye fatigue. Besides, BCC, NRA, PRA, LTL, and LTR reflected the accommodative ability of the ciliary muscles. Additionally, PL and PR referred to the accommodative ability of pupillary muscle plant. As shown in Table 2, all the seven indicators are sensitive to eye fatigue. It is noteworthy that LTR and LTL decreased from 3.95 mm at the first measurement to 3.79 and 3.77 mm at the fourth measurement. Researchers found that prior to presbyopia, the thickness of lens increased by about 42 to 72 μm per diopter of accommodation [45][46][47]. In every optometry test of our experiment, the order of measurement was BCC, PRA, NRA, pupil size, and lens thickness. Pupil size and lens thickness were simultaneously measured by the optical biometer. For each subject, his/her lens thickness was measured right after NRA test. When NRA test was performed, the subject was added positive diopter, which led to decreasing of his/her lens thickness. When testing of lens thickness was undertaken, the subject observed a light source with a fix distance, which required accommodation. The results of the present experiment showed that lens thickness of the subjects decreased over time because their accommodative ability decreased as eye fatigue occurred. Since the measurement based on optometry data needs medical equipment and it cannot be applied in real time, we proposed an assessment method using eye movement data. Numerous scholars have recently presented assessment methods using eye movement data; however, the environment of their experiment was either natural or traditional displays [29,31,32]. When HMDs are utilized, the user's eyes are remarkably closer to the screen compared with traditional displays. In addition, HMDs imposes more stress to users' eyes than traditional displays due to their close distance to the screen and immersive environment. No study has investigated individuals' eye movement when HMD is used due to limitations in eye tracker. In the present study, we supervised the subjects' eye movement using an eye tracker embedded in the HMD. We extracted 10 eye movement features based on previously conducted studies [30][31][32]. Jansen et al. [30] defined fixation as an instant when our eyes were stationary and focused on an area interpreting data. It also has been reported that EBR is more sensitive to workload than other conventionally used eye-tracking measures, such as saccade rate and amplitude in a demanding visual task [31]. Changes in blinking habit indicated that a subject may suffer from a high level of eye fatigue. The saccade length was defined as distance (in pixels) between two sequential fixation points in a scanpath [32]. We also ranked the ten eye movement features to assess eye fatigue by using a feature extraction algorithm. Afterward, by using the measurements related to optometry test as the ground truth and the eye movement features as input features, we established ten eye movement feature sets and one blink feature set with different dimensions based on the ranking, and attempted to perform SVM for assessment. Table 5 shows the results of the three kinds of classifications of eye fatigue, including two-class, three-class, and four-class classifications. When comparing the performance of the eleven input feature sets, it was unveiled that the more dimensions of the feature set, the better the result of the classification. The accuracies of the blink set were sightly lower than feature set 10, which were 0.8763, 0.7458, and 0.7225 in the two-class, three-class, and four-class classification, respectively. As presented in Table 6, there were two kinds of regressions of eye fatigue. The input features were as same as multi-class classifications. The accuracies of the blink set were also slightly lower than feature set 10. We also presented two assessment models: eye tracker model and blink detector model. The general preferences and the accuracies of the two models are shown in Table 7. The eye tracker model uses feature set 10 as input data. The blink detector model uses blink set as input data. Both models could provide graded and continuous results to evaluate eye fatigue of HMDs users via analysis of parameters related to eye tracker. Further research should be conducted to verify our findings and explore new practical scenarios.

Conclusion
In the present study, an objective algorithm to estimate eye fatigue caused by an HMD using eye-tracking data is developed. The experiment had two sessions: the main experiment and the control. In the main experiment, there were four times of SSQ and optometry test, and three times of utilizing HMD, with a total duration of 35 min per participant. The participants' eyes went from a completely relaxed state to becoming quite fatigued during the experiment. We monitored the participants' eye conditions during the whole process. SSQ was used to evaluate VIMS. We used a weighted combination of seven indicators of optometry as the ground truth of eye fatigue, which is a novel definition based on our previous study. On the basis of SSQ and optometry data, we found that the relation between VIMS and the feeling of eye fatigue was small. We also demonstrated that subjective perception is not an accurate indicator of eye fatigue. An unobtrusive eye tracker installed in the VR gear was used to perform eye fatigue assessment. Ten eye movement features, including four fixation features, four blink features, and two distance features, were recorded for assessment of eye fatigue. One feature selection algorithms were applied to rank the ten eye movement features based on a particular classifier. We conducted three kinds of classifications and two kinds of regressions to evaluate the performance of different feature sets. We presented two assessment models: eye tracker model and blink detector model. Both models provided graded result and continuous result to users. The models proposed in the present study could be applied to all users.

Overview of experiment
A total of 105 subjects, who aged between 19 and 51 years old, participated in the experiment. All the subjects were with normal or corrected to normal vision. In order to ensure that the initial conditions were the same, all the subjects were fully relaxed before the experiment. Every subject was asked to observe distance (more than 5 m) for more than 30 min before the experiment. With commencement of the experiment, every subject was asked whether his/her eyes were tired or not to make sure he/she was fully relaxed. We assumed that every subject's eyes experienced no fatigue only before the experiment. Figure 4 shows the experimental procedure. Session 1 presented the main experiment. We monitored the subjects' eye movement in this session. Session 2 described the control experiment. Every subject participated in these two sessions on different days, while at the same time of the day. Each session included four times of questionnaire, four times of optometry test, and three segments of utilizing HMD. In session 1, subjects watched a video for 3 min in an HMD. An eye tracker was embedded in the HMD, which fully recorded the subjects' eye movement data. In order to be time efficient, we analyzed the data achieved in four periods of 20 s rather than 3 min. In session 2, subjects still wore the HMD at the same time as session 1, while their eyes were closed during these three parts. Besides, both sessions lasted for 35 min. During filling of the questionnaire, the participants evaluated their feelings on a four-grade quality scale. In the optometry test, we assessed seven parameters on each participant. Optometrists carried out the experiment. The total time spent in this section was 6 min. The same optometrist conducted the whole experiment on the same subject to maintain consistency. As shown in Fig. 5a, the subject wore an HTC Vive HMD in the experiment, consisting of one headset, two controllers, and four base stations. The four base stations were positioned with the headset and controllers in a room with the area of 6.6 × 5 square meters room. The screen had a resolution of 1080 × 1200 pixels/eye, and 2160 × 1200 pixels were combined. The screen's highest refresh rate was 90 Hz. As illustrated in Fig. 5b  components were manually placed into the headset. Each part could emit infrared light to read the position of each eyeball. The refresh rate was 75 Hz. The error of the tracking position was less than 0.5 • , and the delay was less than 10 ms [48]. We tested the accuracy of the eye tracker before the experiment. We aimed to design a model that could read the value of user's eye fatigue at any given time during the application of HMD. The system computed eye movement features in real time and imported the data into the assessment model. The lenses shown in Fig. 5b were used to replace the user's glasses if they wouldn't like to wear their own. However, since the lenses in aGLASS DKII only had three scales of diopter, which were not appropriate for everyone, participants were asked to wear their own glasses throughout the video-viewing process.

SSQ test
We used a traditional questionnaire, the SSQ [25], to assess VIMS. Table 8 presents the items and scoring rule. We utilized a 4-point scale, in which each symptom's variable score (0,1,2,3) was multiplied by an appropriate weight, and the weighted values were summed down the column to obtain the weighted total. The N, O, and D scores were then calculated from the total values using the conversion formulas given at the bottom of Table 8. The scoring rule is as follows:

Optometry test
We obtained seven indicators for optometry test, including BCC, PRA, NRA, PL, PR, LTL, and LTR. The test was conducted by qualified optometrists. When BCC, PRA, and NRA were tested, the distance between the eye and the target was 40cm. The detailed measurement process was described in our previous study [35]. As illustrated in Fig. 5c  we herein used two ophthalmic devices: a phoropter and an optical biometer. The phoropter was NIDEK RT-600 (NIDEK Co., Ltd, Japan). It was a comprehensive refractometer, and it was also a favorable device for optometrists to accurately perform optometry.
Although an automatic phoropter could provide an acceptable starting point in optometry, it could never replace subjective refraction [49]. We achieved indicators of the three accommodative amplitudes through the phoropter, including BCC , PRA , and NRA , which were discrete and the minimum step size was 0.25 diopter. The optical biometer was SUOER SW-9000 (Suowei Electronic Technology Co., Ltd., China). It provided the indicators of pupil diameters and lens thicknesses for both eyes.

Definition of eye fatigue
A ground truth function for eye fatigue is given based on our previous study [35] as follows: where BCC , PRA , NRA , PL , PR , LTL , and LTR represent seven indicators for optometry that we previously mentioned in Background section. Besides, F is the ground truth function of the eye fatigue. Those seven indicators were simultaneously obtained. In our previous research [35], the ground truth function for individual's eye fatigue was simply defined as The minuses shown before NRA , PL , PR , LDL , and LDR indicate that these five indicators were attenuated over time. According to the trends presented in Table 2, we redefined eye fatigue as Eqs. (4) and (5), preventing any single large numerical change due to the fluctuation of all indicators of optometry test.
where w i , i = 1, 2 . . . 7 can be formulated in the following equation: where mean time 4 is the mean value of the corresponding indicator for optometry test in the 4th measurement and mean time 1 is the mean value of the corresponding indicator of optometry test in the 1st measurement.

Eye movement test
The distance between the subject and the screen was 2 cm. During the video-watching experience, participants were able to freely blink and move their eyes and heads. In theory, the longer the experiment, the more notable the symptoms of eye fatigue would become. For ethical and humanitarian reasons, the total duration of observation in each subject's HMD was set to 9 min. We selected ten features related to eye movement, including four features related to fixation (the number of fixation points, the total duration of fixation points, the mean duration of fixation points, and the standard deviation of the duration), (2) F = g(BCC, PRA, NRA, PL, PR, LTL, LTR) four features related to blinking (the times of blinking, the total duration of blinking, the mean duration of blinking, and the standard deviation of the duration), and two features related to scanpath (the length of scan path and the mean length of saccades). The output of our eye tracker data included users' gaze locations and their corresponding times. We, in the present study, used Dispersion-Threshold Identification (I-DT) algorithm to obtain the number of fixation and duration of measurement. The minimum duration threshold was set to 200 ms [50]. The dispersion threshold was set to 1 • of visual angle. Blinking features were achieved at the same time of the extraction of fixation features. If the user blinked, the eye tracker was not able to detect the eyes and output data would not include gaze information. Therefore, we can assume that blinking occurred at this time. The duration of the consecutive null data in one time refers to duration of the blink at this time. Although an extreme gaze could also cause this condition, we did not consider this condition. Generally speaking, an extreme gaze happens when the peripheral vision picks up a very strong stimulus at the edge of the display. However, when using the HMD, the user can freely move his/her head, and the screen is always in front of his/her eyes. In the present study, scanpath features were defined by a saccade-fixate-saccade sequence on a display [51]. Here, the scanpath length and the mean length of the saccades were analyzed. The length of saccade was defined as a distance (in pixels) between two sequential fixation points in a scanpath. The scanpath length (in pixels) was taken as summation of the length of saccade in a certain period of time. As shown in Fig. 4, we analyzed eye movement in four periods, in which duration of each period was 20 s. These four periods are the closest periods to the four measurements of optometry test. In order to be time efficient, we analyzed the data obtained during 20 s rather than the 3 min. It lasted for 0.235s to compute all the eye movement features obtained during 20 s for one subject, while that lasted for 0.705s to compute all the features obtained during 1 min.

Eye fatigue assessment
We proposed an objective assessment for eye fatigue through the use of eye-tracking features. Based on the collected eye-tracking features, the assessment function is given by Eq. (6) as follows: where NF is the number of fixation points, DF represents the total duration of fixation points, MF denotes the mean duration of fixation points, VF denotes standard deviation of the fixation duration, BT is the times of blinking, DB represents the total duration of blinking, MB is the mean duration of the blinking, VB represents standard deviation of the blinking duration, SL is the scanpath length, ML denotes the mean length of the saccades, and F is an estimated fatigue value of the user's eyes at a certain period of time. Equation (6) is a real-time assessment function. All the above-mentioned ten parameters were simultaneously accumulated in a certain period of time. In this study, this period was set to 20 s.

Input data
The input data of our assessment model included the ground truth of the eye fatigue and the ten eye movement features. Since all the subjects' eyes were fully relaxed before the