Detecting fixation on a target using time-frequency distributions of a retinal birefringence scanning signal

Background The fovea, which is the most sensitive part of the retina, is known to have birefringent properties, i.e. it changes the polarization state of light upon reflection. Existing devices use this property to obtain information on the orientation of the fovea and the direction of gaze. Such devices employ specific frequency components that appear during moments of fixation on a target. To detect them, previous methods have used solely the power spectrum of the Fast Fourier Transform (FFT), which, unfortunately, is an integral method, and does not give information as to where exactly the events of interest occur. With very young patients who are not cooperative enough, this presents a problem, because central fixation may be present only during very short-lasting episodes, and can easily be missed by the FFT. Method This paper presents a method for detecting short-lasting moments of central fixation in existing devices for retinal birefringence scanning, with the goal of a reliable detection of eye alignment. Signal analysis is based on the Continuous Wavelet Transform (CWT), which reliably localizes such events in the time-frequency plane. Even though the characteristic frequencies are not always strongly expressed due to possible artifacts, simple topological analysis of the time-frequency distribution can detect fixation reliably. Results In all six subjects tested, the CWT allowed precise identification of both frequency components. Moreover, in four of these subjects, episodes of intermittent but definitely present central fixation were detectable, similar to those in Figure 4. A simple FFT is likely to treat them as borderline cases, or entirely miss them, depending on the thresholds used. Conclusion Joint time-frequency analysis is a powerful tool in the detection of eye alignment, even in a noisy environment. The method is applicable to similar situations, where short-lasting diagnostic events need to be detected in time series acquired by means of scanning some substrate along a specific path.


Introduction
The fovea, which is the most sensitive part of the retina, is known to have birefringent properties, i.e. it changes the polarization state of light upon reflection. The main cause for this birefringence are the Henle fibers surrounding the fovea, arranged in a radially symmetric pattern. When illuminated or scanned with polarized light, the Henle fiber layer produces the macular polarization cross or "bow tie", which is a windmill shaped pattern centered about the fovea [1][2][3]. The strength and contrast of the polarization cross and the orientation of its bright parts depend on the orientation and degree of polarization of the light striking the retina, which is a function of both the instrumentation and the individual corneal properties [4,5]. In recent years, the birefringent properties of the human eye have been used to identify the position of the fovea and the direction of gaze. This allows for one to check for eye alignment and strabismus (crosssightedness), a risk factor for amblyopia (called also "lazy eye"), which can potentially lead to a loss of sight in the affected eye. Screening techniques for amblyopia have been reported that are based on the birefringence signal derived from foveal circular scanning [6][7][8]. In this approach, a signal s(t) consisting of several frequency components (f 1 = k 1 *f s , f 2 = k 2 *f s , f 3 = k 3 *f s , etc.) is produced, where each frequency is a multiple of the scanning frequency f s . Some frequencies prevail during central fixation, while others appear at para-central fixation. The constants k 1 , k 2 , k 3, etc. depend on the opto-mechanical design. Figure 1 shows the time-frequency distribution of such signals, produced by a system which generates two signals. Let us assume that the eye does not fixate on the target in the time interval [t a … t b ], and fixates during [t c … t d ]. Consequently, in the time-frequency plane, we observe concentration of power at frequency k 1 *f s in the timeframe [t a … t b ], and then shift of the dominating frequency to k 2 *f s during [t c … t d ]. At time t e the eye loses fixation again, and consequently, the emphasis is shifted back to f 1 = k 1 *f s . In the simplest case, f 2 = 2f s is produced during central fixation, while f 1 = f s prevails during off-central fixation, i.e. k 1 = 1 and k 2 = 2 [6]. Existing instruments (i.e. [7]), acquire consecutive epochs of s(t), with gaps between them, during which an FFT is performed. A problem with this approach is that the FFT power spectrum is a global measure, i.e., it provides information on how much of f 1 and f 2 are represented in the whole epoch analyzed, but it does not provide information on exactly where these frequencies appear and for how long. With less-cooperative pediatric patients, important short-lasting moments of central fixation (f 2 ) may easily be hidden behind large lowfrequency (f 1 ) components, and missed. Analyzing short time intervals is desirable, in order to increase temporal resolution, but this is where the FFT becomes prone to noise and loses spectral resolution. There thus remains a need for improved methods for detecting central fixation.

Method
The method proposed here assumes that an opto-mechanical apparatus already exists, which enables the acquisition of a birefringence signal from a circular retinal scan, where the scanning circle is positioned around the expected position of the fovea during fixation on a stationary target, appearing optically in the center of the scanning circle. Such systems have been developed and described elsewhere [6][7][8][9][10]. The general design of such a device is shown on Figure 2. In short, the system uses a low-power laser beam of vertically polarized near-infrared light, to scan the retina around the presumed center of the fovea and the polarization cross. The returning light is diverted to a polarization sensor, consisting of a polarizing beamsplitter and two photodetectorsone for the vertically polarized component s , and one for the horizontally polarized component p. The difference of the two gives a measure of the polarization state of light, called also S 1 and being the second component of the full four-component Stokes vector [11]. The signal is amplified and filtered, to remove frequency components outside the region of interest, and finally digitized. The background, comprising the optical noise, is removed prior to any signal processing. This paper focuses on a design producing f 2 = 2f s during central fixation and f 1 = f s during off-central fixation (k 1 = 1, k 2 = 2) as reported in [7]. Scanning signals were recorded from six subjects after obtaining written informed consent, following a protocol approved by the Institutional Review Board (IRB) and in compliance with the Helsinki Declaration. Personal identification data was not collected, and the signals collected were not used for identification purposes. From each subject, 12 records were acquired, each 400 ms long, for both the right eye (RE) and the left eye (LE). The sampling frequency was 5 kHz, while the scanning frequency was 96 rounds per second.
The problem of detecting short-lasting episodes of central fixation, characterized by the appearance of f 2 = 2f s , can be successfully solved by using time-frequency distributions (TFD) obtained by means of the Continuous Wavelet Transform (CWT). It allows excellent localization in both time-and frequency domains and is computationally efficient. If the signal to be analyzed is s(t) and g(t) is the analyzing wavelet, the CWT is defined as a convolution of the type: where * denotes complex conjugate , a -the scale (dilation), and τ -a time shift. The wavelet g(t) and the distribution W(τ, a) are complex-valued [12][13][14][15][16][17]. The constant 1/ is used for energy normalization. The analyzing wavelet satisfies the following conditions: (a) Belong to L 2 (R), i.e. be square integrable (be of finite energy), (b)Be analytic (G(ω) = 0 for ω < 0 ) and thus be complex-valued, (c) Be admissible. This condition was shown to enable invertibility of the transform: where c g is a constant that depends only on g(t) and a is positive. For an analytic wavelet this constant should be positive and convergent which in turn imposes an admissibility condition on g(t). For a real-valued wavelet, the integrals from both -∞ to 0 and 0 to +∞ should exist and be greater than zero. Admissible wavelets have no zero frequency contribution, or, what amounts to the same, they are of zero mean, or equivalently G(ω) = 0 for ω=0: An appropriate choice for an analyzing wavelet is the admissible complex-valued wavelet of Morlet [12,18] comprising a modulated Gaussian function of optimal timefrequency concentration: As an example, Figure 3 shows a surrogate signal s(t) containing two bursts of sinosoidal signals. The first burst is a mixture of two sine waves (25 Hz and 100 Hz) occurring simultaneously, whereas the second burst, shifted in time, contains a sine wave of 50 Hz. Each of the two bursts was modulated by a three-term Blackman-Harris window: where f 1 = 25 Hz, f 2 =100 Hz, f 3 = 50 Hz and each modulation window of duration T and starting times respectively t 1 =0 ms and t 2 = 600 ms, was calculated according to: The time domain signal is shown in the upper panel of Figure 3, whereas the FFT of the same signal is shown on the middle panel, and the time-frequency distribution calculated with the CWT (equations 1 and 5) is shown on the bottom panel as a contour plot. Notably, while the FFT does detect all three frequencies correctly in the frequency domain, it does not localize the time moments of the three bursts. The CWT, conversely, does localize all three events in both time and frequency.

Results
For every epoch (T = 0.4 s) of the incoming signal, a time-frequency distribution was computed by means of the CWT (equations 1 and 5). Figure 4, top panel, shows the time domain scan signal s(t) (right eye of a 7 year old boy, f s = 96 Hz; k 1 = 1, k 2 = 2). The subject was trying to fixate on a target. The middle panel shows the FFT power of the same signal. Two main frequency components are presentone at 96 Hz (f 1 = k 1* f s = 96 Hz), characteristic of para-central fixation, and one at 192 Hz (f 2 = k 2 *f s = 192 Hz), due to central fixation. Based on the FFT plot, one can easily decide that the prevailing frequency is f 1 and therefore para-central fixation was prevailing during this record. With the CWT (bottom panel), episodes of f 1 and f 2 can easily be localized in time and in frequency dimensions simultaneously. A close look at the appearances of these components gives enough evidence to conclude that there were substantial shortlasting episodes of central fixation, manifested by intermittent appearance of components at f 2 = 192 Hz. They last between 30 ms and 100 ms each, and show up throughout the epoch. In the first 270 ms they exist side-by side with the para-central fixation components (f 1 = 96 Hz), which at times are of higher amplitude. One can easily see why the para-central components mask the central fixation components in the FFT. It should be mentioned here that sometimes remnants of the optical noise, not entirely eliminated by the background subtraction, appear at the scanning frequency f s = f 1 = 96 Hz, and can further mask the central fixation components in the FFT. Likewise, despite the polarizationkilling properties of the human skin in general, some polarized portion of the light gets reflected by the face and returns into the optical system, reaching the sensors and giving rise to additional interference at f s = f 1 = 96 Hz. This emphasizes the importance of analyzing the signals in the time-frequency plane, which gives a better chance of separating signal from noise.
In all six test subjects, the CWT allowed precise identification of both frequency components. Moreover, in four of these subjects, episodes of intermittent but definitely present central fixation were detectable, similar to those in Figure 4. A simple FFT is likely to treat them as borderline cases, or entirely miss them, depending on the discrimination thresholds used. In one subject with stable central fixation, the frequency component f 2 = k 2 *f s = 192 Hz was strong and persistent when the subject was fixating on the target, and therefore both the FFT and CWT revealed frequency doubling. But the vast majority of recordings were a constantly changing mixture of the two spectral components. Depending on the time interval analyzed and the duration of central fixation episodes, the FFT was able to detect many episodes, usually the longer lasting ones (longer than 150 ms) and in the absence of noise, whereas the CWT always detected central fixation, even short instants lasting less than 60 ms. Uncompensated portions of the background noise did sometimes affect the f s = f 1 = 96 Hz signal, but never the central fixation frequency f 2 = k 2 *f s = 192 Hz.

Discussion
This work is related mainly to methods aiming at detection of strabismus (crosssightedness), eye alignment and amblyopia ("lazy eye") in young test subjects, where patient cooperation is problematic and the detection of short-lasting episodes is important and even crucial in some cases. The robust application of the CWT makes it applicable to any type of optical systems acquiring and analyzing signals distinguishable in the joint time-frequency domain, especially in systems with abundant optical noise of origin different from that of the signals measured. A condition would be that the noise does not overlap in time and frequency simultaneously with the frequency and temporal location of the event. In addition to amblyopia, the technique described can be used for the detection of frequent and short-lasting losses of fixation, possibly indicative of nystagmus, attention-deficit-hyperactivity disorder (ADHD), autism, or other neuropsychologic disorders.
The CWT is not the only technique available for time-frequency expansion. The short-time Fourier transform, STFT, (also known as the windowed Fourier transform) localizes the signal in time-and frequency domain by modulating the time signal with a window function before performing the Fourier transform, to obtain the frequency content of the signal in the region of the window. As a rule, it is a compromise between time and frequency resolution; the wider the window, the higher the frequency resolution, at the cost of poorer time resolution, and vice versa [16]. Any attempt to increase the frequency resolution causes a larger window size and therefore a reduction in time resolution, and vice-versa. Also, in order to be able to analyze transients, overlapping windows need to be used, which can slow down analysis considerably, and acts like a low-pass filter in the time domain.
Another alternative, most widely used two or three decades ago, is the Wigner-Ville distribution (WVD). Its definition for time-frequency analysis is: where i ¼ ffiffiffiffiffiffi ffi À1 p is the imaginary unit, and * denotes complex conjugation [19][20][21][22]. In essence, the WVD is the Fourier transform of the input signal's autocorrelation function, i.e. the Fourier spectrum of the product between the signal and its delayed, time reversed copy, as a function of the delay. Unlike the short-time Fourier transform, the Wigner distribution function is not a linear transform. A cross term ("time beats") occurs when there is more than one component in the input signal, analogous in time to frequency beats.
In order to reduce the cross term problem, many other transforms have been proposed, the most prominent one perhaps being the Cohen's class distribution. The best known member of Cohen's class distribution function is the Choi-Williams distribution function [22]. This distribution function adopts an exponential kernel to suppress the cross-term: and the kernel function is However, the kernel gain does not decrease along the η and τ axes in the ambiguity domain, and, consequently, the kernel function of the Choi-Williams distribution function can only filter out the cross-terms resulting from components away from the η and τ axes and away from the origin.
In summary, the CWT appears to be the tool of choice when time-frequency distributions are needed in order to detect different frequency components appearing simultaneously, or in different moments in time. This is particularly true of high frequency events of short duration and low amplitude (small scales), or longer-lasting lowfrequency oscillations of higher amplitude (large scales), as is the case here.
There are some limitations of the optical and opto-mechanical hardware involved in this technology. Mechanical vibrations, optical back-reflections (reflections from reflective surfaces back to the sensors before actually the light reaches the eyes), multiple internal reflections not captured by the light traps, some birefringence caused by the beam splitters, and others, can all cause instrumental noise which translates into parasitic frequency components. Some of them may overlap with the central fixation frequency of interest and thus mask it, making a differentiation in the joint timefrequency domain difficult. Such artifacts can be avoided or moved away from the region of interest, by careful choice of the mechanical parameters (like scanning speed), optical design (i.e. spinning wave plates), or proper design of the analog electronics (i.e. filters).
Other limitations arise from the measurement principle. Because only about 1/1000 of the polarized light used for measuring is returned by the fovea, the distance between the measurement system and the eye cannot be increased much beyond 40 cm. Further, the room illumination should be dimmed, to prevent pupil constriction and reduction of the near-infrared light going through it. This not only decreases the signsl-to-noise ratio, but can cause patient discomfort, and is one more reason to aim for fast and reliable tests.