### Experiments arrangement

Behavioral FDLs were obtained first for all subjects. Then, FFRs for two-tone stimuli were recorded for tones with frequency difference equal to: 1) FDL (100% FDL condition); 2) 75% of FDL (75% FDL condition); and 3) 50% of FDL (50% FDL condition). Some subjects with relatively large FDLs (with 50% FDL similar to the 75% FDL of other subjects) were selected to participate in the 50% FDL condition. The 50% FDLs of these subjects were paired with the 75% FDLs of others. Control FFRs were recorded with the earphone blocked and the subject’s ear occluded. The order of the FFR recordings was randomized. Each test or condition lasted around half an hour. For each subject, the tests were taken separately on different days in two weeks.

### Subjects

FDLs were obtained from fifteen college students (3 males, 12 females; ages 19–25). Their FFRs were then recorded in the 100% and 75% FDL conditions. Six of the subjects were additionally tested in the 50% FDL condition. All subjects were native speakers of Mandarin Chinese and had normal hearing sensitivity (better than 15 dB HL for octave frequencies from 250 to 8000 Hz). Participants reported no history of neurological or psychiatric illnesses, and no music instruction. All subjects were paid for their time and gave informed consent in compliance with a protocol approved by the institutional review board at Tsinghua University.

### Psychoacoustic experiment

#### Stimuli

Stimuli were tone pairs. One tone was always set at 140 Hz (the reference frequency or F_ref) and the other (the comparison frequency or F_comp) at a higher frequency. The duration of both tones was 250 ms, including 10-ms rise/fall times shaped with a Blackman window. An insert earphone delivered the stimuli to the right ear. The overall level of each tone including onset and offset ramps was 83 dB SPL. Calibration was performed with a Brüel & Kjær type 3160-A-042 sound analyzer and a type 4157 2-cc coupler.

#### Behavioral procedure

Frequency discrimination was tested using an adaptive three-interval forced choice procedure [two-down, one-up rule [22], programmed using MATLAB]. For each test trial, subjects heard three sequential intervals, two of them identical, containing the reference frequency F_ref, and the other one containing the comparison frequency F_comp. F_comp was always higher than 140 Hz and its initial value was 170 Hz. The three tones for each trial were assigned randomly. Subjects were instructed to identify the interval perceived as having a higher pitch by mouse clicking on the corresponding button on a computer monitor. The inter-sound interval was 0.8 s and there was a pause with 3-s duration between the subject selecting an answer and the beginning of the next trial. After two consecutive correct responses, F_comp was decreased by one step for the subsequent trial; conversely, F_comp was increased by one step following a single incorrect response. The step size was changed adaptively: its initial value, 6 Hz, was decreased to 1/\sqrt{2} of its previous value after each reversal. When the step size reached a pre-determined small value (0.1 Hz), it remained fixed. Each trial included 14 reversals, and the geometric mean of the frequency differences (F_comp-F_ref) across the last 8 reversals was taken as the FDL (in Hz). Every subject took at least one practice test and two formal tests. The mean of the formal tests was taken as the individual’s behavioral FDL. All tests were performed in an acoustically- and electrically-shielded booth.

### FFR experiment

#### Stimuli

Two tones with frequencies F_ref and F_comp were presented alternatively. The duration of each tone was 144 ms, including 7-ms rise/fall times shaped with a Blackman window and 0° initial phase. This duration, shorter than that used in the psychoacoustic experiment, was of little consequence, since frequency discrimination improves minimally with increasing duration beyond 100 ms [23, 24]. Stimuli were delivered to the right ear at 83 dB SPL through an insert earphone (Etymotic, ER-2) at a rate of 2.4 per second. The earphone was shielded in a Faraday cage, with the transducers and electric wires wrapped by aluminum foil linked to common ground [25, 26].

F_ref was always 140 Hz but F_comp varied across the three FFR conditions for each subject. 1) In the 100% FDL condition, F_comp equaled 140 Hz plus FDL. 2) In the 75% FDL condition, F_comp equaled 140 Hz plus 75% of FDL. 3) In the 50% FDL condition, F_comp equaled 140 Hz plus 50% of FDL.

#### FFR recording

Subjects were seated comfortably in an acoustically- and electrically-shielded booth. They were instructed to relax and to refrain from moving during data recording to minimize myogenic artifacts. FFRs were recorded with a Grass LP511 AC amplifier and digitized by a National Instruments data acquisition card, as described in our previous work [27]. A vertical electrode montage [3, 4], with the non-inverting electrode placed on the midline of the forehead at the hairline (+, F_{z}), a reference electrode placed on the ipsilateral mastoid (-, M2) and the common ground electrode placed on mid-forehead (F_{pz}). FFRs were recorded differentially from F_{z}-to-ipsilateral mastoid, amplified by a factor of 50,000 and band-pass filtered (1–3000 Hz) online. FFR recording was started 77.7 ms before the onset of stimuli and was ended at 61.6 ms post-stimuli offset. Two thousand sweeps were recorded for each tone with a sampling rate of 10,000 Hz. The ear probe included an ER-2 earphone and a microphone (ER-10B+, Etymotic research), through which the ear-canal sound pressure was recorded concurrently with FFR recordings. The sound pressure signal was used to ascertain appropriate placement of the earphone and to measure the system delay, including the travel time in the 30-cm rubber tubes. In addition, the cross-correlation between the sound pressure signal and FFR recording was performed and the time lag corresponding to the highest peak of the cross-correlation function was taken as the latency of FFR. All the latencies of our FFR recordings were in the range of 5 ~ 10 ms, much larger than the latency of cochlear microphonic occurring within 1 ms after the stimulation onset [28]. Background EEG noise was also recorded when stimuli were rendered inaudible by blocking the earphone. As shown by the dotted line in Figure 1, there were no periodic components in the background noise. Thus, our FFR recordings reflected neural activity rather than CM or stimulus artifact [27].

### FFR data processing and analysis

FFR processing and analysis were done offline, using software programmed in MATLAB, after isolating the 2000-sweep raw data for each of the two tones. Pre-processing was done first and then data were selected according to the calculated SNR. FFRs were analyzed both in the frequency and time domains (using spectra and autocorrelations, respectively) to extract the frequency encoded in the FFR. To represent the stimulus frequency difference, FFRs to paired tones was discriminated by comparing their spectra and autocorrelations.

#### Pre-processing

Details about pre-processing were described in our previous work [27]. Monitoring of sound in the ear canal permitted the identification and exclusion of trials which were contaminated by subject motion or slippage of the earphone. Trials in which the FFR signal excursions exceeded 95% of the measuring range of the recording equipment were also excluded from further analysis. The remaining trials were averaged together. Then, a posterior Wiener filtering and band-pass (70–210 Hz) filtering were used to reduce noise. The analysis time window was 12 ~ 145.6 ms, corresponding to the steady-state portion of the stimulus tones.

#### Signal-to-noise ratio (SNR)

After pre-processing, SNRs were calculated taking the intervals 12 ~ 145.6 ms and -77.7 ~ 0 ms, respectively, as the signal and the noise. The SNR is the ratio of the root mean square (RMS) amplitude of the FFR signal relative to the RMS amplitude of the noise, expressed in decibels. In cases where the FFR was smaller than the noise (i.e., SNR < 0 dB), the subject’s FFR data were excluded from further statistical analysis. This happened for one subject in the 100% FDL condition and another in the 75% FDL condition. After these exclusions, the SNR values were 8.66 ± 4.1 dB (mean ± standard deviation) for the 100% FDL condition and 8.31 ± 2.78 Hz for the 75% FDL condition.

#### Spectrum analysis

Data in the interval 12 ~ 145.6 ms were subjected to spectrum analysis. Classical spectrum analysis (e.g., the periodogram) was not used because of its poor frequency resolution: in our case, the resolution is about 1000 / (145.6-12) ms or ~7.48 Hz (i.e., larger than the behavioral FDLs). As an alternative, we chose the autoregressive (AR) spectral estimator, which has high resolution. The approximate resolution of AR spectral estimator is given in equation (1),

\delta {f}_{\mathit{\text{AR}}}=\frac{1.03}{p{\left[\eta \left(p+1\right)\right]}^{0.31}}

(1)

where *η* is the SNR of one sinusoid, *p* is the order of the AR model [29] and *pη* > 10. The AR model parameters were estimated using the modified covariance method and *p* was set to 32. Since the minimum SNR was 4 dB, *η* ≥ 4 and the frequency resolution of the AR spectrum *δf*
_{
AR
} < 0.007Hz.One example of AR-spectrum estimation is shown in Figure 1. The highest spectral peak indicates the frequency at which the FFR has most of its energy, i.e., its “characteristic frequency”, corresponding to the stimulus frequency. The -10 dB frequency points at which the spectrum amplitude is 10 dB lower than the peak amplitude were extracted. To separate FFRs to tones with different frequencies, the spectral peaks should be sufficiently narrow to prevent overlap at the -10 dB frequencies. Then, the “-10 dB frequency gap” was calculated by subtracting the higher -10 dB frequency of F_ref from the lower -10 dB frequency of F_comp, to estimate the separation between spectral peaks [see Figure 2(A)].

#### Autocorrelation

Autocorrelation (ACF) can be used to detect periodicity within a signal. After preprocessing, autocorrelation was performed by making a copy of the FFR signal and shifting it forward in time. For discrete signal representation, the entire signal is *x*(*n*), *n* = 1 ~ *N*. N is the total number of sampling points. The autocorrelation function is computed as equation (2).

r\left(m\right)=\frac{{\displaystyle {\sum}_{n=1}^{N}x\left(n\right)x\left(n-m\right)}}{{\displaystyle {\sum}_{n=1}^{N}{x}^{2}\left(n\right)}},m={0}^{~}N-1

(2)

The first peak in the normalized autocorrelation function at a non-zero lag reflects the dominant periodicity. Autocorrelation can also be used to estimate the characteristic frequency of the FFR, calculated as 1/*d*, where *d* is the time shift that yields a local maximum, representing the period of the FFR. *d* was estimated as the mean of the first ten inter-peak intervals, thus improving the accuracy of calculation.

Only FFRs evoked by F_ref were used to calculate FFR frequency tracking accuracy, since F_ref across all subjects and all experiment conditions were the same (140 Hz). The FFR signal was windowed into 30-ms bins with a 1-ms step shift of the window. The time lag of the maximum ACF peak in the *i*
_{
th
} bin was *τ*
_{
i
}. The RMS of differences between the time lags and the stimulus periodicity across all time bins serves as a measure of FFR frequency tracking accuracy, calculated as:

\mathrm{\Delta a}={\left[\frac{{\displaystyle {\sum}_{i=1}^{Q}{\left({\tau}_{i}-\frac{1}{f}\right)}^{2}}}{Q}\right]}^{\frac{1}{2}}

(3)

where *Q* was the total number of time bins and *f* was the frequency of stimulus. Higher Δa values indicated poorer FFR frequency tracking accuracy results [27].

### Statistical analysis

Statistical analysis was conducted with IBM SPSS Statistics 20.0 software. Levene’s Test for Equality of Variances and Shapiro-Wilk test for normality were applied for each statistical analysis except for non-parametric test.

Linear regression was used to assess the relationship between FFR measures and the stimulus frequency difference. The characteristic frequencies (FFR_F_ref and FFR_F_comp) of responses evoked respectively by the reference frequency, F_ref, and the comparison frequency, F_comp, were calculated in the frequency domain by AR spectrum estimation. The linear regression analyses were performed between the stimulus frequency difference (FD = F_comp-F_ref), the independent variable, and the FFR frequency difference (AR_FD = FFR_F_comp-FFR_F_ref), the dependent variable. In the case of autocorrelation method, the FFR period difference (ACF_PD = FFR_d_comp-FFR_d_ref) was taken as the dependent variable. A comparison of the regression results of 100% FDL condition and 75% FDL condition allowed ascertaining the correspondence between changes in the FFR measures and changes in the frequency difference between the tone stimuli.

Paired-Sample T tests were carried out to compare the differences on FFR frequency tracking accuracy in the 100% and 75% FDL conditions. For the comparison of 50% FDL condition and 75% FDL condition, non-parametric statistics Wilcoxon signed ranks tests were used.