In most of the previous studies, the step response of oxygen uptake during exercise which is shown in Fig. 1 has been considered as an exponential function [2]:
$$\begin{aligned} VO_2(t)=VO_{2}^{0}+{\beta } \left[1e^{\frac{{(tt_d)}}{T_p}}\right], \end{aligned}$$
(1)
where \(t_d\) is the time delay, \(\beta\) is the steady state gain of the system, \(T_p\) is the time constant and \(VO_2^{0}\) is the baseline value of oxygen uptake. Based on Laplace transform, the transfer function in regarding with Eq. (1) can be derived as follows:
$$\begin{aligned} VO_2(s)=\frac{\beta e^{t_d s}}{{T_p} s+1}. \end{aligned}$$
(2)
However, sometimes, a firstorder system with time delay cannot lead to the best results due to the pattern of measurements (breath by breath) and the individual variation of oxygen uptake. It is likely that the model of VO_{2}—Speed of some individuals should be described by highorder dynamic systems. However, to correctly identify a highorder system, specific input such as pseudorandom binary sequence (PRBS) is necessary to well stimulate the system. As we know, during treadmill exercise, it is unpractical for users to follow an ideal PRBS signal as input. Therefore, the \(VO_2\) uptake during treadmill exercise has been mainly considered as a firstorder system and modelled by a firstorder ARX model previously. However, a firstorder transfer function and a high order transfer function which can be decomposed to serval firstorder transfer functions, can lead to quite similar responses with a step input. Therefore, it is generally difficult to identify the correct order for the transfer function which describes the input–output relation. Hence, to overcome this shortage, nonparametric methods are developed to provide better accuracy in this situation. In order to obtain more acceptable results, we exploited a newly developed nonparametric modelling method which make use of finite impulse response (FIR) to describe the system’s characteristic.
Kernel based estimation method of finite impulse response
In this section, a new kernel based nonparametric estimation method is exploited to model the oxygen uptake during treadmill exercise. For this nonparametric estimation method, it is not necessary to predefine the order of the model in advance. Furthermore, it will be shown that the proposed method can provide stable and smooth estimation comparing to other estimation methods, cf [15].
For nonparametric model, we select t with sampling time T as the time index. The relationship between the running speed (u(t)) and oxygen uptake (y(t)) can be considered as a single input single output (SISO) dynamic system. Therefore, for the impulse response of this SISO system, the discrete time output can be calculated as [17]:
$$\begin{aligned} y(t)=G_o(q)u(t)+\varepsilon (t),\quad t=1,2,3\ldots ,M, \end{aligned}$$
(3)
where q represents the shift operator, i.e. \(qu(t)=u(t+1)\), \(\varepsilon (t)\) is the Gaussian white noise and \(G_0(q)\) is expressed as:
$$\begin{aligned} G_o(q)=\sum _{k=1}^{\infty }g_k^0q^{k}, \quad k=1,2,3\ldots ,\infty , \end{aligned}$$
(4)
where \(g_k^0\) represents the coefficient from the impulse response \(G_o(q)\). For linear response, the impulse response decays exponentially for stable \(G_0(q)\). Therefore, normally, the mth order finite impulse response is able to describe the system as:
$$\begin{aligned} G(q,\varvec{\theta })=\sum _{k=1}^{m}g_kq^{k},\quad \varvec{\theta }=[g_1,g_2,\ldots ,g_m]^T, \end{aligned}$$
(5)
where \(\varvec{\theta }\in \mathbb {R}^m\) is the unknown parameter to be identified hereafter. Hence, the model in Eq. (3) can be written as:
$$\begin{aligned} y(t)=\varvec{\varphi }^T(t)\varvec{\theta }+\epsilon (t), \end{aligned}$$
(6)
where \(\varvec{\varphi }(t){\in \mathbb {R}^m}\) contains the input information of the system:
$$\begin{aligned} \varvec{\varphi }(t)= \big[u(t1),u(t2),\ldots ,u(tm) \big]^T. \end{aligned}$$
(7)
Then, the FIR model can be expressed in matrix form as:
$$\begin{aligned} \varvec{Y}_N=\varvec{\phi }_N\varvec{\theta }+\varvec{\varepsilon }_N, \end{aligned}$$
(8)
where \(N=Mm\), with M being the number of collected data points. As an illustration, the ith row of \(\varvec{Y}_N\in \mathbb {R}^N\), \(\varvec{\varepsilon }_N\in \mathbb {R}^N\) and \(\varvec{\phi }_N\in \mathbb {R}^{N\times m}\) are \(y(m+i)\), \(\varepsilon (m+i)\) and \([u(m+i1), u(m+i2),\ldots , u(i)]\). Apparently, the straightforward cost function based on Eq. (8) can be expressed as:
$$\begin{aligned} {\text {CostFunc1:}\quad \quad \hat{\varvec{\theta }}=\arg } \min _{\varvec{\theta }\in \mathbb {R}^m}\varvec{Y}_N\varvec{\phi }_N\varvec{\theta }^2_2. \end{aligned}$$
(9)
By minimising the cost function (9), \(\hat{\varvec{\theta }}\) can be derived by least square (LS) estimation or maximum likelihood (ML) estimation easily. However, this is inappropriate for modelling the oxygen uptake impulse response, as the input is only a square signal, and the measurements are normally extremely noisy [12] and thus it is likely that \(\varvec{\phi }_N^T\varvec{\phi }_N\) is illcondition. Hence, to guarantee the validity of the obtained model and avoid any illconditioned solution, a regularisation term is crucial to reduce the variation of the estimated parameters in the objective function [16]. Then, the cost function can be considered as:
$$\begin{aligned} {\text {CostFunc2:}\quad \quad \hat{\varvec{\theta }}=\arg } \min _{\varvec{\theta }\in \mathbb {R}^m}\varvec{Y}_N\varvec{\phi }_N\varvec{\theta }^2_2+\gamma \varvec{\theta }^T\varvec{W}\varvec{\theta }, \end{aligned}$$
(10)
where the first term implies the modelling error, \(\gamma\) is a positive coefficient controlling the trade off between the error term and the regularisation term. \(\varvec{W}{\in \mathbb {R}^{m\times m}}\) is a weighting matrix, which can be used to prioritise between system parameters. For a normal regulariser \(\varvec{\theta }^T\varvec{W}\varvec{\theta }\), regularised least square estimation (ReLS) is a standard method to obtain the solution based on cost function (10). This can be seen as an improved method out of ridge regression or weighted ridge regression [18] depending on the selection of matrix \(\varvec{W}\).
However, ReLS cannot achieve desired solution when the input stimulation is insignificant and the measurement has high noise level. In order to obtain a better FIR model of the oxygen uptake model, we introduce a newly developed kernel method based on the work in [14, 15]. Let us recall Eq. (5), assuming that the FIR function \(g\in R^{m}\), then the function \( g \) in regularisation term can be projected into a reproducing kernel Hilbert space (RKHS), i.e., \(g\rightarrow g_{\mathcal {H}}\) (\(R^{m}\times R^{m}\rightarrow R^{m\times m}(\mathcal {H})\)). The advantage of this transform is the penalisation of the high frequency components in the function g [15]. Different from ReLS which only focuses on solving the equalities with illconditioned matrices, the regularisation term \(g_{\mathcal {H}}\) can perform better for minimising the mean square error of finite impulse response \(g(\cdot )\) [17]. If we use the kernel matrix \(\varvec{P}\) based on \(g_{\mathcal {H}}\) to replace \(\varvec{W}\) in Eq. (10), the kernel based estimation method can be expressed as:
$$\begin{aligned} {\text {CostFunc3:}\quad \quad \hat{\varvec{\theta }}=\arg } \min _{\varvec{\theta }\in \mathbb {R}^m}\varvec{Y}_N\varvec{\phi }_N\varvec{\theta }^2_2+\gamma \varvec{\theta }^T\varvec{P}^{1}\varvec{\theta }, \end{aligned}$$
(11)
where \(\varvec{P}{\in \mathbb {R}^{m\times m}}\) represents the kernel matrix whose details are given in the next section. With the priori information in kernel matrix \(\varvec{P}^{1}\), the estimated \(\hat{\varvec{\theta }}\) from Eq. (11) can provide better and smoother results comparing to ReLS or ridge regression [15]. Furthermore, unlike support vector regression (SVR) [19], and for this nonparametric method, the inputs and system parameters from the error term are not projected to a higher dimension as the estimated system parameters from SVR are normally hard to recover from projected space to original system parameters. Furthermore, during the \(VO_2\) uptake modelling, this model tends to have relatively large time constant based on the previous research [12]. Hence, we expect that the last several parameters of the estimated FIR approach to zero. Therefore, we need to add an extra \(\mathcal {L}_1\) regularisation term as another regulariser to sparsify the transfer function identified, by which the overall cost function can be expressed as:
$$\begin{aligned} {\text {CostFunc4:}\quad \quad \hat{\varvec{\theta }}=\arg } \min _{\varvec{\theta }\in \mathbb {R}^m}\varvec{Y}_N\varvec{\phi }_N\varvec{\theta }^2_2+\gamma \varvec{\theta }^T\varvec{P}^{1}\varvec{\theta }+\alpha \varvec{\theta }_1. \end{aligned}$$
(12)
where \(\alpha\) is a positive coefficient to control the trade off between \(\mathcal {L}_1\) regulariser, kernel regulariser \(\gamma \varvec{\theta }^T\varvec{P}^{1}\varvec{\theta }\) and the error term.
The above equation can be considered as a special case of elastic net [20], in which the \(\mathcal {L}_2\) norm regularisation is weighted by kernel matrix \(\varvec{P}^{1}\). Let us rearrange Eq. (12) and define a parameter \(\varvec{\phi }^*_N\in \mathbb {R}^{(N+m)\times m}\) as:
$$\begin{aligned} \varvec{\phi }^*_N=\frac{1}{\sqrt{1+\gamma }}\begin{bmatrix} \varvec{\phi }_N \\ \sqrt{\gamma }\varvec{R}\end{bmatrix}, \end{aligned}$$
(13)
where \(\varvec{R}\in \mathbb {R}^{m\times m}\) is the upper triangular matrix from Cholesky factorisation of kernel matrix \(\varvec{P}^{1}\) (\(\varvec{P}\) is symmetric).
If denote, \(\varvec{Y}^*_N\in \mathbb {R}^{N+m}\) can be defined as:
$$\begin{aligned} \varvec{Y}^*_N=\begin{bmatrix} \varvec{Y}_N\\ \varvec{0}\end{bmatrix}. \end{aligned}$$
(14)
Then, the cost function (12) can be rewritten as:
$$\begin{aligned} {\text {CostFunc5:}\quad \quad \hat{\varvec{\theta }}^*=\arg } \min _{\varvec{\theta }^*\in \mathbb {R}^m}\varvec{Y}_N^*\varvec{\phi }_N^*\varvec{\theta }^*^2_2+\frac{\alpha }{\sqrt{1+\gamma }}\varvec{\theta }^*_1, \end{aligned}$$
(15)
where \(\varvec{\theta }^*{\in \mathbb {R}^m}\) is defined as:
$$\begin{aligned} \varvec{\theta }^*=\sqrt{1+\gamma }\varvec{\theta }. \end{aligned}$$
(16)
Due to the limitation of the input signal, the input matrix \(\varvec{\phi }^T_N\varvec{\phi }_N\) is not orthogonal. Therefore, the close form solution of LASSO [21] cannot be applied to achieve the solution of the optimisation problem of CostFunc5 directly. Here, an interiorpoint method [22] is adopted for this \(\mathcal {L}_1\) norm regularisation for \(\hat{\varvec{\theta }}^*\). At the end, \(\hat{\varvec{\theta }}^*\) can be restored to \(\hat{\varvec{\theta }}\) according to Eq. (16) as:
$$\begin{aligned} \hat{\varvec{\theta }}=\frac{1}{\sqrt{1+\gamma }}\hat{\varvec{\theta }}^*. \end{aligned}$$
(17)
Kernel selection
Several kernels have been applied or developed for the proposed nonparametric kernel estimation method, such as polynomial kernel, radial basis function (RBF) kernel, stable spline (SS) kernel, diagonal/correlated (DC) kernel, diagonal (DI) kernel, etc. Due to the use of Cholesky matrix decomposition in the proposed nonparametric modelling method, the kernel matrix must be symmetric positive definite. As a result, SS kernel, DI kernel and DC kernel were selected. SS kernel, DC kernel and DI kernel were developed in [14, 17]. In addition, as the impulse response of a stable process decays exponentially with a certain rate, the SS, DC and DI kernels, which belong to amplitude modulated locally stationary (ALMS) kernel, can often achieve deserved results when identifying FIR model. These three kernels are defined as follows:

DI kernel:
$$\begin{aligned} P(i,j)=\left\{ \begin{aligned}&c\lambda ^{i},\quad i=j\\&0 \end{aligned}\right. , \end{aligned}$$
(18)
where \(c>0\), \(1 {>} \lambda {>} 0\).

SS kernel:
$$\begin{aligned} P(i,j)=\left\{ \begin{aligned}&c\frac{\lambda ^{2i}}{2}\left( \lambda ^i\frac{\lambda ^{j}}{3}\right) ,\quad i\ge j\\&c\frac{\lambda ^{2j}}{2}\left( \lambda ^j\frac{\lambda ^{i}}{3}\right) ,\quad j\ge i \end{aligned}\right. , \end{aligned}$$
(19)
where \(c >0,\lambda >0\).

DC kernel:
$$\begin{aligned} P(i,j)=c\rho ^{ij}\lambda ^{i+j/2}, \end{aligned}$$
(20)
where \(c> 0\), \(1>\lambda > 0\), \(\rho \le 1\) and \(\rho \ne 0\).
More details about the kernels above can be found in [23].