Interpretable classification for multivariate gait analysis of cerebral palsy

Background The Gross Motor Function Classification System (GMFCS) is a widely used tool for assessing the mobility of people with Cerebral Palsy (CP). It classifies patients into different levels based on their gross motor function and its level is typically determined through visual evaluation by a trained expert. Although gait analysis is commonly used in CP research, the functional aspects of gait patterns has yet to be fully exploited. By utilizing the gait patterns to predict GMFCS, we can gain a more comprehensive understanding of how CP affects mobility and develop more effective interventions for CP patients. Result In this study, we propose a multivariate functional classification method to examine the relationship between kinematic gait measures and GMFCS levels in both normal individuals and CP patients with varying GMFCS levels. A sparse linear functional discrimination framework is utilized to achieve an interpretable prediction model. The method is generalized to handle multivariate functional data and multi-class classification. Our method offers competitive or improved prediction accuracy compared to state-of-the-art functional classification approaches and provides interpretable discriminant functions that can characterize the kinesiological progression of gait corresponding to higher GMFCS levels. Conclusion We generalize the sparse functional linear discrimination framework to achieve interpretable classification of GMFCS levels using kinematic gait measures. The findings of this research will aid clinicians in diagnosing CP and assigning appropriate GMFCS levels in a more consistent, systematic, and scientifically supported manner. Supplementary Information The online version contains supplementary material available at 10.1186/s12938-023-01168-x.


Classification results with kinematic variables from the left
In this section, we present classification results with kinematic variables from the left-side measurements.Classification accuracy along with false negative rate and false omission rate are shown in Tables S1, S2, and S3, respectively.We also present estimated discriminant functions using SFLDA in Figures S1 S2.They generally show similar results to the analysis using the right-side measurements presented in the paper, due to strong correlations between left-and right-side measurements.2 Details of synthetic data generation In this section, we address the details of synthetic data generation for simulation studies.For simplicity, consider the concatenated version of β j , denoted as β(t), t ∈ j I j , which is created by simply linking β 1 , . . ., β p without connecting them at the knots, thus allowing discontinuity points at p − 1 knots.Define X(t) and δ(t) in a similar way, and let Γ be the covariance operator where γ(s, t) = Cov(X(s), X(t)).

Construction of population covariance structures
In the simulation study, three different trivariate covariance structures (A, B, and C) were used.Let γ 1 , γ 2 and γ 3 denote the covariance function for A, B and C respectively.We constructed γ 1 and γ 2 using Matérn covariance functions while construction of γ 3 was inspired from Chiou et.al. ( 2014) where we use Gaussian covariance function and rational quadratic covariance function along with Matérn covariance function.
Matérn covariance function is defined as Here, G and K ν denote the gamma function and the modified Bessel function of the second kind respectively, while σ, ρ, ν are parameters controlling the covariance structure.Gaussian covariance function is defined as where σ and V are parameters for the covariance structure.Finally, the rational quadratic covariance function is defined as with parameters α and k.
Since the implementation of SFLDA used a discretization strategy, covariance matrices, γ 1 , γ 2 , γ 3 , were constructed from discretization of covariance functions.Consider T equispaced grid points t i , i = 1, ..., T , on [0, 1].With these grid points, T × T covariance matrix, γ, can be constructed where (i, j) entry equals γ(t i , t j ).Here, γ can be Matérn, Gaussian, or rational quadratic covariance function.A distinctive feature of multivariate covariance structure is that the covariance matrix is in blocks, each block representing the covariance matrix among covariates.We discretize our synthetic functional data from one covariate with 100 equispaced points.Hence, our final trivariate covariance matrices will be 300 × 300 matrices with 9 blocks.

Construction of γ 3
As noted earlier construction of γ 3 is mostly inspired from the simulation settings introduced in Chiou et.al. (2014).Let γ jk indicates (j, k)th block of the matrix γ 3 .That is, where γ jk is a 100 × 100 matrix.Then we construct the block-wise γ 3 as follows.
Finally, the samples for each class (m = 0, 1, or 2 for multi-class) were generated as x m (t) = µ m (t) + j Ij γ 1/2 (t, s)e m (s)ds where e m (s) are random noise generated from Gaussian distribution with mean 0 and standard deviation τ which is a parameter to set.
Table S5 and Table S6 present the parameters used for each setting.Setting 2 is where all parameters are the same as Setting 1 but with unbalanced observations from each class.3 Estimated discriminant function from setting 3 Here, we present the estimated discriminant function from setting 3 where the population discriminant function is non-sparse.

Figure S1 :
Figure S1: The estimated univariate SFLDA discriminant functions β from each kinematic variables.A Hip. B Knee. C Ankle.Discriminant functions from binary classification tasks '0 vs. 1', '1 vs. 2', '2 vs. 3' are shown in the first rows, respectively shown in solid green, dotted purple, and dot-dashed orange curves.The β from the task '0 vs. {1,2,3} is shown in red in the second row.

Figure S2 :
Figure S2: The estimated multivariate discriminant function by MV SFLDA (divided into corresponding kinematic variables).A Hip. B Knee. C Ankle.Descriptions are the same with previous figure.

Figure S3 :
Figure S3: Sample curves from setting 3 with three different covariance structures.A Type A. B Type B. C Type C. Red and blue thick curves are class-wise means.

Figure S4 :
Figure S4: Sample curves from setting 4 with three different covariance structures.A Type A. B Type B. C Type C. Red, blue, and green thick curves are class-wise means.

Figure S5 :
Figure S5: Sample curves from setting 5. Red and blue thick curves are class-wise means.

Figure S6 :
Figure S6: The estimated discriminant functions β from 100 repetitions for setting 3 with three different covariance structures.A Type A. B Type B. C Type C. Discriminant functions estimated by MV SFLDA are presented in the upper panels and the functions from univariate FLR are presented in the lower panels.Their mean curve is shown in the solid black curve while the true β is in the dotted black curve in each panel.

Table S2 :
False negative rates

Table S3 :
False omission rates

Table S4 :
Parameters for each blocks.

Table S5 :
Parameters for discriminant functions in the simulation study.

Table S6 :
Parameters for mean functions in the simulation study Figures S3 -S5 show some sample functions from settings 3, 4, and 5, respectively.