A multivariate relationship between the kinematic and clinical parameters of knee osteoarthritis population

Background Biomechanical and clinical parameters contribute very closely to functional evaluations of the knee joint. To better understand knee osteoarthritis joint function, the association between a set of knee biomechanical data and a set of clinical parameters of an osteoarthritis population (OA) is investigated in this study. Methods The biomechanical data used here are a set of characteristics derived from 3D knee kinematic patterns: flexion/extension, abduction/adduction, and tibial internal/external rotation measurements, all determined during gait recording. The clinical parameters include a KOOS questionnaire and the patient’s demographic characteristics. Canonical correlation analysis (CCA) is used (1) to evaluate the multivariate relationship between biomechanical data and clinical parameter sets, and (2) to cluster the most correlated parameters. Multivariate models were created within the identified clusters to determine the effect of each parameter’s subset on the other. The analyses were performed on a large database containing 166 OA patients. Results The CCA results showed meaningful correlations that gave rise to three different clusters. Multivariate linear models were found explaining the subjective clinical parameters by evaluating the biomechanical data contained within each cluster. Conclusion The results showed that a multivariate analysis of the clinical symptoms and the biomechanical characteristics of knee joint function allowed a better understanding of their relationships.

Page 2 of 12 Bensalma et al. BioMed Eng OnLine (2019) 18:58 have investigated the relationship between 3D knee kinematic parameters and clinical data [7,8]. These studies have been limited to a univariate analysis implying the correlation between one kinematic parameter and one specific clinical parameter. Such analysis is not adapted to the complexity of biomechanical data [9] and can even mask several strong relationships if the parameters are considered independently. The objective of this study is (1) to evaluate the multivariate relationship (compared to the univariate approach) between a set of biomechanical data and a set of clinical parameters of an osteoarthritis population, and (2) to cluster the most correlated parameter. The biomechanical data are a set of characteristics extracted from 3D knee kinematic patterns during gait recording: flexion/extension, abduction/adduction, and tibial internal/external rotation measurements. The clinical parameters were acquired via the Knee Osteoarthritis Outcome Score (KOOS) questionnaire. Through this questionnaire, the patient provides a valid and reliable assessment of his/her health status relative to the pathology [10]. Our hypothesis is that these subjective clinical measures may complement objective biomechanical measures for a better understanding of knee joint function.
This study utilizes a canonical correlation analysis (CCA) to evaluate the relationship between a set of biomechanical data and a set of clinical parameters of an osteoarthritis population. CCA is a method for exploring the relationship between two multivariate sets of variables all measured on the same individual. Although the CCA has already been successfully applied to several applications in image processing [11] and in the domain of ecology [12], its use remains almost limited in the biomedical field. This situation could be due to the difficulty of interpreting results. To our knowledge, this study is the first to consider such a multivariate analysis combined with multivariate modeling in the biomechanical domain.

Methods
The flowchart of the proposed method is shown in Fig. 1. The first step consists of biomechanical and clinical data acquisition and parameter extraction. Next is a multivariate analysis using a CCA, which aims at clustering the most correlated parameters. Multivariate models are then developed within the identified clusters to determine the correlation and relationships between biomechanical and clinical data.

Biomechanical and clinical data collection
One hundred and sixty-six patients with clinically and radiographically confirmed knee osteoarthritis participated in the study [mean age of 62 years old ( SD = 9.2 ), body mass index (BMI) of 32 kg/m 2 ( SD = 7.3 ), 99 women ( 59.6%)]. The experimental data were collected by the KneeKG (Emovi, Canada), a knee marker attachment system (Fig. 2) designed to reduce skin-motion artifacts during motion [13]. The KneeKG was installed on participants knees to record 3D kinematics (flexion/extension, abduction/adduction, and internal/external rotation) during gait trials. The kinematic data were represented over several gait cycles (GCs) and averaged to obtain mean GCs per participant. This was followed by re-sampling of from 1 to 100% of the GCs with 100 measurement points for each participant in each plane.
Participants were also asked to answer the KOOS questionnaires. The KOOS is a valid and reliable instrument which assesses the impact of knee OA on five domains: symptoms, pain, activities of daily living (ADL), sports and recreation (Sports/Rec), and quality of life (QoL). Scores on the subscales range from 0 (extreme symptoms) to 100 (no symptoms) [10].

Biomechanical and clinical parameters' extraction
For each participant, a set of 69 parameters that correspond to biomechanical values were measured on the kinematics curves from gait analysis. These parameters were chosen based on variables routinely assessed in clinical biomechanical studies of knee OA populations, such as maximums, minimums, varus and valgus thrust, angles at initial contact, mean values, and range of motion (ROM) throughout GCs or GC sub-phases   (i.e., loading, stance, swing) [14]. Thirteen parameters among this set have been identified by Mezghani et al. [3] as having the potential to serve as diagnostic and burden of disease biomarkers of knee OA. The kinematic parameters considered as biomarkers were identified by incremental selection on a regression tree determining the best set of biomechanical parameters for each biomarker type: knee OA disease diagnosis and severity grading. This has been done in accordance with the standard BIPED (burden of disease, investigative, prognostic, efficacy of intervention, and diagnostic) OA biomarker classification scheme [15]. Table 1 describes the clinical meaning of the 13 biomechanical data considered in this study. The participants in this study were selected if the OA was the main cause of their knee pain. The exclusion criteria were considered for the subjects being on a waiting list for total knee replacement. Patients being pregnant, suffering from rheumatoid arthritis, and active cancer were also excluded. A standardized radiographic examination of both knees was performed after the patient had given written informed consent. Only patients who had a Kellgren-Lawrence (KL) grade ≥ 2 on radiographs were considered and only data from the most painful knee were collected.
The set of clinical parameters contains 11 measurements: the patients' demographic characteristics (sex, age, BMI), the degree of osteoarthritis severity variable (grade), the variable (pain) which is measured on the Pain Numerical Scale (NS) for Knees (on which no pain is marked 0 and the worst pain imaginable is marked 10), and 6 scores generated using the KOOS questionnaire [10,16]. These scores assess the five dimensions mentioned above. An overall KOOS score is then generated and normalized to give a maximum of 100 points in the absence of pain or other knee dysfunction. Through this questionnaire, the patient provides a valid and reliable assessment of his health status relative to the pathology [10,16]. A summary of the clinical parameters and their descriptions is provided in Table 2.

Canonical correlation analysis (CCA)
The CCA is a multivariate statistical technique that explores the correlations between two sets of variables observed on the same individual [17]. The theoretical development of CCA can be found in [18][19][20].
.., X p ] denote the two data vectors to be analyzed, i.e., the biomechanical parameter vector and the clinical parameter vector. In our case q = 13 and p = 10 . The Y-variables can be thought of as response (or dependent) variables, but in fact, the X and the Y sets can be interchanged without affecting the results. The aim of the CCA is to project X and Y datasets onto basis vectors A and B, respectively, in a way that the correlations between the projections of the variables onto these basis vectors are mutually maximized [21]: where ρ is the Pearson correlation coefficient vector and U and V are linear combinations of the original variables X and Y (Eqs. 2, 3), respectively.
U and V are the canonical variate vectors. The coefficient vectors A and B are known as canonical weights, canonical vectors, or canonical coefficients. The procedure is to find the first two canonical variates U 1 and V 1 that have the largest correlation as illustrated in Fig. 3. The maximized correlation between these two canonical variates is the first canonical correlation ρ 1 . The canonical coefficients are normalized such that each canonical variate has a variance of 1. The procedure continues by finding a second pair of canonical variates U 2 and V 2 , uncorrelated with the first pair, that produces the second highest correlation coefficient ρ 2 . The process continues until the number of pairs of canonical variables reaches pre-set min(p, q).
To evaluate the statistical significance of the canonical correlation model, we use the Wilks' Lambda statistic ( ). This is a multivariate statistic that uses approximations based on the Fisher distribution for the null hypothesis, i.e., all canonical correlations are zero in the population. The small p values for this test (< 0.05) suggest a rejection of the null hypothesis and that the first canonical correlation is significant. (2) U = a 1 X 1 + a 2 X 2 + · · · + a p X p = XA In our study, the analysis was conducted using the R software environment for Statistical Computing (R version 3.4.3) [22].

Comparison between the multivariate analysis (CCA) and a univariate analysis
The results of the CCA analysis were compared to those of a univariate analysis based on the pairwise correlation matrix calculated using the Pearson correlation coefficient. The objective of this comparison is to show that the univariate analysis cannot adapt to the complexity of biomechanical data [9] and can even mask several strong relationships if parameters are considered individually.

Clustering via correlation biplot
The results of a CCA are visualized by a correlation biplot graph, which represents the between-set correlation matrix R XY by a joint plot. This format allows for the visualization of the intra-set correlation for the original variables and the corresponding canonical variates and of the correlation between the original variables and the opposite canonical variates. The main features of a correlation biplot are the angles between the variables from sets X and Y in the biplot, which reflect their correlations [12]. The combined angle and direction of the X and Y variables indicate the importance of the positive and negative correlations of the two sets. Strongly correlated variables are very close to each other. More specifically, in our case, the correlation biplot graph is used to cluster biomechanical data and clinical parameters. The identified clusters are then used to explain the relationships between the sets of parameters within the clusters.

Canonical prediction model and regression within clusters
Once the clusters are identified, we can explain the relationship between the parameters within the clusters using a regression analysis. This analysis aims at estimating the coefficients of the linear equation, involving one or more independent variables (clinical parameters) that best predict the value of the dependent variables (biomechanical data). The purpose of regression is to predict X on the basis of Y within the clusters.

Clinical parameters
Biomechanical parameters canonical variates In order to determine which variable should be considered as dependent and which as independent, we performed a redundancy analysis. This analysis measures the proportion of variance of one original variable explained by the canonical variate of the other set. The original variables of one set are well represented by the canonical variate of the other set when the redundancy index is higher. A relational model is then proposed to determine which of the variables best explains the other. A redundancy coefficient close to 1 is considered to be the highest, and shows that the amount of the dependent (original) variable's variance shared with the independent (canonical) variable is significant, and vice versa; a coefficient close to zero means that there is no significance in the shared variance.

Univariate correlation analysis
The univariate correlation matrix is visualized in a graphical display in Fig. 4. The 10 clinical parameters are in rows and the 13 biomechanical parameters are in columns. Positive correlations are displayed in blue and negative correlations in red color. Color intensity and the size of the circle are proportional to the correlation coefficients. The correlations between the biomechanical data and clinical parameters are moderate. The largest correlation value is between age ( X 4 ) and the range of motion of the abduction/adduction angle during loading phase ( Y 13 : Abd_RomLo) (r = 0.3) . Indeed, the univariate analysis considers the pairwise correlation of only two parameters. This result supports the need for a multivariate investigation.

Canonical correlations and multivariate statistic
The Wilks' Lambda statistics of the canonical correlation model was = 0.32, p = 0.04. This confirms that canonical correlations are worthy of consideration and the between-set correlations are significant. The two first higher canonical correlations are ρ 1 = 0.52 and ρ 2 = 0.44.

Correlation clustering via biplot
The correlation biplot graph of Fig. 5 represents the between-set correlation matrix R XY , i.e., the correlation between 13 biomechanical parameters (in black) and 10 clinical parameters (in red) via their canonical variates. It identifies three clusters, each grouping biomechanical and clinical parameter. Recall that, the 13 biomechanical parameters among this set have been identified by Mezghani et al. [3] as having the potential to serve as diagnostic and burden of disease biomarkers of knee OA. Note that the correlations located around the center are negligible. For example, the parameters X 1 , X 3 , Y 4 , and Y 10 are very close to the origin in the biplot, which shows that they are not important in the CCA. Meanwhile, the parameters Y 2 , Y 7 , and Y 9 are highly correlated with Y 8 . Thus, these parameters have been removed from the subsequent analysis, and the identified clusters between X and Y are summarized in Table 3.

Canonical correlation model
The canonical model of the first canonical variates is summarized in Fig. 6. This model describes the most strongly correlated variables with their appropriate weights or canonical coefficients. It provides the following two relations with a significant canonical correlation relation ( ρ 1 = 0.52, p = 0.04):

Redundancy coefficients
The total redundancy corresponds to 8.98% of the variance of X explained by the opposite canonical variate V , and to 8.71% of the variance of Y explained by the opposite canonical variate U . We can therefore affirm the equality of the indices of shared variances; more specifically, both clinical and biomechanical parameters may be considered as dependent or independent.

Regression within the clusters
Following the redundancy analysis, we developed a multivariate regression model within clusters to investigate the relationships between clinical and biomechanical parameters. Table 4 summarizes the regression model developed for each cluster. All the estimated regression models were significant with an Adjusted R 2 ≥ 0.68 . The residual standard errors RSEs of the models were 0.63 ≤ RSE ≤ 1.05 indicating a perfect fit to the data by the estimated models.

Cluster 1 analysis
The cluster C 1 regroups biomechanical data corresponding to kinematic parameters in the frontal plane (abduction/adduction) ( Y 6 : Abd_MaxSw, Y 8 : Abd_Init and Y 13 : Abd_ ROMLo) and the level of pain ( X 2 ) as described in Tables 1, 2, and 3. The results of the multivariate regression of pain as a function of three parameters of the abduction/ adduction movement (the X 2 : Pain regression model in Table 4) indicate that the pain felt is negatively correlated with Y 8 and Y 13 , while positively correlated with Y 6 .

Cluster 2 analysis
From the second cluster C 2 , the Flexion angle at the end of the stance phase ( Y 5 :Flex_ EndSt), the Range of motion of the internal/external rotation ( Y 12 :Rot_Rom), and the pain measured by the score KOOS ( X 7 :KOOS_Pain) were very directly related. The association between the improvement in KOOS_Pain score and changes in the range of motion (ROM) in the transverse plane was identified by Makovey et al. [23]. The subjective value of KOOS_pain is positively correlated with parameters in the sagittal (flexion/extension) and transverse (internal/external rotation) plane as shown by the X 7 :KOOS_Pain regression model in Table 4.

Cluster 3 analysis
From the third cluster C 3 , only kinematic parameters in the transverse plane (internal/external rotation) Y 3 presented correlations with X 5 (BMI) and X 9 (KOOS_Sport), more precisely the internal/external rotation angle at initial contact. The improvement in KOOS_Sport score was identified by Makovey et al. [23] as being related to the changes in the range of motion (ROM) in the transverse plane. Therefore, the model explaining the value of KOOS_Sport and recreation score as a function of the kinematic parameters in the transverse plane and the BMI showed (Table 4) positive correlations. When comparing the multiple regression models in C 1 and C 2 (Table 4), we note that they are both related to pain scores ( X 2 : Pain Numerical Scale and X 7 : KOOS_ Pain) but they are not associated with the same kinematic parameters. Indeed, these two scores, i.e., X 2 and X 7 , are quite different because they are evaluated based on different symptoms: the Pain numerical Scale variable ( X 2 ) was evaluated on a 0-10 pain intensity scale and concerns a general pain felt for knees, whereas KOOS_Pain variable ( X 7 ) was evaluated based on (9) questions, especially relative to the knee injury [10].

Conclusion
The CCA results showed a moderate correlation that gave rise to three clusters of the most closely related parameters. Multivariate linear models were found complementing the subjective clinical parameters by the biomechanical data using the correlation clusters.
Only the age, BMI, pain which is measured based on Pain Numerical Scale (NS), KOOS_Pain, and KOOS_Sport scores were correlated with the kinematic parameters (mechanical biomarkers). Biomechanical data corresponding to kinematic parameters in the frontal plane (abduction/adduction) during swing phase, the kinematic parameters in the sagittal plane (flexion/extension) at the end of the stance phase, and the kinematic parameters in the transverse plane (internal/external rotation) were positively correlated with pain. In other words, pain increased when kinematic parameters in those planes also increased. On the other hand, Biomechanical data corresponding to kinematic parameters in the frontal plane (abduction/adduction) at initial contact and during the loading phase were correlated negatively with pain. This means a decrease in the frontal plane at those phases is related with an increase in pain level. Kinematic parameters in the transverse plane (internal/external rotation) were correlated positively with the KOOS sport and recreation function. This means that KOOS_Sport increased when movement in the transverse plane was more increased in kinematic parameters.
Finally, the results show that a multivariate analysis of the clinical symptoms and the biomechanical characteristics of knee joint function allows a better understanding of their relationships and would help to better understand how biomechanical characteristics can be used in guiding clinical decision making in OA management.