A perceptual map for gait symmetry quantification and pathology detection

Background The gait movement is an essential process of the human activity and the result of collaborative interactions between the neurological, articular and musculoskeletal systems, working efficiently together. This explains why gait analysis is important and increasingly used nowadays for the diagnosis of many different types (neurological, muscular, orthopedic, etc.) of diseases. This paper introduces a novel method to quickly visualize the different parts of the body related to an asymmetric movement in the human gait of a patient for daily clinical usage. The proposed gait analysis algorithm relies on the fact that the healthy walk has (temporally shift-invariant) symmetry properties in the coronal plane. The goal is to provide an inexpensive and easy-to-use method, exploiting an affordable consumer depth sensor, the Kinect, to measure the gait asymmetry and display results in a perceptual way. Method We propose a multi-dimensional scaling mapping using a temporally shift invariant distance, allowing us to efficiently visualize (in terms of perceptual color difference) the asymmetric body parts of the gait cycle of a subject. We also propose an index computed from this map and which quantifies locally and globally the degree of asymmetry. Results The proposed index is proved to be statistically significant and this new, inexpensive, marker-less, non-invasive, easy to set up, gait analysis system offers a readable and flexible tool for clinicians to analyze gait characteristics and to provide a fast diagnostic. Conclusion This system, which estimates a perceptual color map providing a quick overview of asymmetry existing in the gait cycle of a subject, can be easily exploited for disease progression, recovery cues from post-operative surgery (e.g., to check the healing process or the effect of a treatment or a prosthesis) or might be used for other pathologies where gait asymmetry might be a symptom.

Abnormal or atypical gait can be caused by different factors, either orthopedic (hip injuries [4], bone malformations, etc.), muscular, or neurological (Parkinson's disease, stroke [5], etc.). Consequently, different parts of the body can be involved or affected, which make gait analysis a complex procedure but also a reliable and accurate indicator for early detection (and follow-up) of a wide range of pathologies. It thus makes a 3D gait analysis (3DGA) procedure a powerful early clinical diagnostic tool [6] that is reliable and non-invasive, and which has been used successfully until now for screening test, detection and tracking of disease progression, joint deficiencies, pre-surgery planning, as well as recovery assessment from post-operative surgery or accident (rehabilitation). It is important to note that a gait analysis-based diagnostic tool also allows the reduction of the costs and amount of surgery per patient [7]. Also, a more appropriate medical prescription can be made by performing gait analysis before treating a patient, leading to a better recovery for the patients [6].
But nowadays, with the aging population, clinical diagnostics have to be cheaper, faster and more convenient for clinical [8][9][10] (or home [11]) usage while remaining accurate. However, analyzing a gait video sequence is often difficult, requires time, and subtle anomalies can be omitted by the human eye. Also, videos are not easy to annotate, store and share.
In this work, we focus on the design of both a reliable and accurate imaging system that is also inexpensive and easy to set up for daily clinical usage. This diagnostic tool is relying on the fact that the gait of healthy people is generally symmetrical in the coronal (frontal) plane (with half a period phase shift) and that asymmetrical gait may be a good indicator of pathologies and its progression [1][2][3]5]. More precisely, the goal of our proposed GA-based diagnostic tool is to compute a perceptual color map of asymmetries from a video acquired by a depth sensor (Kinect) of a subject walking on a treadmill. The recording plane is the coronal plane in order to exploit the temporally shift-invariant properties of the movement. A perceptual color map of asymmetries is the compression of a subject's video mapped into a single color image in such manner that asymmetries of the body movements in the human gait cycle may be clearly visible and immediately quantifiable.
This paper is organized as follows. First, "Previous work" section makes a study of existing procedures in the 3DGA literature. In "Proposed model" section, we introduce details about the dataset that will be used in our gait analysis system and we describe our asymmetry map estimation model based on the multidimensional scaling (MDS) mapping procedure and a local search refinement strategy. Finally, we show experimental results in "Results" section, give a discussion in "Discussion" section and conclude in "Conclusion" section.

Previous work
Current 3DGA techniques can be divided in two categories, namely, with or without markers.
Among the state-of-the-art marker-based approaches, the Vicon motion-tracking and capture system [12] offers millimeter resolution of 3D spatial displacements. Due to its accuracy, it is often used as ground-truth for validation in medical application. On the other hand, the high cost of this system inhibits its widespread usage for routine clinical practices. Basically, optical motion capture system consists in tracking infrared (IR) reflective markers with multiple IR cameras [13]. Optical motion capture is efficient, but requires a lot of space, time, and expertise to be installed and used. For instance, placing the markers on the subject is prone to localization errors and requires someone who understands both the subject's anatomy and the acquisition system. Also the subject might have to wear a special suit and change outfits, which is constraining both for the subject and for the physician.
Therefore, marker-less systems are a promising alternative for clinical environments and are often regarded as easy-to-set-up, easy-to-use, and non-invasive. They are either based on stereo-vision [14], structured light [15], or time-of-flight (TOF) [16] technologies. As a stereo-vision application, [17] used two camcorders to extract 3D information of the subjects and to measure the gait parameters. Although low-cost, the setup and calibration procedure of the system remains complex and only the lower parts of the body are measured. Also, stereo vision-based systems will not function properly if the subject's outfit lacks texture. However, the Kinect sensor is based on structured light technology which makes it robust to textureless surfaces. The Kinect remains also compact and affordable. The Kinect has two output modes: depth map or skeleton modes. The former consists of an image sequence where the value of each pixel is proportional to the inverse of the depth, whereas the latter is a set of 3D points and edges that represents 20 joints of the human body.
Recent researches have been conducted to test whether the Kinect is suitable for clinical usage or not. Clark et al. [18] used the skeleton mode to measure spatial-temporal gait variability (such as the stride duration, speed, etc.) and compared it with data acquired by the high-end Vicon MX system. They found encouraging results for the estimation of the length of the steps and strides and the average gait velocity. Nevertheless, due to the inability of the skeleton tracking algorithm to accurately localize important anatomical landmarks on the foot, some spatio-temporal parameters of gait remain poorly estimated such as the assessment of step and the stride time. In addition, the Kinect camera was placed facing the subject, without a treadmill. Therefore the system was based on the analysis of only one gait cycle, because the intrinsic working range of the Kinect depth sensor is between 800 and 4000 mm. This somehow compromised the accuracy and the reliability of their system.
Gabel et al. [11] also used the skeleton mode to perform a 3DGA. They asked people to wear wireless sensors (gyroscopes and pressure sensors) at movement points and to walk back and forth along a straight path for approximately 7 min. They found that the Kinect was capable of providing accurate and robust results, but only a few gait parameters were tested and further research is under investigation. Finally, it is worth mentioning that none of the methods, using the Kinect skeleton mode, provide a visual feedback of the gait of the subject.
In [19], the authors compared the Kinect with depth map output mode versus a Vicon system. They placed two Kinects in a different alignment with the subject (facing and on the side) and measured key gait parameters, such as stride duration and length, and speed. They found excellent results with an average difference of less than 5 % for both Kinect camera setups. They also found that using the depth map data allows to reduce drastically the computation time for background removal.
In [10], the authors proposed to use a treadmill and a Kinect depth sensor to quantify the gait asymmetry with a low-cost gait analysis system. More precisely, the authors computed an index for quantifying possible asymmetries between the two legs by first dividing each gait cycle in two sub-cycles (left and right steps), and by comparing these two sub-cycles, in terms of an asymmetry index (proportional to the difference of depth, over a gait cycle, between the two legs) after a rough spatial and temporal registration procedure. Although the system is able to distinguish whether the subject has a symmetric walk or not, no visualization or information on the location of the asymmetries is provided, unlike our method.
In [9], the Kinect camera was placed at the back of a treadmill and used to record a video sequence of the subject's walk. The authors then simply computed the mean of the obtained depth image sequence (over a gait cycle or a longer period) in order to compress the gait image sequence into one image which was finally called a depth energy image (DEI). Their results were conclusive since they were able to distinguish both visually and quantitatively asymmetries (a symmetric walk generating a DEI exhibiting a symmetric silhouette, in terms of mean depth and conversely). Nevertheless, this latter strategy is inherently inaccurate since taking the average (mean) depth over a gait cycle does not allow to detect all asymmetric body movements; indeed, movement variation of some parts of the body can clearly be different and asymmetric while keeping the same mean (in terms of mean depth).
In our work, the depth image sequence of the gait, containing a certain number of gait cycles (wherein each pixel of the video corresponds to a depth signal as a function of time, as shown in Fig. 1) is reduced to three dimensions with a multi-dimensional scaling (MDS) mapping [20] using a temporally shift invariant Euclidean distance. This allows us to quickly display the gait image cube into an informative color image (with red, green and blue channels) allowing to visualize the asymmetric body parts of the gait cycle of a subject with a color difference, in a perceptual color space, which is linearly related to the asymmetry magnitude.

Data description
The dataset consists of multiple sequences of people walking on a treadmill, facing an inexpensive commercial depth sensor (Kinect). The Kinect sensor outputs 30 depth maps per second (30 fps), with a resolution of 640 per 480 pixels. The dataset contains 51 sequences acquired from 17 (healthy) subjects (17 males, 26.7 ± 3.8 years old, 179.1 ± 11.5 cm height and 75.5 ± 13.6 kg with no reported clinical asymmetry or gait impairment) walking with or without simulated length leg discrepancy (LLD). Every subject had to walk normally (group A), then with a 5 cm sole under the left foot (group B), then with the sole under the right foot (group C).
Sequences are approximately 5 min long and contain around 180 gait cycles. For all sequences, the same relative position between the treadmill and the sensor is kept in order for the subject to be within the same image area. The institutional ethical review board approved the study.
The method can be divided into four steps: a pre-processing for the silhouette extraction ("Silhouette extraction" section), a MDS-based mapping ("Multidimensional scaling-based mapping" section), a local search refinement strategy ("Refined estimation" section) and a color space conversion step ("Color space conversion" section).

Silhouette extraction
Since the scene took place in a non-cluttered room where the treadmill is in the same position relatively to the camera, a 3D bounding box around the subjects can be set. Hence, by retrieving 3D information, we can convert this information back in the 2D image space to segment the subject's silhouette directly from a depth map. To do so, the depth sensor is considered as a pinhole camera model with intrinsic parameters, K, (see [ [21], p. 30]) defined as: where f is the focal length in pixels and (c u , c v ) is the image center in pixels (values given by the manufacturer). From a depth map, a pixel at position (u, v) T with depth value, d is projected in 3D space, (X, Y, Z) T , using: First, the positions of the points around the subject, approximated by a 3D bounding box, are estimated. Second, the minimal and maximal depth, Z min and Z max , of the eight points (of the bounding box) are retrieved. Third, the eight points were projected back in the 2D image space where the minimal and maximal vertical and horizontal 2D position value (u min , u max , v min , and v max ) are finally estimated.
The necessity of working in 3D space is because of the spatial coherence of objects in the scene. For instance, in 3D space the treadmill is always beneath the subject whereas in an image it overlaps the subject, as shown in Fig. 2. Once this step is done, it is no more necessary to project the depth maps in the 3D space as long as the camera and the treadmill stay at the same relative position. In our case, some small adjustments of the enclosing parameters, u min , u max , v min , v max , Z min and Z max , (bounding box) were needed to encompass all sequences. Now, with, the required information, the subject can be segmented in each frame of the original gait depth sequence (of N frames).

Background removal
Background removal is trivial since the subject is in the middle of the image in a noncluttered room. Therefore, every pixel outside the bounding box is clipped to a default value (see subsection "Clipping step" below).

Treadmill removal
After background removal, the only objects remaining in the image are the treadmill and the subject. Because the treadmill is below the subject, it can be removed by selecting pixels with coordinates superior to a threshold T y (Y axis is going from top to bottom). An equation in the 2D-space can be derived from Eq. (2) in order to work directly on the image: Figure 2 visually shows the different steps of the setup and pre-processing stage.

Clipping step
At this stage, it is important to recall that the core of the method, which is MDS-based, aims to preserve the pairwise distances between depth signals as well as possible in a final 3D (perceptual) color space. In order to get also an exploitable, high contrast (asymmetry) color image, that contains a wide dynamic range of color values (well distributed among all possible colors), the clipping value of the non-subject pixels have to be set carefully. Indeed, it is crucial that no big artificial pairwise distances are created, because those distances will induce a squeezing and penalizing effect on the other informative distances. For instance, if the relative value of the background differs a lot from the subject's pixel (depth) values, the asymmetries between the right and left legs might not be easily distinguishable. Therefore the default background value has to be set carefully. In this work, this clipping value is estimated as being the mean of all the depth values belonging to the subject in the whole sequence (see Fig. 3). It is important to notice that the whole point of using a default clipping value is to make the distribution of pixel value of the images uni-modal and continuous. This will ensure a well contrasted map for the human eye

Filtering step
Finally, the whole sequence is filtered with a 3D (3 × 3 × 3) median filter to remove some aberrations on the contours or on the top of the treadmill.

Multidimensional scaling-based mapping
The MDS-based mapping technique [20] aims at visualizing the (temporally shiftinvariant) motion asymmetric body parts with a perceptual color difference which corresponds (perceptually and linearly) to the asymmetry magnitude. This mapping is achieved by considering each pair of pixels (i.e., pair of N-dimensional depth signals) in the original gait video sequence and by quantifying their (temporally shift-invariant) degree of asymmetry with a temporally shift-invariant pairwise Euclidean distance β s 1 ,s 2 between each pair [s 1 (t), s 2 (t)] of depth signals: where the maximal value of τ corresponds approximately to the number of frames in a gait cycle. In our application, τ max = 66 and, in order to decrease the computational load, τ is increased with a step size equals to 6 (≈0.2 s).
In addition, four points are important to consider in this step: • First, the use of the shift-invariant pairwise Euclidean distance is crucial in this MDSbased mapping step. Indeed, two pixels in the gait video cube, i.e., two depth signals (as a function of the time) with a perfect similar movement but in phase opposition (phase difference of half a gait cycle) like the legs and arms will have to be considered as symmetric with the same (perceptual) color in the final asymmetry map. • Second, in order to provide a final perceptual color asymmetry visualization map, the MDS mapping is achieved in a perceptual color space, namely the classical CIE 1976 L * , a * , b * (LAB) color space which is approximately perceptually uniform. In this color space, a color difference shall (perceptually) appear twice as large for a measured (temporally shift-invariant) asymmetry value which is twice bigger. • Third, as already said, MDS is a dimensionality reduction technique that maps objects lying in an original high N dimensional space to a lower dimensional space (3 in our application), but does so in an attempt that the between-signal distances are preserved as well as possible. The originally proposed MDS algorithm is not appropriate in our application and more generally for all large scale applications because it requires an entire N × N distance matrix to be stored in memory [with a O(N 3 ) complexity]. Instead, we have herein adopted a fast alternative, FastMap [22], with a linear complexity O(pN) (with p = 3, the dimensionality of the target space). In FastMap, the axes of the target space are constructed dimension by dimension. More precisely, it implicitly assumes that the objects are points in a p-dimensional Euclidean space and selects a sequence of p ≤ N orthogonal axes defined by distant pairs of points (called pivots) and computes the projection of the points onto the orthogonal axes.
The above-mentioned FastMap-based mapping method, which exploits an algebraic procedure, has the main advantage of being very fast (for large scale applications) but slightly less accurate than a MDS procedure exploiting a (gradient descent or a local search-based) optimization procedure [23]. For this reason, we decided to refine the estimated asymmetry map given by the FastMap as being the initial starting solution of a local search using a local exploration around the current solution. This step is now detailed in the following section.

Linear stretching
The FastMap-based mapping method allows us to preserve the between-depth-signal (shifted Euclidean) distances, as well as possible in a final 3D (perceptual LAB) color space with a scale factor k, which we have now to estimate in order to be able to refine the solution with a local search algorithm.
To this end, let u be the three-dimensional vector (u = (L, A, B) t ) corresponding to the three L, A, B color bands of the final asymmetry image to be estimated and let also β s,t denotes the Euclidean distance between two depth vectors, associated with a pair of sites at spatial (pixel) locations s, t. The scale factor k is the linear stretching factor which minimizes the following cost function: where the summation s,t s� =t is done over all pairs of sites existing in the final image to be estimated. In our application, in order to speed up this estimation procedure, we take the subset of pixel pairs induced by the graph presented in the following section. In this way, a simple local discrete grid search routine, for the parameter k in a suitable range (k ∈ [0 − 1] with a fixed step size set to 0.005) or a least square estimation can be easily achieved.

Local search refinement
At this stage, we are very close to the solution of our optimization problem expressed in Eq. (7). To improve this solution, we use a deterministic local exploration around the current solution and a low radius of exploration (see the detailed Algorithm in "Appendix" and the validation of this local search procedure). For this local refinement, in order to decrease the computational load, we do not consider a complete graph but a graph in which each pixel is connected with its four nearest neighbors and N cnx equally spaced other pixels located within a square neighborhood window of fixed size N s pixels centered around the pixel (see Fig. 4). In addition, since this local search refinement strategy could be sensitive to noise, we add a regularization term allowing us to both incorporate knowledge concerning the types of estimated images a priori defined as acceptable solutions and to regularize the optimization problem. The regularization term used in our model is formulated in the (image) spatial domain and promotes a (regularized) �� � estimated image u with spatial smoothness and edge-preserving properties (see Fig. 5).
To this end, we have considered the generalized Gaussian Markov random field (GGMRF) regularization term initially proposed by Bouman and Sauer in tomographic reconstruction [24]: where 1 ≤ q ≤ 2 is a parameter controlling the smoothness of the image to be estimated and/or the sharpness of the edges to be formed in the final estimated image.    9)]. The slight but annoying ringing noise effect has been corrected by the regularization term allowing us to a priori favor a edge-preserved asymmetry map that is piecewise smoothed (relative to the second order neighborhood system), or binary clique <s, t> is horizontal/vertical or right diagonal/left diagonal. This regularization term has the advantage of including a Gaussian MRF prior for q = 2 and a more interesting edge-preserving absolute-value potential function with q = 1 somewhat similar to the L 1 regularizer proposed by Rudin et al. in [25]. In the regularization framework and under this constraint, an asymmetry map u can be seen as a solution to the following penalized cost function to be optimized: In this model, the set of β s,t , ({β s,t }) represents the observed data. The first term is related to the preservation of between-depth signal distances and can be viewed as a "goodnessof-fit" energy term. The second term corresponds to the regularization encoding some a priori expected properties of smoothness and of edge-preserving of the asymmetry image to be estimated. Let us also note that this model can be also easily viewed as a Bayesian optimization strategy formalizing a trade-off between a likelihood and an image prior expressing, via a prior distribution, that an acceptable estimated image is piecewise smoothed. η is the value controlling the contribution of these two terms.

Color space conversion
It is important to mention that, at this stage, we are not assured that the LAB color values of the 3D asymmetry map are not saturated in the RGB space. In order to fix this problem, we use a simple linear stretching of the L, A, B color values such as L ∈ [0:100], and A, B have a maximal amplitude of 100 with a zero mean in order to ensure that a very small number of pixels are outside the RGB color space [23]. Once this linear stretching is achieved, a RGB conversion is done.
Let us add that the local search refining procedure can be easily computed in parallel. Indeed, the objective function to be minimized (Eq. 9) can be viewed as a Gibbs energy field related to a non-stationary Markov random field (MRF) model defined on a graph with long-range pairwise interactions (or binary cliques <s, t>). Each binary clique of this MRF model is associated to a non-stationary potential since this energy-based model is spatially variant and depends on the distance between the depth vectors associated with each pair of pixels s, t. Consequently, Algorithm 1 can be also viewed as a simple iterative conditional modes (ICM) procedure [26] for a MRF model with non-stationary and long-range pairwise interactions. Consequently, a Jacobi-type version of this Gauss-Seidel based ICM procedure (proposed in Algorithm 1) can be also efficiently implemented by using the parallel abilities of a graphic processor unit (GPU) (embedded on most graphics hardware nowadays available on the market) and can be greatly accelerated (up to a factor of 200) as proposed in [27]. Source code (in C++ language under Linux) of our algorithm with the set of image sequences are publicly available at the following http address: http://www.iro.umontreal. ca/∼mignotte/ResearchMaterial/pamga.html for the scientific community.

Setup
This section presents the asymmetry maps obtained for the subjects with or without (simulated) pathologies. Sequences of 300 frames have been used (longer sequences did not yield significantly better results, see "Performance measures of the proposed model" section). This corresponds approximately to a range of 6-9 gait cycles depending on the subject's speed and step size. On average for all images, the correlation score [23] (see end of "Local search algorithm" in "Appendix") for the mapping of 300 frames to three color channels (according to our shifted Euclidean pairwise depth distance) is 93.5 ± 2 % which shows us that the FastMap-based MDS procedure is able to preserve a large quantity of information of the original image sequence (in terms of pairwise depth distances). We have used an offset of 400 frames (approximately 13 s) relatively to the beginning of the image sequence for all the subjects to allow them to get used to the treadmill. In addition, η, the value controlling the contribution between the likelihood and the regularization terms in Eq. (9) was set to η = 0.025 in all the following experiments.

Initial tests
In order to quantify the influence of the choice of the distance in the reliability of our asymmetry map, we have compared several (possibly shifted) distances. In addition to the shifted L2 norm (or Euclidean distance) between the two depth vectors, we have also considered; namely; the shifted L1 and L inf (infinite) norms, the L2 norm between the amplitude of their Fourier spectrums (Lmod), which is inherently invariant to translation, the L2 norm between their amplitude histograms (Lrad) providing also a distance invariant to translation and finally the L1 norm between the mean of these two depth vectors (Lmoy). Figure 6 allows seeing the different asymmetry maps obtained for the subject #S05 of our database. Here the L2 distance shows clearly, as color differences between the left and right side of the body, the gait asymmetry magnitude. For instance, with right LLD (case C), the asymmetry of arm swing is clearly noticeable and for both right and left LLD (cases B and C) leg color differences (motion asymmetries) are visible.
Qualitatively, we can notice that the asymmetry maps based on the shifted L1 and Lmod distances visually appear as reliable as the asymmetry map given by the shifted L2 distance to detect motion asymmetries appearing as color differences between the left and right side of the body. In addition, we can also see quite clearly that the L inf norm provides a (correlated) noisy asymmetry map (with artifacts and without edge preservation) with which we can however see color differences or the presence of asymmetry cues in the lower legs. The Lrad and Lmoy distances are clearly invariant to translation distances, nevertheless, the maps based on these two distances are inaccurate because they fail to detect all asymmetric body movements. Indeed, movement variation of some parts of the body can be different and asymmetric while keeping the same mean (depth) or keeping the same histogram. This explains why, with the right LLD (case C), the asymmetry of arm swing cannot be detected with the Lrad and Lmoy distances whereas this defect is easily detected and clearly visualized with the L2, L1 and Lmod based distances asymmetry maps.

Performance measures of the proposed model
In order to get a quantitative measure of asymmetry, we propose to first estimate, for each line of the asymmetry map, the biggest mirrored differences. More precisely, for a line k of width m, (m being half the width of the asymmetry map) the set of biggest mirrored differences is: where p i,j is a pixel value at position (i, j). This set of biggest mirrored differences yields a vertical curve whose mean amplitude, allows computing a global asymmetry index (ASI). From each asymmetry color map, this ASI curve is estimated by a two step procedure. Firstly, by estimating, individually for each subject, the position of the (symmetrical) longitudinal axis of his body (head to tail). This axis is determined from the silhouette contour (located in places of strong gradient) and its optimal position is searched on both sides (±10 pixels) around the vertical center line of the image (since this axis is assumed to be not too far from it) by estimating the vertical line whose pixel coordinates are the most symmetric with respect to the subject's silhouette (body contour) in the median sense. Secondly, by seeking and recording the maximal color difference existing on either side (horizontally-oriented) of this preliminary estimated longitudinal axis. The ASI index is then the mean value of the ASI curve elements (see Algorithm "Estimation of the ASI" where the estimation procedure is outlined in pseudo-code).
Asymmetries can be detected visually, as shown by Figs. 6, 7, 8, 9 and 10, but also quantitatively with the ASI curves [in which the Y axis shows the biggest mirrored difference, located on a horizontal line, as a function of the vertical distance from the top Fig. 7 Asymmetry maps for subject #S02 for respectively (from left to right) the normal gait and with the left and right simulated LLD (cases A, B and C with L2 distance). For this case, the ASI is, respectively 21.2/27.8/28.5 head of the subject (X axis)] or with the ASI index mentioned above (see Algorithm in Fig. 11 and the ASI index obtained by the 17 subjects of our experiment in Fig. 12). For instance, in Fig. 6, the ASI curves of the L2 distance display a significant gap between case A and cases B-C in the leg areas as expected (identified with an arrow in Fig. 6). Similarly, the asymmetry of arm swing appears as a gap between case A and C curves. In terms of paired differences t tests and confidence values, we can notice (see Table 1) that the shifted Euclidean distance seems to be the most appropriate distance allowing us to discriminate, on average with the ASI index, an asymmetry difference with a confidence value around 98.5−99 %. In this case, the refining step does not allow to (statistically significantly) increase or decrease this confidence score but this step actually allows us to include a regularization term in our energy-based model [see Eq. (9)] and to estimate a denoised and smooth asymmetry map with edge-preserving properties and consequently to somewhat remove the slight annoying ringing artifact effects occurring for some estimated asymmetry maps (see Fig. 5; Table 2). Table 3 shows the average (µ) and standard deviation (σ) of the ASI for the three groups of subjects (for the shifted L2 distance) and paired difference t test and confidence value for the paired tests A ≠ B and A ≠ C, respectively (1) without refining, (2) with η = 0 (with refining but without prior) and (3) η = 0.025 (with refining and prior). The statistical difference for the paired t test were highly significant (see also "T test table for 16 degree of freedom" in "Appendix") for both left and right legs LLD groups (confidence value around 98.65 %). This demonstrates that this method can efficiently detect gait asymmetry. In practice, three subjects had a higher ASI for their normal gait than with LLD introduced with a sole (Fig. 10). By looking at their videos, the authors have noticed that those subjects already had a visible gait asymmetry (one arm swinging more than the other, tilted shoulders, etc).
We recall that we have used 300 frames in our application. For a sequence of 150 frames, the confidence value is 96.17 % and for a sequence of 600 frames, we obtained a confidence value of 99.52 but at the price of two times more computational load.

Discussion
The preceding experimental results have shown that asymmetries can be detected visually with the proposed asymmetry maps, whether in terms of color differences with respect to the middle of the (standing) vertical axis but also with respect to the difference of length or geometric anatomical shapes or movements (legs and arms) exhibited on either side of the body (along this vertical axis), see for instance the difference of (1) length between the legs (for the L2 distance) in Fig. 6 with left and right LLD or for the subjects shown in Figs. 8, 9 and 10 or (2) length between the two arms in Fig. 7 or (3) mean gait posture (slightly inclined with respect to a vertical axis), for the subject shown in Fig. 10 (LLD only).

Fig. 11 ASI curve and index estimation algorithm
In addition, this asymmetry can be quantified with the proposed ASI index. It is also worth mentioning that the asymmetry map along with the ASI curve allows us to know where are distributed the asymmetric motions along the subject's body. See for instance   Table 2 Correlation ρ before the refining step and after the refining step for η = 0 and η = 0.025 for the first five subjects without the simulated length leg discrepancy (LLD) the circled areas and the gaps identified with arrows in these figures. The ASI curves thus provide quantitative local assessment of asymmetry. This cue could be a good indicator of pathologies and their progression over time for a more appropriate medical prescription leading to a better recovery for the subjects. It is also important to understand that the VICON system is able to give very accurate but sparse (and generally not distributed equidistantly) measures (over the body) with which it is difficult to estimate a reliable dense asymmetry map without subsequent interpolation and extrapolation errors. By this fact, it makes a comparison between the Kinect and the VICON systems, in their ability to estimate an accurate gait asymmetric map, difficult to implement and to analyze. Although, it is also clear, that for a sufficient number of sensors distributed over the body, the VICON could be superior in terms of accuracy. Nevertheless, this late assertion does not detract from the originality of this work since the proposed estimation method of asymmetry map, based on the preservation of all the pairwise temporally shift invariant distances between depth signal as well as possible in a final 3D color space, with a MDS-based penalized likelihood strategy (and even the very concept of gait asymmetry map) has never been proposed to our knowledge, to date, and also remains inherently independent of the depth sensing technology used.
In our application, a paired sample t test is used to determine whether there is a statistically significant difference (increase) in the ASI index between the normal gait versus the left or right LLD (abnormal gait) groups and the p value (associated with this t test) actually quantifies the magnitude of this difference (i.e., a good confidence interval meaning that the difference is quite large). In our case, it just means that the increase in the ASI index, between the (A and B, C) groups, are statistically significant and then the asymmetry differences between these groups, in terms of ASI index, are real and are not due to standard error. Nevertheless, this does not mean that the ASI index can be used for separating normal from abnormal gait since, even if a majority of individuals have an ASI index above 30.00 for an abnormal gait (see Fig. 12 showing the scatter-plots of the ASI values for the different subjects with or without a LLD), there unfortunately are some subjects for which the normal gait remains more asymmetric (visually and in terms of ASI index) than some other subjects with a simulated LLD. In addition, as we have already mentioned in "Performance measures of the proposed model" section, that three subjects have a higher ASI for their normal gait than with LLD. More precisely, among the 17 subjects, three of them (#01, #09, #15) do not show a significant difference with or without LLD, two of them show a slight (but not significant) decrease in the ASI index with either the left or right simulated LLD (#07, #13) and one subject (#10), which has a visible gait asymmetry (one arm swinging more than the other along with tilted shoulders), has a higher ASI for his normal gait than with the right or left LLD introduced with a sole. Because of this fact, the ASI measure should not be used as an absolute measure for separating normal or abnormal gait, but rather as a relative measure, for example to analyze and quantify the gait recovery assessment through time or to check the adequacy of a prosthesis (or an adequate treatment) and to indicate, through an asymmetry map, where are located the strongest asymmetric areas of a subject's gait cycle. η remains the sole and major internal parameter of our model which acts as a regularization term and is fixed once and for all experimentation. Let us recall that the number of frames (N = 300) used in our MDS mapping should not be viewed as a critical internal parameter since doubling or halving the number of frames does not (significantly) change the efficiency of the FastMap mapping (see "Performance measures of the proposed model" section). Similarly, the two parameters of the spatial neighborhood (N s = 13 and N cnx = 11) used in our model are not sensitive parameters since the more connections we use, the better the convergence behavior of the algorithm is but at the cost of more computation time.

Conclusion
In this paper, we have presented a new gait analysis system, based on Kinect depth sensor, which estimates a perceptual color map providing a quick overview of asymmetry existing in the gait cycle of a subject and an index (ASI), that was proved statistically significant with an approximately 98.75 % confidence value. While being inexpensive, marker-less, non-invasive, easy to set up and suitable for small room and fast diagnostic, this new gait analysis system offers a readable and flexible tool for clinicians to analyze gait characteristics which can be easily exploited for disease progression, recovery cues from post-operative surgery or might be used for other pathologies where gait asymmetry might be a symptom.
As future work, it would be necessary to validate the proposed method on real patients with different types of gait impairments. Besides, it would also be interesting to explore what could possibly be the other benefits of an asymmetry map estimation and visualization, which are not considered in this work, over a set of spatio-temporal gait parameters in a gait analysis system. An interesting research perspective would be specifically to analyze the topology and the pattern differences of these asymmetric areas in order to see if they are characteristic of a specific kind of disease (bone, neurodegenerative, muscular, etc.) or to simply determine, from the perceptual maps, that the asymmetry allows

Local search algorithm
There are two ways to validate the efficiency of our local search refinement strategy.
First in terms of the decrease of the cost function to be optimized [see Eq. (9)] with η = 0. To this end, Fig. 13 shows us the evolution of the energy function for our local search procedure (see Algorithm in Fig. 14) compared to a refinement strategy based on a (much more demanding) stochastic local search using the Metropolis criterion (as proposed in [23] or in [28]) with a starting temperature T 0 = 1 (ensuring that, at the beginning of the stochastic search, approximately one-third of the sites change their luminance values between two complete image sweeps) and a final temperature T f = 10 −3 (ensuring that at the end of the stochastic search, very few sites change their channel color values). For this first experiment (and in all the following) experiments, we have considered a radius of exploration r = 7 and a graph in which each pixel is connected with its four nearest neighbors and N cnx = 11 equally spaced other pixels located  Fig. 13 Evolution of the energy function expressed in Eq. (9) (for η = 0, r = 7, N cnx = 11, N s = 13) given by our deterministic local search procedure (see Algorithm 1) compared to a stochastic local search using the metropolis criterion (as proposed in [23]) with T 0 = 1 and T f = 10 −3 (for the subject #05 without simulated length leg discrepancy) within a square neighborhood window of fixed size N s = 13 pixels (see Fig. 4). In addition, we have considered a maximal number of iterations l max = 200. We can notice (see Fig. 14) that our proposed deterministic refinement strategy allows us to noticeably improve the solution of our energy-based estimation model compared to the solution given by the initial FastMap algorithm and to obtain, in our application, similar minimization results given by a stochastic approach (without the need to tune or fit parameters such as initial or final temperatures). This might be due to the (nearly) convex shape of our cost function around the solution given by the FastMap algorithm.
Another way to evaluate the efficiency of the rough initial FastMap mapping and then our proposed mapping refinement strategy consists in computing the correlation metric [29] which is simply the correlation of the temporally shift-invariant Euclidean distance between each pairwise depth vectors in the high (N-)dimensional space (let X be this vector) and their corresponding (pairwise) Euclidean distances in the low (3D) dimensional (LAB color) space (let Y be this vector). The correlation ρ can be estimated by the following equation: where X t , |X|, X and σX, respectively represent the transpose, cardinality, mean, and standard deviation of X. This correlation factor (Pearson) will specifically quantify the degree of linear dependence between the variables X and Y and quantify how the Fast-Map technique is able to give a final 3D LAB mapping in which each pixel (LAB-)color value is set such that the between-color distances are preserved as well as possible [20]. A perfect correlation ρ = 1 indicates a perfect relationship between the initial set of between-depth signal distances and the final between-color distances in the final mapping (and a correlation of ρ = 0 indicates a total loss of information). • If (∆Energy < 0) Replace x s by y s l ← l + 1

Importing Kinect data
Each pixel of a depth image of the Kinect sensor is a 16-bit (little-endian) unsigned integer (uint16) (represented by four digits in hexadecimal base) which actually represents the depth in millimeter estimated at this location. To recover a binary image (of the sequence), three steps are thus required: the first one is to get to the starting byte of the image, the second one is to read and convert the pixels of the image, and the final one is to store each pixel value. To recover the i th image, the starting point is evaluated as follows: For each pixel, four bytes are thus read sequentially and are converted into a decimal base according to the following scheme: where → designates the conversion of the read data to the decimal base. At this point the pixel p i j is stored as a unit16 in a table. This table represents the i th image and once the i th image is fully recovered, it is saved in a 3D array at the i th position. Table 4 lists a few selected critical values (cutoffs) and the associated confidence values (in percentage) for a t distribution with 16 degrees of freedom (as in our case) for twosided critical regions.

T test table for 16 degree of freedom
In order to estimate the confidence values between two known values of cutoffs we have used, in our application, a Lagrange method based interpolation leading to the following polynomial interpolation: