Subjects
A group of 18 stroke patients (8 females and 10 males; aged 50 ± 16) and 12 healthy subjects (4 females and 8 males; aged 46 ± 15) participated in this research. The tests were taken at the Red Crescent Society Rehabilitation Center (Tehran, Iran). Eleven of the patients were in the subacute phase (less than 6 months had passed from their unilateral cerebrovascular accident or CVA), and seven of them were at the chronic phase (more than 6 months had passed from their unilateral CVA). The inclusion criteria consisted of the existence of a single unilateral CVA and the occurrence of movements with more than 15 degrees in the impaired shoulder and elbow. However, any sever visual impairments, apraxia, or neglect syndromes lead to the exclusion of subjects from the tests. All subjects were informed about the study and willingly participated in the tests. Their consents were approved by the local scientific and ethics committees.
Data capture program
The subjects’ hand movements were measured by the Microsoft Kinect for Xbox 360 using Microsoft Kinect’s skeleton tracking driver version 1.7 [7]. A program was designed for this purpose using C# and Microsoft XNA game studio [30]. During the test, subjects sat or stood in front of a video screen running a graphical interface program. Kinect’s distance from each subject was about 2.7 m. This was set so that Kinect could see the full body of the subject. As the sitting option was used in the skeleton tracking program and only upper body joints were tracked, there was no difference between tracking subjects in standing or sitting positions.
In the program, they were instructed to move their hands in order to intercept and catch several approaching balls. Balls were sent toward the subject using a predefined pattern shown in Fig. 1. All of the targets are on a plane parallel to the frontal plane but reaching them requires movements in the three dimensional space. The program also provided audio feedback to the patients based on their performance. During the tests, all upper body joints’ positions (Hand, Wrist, Elbow, Shoulder, Shoulder Center, Head and Waist position) were recorded for further analysis using Kinect’s skeleton tracking driver. A picture of one of the patients in a test session and the data capture program interface is shown in Fig. 2. Data capture program guided the patient through training exercises. In this figure a red ball is approaching the patient which implies that the patient should intercept the ball using his red (right) hand.
The protocol of reliability assessment
The intrasession and intersession variabilities were adopted for this study, where the former concerns the same session and the latter deals with daybyday reliability of the system.
Each test session was divided into 4 subsessions which lasted 5 min each with 2min rest intervals. In the test period, subjects were told to catch as many balls as they could. The tests were held twice a week for the patients. They were asked to continue this routine for at least 4 weeks. Therefore, each patient had a minimum of 8 sessions and 32 subsessions of training. In order to lower the effects of any misunderstandings about the instructions and adaptation to the virtual environment of the tests, the first test session was considered as an orientation. Therefore, the intrasession reliability analysis was applied to performance indices from the second session.
The intersession reliability approach was applied to the second subsession of the last two sessions of each patient’s tests. This choice was to lower the systematic error related to the patient’s progress as its slope leveled off with time. It should be noted that in this study, all patients did their conventional physical therapy sessions two or three times a week and these tests took place parallel with their usual rehabilitation program.
Performance indices measurement
The positions of upper body joint centers were recorded by Kinect at a sampling frequency of about 30 Hz during the tests. The sampling frequency of Kinect fluctuated between 25.61 and 34.72 Hz with a mean frequency of 29.9 Hz and a standard deviation of 2.67 Hz.
In order to extract performance indices, hand velocity, acceleration and jerk had to be calculated. In these calculations, Kinect’s reference frame was used as the main coordinate system. Since numerical derivation to find velocity and acceleration intensifies the noise, it was essential to smoothen the data before any derivation. For this purpose, a Bspline (which is a piecewise polynomial function of order k [30]) was fitted to the position data. The order of Bspline used in this study was 6. The output of this method was a smooth function which was differentiable up to 5 times.
Subsequently, indices of movement performance for each hand, were extracted and calculated as listed below:

1.
Mean velocity (MV): the mean value of the hand velocity [15] is defined as
$$MV = \frac{{\mathop \sum \nolimits_{i = 1}^{N} V_{i} }}{N}$$
(1)
where \(V_{i}\) is the hand velocity at the ith sample of data, and N is the number of data samples.

2.
Normalized mean speed (NMS): it is the mean value of the hand velocity divided by its maximum value [15],
$$NMS = \frac{MV}{{V_{max} }}$$
(2)

3.
Normalized speed peaks (NSP): speed peaks are points where acceleration trajectory crosses the xaxis. NSP is defined as the number of speed peaks divided by the number of data samples [15],
$$NSP = \frac{\text{Number of speed peaks}}{N}$$
(3)

4.
Logarithm of dimensionless jerk (LJ): it is the logarithm of median of hand’s dimensionless jerk [19],
$$LJ = {\text{Log}}\,\left({\text{median}}\left(\frac{(\mathop \smallint _{{t_{1} }}^{{t_{2} }} (\dddot X^{2} + \dddot Y^{2} + \dddot Z^{2} )dt)(t_{2}  t_{1} )}{V_{mean}^{2}} \right) \right)$$
(4)
where \(t_{1}\) is the start time and \(t_{2}\) is the end time of the movement, X, Y and Z are the positions of the hand measured by Kinect, and \(V_{mean}\) is the mean velocity in the movement.

5.
Curvature (C): it is the logarithm of the median of hand’s path curvature [17],
$$C = {\text{Log}}\,\left({\text{median}}\left(\sqrt{\frac{\mathop(\dot X^{2} + \dot Y^{2} + \dot Z^{2})(\ddot X^{2} + \ddot Y^{2} + \ddot Z^{2})(\dot X\ddot X+\dot Y\ddot Y+\dot Z\ddot Z)^{2}}{(\dot X^{2} + \dot Y^{2} + \dot Z^{2})^{3}}}\right) \right)$$
(5)
where X, Y and Z are positions of the hand measured by Kinect.

6.
Spectral arc length (SAL): it is the negative arc length of the frequencynormalized Fourier magnitude spectrum of the speed profile [18],
$$SAL =  \mathop \int \limits_{0}^{{\omega_{c} }} \sqrt {\left( {\frac{1}{{\omega_{c} }}} \right)^{2} + \left( {\frac{{d\hat{V}\left( \omega \right)}}{d\omega }} \right)^{2} } d\omega , \hat{V}\left( \omega \right) = \frac{V(\omega )}{V(0)}x$$
(6)
where \(V(\omega )\) is the Fourier magnitude spectrum of \(V(t)\), and \(\omega_{c}\) is the frequency band occupied by the given movement (\(\omega_{c} = 40\pi{\text{}}\ rad/s\)).

7.
Shoulder angle with body (SA): the mean value of arm angle with body,

8.
Elbow angle (EA): the mean value of elbow angle.
Both patients and healthy subjects did the exercises following the program guidance. All of the performance indices were measured in every reaching movement and their overall average in each subsession was calculated so as to obtain one value for each training subsession.
Statistical analysis
This study has followed the statistical methods used by Colombo et al. [31] who assessed performance indices reliability measured by robotic systems.
To reach a general view of the variability of the measured performance indices, a scatter plot was drawn for each of the indices measured over two subsessions of a single session and repeated measures ANOVA (analysis of variance) was used to calculate the reliability [32, 33].
Naturally, tests such as those carried out in this study involve a parameter of learning, and therefore, existence of learningrelated error is taken for granted and not by any means any flaw of the tests. Nevertheless, to reduce the effects of this error on the reliability of intrasession indices, the first subsession of each test was overlooked to let the patients adapt to the circumstances of the test, and the last 3 subsessions were investigated. As for the intersession reliability of the tests, the last two sessions were analyzed to lower the influence of the learning procedure.
To model the system, N subjects with M repeated measurements of continuous variable P were considered. A mathematical model for measurements of P was considered here as:
$$P_{ij} = T + t_{i} + S_{ij} + R_{ij}$$
(7)
where \(P_{ij}\) is the \(j\)th measurement (\(j = 1, \ldots ,M\)) made on the ith subject (\(i = 1, \ldots ,N\)), \(T\) is the true value of the variable, \(t_{i}\) is the subject’s effects on the true value, \(S_{ij}\) is the systematic error and \(R_{ij}\) is the random error. \(t_{i} ,{\text{}}\ S_{ij}\), and \(R_{ij}\) are independent random errors which are normally distributed with means of 0 and variances of \(\sigma_{t}^{2} ,{\text{}}\ \sigma_{S}^{2}\), and \(\sigma_{R}^{2}\), respectively [31, 34, 35]. The reliability of parameter P can be calculated using intraclass correlation coefficient (ICC) defined as [35]:
$${\text{Reliability = }}\frac{\text{Between subjects variability}}{{{\text{Between subjects variability}}\,{ + }\,{\text{error}}}}$$
(8)
$$R_{u} = \frac{{\sigma_{t}^{2} }}{{\sigma_{t}^{2} + \sigma_{S}^{2} + \sigma_{R}^{2} }}$$
(9)
Since this study did not consider the systematic error, Eq. (9) was reduced to Eq. (10):
$$R = \frac{{\sigma_{t}^{2} }}{{\sigma_{t}^{2} + \sigma_{R}^{2} }}$$
(10)
Both intersession and intrasession reliabilities were calculated as positive values between 0 and 1. A model of repeatedmeasures ANOVA was used to determine this parameter. In this method \(\sigma_{t}\) and \(\sigma_{r}\) were determined as:
$$\sigma_{t} = \frac{{MS_{S}  MS_{E} }}{k}$$
(11)
$$\sigma_{R} = MS_{E}$$
(12)
where \(MS_{S}\) is the subjects difference mean square and calculated based on differences among subjects in measurements of each trial, \(MS_{E}\) is error mean square which was calculated based on the difference between evaluations of one subject’s trials and k is the number of trials which is 2 for this study.
By replacing Eqs. (11) and (12) in Eq. (10), McGraw and Wong’s [33] 2way fixed model equation (C, 1) was obtained as below:
$$R = \frac{{{\text{MS}}_{S}  {\text{MS}}_{E} }}{{{\text{MS}}_{S} + \left( {k  1} \right){\text{MS}}_{E} }}$$
(13)
Moreover, the standard error of measurement (SEM) was calculated which makes an absolute index of reliability available, and allows for the quantification of each measurement’s precision. SEM was defined as the square root of the mean square from ANOVA results. SEM has the same units of each measured indices and encompasses components of random and systematic error of measurement.
$${\text{SEM}} = \sqrt {{\text{MS}}_{E} }$$
(14)
Furthermore, SEM’s coefficient of variation (CV) was calculated in order to optimize the result comparison in cases of various unites and scales [36]. The CV of SEM was defined as the ratio between SEM and each index’s overall mean and was presented as a percentage. Another parameter, calculated in this study, was the minimal detectable difference (MDD [33]) which indicates the minimum difference required to state a significant change in an index. Equation (15) demonstrates how MDD was calculated [33]. This parameter enables the examiner to realize whether any noticeable change has occurred in the movement quality index.
$${\text{MDD}} = {\text{SEM}} \times 1.96 \times \sqrt 2$$
(15)