### Measuring bone position

We propose to use an *intelligent* skin-mounted US sensor which contains ultrasound (US) transducers that can record images of the internal muscle tissue and bone surface while the patient is performing a particular activity. Figure 1(a) shows an illustration of this concept in 2D with the red arrow indicating the relative position of the skin-mounted sensor to the underlying bone in the knee (the femur in this case). The position of the bone relative to the skin-mounted sensor will be determined by registering the bone’s surface in an initial US frame with the bone’s surface in subsequent US frames.Once the position of the skin-mounted sensor and the position of the bone relative to the sensor is known, these two distance vectors can be added together to find the position of the bone in a global (i.e. lab) reference frame. As shown in Figure 1(b) for the 2D case, the movement of the bone relative to the sensor can be used to compensate for the movement of the sensor due to soft-tissue artifacts. Once the soft-tissue artifact has been removed, a much more accurate measurement of the position of the bone in a global reference frame can be obtained. This measurement steps are indicated in Figure 1(b). The global reference coordinate can be established by using an optical tracker (e.g. Polaris optical tracker, Northern Digital Inc., Waterloo, ON, Canada) which usually tracks the sensor attached on the skin.

In our proposed framework, a novel *H*-shaped ultrasound array geometry is used to capture all six motion parameters for 3D rigid body movement estimation between bone and sensor. The three probes were arranged in such a way as to capture three 2D US image *slices* in a novel *H* shaped orientation. For US image acquisition, the three US arrays are mounted firmly to prevent any relative motion among the image slices as well as calibrated to ensure they preserve the *H* shape orientation. In this study, US slices are acquired with a frame rate of 30 frames per second (fps) at 24 MHz with a resolution of 0.132 mm. Five out of the six motion parameters required to describe the 3D rigid body motion can be extracted from any two orthogonal US slices but the remaining out-of-plane rotation can only be measured by using twoparallel US slices.

Three motion parameters (two translations and a rotation) can be obtained byregistering a 2D image slice to the previous one. However, to capture all six degrees-of-freedom, three slices are required. By registering these slices simultaneously to the slices from an initial US scan, the dynamic movements of the underlying bone relative to the sensor can be measured. Consider the slices *AA’*, *BB’* and *CC’* in Figures 2b and 3b. The corresponding slices from a B-mode US scan are shown in Figure 3c. The slices *AA’* and *BB’* are parallel to each other while slice *CC’* is orthogonal to both, thus forming a shape similar to the letter *H*. Slice *CC’* provides translations along the *y*- and *z*-axis respectively as well as rotation about the *x*-axis. Translation along the *x*-axis and rotation about the *y*-axis are determined by using either of the two parallel slices *AA’* or *BB’* as they are rigidly coupled and produce almost the same result. Now, for the case of out-of-plane rotation (about the *z*-axis), registering the two parallel slices (*AA’* and *BB’*) with their initial counterparts provides translations along opposite directions as shown in Figure 2b and 2c. According to the case shown in Figure 2b, the *H*-shaped slices are rotated by *θ*° about the z-axis and, *ac* and *a’c’* represent the opposite translations introduced due to this rotation. As rotation due to the soft-tissue artifacts is small (≈4.4°, [20]), the lengths *o* *b*≈*o* *c* as well as *o* *b*^{′}≈*o* *c*^{′}. As a result, the translations found in this case are equal and opposite, and provide a clear indication of the out-of-plane rotation about the *z*-axis. The more general case is shown in Figure 2c. In this situation, *o’c* and *o’c’* are no longer equal, however, *Δ* *o*^{′}
*a* *c* and *Δ* *o*^{′}
*a*^{′}
*c*^{′} are similar (according to the *law of similar triangles*), then *o*^{′}
*a*^{′}/*a* *a*^{′}=(*a* *c*+*a*^{′}
*c*^{′})/*a*^{′}
*c*^{′}, and the shift of the center of rotation from *o* to *o’* can be determined. As a conclusion, we can say, however that if the two translations are equal, the axis of rotation is exactly the *z*-axis, otherwise it is parallel to the *z*-axis shifted along the slice *CC’*. It is possible, however to estimate the out-of-plane rotation using two parallel slices (*AA’* and *BB’*). Hence it turns out that the *H* shape is the most optimized geometric orientation of the US sensor for measuring all six transform parameters using low complexity 2D-2D (slice by slice) US-US image registration.

### The registration algorithm

Image registration is the process of spatially aligning one or more images to another image, generally called the reference image. It is the process of determining point to point correspondences between images of the same entity. Image registration is a versatiletechnique used in computer vision, video surveillance, automatic target recognition, medical imaging, satellite imaging and some military applications [22]. The goal of image registration in our case is to find the geometric transformation between target and reference images in order to find the relative movement between the US sensor and the underlying bone.

In the registration algorithm, a gradient descent optimization approach proposed by Lucas and Kanade [23] is used with a new similarity metric called the *sum of conditional variance* (SCV) to efficiently determine the rigid body transform parameters required to align the bone surface between two corresponding images, and this is the first article to use SCV for US-US image registration. Originally SCV was developed by our group and proposed for multi-modal medical image registration [24] but recently it has been used for mono-modal registration techniques where non-linear illumination change was an issue. In [25] the authors showed that its performance was superior when compared to the mutual information (MI) and cross-cumulative residual entropy (CCRE) similarity measures. The motivation of using SCV in our case comes from its resistance to view dependent non-linear intensity variations in US images as well as its ability to handle a large convergence radius with computational ease compared to the other popular similarity measures (e.g. sum-of-squared difference (SSD), sum-of-absolute difference (SAD) etc.).

The SCV can be calculated as follows: suppose the current US image of the bone is *I*(*x*
_{
i
},*y*
_{
i
}) to be registered with the initial US image R({x}_{i}^{\prime},{y}_{i}^{\prime}). The coordinates (*x*
_{
i
},*y*
_{
i
}) and ({x}_{i}^{\prime},{y}_{i}^{\prime}) denote the locations of pixels in *I* and *R* respectively for *i*=1…*N*, where *N* is the total number of pixels in each image. To construct the joint histogram, the images *I*
_{
i
} and *R*
_{
i
} were quantized from the original 256 possible values to 64 possible values. Hence, the joint histogram has 64×64 bins corresponding to the 64 possible quantized values in *I* and *R*.

Any spatial misalignment between *I* and *R* will mean that each pixel value *R*
_{
i
} will correspond to a range of *I*
_{
i
} values. However, typically this range of values is approximately equally distributed above and below the value for the registered version of *I*. The reason for this approximately equal distribution is explained by observing that any spatial misalignment will cause pixel values to change in opposite directions for regions of the image that exhibit opposite spatial gradients. Hence, if a spatial shift of the image causes a value of *I*
_{
i
} to increase at positions with a positive spatial gradient, then this same shift will cause the same value of *I*
_{
i
} to decrease in areas with negative spatial gradient and vice-versa. For the approximately equal distribution to occur, it is assumed that there is an approximately equal number of positive and negative spatial gradients in the images which is typically true for the application addressed in this paper.

The joint histogram for images *I* and *R* will be denoted by *H*(*u*,*v*) where *u* and *v* have values of 1,2,3,…*L*-1, and *L* is the number of quantized values in *I* and *R*. The values of the joint histogram represent the frequency that pixels at the same spatial location in quantized images *I* and *R* have the values *u* and *v* respectively. This joint histogram can be considered to consist of multiple conditional histograms corresponding to each of the possible quantized values of *R*
_{
i
}. It is possible to estimate the conditional expectation (conditional mean) of the distributions represented by these histograms using the following formula:

E\left({I}_{i}\right|{R}_{i}=v)=\frac{\sum _{u=0}^{L-1}\mathit{\text{uH}}(u,v)}{\sum _{u=0}^{L-1}H(u,v)}

(1)

Now, given these conditional mean values, an estimate of *R*
_{
i
} can be found which has pixel values that correspond linearly to the pixel values in *I*
_{
i
}. This estimate is found by replacing the pixel values in *R* with the conditional mean for that value of *R* and is given by:

\begin{array}{ll}{\widehat{R}}_{i}=& E\left({I}_{i}\right|{R}_{i}=v)\phantom{\rule{0.3em}{0ex}}\phantom{\rule{1em}{0ex}}\mathit{\text{when}}\phantom{\rule{1em}{0ex}}{R}_{i}=v\phantom{\rule{2em}{0ex}}\\ \phantom{\rule{2em}{0ex}}0\phantom{\rule{0.3em}{0ex}}\phantom{\rule{0.3em}{0ex}}\phantom{\rule{0.3em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{2em}{0ex}}\mathit{\text{otherwise}}\phantom{\rule{2em}{0ex}}\end{array}

(2)

for *v*=1,2,3,…,*L*-1.

The new image \widehat{R} now has pixel values which correspond linearly to those in *I*. However, the spatial misalignment between *I* and \widehat{R} will be the same as that between *I* and *R*. So the transformation required to register *I* and *R* will be the same as that required to register *I* and \widehat{R}, but, since *I* and \widehat{R} are linearly related, the sum-of-the-squared difference between *I* and \widehat{R} can be minimized. This new similarity measure is given by:

S=\sum _{i=1}^{N}{\left({I}_{i}-{\widehat{R}}_{i}\right)}^{2}

(3)

Since this measure is the sum of the squared difference between values of *I*
_{
i
} and their conditional mean, this equation actually represents a sum of the conditional variances (SCV) over all values of *R*
_{
i
}. Conveniently, since the equation for *S* now has the same form as the equation for the sum-of-squared difference, any standard optimization technique (e.g. Gauss-Newton, Levenberg Marquardt) can be used to find the geometric transform required to register \widehat{R} and *I*. In our experiments we minimized the SCV measure using a gradient descent optimization approach proposed by Lucas and Kanade [23].

### Experimental procedure

Our kinematic analysis technique is based on the fact that knee joint provides US scans with enough anatomical details. These details (anatomical landmarks on knee bones) captured in the US scans are crucial for the accurate motion parameter estimation using image registration, which means, rotation and translation in the US scans must correspond to the similar displacements of the natural landmarks of the knee bones. Fortunately, knee joints reveal enough anatomical details to be used as natural landmarks for image registration based kinematic analysis. During the full extension of the knee joint, the posterior parts of both the distal femur and proximal tibia can be captured in US scans. However, in the 90° flexion, the anterior surfaces of the distal femur, the trochlear grove, almost all the inferior surface of the femoral condyles, the anterior superior surface of the tibia, and the anterior surface of the tibia can be easily imaged using US. Moreover, the entire medial and lateral sides of the femur and tibia are visible at all flexion angles. So, it is feasible to scan *in-vivo* the knee joints at strategic anatomical positions for the purposes of kinematic analysis. Some *in-vivo* and *in-vitro* US images are shown in Figure 4. From these images, it is apparent that the knee bones have enough details to be used for non-invasive kinematic analysis.

In this validation framework for determining the movement of the sensor relative to the bones in the knee, three Interson USB ultrasound probes (Interson Corporation, Pleasanton, CA, USA) were attached to the calibration apparatus as depicted in Figure 3a and Figure 5a for two kinds of simulated environments respectively. The experimental apparatus has three rotation and three translation stages to allow the probes to be accurately positioned. The rotation stages have an angular precision of 1/60° and the translation stages have a precision of 10 *μ* m.

In the first validation experiment, an artificial model of a femur and tibia were placed in a water-filled container as shown in Figure 3a. The use of a water tank is common practice for validating ultrasound related biomedical experiments [26–28]. The water in the tank acted as a coupling medium and simulated the behaviour of muscle tissue. The frequency of the ultrasound signals used to capture the images was 24 MHz and the resolution of the images captured using these probes was 0.132 mm.

The probes were then translated by ±4 mm in steps of 0.5 mm in the *x*, *y* and *z* directions away from the initial positions and rotated by ±3° in steps of 1° around the *x*, *y* and *z* axes. At each position, the US probes were used to capture 2D B-mode scans of the surface of the model bone. These images were then registered to US images captured at the initial position of the probes. The amount of translation or rotation required to register the images was taken as an estimate of the movement of the probe relative to the surface of the bone. In-plane translations (along the *x*, *y* and *z* axes) and two in plane rotations (around the *x* and *y* axes) were determined from the two orthogonal US scans captured at the positions of *AA’* and *CC’* as well as the last out-of-plane rotation (about the *z* axis) was determined using parallel image slices at *AA’* and *BB’*, according to Figure 2. The estimates of the in-plane translations and rotations came directly from the output of the 2D-2D image registration. From the known center of rotation, the angular displacement of the two parallel probes was calculated and averaged to produce the estimate of the rotation around the *z* axis.

In the second validation experiment, we deliberately hide the internal bone structure by using an opaque balloon filled with a low concentration (10*%*, w/w at 20°C) collagen-water solution to mimic the properties of muscle tissue. This setup is shown in Figure 5. Figure 5a shows the experimental setup while Figure 5b shows the balloon phantom which covers the bone and is filled with the collagen-water solution. A typical scan from each of the three sensor arrays is also shown in Figure 5c. This kind of experimental setup can be justified by the fact that, in a real scenario we are not able to see which part of the femur and tibia are currently being scanned. As a result, there is no experimental bias introduced by placing the US probes at the best possible location for the subsequent registration process.

The low concentration collagen-water solution has been shown to closely resemble the acoustic properties of real muscle tissue [29, 30]. The speed of sound in this medium is almost 1540 m/s and collagen, the fibrous protein, is the major constituent of mammalian flesh and connective tissues [31–33]. To ensure successful transmission and reception of ultrasonic pulses through the balloon, ultrasound transmission gel was placed on the transducer and balloon interface. Then the *H* shaped US transducer assembly was moved carefully on the balloon surface to capture the translated and rotated images of the bone surface in a similar manner to the first water tank experiment. These images were then processed to extract the 6 3D rigid body transform parameters. The experiment as described in the following section, which was carried out on a human subject using accelerometer was approved by the human research ethics advisory panel (HREA) of UNSW Canberra and the approval number is A-14-31.