In this section, we describe the details of the proposed deformable surface model based statistical shape models construction method. It adopts a greedy algorithm which allows simplex meshes to converge on the object of interest under the constraints of both internal energy and VFC external energy. The whole procedure is illustrated in Fig. 2.
Automatic initialization of deformable surface models
In order to get accurate and robust results, we develop an automatic shape initialization method to derive an initial shape that has high overlap with the object of interest, such that the deformable models can then evolve more locally. Given the training samples with ground truth segmentation, we firstly obtain their simplex mesh representations, which are defined as the dual of the triangular mesh derived via Marching Cubes algorithm [26] and mesh smoothing methods [27]. To minimize bias towards the chosen initial shape and make the deformable simplex meshes more robust to local minima, we then choose the training sample similar to an average femur shape as the template mesh for the deformable models.
In order to obtain a good initial shape for the deformable models, we employ the Gaussian mixture model (GMM) based point set registration method [28] by aligning the template mesh with other training meshes via an affine transformation. In GMM based registration method, the main idea is to represent the point sets to be registered as Gaussian mixture models, and then align the two corresponding Gaussian mixtures by minimizing their \(L_2\) distance. Specifically, we represent the point sets of the template mesh as a Gaussian mixture \(f(\mathbf {x}) = \sum _{ i=1 }^{m} {\alpha _{i} \phi (\mathbf {x}  \mu _{i},\Sigma _{i})}\), where \(\alpha _{i}\) is the weight of each component, \(\phi (\mathbf {x}  \mu _{i},\Sigma _{i})\) is a Gaussian distribution with mean \(\mu _{i}\) and covariance variance \(\Sigma _{i}\). Similarly, the point sets of other training meshes are represented by \(g(\mathbf {x}) = \sum _{ j=1 }^{n} {\beta _{j} \phi (\mathbf {x}  \nu _{j},\Gamma _{i})}\). An affine transformation can be represented by a \(3 \times 3\) matrix \(\mathbf {A}\), and a translation vector \(\mathbf {t}\). For convenience, the matrix \(\mathbf {A}\) is factorized as an orthogonal matrix \(\mathbf {Q}\) and a symmetric positive definite matrix \(\mathbf {S}\), i.e., \(\mathbf {A}=\mathbf {Q}\mathbf {S}\) [28]. Then the affine transformation (i.e., \(\mathbf {A}\) and \(\mathbf {t}\)) can be found by minimizing the following \(L_2\) distance between Gaussian mixtures \(f(\mathbf {x})\) and \(g(\mathbf {x})\) [28]:
$$d_{L_2}(f,g,\mathbf {A},\mathbf {t}) = \int {(gf_{\mathbf {A},\mathbf {t}})^2dx} = \int {(f_{\mathbf {A},\mathbf {t}}^22f_{\mathbf {A},\mathbf {t}}g+g^2)dx},$$
(6)
where \(f_{\mathbf {A},\mathbf {t}}(\mathbf {x})= \sum _{ i=1 }^{m} {\alpha _{i} \phi (\mathbf {x}  \mathbf {A}\mu _{i}+\mathbf {t},\mathbf {Q}(\Sigma _{i})\mathbf {Q}^T)}\) is the transformed distribution of the template mesh. Since \(\int {g^2}\) is independent of the affine transformation, we only need to consider \(\int {f_{\mathbf {A},\mathbf {t}}^2}\) and \(\int {f_{\mathbf {A},\mathbf {t}}g}\). Fortunately, both terms has closedform expressions, for the latter term:
$$\int {f_{\mathbf {A},\mathbf {t}}g \, dx} = \sum _{ i=1 }^{m} {\sum _{ j=1 }^{n} {\alpha _{i}\beta _{j} \phi (\mathbf {x}  \mu _{i}\nu _{j},\mathbf {Q}(\Sigma _{i})\mathbf {Q}^T+\Gamma _{i})}}.$$
(7)
Considering that the gradients associated with affine transformation have analytical solutions [28], fast gradientbased numerical optimization techniques (e.g., QuasiNewton methods) can be deployed to minimize the objective function.
After deriving the affine transformation (i.e., \(\mathbf {A}\) and \(\mathbf {t}\)), we used it to transform the template mesh to the space of other training meshes, resulting in the initial shape for the deformable models.
Evolution method
We adopt the greedy algorithm [16] as evolution method to guide deformable 2simplex meshes to the object of interest (e.g., edges) [21]. Let \(V_i\) be the voxel in the volumetric image containing vertex \(P_i\). During each iteration, a \(w\times w\times w\) cubic window around \(V_i\) is searched, and the energy is computed at each voxel within the window (see Fig. 3a). The energy at vertex \(P_i\) is defined as a combination of both internal and external energy normalized within the window:
$$\begin{aligned} {\varvec{E}(P_i) }&= {\alpha \varvec{E}_{int}(P_i) + \beta \varvec{E}_{ext}(P_i)} \\&= {\alpha \left( \varvec{E}_{Tangent}(P_i) + \varvec{E}_{Normal}(P_i) \right) + \beta \varvec{E}_{VFC}(P_i)}, \end{aligned}$$
(8)
where \(\varvec{E}_{Tangent}\) and \(\varvec{E}_{Normal}\) are internal energy, \(\varvec{E}_{VFC}\) is the VFC external energy, and \(\alpha > 0\), \(\beta > 0\) are weighting parameters that control the relative influence of internal and external energy. The internal energy regularizes the rigidity and elasticity of the deformable models, while VFC external energy attracts the deformable models to the object of interest (e.g., edges).
We define \({N_i}^{t}\) as the \(w\times w\times w\) cubic window around voxel \(V_i\). The position \({Q_i}^{t}\) with minimum energy within the window \({N_i}^{t}\) is chosen as:
$${Q_i}^{t}={\mathop{\text{arg min}}\limits_{{P_j}\in {N_i}^{t}}} \,\varvec{E}(P_j).$$
(9)
In order to guarantee a stable and smooth deformation of the 2simplex meshes, the vertex \(P_i\) is moved only along its normal direction \(\mathbf {n}_i\), rather than directly moved to \({Q_i}^{t}\) as in the classical greedy algorithm (see Fig. 3b):
$${P_i}^{t+1} = {P_i}^{t}+ \left( ({Q_i}^{t}{P_i}^{t})\cdot \mathbf {n}_i \right) \mathbf {n}_i.$$
(10)
The overall pseudocode of the greedy algorithm is summarized in Algorithm 1 [21].
Internal energy computation
The internal energy maintains the geometrical regularity of the deformable 2simplex meshes. We adopt the original internal force proposed in [15] for greedy algorithm as internal energy, which is decomposed into tangential energy and normal energy [21]:
$$\begin{aligned} {\varvec{E}_{int}(P_i) }&= {\varvec{E}_{Tangent}(P_i) + \varvec{E}_{Normal}(P_i)} \\&= {{\left\ \frac{1}{3}( P_{N_1(i)}+P_{N_2(i)}+P_{N_3(i)} )F_i \right\ }^2 + \;{\left\ L(r_i,d_i,\tilde{\phi }_i)L(r_i,d_i,\phi _i) \right\ }^2,} \end{aligned}$$
(11)
where \(\tilde{\phi }_i\) is the reference simplex angle defined as the mean of neighboring vertices’ simplex angles. The tangential energy measures the distance of the orthogonal projection \(F_i\) of \(P_i\) from the center of gravity of its three neighbors in the tangent plane. When the energy achieves the minimum, it ensures that the vertices exhibit uniformly on the surface. While the normal energy keeps a local smoothing simplex mesh through a \(C^2\) smoothness constraint.
VFC external energy computation
Vector field convolution (VFC) field [18] is a widely used static external force for physicsbased deformable models. It largely solves the problems associated with traditional external force and can guide the active contour into long and thin boundary. Furthermore, in comparison with the classical GVF external force [19–21], VFC force shows superior robustness to noise and initialization, and much less computational cost [18]. The threedimensional VFC field \(\mathbf {v}({x, y, z})=[u({x, y, z}), v({x, y, z}), w({x, y, z})]\) is defined as the convolution of a vector field kernel \(\mathbf {k}({x, y, z})\) with the edge map f(x, y, z) generated from the input image I(x, y, z):
$$\mathbf {v}({x, y, z}) = f({x, y, z}) \otimes \mathbf {k}({x, y, z}),$$
(12)
where \(\otimes\) denotes linear convolution. The basic principle behind this formula is to propagate the gradient vectors of input image I(x, y, z) into homogeneous image regions via image convolution, such that the deformable models have more chances to move towards the object of interest (e.g., edges).
The edge map f(x, y, z) for graylevel input images is usually defined as the gradient magnitude of the blurred image:
$$f({x, y, z}) =  \nabla [G_{\sigma }({x, y, z})] \otimes I({x, y, z}),$$
(13)
where \(G_{\sigma }({x, y, z})\) is a 3D Gaussian function with standard deviation \(\sigma\), and \(\nabla\) is the gradient operator. The vector field kernel \(\mathbf {k}({x, y, z})=[u_k({x, y, z}), v_k({x, y, z}), w_k({x, y, z})]\) can be calculated as follows:
$$\mathbf {k}({x, y, z}) = m({x, y, z}) \mathbf {n}({x, y, z}),$$
(14)
where m(x, y, z) is the magnitude of the vector at (x, y, z) and \(\mathbf {n}({x, y, z})\) is the unit vector from (x, y, z) to the kernel origin (0, 0, 0) given as:
$$\mathbf{{n}}(x,y,z) = \left\{ {\begin{array}{lll} {[0,0,0],}&\quad{\mathrm{{if }}(x,y,z) = (0,0,0),}\\ {[  \frac{x}{r},  \frac{y}{r},  \frac{z}{r}],}&\quad{\mathrm{{otherwise}},} \end{array}} \right.$$
where \(r=\sqrt{x^2 + y^2 + z^2}\) is the distance from the kernel origin (0, 0, 0).
The magnitude of the vector field kernel m(x, y, z) plays a major role in generating the VFC field. Considering that the influence of the object of interest should be diminished as the deformable models become further away, then two types of magnitude functions can be defined as decreasing functions of the distance from the kernel origin [18]:
$$m_1({x, y, z})=( r + \varepsilon )^{\gamma },$$
(15)
$$m_2({x, y, z})= {\rm exp} \left( \frac{r^2}{\zeta ^2}\right),$$
(16)
where \(\gamma > 0\) and \(\zeta > 0\) are parameters that control the decrease of the magnitude, and \(\varepsilon > 0\) is a small constant for the prevention of division by zero at the kernel origin. The parameters \(\gamma\) and \(\zeta\) are set by considering the signaltonoise ratio (SNR) of the input image I(x, y, z) [18].
In the practical implementation, the 3D VFC field \(\mathbf {v}({x, y, z})=[u({x, y, z}), v({x, y, z}), w({x, y, z})]\) can be calculated by convolving the edge map f(x, y, z) with each component of the vector field kernel \(\mathbf {k}({x, y, z})=[u_k({x, y, z}), v_k({x, y, z}), w_k({x, y, z})]\) [18]:
$$u({x, y, z})= f({x, y, z}) \otimes u_k({x, y, z}),$$
(17)
$$v({x, y, z}) = f({x, y, z}) \otimes v_k({x, y, z}),$$
(18)
$$w({x, y, z})= f({x, y, z}) \otimes w_k({x, y, z}).$$
(19)
The continuous vector field kernel \(\mathbf {k}({x, y, z})\) is approximated by a discrete matrix defined on a 3D grid [18]:
$$\{\mathbf {k}(x,y,z)~x,y,z=R,\ldots ,1,0,1,\ldots ,R\},$$
(20)
where R is the chosen kernel radius.
To the best of our knowledge, all current work on deformable models employs VFC as force fields rather than potential energy. To tackle the main issues of the traditional external energy, we extend it for greedy algorithm as external potential energy. Specifically, we make the following enhancements to the original VFC method:

1.
The magnitude of the VFC field \(\left\ \mathbf {v}({x, y, z})\right\\) is used as external energy instead of the force field itself.

2.
One major drawback of the original VFC method is that the generated VFC fields are dominated by the strong edges, this will make the weak yet important edges smooth out along with the noise and cause leakage problem [18]. The GVF fields presented in [19] have successfully avoided this leakage problem and can retain the weak edges, as can be seen from Fig. 3(b) in [18]. It is mainly because in GVF, all the edges generated from different features are treated equally, which is achieved by normalizing the used edge maps to the range [0, 1] during GVF computations. After this normalization step, the weak edges will thus have similar magnitude to that of the strong edges, resulting in equal importance between them. Considering that no noise is presented in our binary input images, we propose also to normalize the edge map f(x, y, z) to [0, 1] before the computation of VFC energy, in order to alleviate the leakage problem and preserve the weak yet important edges.
Therefore, the VFC energy at vertex \(P_i\) can be defined as:
$$\varvec{E}_{VFC}(P_i) = {\left\ \mathbf {v}(P_i)\right\ }.$$
(21)
When vertex \(P_i\) is within the volumetric image, its external energy can be derived through a trilinear interpolation of the external energy at its neighboring image grid points; otherwise, to keep the deformable simplex meshes from moving out of the volumetric image, its VFC energy is set to be the maximal value of the external energy in the volumetric image [21]. When VFC is used as potential energy, we can precompute it and store its field magnitude as graylevel images, this can help greatly reduce the computation time of the VFC, while this does not hold when VFC is used as force fields.
Statistical shape model construction
Using our proposed greedy algorithmbased deformable simplex meshes, we obtain a set of K corresponding femur training shapes \(\{M_i~~i=1,2,\ldots ,K\}\), and each shape \(M_i\) is represented by a shape vector \(\mathbf {x}_i\) with N landmark points. Firstly, all training shapes are spatially aligned into a common coordinate frame using the generalized Procrustes analysis (GPA) [29]. We define the corresponding covariance matrix as:
$$\mathbf {S} = \frac{1}{K1} \sum _{i=1}^{K} (\mathbf {x}_i  \bar{\mathbf {x}})(\mathbf {x}_i \bar{ \mathbf{x}})^T,$$
(22)
where \(\bar{\mathbf {x}}\) is the mean shape vector of all subjects: \(\bar{\mathbf {x}} = \frac{1}{K} { \sum _{i=1}^{K} { \mathbf {x}_i } }\). Then the principal component analysis (PCA)based statistical shape model (SSM) can be built by an eigendecomposition on the covariance matrix \(\mathbf {S}\):
$$\mathbf {S} = \mathbf {U}\mathbf {D}\mathbf {U}^T,$$
(23)
where columns of matrix of \(\mathbf {U}\) form the principal modes of variation \(\phi _m\), and diagonal entries of \(\mathbf {D}\) are their respective variances \(\lambda _m\). Then any valid shapes of femur structure can be approximated by a linear combination of the first c modes of variation:
$$\mathbf { x } = \bar{\mathbf {x}} + \sum _{m=1}^{c} { b_m\phi _m },$$
(24)
where \(c = \min \{ t ~  ~ { \sum _{i=1}^{t} \lambda _t }/{ \sum _{i=1}^{K1} \lambda _t } > 0.98 \}\), and \(b_m\) is the shape parameter constrained to the interval \(b_m \in \left[ 3\sqrt{\lambda _m}, 3\sqrt{\lambda _m}\right]\).