Deformable part models for object detection in medical images
© Toennies et al.; licensee BioMed Central Ltd. 2014
Published: 28 February 2014
Object detection in 3-D medical images is often necessary for constraining a segmentation or registration task. It may be a task in its own right as well, when instances of a structure, e.g. the lymph nodes, are searched. Problems from occlusion, illumination and projection do not arise, making the problem simpler than object detection in photographies. However, objects of interest are often not well contrasted against the background. Influence from noise and other artifacts is much stronger and shape and appearance may vary substantially within a class.
Deformable models capture the characteristic shape of an anatomic object and use constrained deformation for hypothesing object boundaries in image regions of low or non-existing contrast. Learning these constraints requires a large sample data base. We show that training may be replaced by readily available user knowledge defining a prototypical deformable part model. If structures have a strong part-relationship, or if they may be found based on spatially related guiding structures, or if the deformation is rather restricted, the supporting data information suffices for solving the detection task. We use a finite element model to represent anatomic variation by elastic deformation. Complex shape variation may be represented by a hierarchical model with simpler part variation. The hierarchy may be represented explicitly as a hierarchy of sub-shapes, or implicitly by a single integrated model. Data support and model deformation of the complete model can be represented by an energy term, serving as quality-of-fit function for object detection.
The model was applied to detection and segmentation tasks in various medical applications in 2- and 3-D scenes. It has been shown that model fitting and object detection can be carried out efficiently by a combination of a local and global search strategy using models that are parameterized for the different tasks.
A part-based elastic model represents complex within-class object variation without training. The hierarchy of parts may specify relationship to neighboring anatomical objects in object detection or a part-decomposition of a complex anatomic structure. The intuitive way to incorporate domain knowledge has a high potential to serve as easily adaptable method to a wide range of different detection tasks in medical image analysis.
Although medical images provide insight into patient-specific human anatomy otherwise not accessible, their interpretation still requires extensive expert knowledge to fill the semantic gap between image data and depicted anatomy and/or function. Image characteristics of the object-of-interest may be known but are often not unique. Furthermore, noise, imaging or reconstruction artifacts alter the depicted content to a much greater extent than for pictures such as photos. Anatomical objects are not always contrasted well against each other and neighboring structures have similar appearance. However, most of the data is 3-D. Change of shape and appearance due to projection has not to be dealt with.
Still, detection and segmentation is possible if a suitable model completes missing information and corrects measurement errors in the data. The premier effect of such deficient information would be an incorrect delineation of the object's boundary. Hence, shape is the most important supplementary model information. Context information about adjacent structures, as well as information about object and background appearance may be added for discriminating among similar objects.
Developing a different method for each new detection or segmentation task is inefficient. Thus, a major challenge is to find a parameterizable representation that can be adapted efficiently for finding different objects.
Object shape can be represented by sampling points on its boundary. If such model is fitted to the data, boundary points of a model instance are registered with likely boundary locations in the image (e.g., high gradient strength locations). If artifacts or low contrast cause some model points not to have counter parts in the image, the model instance predicts the local course of the boundary there. Visible shape parts have to be sufficiently characteristic so that this prediction does not result in inacceptable errors.
The model has to include within-class variation of the structure of interest, while inhibiting influences from between-class variation.
A weight for combining data and model needs to be set appropriately.
The search for object instances in the data has to deal with many local minima of the corresponding optimization criterion.
This expression is found in many formulations of segmentation and detection tasks. It may be interpreted as an energy balance where external influences from a potential field generated by the data are counteracted by internal forces that represent domain knowledge such as the shape of an object or just the smoothness of its boundary. The parameter λ weighs the reliability of these two kinds of information. The equation can also be interpreted as logarithm of the conditional probability given by a normal-distributed likelihood function based on and a normal-distributed a priori probability depending on . Here, λ represents the ratio of variances of the two distributions.
Equation (1) may be optimized by gradient descent. It requires that the model instance is initialized sufficiently close to the optimum searched for. For global optimization, stochastic search techniques, e.g., the method by  may be used, which draws randomly distributed initializations and selects the best fitting candidates based on (1).
With this paper, we present an elastically deformable model as representation for shape and appearance. We argue that this model may be used efficiently for object detection as it does represent shape variation and expected appearance for an object class without requiring training (although training may improve performance). The main reason for this lies in the fact that model information is represented by integrating simple deformable shapes in a part-based model that, depending on the task, either represents object parts or guiding structures. We show that this model can be applied to solve different detection and segmentation tasks in medical image analysis.
The remainder of the paper is structured as follows. We first present previous work on models in object detection for medical images. We then present our elastic model, its application to object detection and its extension to a part-based model. We conclude with a number of examples that show the different capabilities of applying the model to detection and segmentation tasks.
Extracting an object-of-interest from the image contradicts the usual assumption for segmentation that segment characteristics are part of the data information. Extracting, e.g., a liver from CT images may require separating it from gall bladder, stomach, and feeding vessel. Depending on imaging modality and protocol the appearance of these structures may be very similar and insufficient for separation.
Extraction may be simple for a human observer with necessary knowledge about shape, appearance and location of the organ. In a computer-guided solution, a detection task has to be solved first, which supports the subsequent boundary delineation task. The two tasks may be solved in parallel or sequentially. In the latter case, the detection result constrains the subsequent segmentation. In the former case, a model instance is expected to deform into the object that is searched for.
The information that is needed for detection and segmentation can be quite complex. Necessary domain knowledge may be introduced at several stages of the process and the "intelligence" of the solution lies in the construction of the process from its sub-tasks (e.g., the kidney detection and segmentation scheme of ). Alternatively, a model can be built that contains information of all aspects about the object necessary for identification in the data including possible dependencies between different types of information. Detection is then fitting a model to measured data (e.g., the application of a shape and appearance model for detecting the left ventricle of the heart in MRI ). We prefer the latter approach as it separates model description from the search for model instances and may be adapted easier to a new detection task.
If detection precedes delineation, it localizes the object-of-interest. In this case, the result locally constrains boundary delineation if the object is not sufficiently contrasted against adjacent structures. Localization may be interactive (e.g., seed points for region growing , closed active contours , or starting points for live wire delineation ). In this case, the detection result often includes little information with respect to object boundary, since interaction costs are usually kept to a minimum. Hence, interactive localization is most suitable if the boundary itself is well defined by the data and interaction is used to find the object among different structures with similar appearance. Small artifacts such as noise can be accommodated by requiring smooth and closed boundaries. If substantial parts of the boundary cannot be derived from data information, they need to be added by interactive delineation of boundary parts.
Localization by matched filter methods such as vesselness filters , blobness , template matching  and Hough transform  does not require interaction, except possibly for specification of a relevancy threshold for a successful detection. These methods are capable of finding multiple instances of an object. The filters predict the (average) object shape and appearance which may then be used as prior for data-driven segmentation, e.g., by graph cuts [11–13] or level sets [14, 15]. The methods operate with few parameters which usually can be found easily. Of course, it restricts shape representation either to variable, but simple shapes (e.g., the vesselness filter) for which intra-class variation can be described by few parameters, or to a representation by a single average shape (e.g., the generalized Hough transform).
Representing acceptable shape variation of complex object shape is more difficult. Unconstrained shape variation will cause the model to fit almost everywhere in the image while an overly constrained shape will not fit well to the data. Active Shape Models (ASM) and its many variants have found widespread use since their introduction in . Shape is represented by coordinates of a set of labeled points. If appearance variation shall be represented as well, it is given by intensity or texture variation at these points. Variation is trained from correspondingly labeled test data. Influence from rotation, translation and sometimes scaling has to be removed by Procrustes analysis . Variation is assumed to be normal-distributed and highly correlated. Hence, principal component analysis is used for de-correlation and information is reduced to a few eigenmodes (modes of variation). The model enables a potentially arbitrarily exact fitting of the missing boundary, as long as the visible part of the boundary sufficiently constrains the variation of the missing part.
Training of such a point distribution model (PDM) can serve two purposes. It estimates shape (and appearance) variation of the class of objects to be delineated and it restricts shape variation. The success of PDMs for model-guided segmentation has been shown in numerous publications (e.g., [5, 18, 19]). Often only few training samples - considering the degrees of freedom of the untrained model - are needed for segmentation. The reason is probably twofold. Firstly, within-class shape variation has much fewer degrees of freedom than the model so that the true co-variance can be estimated from few samples. Secondly, boundary delineation does not require exact estimates of intra-class variation anyway as long as incorrectly estimated shape variation is corrected from boundary information in the data.
Besides PDM, the Hough transform may be extended into a probabilistic shape model as well . However, compared to a PDM, its versatility is restricted since appearance information is not part of the model and it is unclear how the trained information may be used for subsequent boundary delineation.
Since a shape model has not to be very accurate for boundary delineation, prototypical variation has also been used for shape-supported boundary delineation. If detection is not necessary (e.g. when tracking a heart shape in a 3d+time sequence), the necessary shape information are properties such as closedness, smoothness and similarity of the boundary between time frames. This has been used, e.g., in tracking the heart beat from CT using an active surface based on a finite element model (FEM) to restrict boundary variation between time steps . This concept has been extended to an elastically deforming shape model (e.g., the mass-spring model  or an FEM ). Such model restricts organ variation to elastic deformation of a prototypical shape. This is particularly simple for a homogeneous, linear-elastic FEM since only few parameters regulate its stiffness. Defining such a prototype solves part of the optimization problem in detection as well, since - similar to active contour models - rather basic image information can be embedded by forces acting on a model instance, locally attracting it to the object-of-interest.
However, an elastic model provides just an approximation of intra-class shape variation. To some extent, either the delineation goal or the detection goal has to be sacrificed if the shape of the object itself or its variability is complex. If stiffness overestimates true shape variation, the model instance may adapt to the object boundary given sufficiently reliable image information, but shape deformation may not be used for object detection. If the shape variation is overconstrained it may be used for object detection but will result in a poor segmentation of the object.
Instead of training complex shape variation, some of this variation can be efficiently represented using part-based models. A shape is decomposed into simpler parts. Variation is then that of the parts and that of part-relationships. In other applications, part-based models have been successfully used for articulated object detection (e.g., detection of faces and people , or pedestrian recognition ). Although part-relationships may be learnt , an advantage is that the qualitative knowledge of a decomposition of an object into parts is often readily available (e.g. the decomposition of the spine into a sequence of vertebrae separated by disks). We successfully applied a combination of deformable models with a part-based representation for several object detection and boundary delineation tasks in medical image analysis.
With this paper we describe how to use our part-based deformable model for object detection and segmentation and illustrate the applicability to different tasks in medical image analysis.
As stated above, variation represented by deformable models often does not have to be exact. However, if it is to be specified by the user instead of being trained from samples, parameterization of the model should be intuitive. Representing the object as an elastically deformable material meets this condition since most users have an understanding of how elasticity influences shape variability. Elastic models have been used in medical image applications for quite some time (e.g.,  for elastic registration), although not often to restrict shape variation for detection tasks. We use this concept to replace trained shape variation. It should be noted that we do not want to simulate deformation behavior of an object (e.g., the deformation of the heart ventricles over time). Elastic deformation is solely used to replace distribution estimates to restrict shape variation among objects of the same class. Since describing complex shape variation by few elasticity parameters may be too simple, we use deformable objects as components of a part-based model. Relationships between parts are modeled by an elastic superstructure.
Object detection consists of a local and a global part. Localized delineation lets a model instance be attracted and deformed by image features. The global search is realized by repeatedly generating random initializations of model instances and selecting the best fitting candidates. Fitting is defined according to equation (1) by a data term that is regularized by shape deformation.
Finite Element Models (FEM)
We use FEMs to represent the deformable shape. A finite element model represents linear-elastic deformation of a 2-D or 3-D object based on a discretization of the object domain into finite elements e. Elements are generated from node locations that sample the object domain. The number of nodes depends on the accuracy with which an object is described. For a given object, the FEM is generated from a 2-D or 3-D representation of the object's shape, either based on a priori knowledge, e.g., about the shape of a vertebra, or from an example segmentation. If segmented data is used, it should reflect a representative object example. It may also be helpful to smooth the segmentation in order to remove irrelevant detail from the model instance before generating the FEM. The FEM may be generated by triangulating the segment, e.g., by computing a Delaunay triangulation.
The displacement depends on applied forces, the elasticity and geometry of the element, as well as on the interpolation functions used to compute continuous material deformation within the element. When linear interpolation is sufficient - which is true for our applications - element shape and elasticity parameters are the only influences that define the deformation behavior.
The elastic modulus, given the isotropic material assumptions above, may be described by Young's modulus. It tells how an isotropic material resists deformation due to opposing forces (see Figure 2b). Changing Young's modulus controls the amount of deformation due to external forces, e.g., image-derived forces.
Given element geometry, interpolation functions, Poisson's ratio and Young's modulus, the stiffness matrix can be computed for each element and assembled into global stiffness matrix K. Hence, the FEM, created from an exemplary segmentation or from a priori knowledge, represents a deformable shape by very few parameters that are intuitive in a sense that a user does not have to understand the underlying numerical concepts of the FEM method.
Dynamic optimization and object detection
Specifying masses in M is necessary if a moving model instance should resist force changes from the image in order to let it move over spurious image details from noise or artifacts. If this is not necessary, mass may be omitted, leading to a damped gradient descent.
If the system is mass-free, we may set D = I + β K, where I is the identity matrix.
Boundary nodes are attracted by boundary features in the image
Inner nodes are attracted by image features according to the specific appearance of the object to be detected
Different kinds of boundary or appearance nodes can be defined if the expected edge strength or appearance varies in an object-specific fashion.
External forces can be computed based on the current location and displacement of nodes with respect to image features in the model's vicinity. This may become necessary if the data is unreliable and current node locations and displacement are used to select image regions evaluated for force computation. An example will be demonstrated in Application 2 described below.
A single or several appearance nodes may help to determine whether the boundary nodes - which are only sampling the boundary - sit on the correct object boundary.
Appearance nodes that sample the appearance inside the object and in the background in the object's vicinity may serve as a complex boundary model in cases where the boundary is difficult to detect in the data using a local operation such as edge detection.
Since local features are not unique (otherwise, detection would be trivial), the influence radius, i.e., kernel width should be as large as possible in order to attract FEM instances from far away, but small enough to avoid overlapping influences, i.e., blurring from different local features.
Given particular potential fields, an FEM instance can be placed anywhere in the image and deforms under the field-derived forces. The potential field is constant over time and may be pre-computed. Forces f change with the displacement in the potential field and, therefore, with time. An instance has found its final destination when internal forces from deformation and external forces from the image balance each other. Computing adaptation of the deforming FEM instance to the image requires computation of the dynamic system described by equation (4).
Computation of this system of dependent differential equations can be done in several ways (see, e.g., ). We prefer a computation via decorrelation into independent modes. It does not only produce a stable solution to the problem but also allows selecting relevant deformation modes (similar to the selection of variation modes in ASM). This has been used for object classification , mapping between similar objects , and to provide a base for training an ASM, if training is desired and possible .
where e i is the i-th eigenvector of E and λ i is the i-th eigenvalue. An analytical solution for such equations can be computed stably and fast. The approach is applicable for a mass-free system as well.
Eigenvectors and eigenvalues carry semantics similar to ASM. An eigenvector is called a mode of vibration and represents a generalized symmetry axis of shape deformation. The overall deformation is a weighted sum of the generalized deformations due to the transformed forces ET f(t). It is possible to reduce the number of modes in order to remove small information details and reduce computational costs. Eigenvalues characterize the amount of force necessary to cause displacement. Assuming ascending order of eigenvalues, the first n! eigenvalues of an n-dimensional FEM will be zero and represent rigid transformation (rotation and translation). The following, intermediate eigenvalues are the most relevant modes representing shape deformation.
The basic idea of warping is to include the current rotation into the representation. Therefore, forces are applied to the un-rotated model and the result is rotated back. If warping is applied on the node level, the current rotation R i is computed for all nodes i. They are concatenated into an orthogonal matrix R of which its inverse is applied to the current displacement u(t) as well as to its derivatives. Deformation is then applied to these vectors and the result is rotated back (the impact of this correction can be seen in Figure 4). It can be shown that this can be applied in the spectral basis as well . This essentially means that the current vibration modes are "rotated" versions of the original modes.
for each node can be computed fast using a method proposed by .
Dynamic optimization of an FEM instance as described above produces a locally optimal fit. For global optimization we use a stochastic initialization technique similar to the method that was suggested by . FEM instances are initialized at different locations in the image and serve as detection candidates for the object-of-interest. After performing local optimization as described above the best candidates spawn new instances in their vicinity. The process ends when no further improvement can be reached.
Rating candidates requires definition of a QoF function. It is defined according to equation (1). Data quality measures the value of the potential field at final node locations and the regularization term measures the deviation of the model from its initial shape based on the weights of the vibration modes.
Computational complexity of the process is O(N) where N is the number of nodes of the model instance. However, computation times depend on the convergence of the various iterative optimization schemes, namely the iterative deformation, the iterative computation of the singular value decomposition in  for computing the warping, and on the number of iterations in the stochastic search.
Hierarchical part-based model
The few parameters of the deformable model described in the previous section are sufficient for object detection as long as the object in question has a rather characteristic mean shape and appearance. If this is not the case, training of shape variation such as in ASMs would help. However, to reduce training effort (ideally up to a point where all parameters are user-specified based on domain knowledge), information has to be included in a qualitative rather than a quantitative way. Hence, we adopted the principle of part-based models for augmenting the descriptive power of our model.
The parts may be sub-objects of the object-of-interest (such as the vertebrae of a spine model).
The parts may relate the object-of-interest to surrounding objects that help to localize the object.
Additionally to the elastic deformation of the parts, division into parts, their relationship to each other, and potential mutual deformation are relevant for model specification.
A part-based model can be realized conveniently within a hierarchical FEM (HFEM) framework [36–38]. The shape is decomposed into parts based on user's specification. Each part is represented by an FEM. This constitutes the morphological layer of the complex object. Parts may have different material properties and different potential fields to accommodate their different semantics. Spatial relationships between parts are represented by a second-level FEM which constitutes the structural layer. Forces on the structural layer FEM are exerted from deformation and displacement on the morphological layer of the HFEM. They, in turn, impose constraints on the shape of its structural layer.
The HFEM may be realized explicitly by generating a set of morphological and structural FEM or implicitly by generating a single, heterogeneous FEM from the morphological and the structural FEM.
A single-connected shape may rotate independently but is restricted in distance to other sub-shapes via the structural layer.
A sub-shape that is connected via two nodes to the structural layer is further restricted in its rotation. Properties such as approximate orthogonality or parallelism of parts can be ensured by this kind of connection.
A sub-shape connected by more than two nodes shares some of its non-rigid deformations via the structural layer.
The influence of external forces between layers is then realized by a message passing algorithm . Displacements due to deformation on the morphological layer act as external forces on the structural layer, while deformations on the structural layer cause forces on the morphological layer. Computation of the dynamic behavior, hence, requires solving the optimization problem presented in the previous section for each sub-shape on the morphological layer, passing the resulting forces to the structural layer, computing the resulting deformation and passing displacement back to the morphological layer.
In the hierarchical representation, each FEM is diagonalized separately resulting in its own vibration modes. It allows selecting vibration modes independently for each FEM. Hence, different requirements on precision, e.g., between structural and morphological layer, or for different substructures on the morphological layer can be accommodated.
The weights w j may be used to account for a different importance of parts. If, for instance, some parts are just guiding structures to find the actual object-of-interest, their fit has not to be very precise.
Since the part relationship may be decisive for detecting the object, deformation and image support is then computed for the structural layer FEM as well. Deformation is measured in the same way as for the morphological layer. The external energy is now given by the state of the morphological sub-shapes. Hence, it is simply defined by the QoF from the sub-shapes of the layer below.
Morphological and structural layer FEM are defined as above. However, instead of treating them as single FEM, which communicate through message passing, the FEM are assembled in the same way as elements were assembled to the FEM. Assemblage is guided by node connections between structural and morphological layer as before. Through the assemblage these nodes are treated as shared nodes between elements of the morphological and the structural layer.
Computation of vibration modes applies to the complete part-based object and optimization does not require message passing between deformable part models.
The quality-of-fit is computable in the same way than for a single FEM based on deformation and input from the data.
Different potential fields and/or different material properties of the part-FEM replace the weights in equation 10.
If warping is used on the node level this applies to the complete FEM. Hence, single-connected FEM do not allow independent rotation.
In general, the second solution is simpler regarding computation of dynamic deformation and stochastic optimization, but it also integrates parts tighter into the representation. Therefore, it depends on the application which of the two methods should be preferred. This will be discussed further in the next section where we will present different applications using the two approaches for different analysis goals in medical image analysis.
Results and discussion
In the following, we present a number of applications to illustrate how to parameterize and use a part-based deformable model for various tasks in medical image analysis. We will summarize performance and results for each application. Since we use the application to demonstrate the use of the model, we refer to the original publications [38–40] for details.
Indentifying Heschl's gyrus in flat maps of the human cortex
Experiments with functional magnetic resonance imaging (fMRI) attempt to localize and delineate particular brain regions, such as the human primary auditory cortex (pAC) and neighboring higher-order areas, in vivo. The pAC is known to be located on the first transverse temporal gyrus (i.e., Heschl's gyrus, HG). Since the region covered by the pAC is very small with respect to the spatial resolution of fMRI and the signal to noise ratio is rather poor, fMRI data of a population of subjects are combined to arrive at a representative functional map. The combination of individual data to a group map requires the mapping of corresponding regions across subjects. We solved this registration task by localizing macro-anatomical landmarks (i.e., the lateral "Sylvian" fissure and superior temporal sulcus) that delineate the superior temporal lobe and enclose the highly variable Heschl's gyrus. The detection is performed using a deformable model of HG that is fitted to the cortical surface (i.e., a surface mesh that represents the grey-white matter interface, gwI) [36, 40]).
Identifying HG in a given flat map requires finding a specific U-shaped border at those zero crossings. This is difficult since the size, location, orientation and individual shape of HG vary substantially between subjects, while there are several gyri of similar shape in each cortical flat map. Experts identify the correct gyrus by means of its relation with two anatomical landmarks: the Sylvian fissure (SF) and sulcus temporalis superior (STS) are approximately parallel to each other and orthogonal to HG.
A part-based deformable model is well suited to represent anatomical descriptions such as U-shapedness and spatial relations between cortical gyri and sulci.
HG, STS and SF were modeled on the morphological layer of a two-level model and combined on a structural layer that models the structural configurations of the macroanatomical landmarks. Since the gyri and sulci were simple 2-D shapes, they were manually drawn based on example images. For HG, two different variants were created since some subjects may have a HG with a sulcus intermedius (SI).
Boundary nodes of the HG and the HG+SI models had access to smooth potential fields, whose minima represent zero crossing locations of the flat map. In practice, the filtering operations were approximated as follows. For a given surface mesh we computed a discrete, difference-of-Gaussian filtered version of the curvature mapping: We first applied an operator that separated convex regions (gyri) and concave regions (sulci), and then we used a discrete approximation of a heat diffusion kernel to smooth the resulting binary map. The low pass filter kernel widths of σ 1 = 2 mm and σ 2 = 2σ 1 for the difference-of-Gaussian provided a good tradeoff between a large region of influence and blurring of adjacent zero crossings. During the model fit, we estimated for each model node the vector to the nearest salient vertex on the cortical surface mesh (with a local maximum filter response and within a given sampling distance) and used the weighted radial component of this vector as external force.
Internal nodes of the HG model responded to positive curvature (gyri) only, whereas the HG+SI model contained nodes responding to negative curvature (sulci) at the SI location as well. Again, the potential fields were in practice defined based on heat kernel smoothed versions of the binary curvature maps.
Since SF and STS primarily served as limits for restricting the cortex region to be searched for HG, it was important to robustly and correctly localize these landmarks, while only roughly matching their main branches. The STS and SF models were constructed as simple line-shaped structures, undersampled with FE nodes that responded only to appearance, i.e. curvature information. This sparse sampling allowed bridging gaps in the curvature maps, while sufficient similarity of the structure to a line was still ensured. (example images showing variability of SF, STS)
The model at the top-layer arranged HG (and HG+SI, resp.) as the central part of the AC folding pattern nearly orthogonal to the two surrounding parallel sulci. To set up a sparse top layer model, "link" nodes were identified for related sub-shapes and duplicated such that the resulting structural model consisted of these shared link nodes (see Figure 9). Four nodes represented the parallel arrangement of SF and STS in the 2-D flat maps, and the internal node was linked with the HG model to position it "above STS" and "below SF". In the structural prior model HG+SI an additional node was linked with a simple SI model in a "contained in HG"-relation.
The FE nodes are then subject to boundary conditions, such that any nodal displacement in the morphological coordinate frames (e.g., due to image forces) enforced displacements in the global model coordinate frame, while the resulting nodal displacements of the top-layer model acted as across-level spring forces on the link nodes of the morphological shape models.
The search for the independent optimum pose parameters (of the structural model as well as HG and HG+SI, resp.) was based on a quasi stochastic sampling of an a priori constrained search space. To define the constraints, we used the Talairach-transformed flat map coordinate system [40, 42] and asked an expert to annotate an example and specify possible variations in the location, size and orientation of the anatomical structures. This information was encoded in terms of pose parameter distribution functions.
Young's modulus was 2.0, Poisson ratio was 0.4 and material density was set to 1.0 for all models.
At each iteration of the deformable model search, a population of 100 model instances was initialized with random affine parameterizations and fitted to the data. Randomness was introduced into this evolutionary process by employing a rank selection of fitted instances. Reproductive success varied with the relative "fitness", i.e. quality, of instances. Poor fitting results were deleted, and Gaussian noise was added to the pose parameters of the best fitting instances to simulate the "mutation" in the reproduction step and initialize new deformable model instances. This evolution process ends if no improvement in overall fitting quality was observed (typically after 3 steps).
Using just the HG model without guiding structures and constraints on possible poses resulted in a 5% success. By constraining the search space, the correct gyrus was identified in about 50% of all cases.
Using the HG model together with SF and STS resulted in 80% success which can be improved to more than 90% if model parameters have been trained.
This part-based model could be directly used (1) to compute precise segmentations of HG with less than 3 mm error compared with manual segmentations of HG and (2) to classify the given cortical surfaces correctly with respect to the presence of a sulcus intermedius.
Using a single-layer model that comprises HG, STS and SF instead of the multiple layer model led to a significant decrease in result quality which did not improve by training. This means that the relevant anatomical information was better captured by the structural decomposition and deformation parameters.
The method is very robust to changes in the parameterization.
This example shows that domain knowledge, such as the structural arrangement of landmark structures that predict the location of a highly variable object of interest within data that carry poor semantics, can be directly formulated in the form of efficient constraints of a hierarchical, deformable model. This is very interesting in that one could expect that such a model provides a better symbolic representation of the "true" object anatomy and anatomical variability than a model that is learnt from annotated training data. Moreover, the expert knowledge can be more efficiently improved by training (e.g., of correct poses) and expressed with less effort (e.g., by annotating a single example and "painting" connecting relations such as "parallel to" and "contained in" between the different structures, by accepting good solutions, applying interactive forces during the model fit or by correcting the deformed shape of poorly fitted model instances).
Segmenting the Substantia Nigra in transcranial sonography
Transcranial sonography (TCS) produces ultrasound images of the brain that are acquired by imaging through the temporal bone window. The mesencephalic brain stem, or midbrain, containing the substantia nigra (SN) is visible in TCS images, although image quality compared to regular ultrasound images is poor.
Appearance nodes of all parts have a similar potential function: SN nodes reacted on high echodensity, since the SN typically produces more reflections than surrounding midbrain tissue. The internal nodes of the SN models sample a Gaussian low pass filtered version I* = G σ ∗ I of the image I (where σ = 2.0), and the intensity forces are f = κ∇I*, with κ > 0. For the echopoor midbrain, we expect low intensities in the interior and let κ < 0.
The boundary FE nodes did not rely on intensity gradients since these were extremely unreliable. Instead, we computed at each iteration of the model fit a balloon-force f b = κ b n in the model instances' current contour normal direction that pulls the nodes towards more robustly estimated inter-tissue boundaries. The magnitude and sign κ b of the balloon force was defined using regional texture information. We dynamically computed an optimal discriminant between "object" and "background" based on statistics over image intensity samples taken from the inside of the model and from the background. The sampling regions were defined in the image in inward and outward normal directions at the boundary nodes of the model instance.
Young's modulus was 0.9, Poisson ratio was 0.25 and material density was set to 0.9 for all models
As in the previous section, a deformable shape search computed the best fitting shape by simultaneous optimization of multiple two-layer model instances. In this application, however, the search was performed sequentially on the global (midbrain localization) and local context (detection and segmentation of the SN). After finding the best fitting instance of the midbrain model, multiple instances with different parameterizations of the SN models were aligned to it and matched to the data to detect the hyperintense SN regions. This sequential process was necessary because the SN appears as a stripe-like structure on both branches of midbrain, but regularly exhibits the same echotexture as the adjacent brain tissue. That is, the midbrain serves as "guiding structure" for the contained SN, but for a given TCS image it is neither known in advance whether the SN is clearly visible, nor how much an echodense pattern of the SN extends. The fit of SN models should not influence the deformation of the midbrain model instance.
Successful detection means that the true boundary of the SN will be in the vicinity of the boundary estimate by the SN sub-shapes. A final segmentation should make use of this information (e.g., shape-driven level sets  or graph cuts ). In the last step of our algorithm, the boundaries of the two deformed SN templates were taken as initial placement of Active Contour Models, which are then locally deformed to precisely adapt to the SN boundaries in the TCS image. The template contours were re-sampled to increase the sampling density and flexibility of the contour models. Internal forces and external balloon forces were set up as described in  and above (see  for details).
Detection of human vertebrae in MRI
Investigation of normal variation of the anatomy of the spine and its vertebrae is one of the research questions within SHIP (Study of Health in Pommerania, ). We use MRI from SHIP to detect the course of vertebrae of the lower back. More than 40 different MRI image sequences have been acquired within SHIP from several thousands of subjects in Pommerania. Two of the sequences - a T1-weighted and a T2-weighted sequence - showing spine and vertebrae were used for the detection task using our model-based approach. Although vertebrae detection and segmentation focuses on radiographs and CT images, MRI-based analysis has been the subject of research in medical image analysis as well . The domain knowledge in existing methods is often represented by a specific combination of processing modules where model information is inserted at several stages. Our goal was to investigate whether this can be replaced by our deformable part model. The advantage would be that adapting the detection to some other application would solely relate to model generation without having to change modules of the search process.
Global optimization was not used since initialization is simple for the given data. The user places the model instance in a sagittal view on the middle slice of the image sequence. Optimization then takes place by model deformation based on local image attributes.
For each of the two shape models, the vertebra and the spine, a weighted combination of the T1-weighted and the T2-weighted image was computed as appearance input (Figure 3 shows computation of the vertebra potential function). Weights for each of the two models were determined a priori and produced a clearly recognizable local minimum for vertebra and spine appearance, respectively.
We used a single heterogeneous model, since the sub-shapes formed a common unit (the spine model) for which, e.g., vibration modes are computed and selected. The restriction that sub-shapes cannot rotate independently around a single connection was not critical. It was even a desired attribute since it very well reflected anatomical relations between sub-shapes. Young's modulus was 1.0, Poisson ratio was 0.0 and material density was set to 0.1 for all models.
Model generation from the sample shape was by Delauney triangulation from evenly distributed sample nodes. It produces a set of "well-shaped" tetrahedrons. Computing the rotation using node warping was solved by minimizing equation 9. Total computation time until convergence on a standard quad-core CPU was between 1.1 and 2.6 seconds per case with an average computation time of 1.5 seconds.
Intra- and inter-cluster distances for shape clustering of the adapted spine model.
We presented a part-based deformable model that can be used for directly and efficiently representing domain knowledge necessary for detection of objects in medical images. Different examples demonstrate its application to context-based detection, model-based segmentation, and to shape analysis.
Structural decomposition of the model depends on part-relationships that can be derived from knowledge about the anatomy or arrangement objects to be detected. In medical applications, parts are typically anatomic substructures or a group of neighboring landmark structures that are necessary to determine an object's position.
Elasticity parameters describe the variability between shape instances and - on the structural level - variability of spatial relations between sub-shapes.
External forces are set depending on the expected appearance of the object in the image. While gradient-based potential forces often allow for exact determination of the object boundary, they may be supplemented by intensity or texture-based forces if gradient information is unreliable.
In future we will investigate the potential for parameter training of the model. Training of the few parameters in the data and the model terms should be much simpler requiring less training data than training of a PDM. An alternative to adapt parameters would be to take the detection result as an initialization for the search of an optimal potential field given the converged model instance instead of trying to improve the overall performance of the model. This may also enhance the potential of the model to guide data-driven segmentation schemes such as level set segmentation or graph cuts. It will also be worthwhile to take a second look at the analysis of shape parameters for the detection of shape classes using adapted model parameters.
We thank the Leibniz Institute for Neurobiology Magdeburg, Ludwig Niehaus, formely of the Clinic for Neurology of the University Hospital Magdeburg, and the Study of Health in Pomerania (SHIP) for providing the image data used in the experiments.
Funding for this article came from the DFG priority program 1335: Scalable Visual Analytics (grant TO 166/13-2).
This article has been published as part of BioMedical Engineering OnLine Volume 13 Supplement 1, 2014: Selected articles from the 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society: Workshop on Current Challenging Image Analysis and Information Processing in Life Sciences. The full contents of the supplement are available online at http://www.biomedical-engineering-online.com/supplements/13/S1
- Hill A, Taylor CJ: Model-based image interpretation using genetic algorithms. Image and Vision Computing 1992,10(5):295–300. 10.1016/0262-8856(92)90045-5View ArticleGoogle Scholar
- Gloger O, Toennies KD, Liebscher V, Kugelmann B, Laqua R, Völzke H: Prior shape level set segmentation on multistep generated probability maps of MR datasets for fully automatic kidney parenchyma volumetry. IEEE Trans Medical Imaging 2012,31(2):312–325.View ArticleGoogle Scholar
- Stegmann MB, Ersbøll BK, Larsen R: FAME--a flexible appearance modeling environment. IEEE Trans Medical Imaging 2003,22(10):1319–1330. 10.1109/TMI.2003.817780View ArticleGoogle Scholar
- Hojjatoleslami SA, Kittler J: Region growing: a new approach. IEEE Trans Image Processing 1998,7(7):1079–1084. 10.1109/83.701170View ArticleGoogle Scholar
- McInerney T, Terzopoulos D: Deformable models in medical image analysis: a survey. Medical Image Analysis 1996,1(2):91–108. 10.1016/S1361-8415(96)80007-7View ArticleGoogle Scholar
- Hamarneh G, Yang J, McIntosh C, Langille M: 3D live-wire-based semi-automatic segmentation of medical images. In Proc. SPIE 5747, Medical Imaging 2005: 12–17 February 2005; San Diego. Edited by: Fitzpatrick JM, Reinhardt JM. International Society for Optics and Photonics; 2005:1597–1603.Google Scholar
- Manniesing R, Viergever MA, Niessen WJ: Vessel enhancing diffusion: A scale space representation of vessel structures. Medical Image Analysis 2006,10(6):815–825. 10.1016/j.media.2006.06.003View ArticleGoogle Scholar
- Tao Y, Lu L, Dewan M, Chen AY, Corso J, Xuan J, Salganicoff M: Multi-level ground glass nodule detection and segmentation in CT lung images. In Medical Image Computing and Computer-Assisted Intervention - MICCAI 2009: 20–24 September 2009; London; LNCS. Volume 5762. Edited by: Yang GZ, Hawkes D, Rueckert D, Noble A, Taylor C. London: Springer; 2009:715–723.View ArticleGoogle Scholar
- Lalonde M, Beaulieu M, Gagnon L: Fast and robust optic disc detection using pyramidal decomposition and Hausdorff-based template matching. IEEE Trans Medical Imaging 2001,20(11):1193–1200. 10.1109/42.963823View ArticleGoogle Scholar
- Golemati S, Stoitsis J, Sifakis EG, Balkizas T, Nikita KS: Using the Hough transform to segment ultrasound images of longitudinal and transverse sections of the carotid artery. Ultrasound in Medicine & Biology 2007,33(12):1918–1932. 10.1016/j.ultrasmedbio.2007.05.021View ArticleGoogle Scholar
- Friedman D, Zhang T: Interactive graph cut based segmentation with shape priors. In IEEE Conf Computer Vision and Pattern Recognition CVPR 2005: 20–25 June 2005; San Diego. Edited by: Schmid C, Soatto S, Tomasi C. Los Alamitos: IEEE Computer Society Press; 2005:755–762.Google Scholar
- El-Zehiry N, Elmaghraby A: Graph cut based deformable model with statistical shape priors. In 19th IEEE Intl Conf Pattern Recognition ICPR 2008: 8.-11. December 2008. Tampa. IEEE; 2008:1146–1149.Google Scholar
- Veksler O: Star shape prior for graph-cut image segmentation. In European Conference on Computer Vision ECCV 2008: 12–18 October 2008; Marseille; LNCS. Volume 2351. Edited by: Forsyth D, Torr P, Zisserman A. London: Springer; 2008:78–92.Google Scholar
- Rousson M, Paragios N: Shape priors for level set representations. In European Conference on Computer Vision ECCV 2002: 7 May - 2 June 2002; Copenhagen. LNCS. Volume 2351. Edited by: Heyden A, Sparr G, Nielsen M, Johansen P. London: Springer; 2002:78–92.View ArticleGoogle Scholar
- Cremers D, Osher SJ, Soatto S: Kernel density estimation and intrinsic alignment for shape priors in level set segmentation. Intl J Computer Vision 2006,69(3):335–351. 10.1007/s11263-006-7533-5View ArticleGoogle Scholar
- Cootes TF, Taylor CJ: Active shape models -- 'smart snakes'. In British Machine Vision Conference BMVC92: 22–24 September 1992. Edited by: Hogg D, Boyle R. London: Springer; 1992:266–275.Google Scholar
- Cootes TF, Taylor CJ, Cooper DH, Graham J: Active shape models - their training and application. Computer Vision and Image Understanding 1995,61(1):38–59. 10.1006/cviu.1995.1004View ArticleGoogle Scholar
- Cootes TF, Hill A, Taylor CJ, Haslam J: Use of active shape models for locating structures in medical images. Image and Vision Computing 1994,12(6):355–365. 10.1016/0262-8856(94)90060-4View ArticleGoogle Scholar
- van Ginneken B, Frangi AF, Staal JJ, ter Haar Romeny BM, Viergever MA: Active shape model segmentation with optimal features. IEEE Trans Medical Imaging 2002,21(8):924–933. 10.1109/TMI.2002.803121View ArticleGoogle Scholar
- Ruppertshofen H, Lorenz C, Schmidt S, Beyerlein P, Salah Z, Rose G, Schramm H: Discriminative Generalized Hough transform for localization of joints in the lower extremities. Computer Science - Research and Development 2011,26(1–2):97–105. 10.1007/s00450-010-0137-xView ArticleGoogle Scholar
- Cohen LD, Cohen I: Finite-element methods for active contour models and balloons for 2-D and 3-D images. IEEE Trans Pattern Recognition and Machine Intelligence 1993,15(11):1131–1147. 10.1109/34.244675View ArticleGoogle Scholar
- Dornheim L, Toennies KD, Dixon K: Automatic segmentation of the left ventricle in 3d SPECT data by registration with a dynamic anatomic model. In Medical Image Computing and Computer-Assisted Intervention - MICCAI 2005: 26–29 October 2005; Palm Springs; LNCS. Volume 3749. Edited by: Duncan J, Gerig G. London: Springer; 2005:335–342.View ArticleGoogle Scholar
- Shen T, Li H, Huang X: Active volume models for medical image segmentation. IEEE Trans Medical Imaging 2011,30(3):774–791.View ArticleGoogle Scholar
- Felzenszwalb PF, Huttenlocher DP: Pictorial structures for object recognition. Intl J Computer Vision 2005,61(1):55–79.View ArticleGoogle Scholar
- Wu B, Nevatia R: Detection of multiple, partially occluded humans in a single image by Bayesian combination of edgelet part detectors. In Intl Conf Computer Vision - ICCV 2005: 17–21 October 2005. Beijing. IEEE Computer Society; 2005:90–97.Google Scholar
- Crandall DJ, Huttenlocher DP: Weakly supervised learning of part-based spatial models for visual object recognition. In Eur. Conf. Computer Vision ECCV 2006: 7–13 May 2006; Graz; LNCS. Volume 3951. Edited by: Leonardis A, Bischof H, Pinz A. Berlin Heidelberg: Springer; 2006:16–29.View ArticleGoogle Scholar
- Bajcsy R, Lieberson R, Reivich M: A computerized system for the elastic matching of deformed radiographic images to idealized atlas images. J Comp Ass Tomography 1983,7(4):618–625. 10.1097/00004728-198308000-00008View ArticleGoogle Scholar
- Petyt M: Introduction to finite element vibration analysis. Cambridge University Press; 1998.Google Scholar
- Oh SI, Altan T: Metal forming and the finite-element method. Oxford University Press; 1989.Google Scholar
- Sclaroff S: Deformable prototypes for encoding shape categories in image databases. Pattern Recognition 1997,30(4):627–641. 10.1016/S0031-3203(96)00108-2View ArticleGoogle Scholar
- Sclaroff S, Pentland AP: Modal matching for correspondence and recognition. IEEE Trans Pattern Analysis and Machine Intelligence 1995,17(6):545–561. 10.1109/34.387502View ArticleGoogle Scholar
- Cootes TF, Taylor CJ: Combining point distribution models with shape models based on finite element analysis. Image and Vision Computing 1995,13(5):403–409. 10.1016/0262-8856(95)99727-IView ArticleGoogle Scholar
- Choi MG, Hyeong-Seok K: Modal warping: Real-time simulation of large rotational deformation and manipulation. IEEE Trans Visualization and Computer Graphics 2005,11(1):91–101. 10.1109/TVCG.2005.13View ArticleGoogle Scholar
- Müller M, Gross M: Interactive virtual materials. Proc Graphics Interface GI'04 2004, 239–246.Google Scholar
- Byers R, Xu H: A new scaling for Newton's iteration for the polar decomposition and its backward stability. SIAM J Matrix Analysis and Applications 2008,30(2):822–843. 10.1137/070699895MathSciNetView ArticleGoogle Scholar
- Engel K, Brechmann A, Toennies KD: A two-level dynamic model for the representation and recognition of cortical folding patterns. In 19th IEEE Intl Conf Image Processing ICIP2005: 11–14 September 2005. Genoa. IEEE Press; 2005:297–300.Google Scholar
- Engel K, Toennies KD: Hierarchical vibrations for part-based recognition of complex objects. Pattern Recognition 2010,43(8):2681–2691. 10.1016/j.patcog.2010.02.009View ArticleGoogle Scholar
- Rak M, Engel K, Toennies KD: Closed-form hierarchical finite element models for part-based object detection. In Vision, Modeling, and Visualization 2013: 11–13 September 2013; Lugano. Edited by: Bronstein M, Favre J, Hormann K. Eurographics Association; 2013.Google Scholar
- Engel K, Toennies KD: Segmentation of the midbrain in transcranial sonographies using a two-component deformable model. Annals of the BMVA 2009,2009(4):1–13.Google Scholar
- Engel K, Toennies KD, Brechmann A: Part-based localisation and segmentation of landmark-related auditory cortical regions. Pattern Recognition 2011, 44: 2017–2033. 10.1016/j.patcog.2010.09.004View ArticleGoogle Scholar
- Fischl B, Sereno MI, Dale AM: Cortical surface-based analysis: II: Inflation, flattening, and a surface-based coordinate system. Neuroimage 1999,9(2):195–207. 10.1006/nimg.1998.0396View ArticleGoogle Scholar
- Tailarach J, Tournoux P: A Coplanar Stereotaxic Atlas of the Human Brain: An Approach to Medical Cerebral Imaging. Thieme, New York; 1988.Google Scholar
- Berg D, Jabs B, Merschdorf U, Beckmann H, Becker G: Echogenicity of substantia nigra determined by transcranial ultrasound correlates with severity of parkinsonian symptoms induced by neuroleptic therapy. Biol Psychiatry 2001,50(6):463–467. 10.1016/S0006-3223(01)01190-8View ArticleGoogle Scholar
- Hishida H, Suzuki H, Michikawa T, Ohtake Y, Oota S: CT image segmentation using FEM with optimized boundary condition. PLoS One 2012,7(2):e31116. 10.1371/journal.pone.0031116View ArticleGoogle Scholar
- Lobregt S, Viergever MA: A discrete dynamic contour model. IEEE Trans Medical Imaging 1995,14(1):12–24. 10.1109/42.370398View ArticleGoogle Scholar
- Völzke H, Alte D, Schmidt CO, Radke D, Lorbeer R, Friedrich N, Aumann N, Lau K, Piontek M, Born G, Havemann C, Ittermann T, Schipf S, Haring R, Baumeister SE, Wallaschofski H, Nauck M, Frick S, Arnold A, Jünger M, Mayerle J, Kraft M, Lerch MM, Dörr M, Reffelmann T, Empen K, Felix SB, Obst A, Koch B, Gläser S, et al.: Cohort profile: The study of health in Pomerania. Intl J Epidemiology 2011,40(2):294–307. 10.1093/ije/dyp394View ArticleGoogle Scholar
- Huang SH, Chu YH, Lai SH, Novak CL: Learning-based vertebra detection and iterative normalized-cut segmentation for spinal MRI. IEEE Trans Medical Imaging 2009,28(10):1595–1605.View ArticleGoogle Scholar
- Klemm P, Lawonn K, Rak M, Preim B, Tönnies KD, Hegenscheid K, Völzke H, Oeltze S: Visualization and analysis of lumbar spine canal variability in cohort study data. In Vision, Modeling, and Visualization 2013: 11–13 September 2013; Lugano. Edited by: Bronstein M, Favre J, Hormann K. Eurographics Association; 2013.Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.