This prototype system is designed to take a cancer patient's head and neck CT images as input and use image registration techniques to produce projections of lymph node regions as output, which can be used to produce a CTV for the radiation treatment plan. The system can be divided into the following major components:
Figure 2 is the flow chart which shows how these components are linked. The offline process on the left creates a database DB of CT scans from prototypical reference patients on which experts have drawn contours that denote the lymph regions. These reference images {d
i
} are segmented offline to extract 3D volumes of landmark anatomical structures such as the mandible and hyoid [5]. 3D meshes and geometric properties of these 3D volumes are also stored in the database.
Image registration
Given a reference model and a target patient's CT data set, we can use image registration methods to align the sets of CT images. Image registration is commonly used in medical imaging applications. It is essentially a process of finding a geometric transformation g between two sets of images, which maps a point x in one image-based coordinate system to g(x) in the other. By assuming the head and neck anatomy has similar characteristics between a specific target patient and a reference subject, we can use image registration methods to transform a region from the reference image set to the target image set.
Mattes and Haynor [36] implemented a multi-resolution non-rigid (deformable) image registration method using B-splines and mutual information. The transformation of a point x = [x, y, z]T in the reference image coordinate system to the test image coordinate system is defined by a 3 × 3- homogeneous rotation matrix R, a 3-element transformation vector T and a deformation term D(x|δ):
where x
C
is the center of the reference volume. A rigid body transformation defined by R and T was first calculated and used as the initial transformation for the deformation process. The deformation term D(x|δ) gives an x-, y-, and z- offset for each given x. Hence the transformation parameter vector μ becomes
The first three parameters γ, θ, ϕ are the roll-pitch-yaw Euler angles of R. The translation vector T is defined by [t
x
, t
y
, t
z
]T. T and R together define the rigid body transformation.
The parameter δ
j
is the set of the deformation coefficients. The deformation was modeled on cubic B-splines [37] because of their computational efficiency (via separability in multidimensional expression), smoothness, and local control. The deformation is defined on a sparse, regular grid of control points λ
j
, each having associated x-, y-, and z-components of the deformation. The resolution
of the deformation determines the spacing of the grid and can be anisotropic. Mattes uses control points on a uniform grid with spacing
where q
x
, q
y
, and q
z
are the dimensions of the reference image.
The deformation at any point x = [x, y, z]Tin the reference image is interpolated using a cubic B-spline convolution:
By displacing the control points, intermediate deformation values are computed by cubic spline interpolation between them.
Because contract enhancement is often used in head and neck CT scans, image intensity range and distribution may vary in different data sets. A mutual information based registration method such as Mattes' can work with images in different range. However, the high anatomical variability in cancer patients' head and neck CT images particularly contributed by the surgical resections, pure voxel intensity based registration such as Mattes' method described in previous section does not always produce satisfactory results. A new method [6] is developed which integrates landmark based information with intensity scheme; hence CT data set need to be preprocessed to extract landmark information.
Segmentation and landmark correspondence
Fully automatic segmentation in the neck region is particularly difficult, because many soft tissue anatomic entities are small in size and similar in density. Furthermore, they can be directly adjacent to each other or only divided by fascial layers that are not visible in CT images. The relative locations between anatomic entities can vary in different axial locations. Little work has been done specifically for the neck images; few exceptions include the work of Krugar et al. [10] who implemented a semi-automatic system to segment neck CT images for pre-operative planning of neck dissections; and the work of Cordes et al. [38] who developed NeckVision system for neck dissection planning. We implemented an automatic segmentation method [5] designed to locate anatomic structures in the neck that are relevant to the lymph node region boundaries including cervical spine, mandible, hyoid, jugular veins, and carotid arteries. This method is motivated by a knowledge-based technique [39] using consists of constraint-based dynamic thresholding, negative shape constraints to rule out infeasible segmentation, and progressive landmarking that takes advantage of the different degrees of certainty of successful identification of each structure.
The drawback of this 2D thresholding approach is the difficulty in determining the optimal threshold, especially in the head and neck region where many structures of similar density are crowded in a tight space. Our proposed method eliminates the need of finding the optimal threshold by combining the 2D thresholding with a 3D active contour procedure (by ITK, www.itk.org). A sub-optimal threshold that only produces partial structure on some axial slices can grow into a 3D volume through 3D active contouring. The 2D regions produced by the knowledge-based dynamic thresholding method can be used as the initial region or "seed" to eliminate the need of user input. In the context of finding relevant landmarks, it is not necessary to fully segment certain structures such as the blood vessels and their branches as we are only interested in sections lateral to the hyoid. This lax requirement circumvents the need of perfect segmentation and optimal active contouring parameters. These selected structures usually have clear contour allowing successful segmentation within the section of axial slices of interests.
The anatomical structures of interests are segmented in the order according to the reliability of successful detection. In addition to domain knowledge of gray tone range, size and shape, each structure is also associated with location relative to other structures that can be found prior to itself. For each structure, we first run the dynamic thresholding process to find 2D regions in axial slices according to domain knowledge. Note that some axial slices may yield successful 2D segmentation results and others do not, we then the 3D active contouring to build a 3D model using the partial 2D segmentation result as the initial seed. Also note that the active contouring may not grow the structure perfectly to its full extent, but sufficient to provide landmark information. This two step process is then repeated for the next structure. 3D surface meshes of those structures are constructed for both the reference and target subjects. The meshes can be used as landmarks to improve the alignment and to measure similarity between the subjects. Figure 3 shows examples of segmentation results on selected subjects.
We choose to use these 3D anatomic structure surfaces as landmarks because virtually no 1D (points) or 2D (lines) anatomic features that can be used as landmarks are defined in the neck. By using Shelton's method [40] of finding surface mesh correspondence, we can estimate correspondence between the surfaces of the landmark anatomic structures of the reference and target subjects. The following energy function for which smaller values indicate better correspondences is defined to evaluate possible correspondence relations:
where C is the function that maps points on surface A to matching points on surface B, α and β are weight parameters, E
sim is the similarity term which measures how closely C matches points on A to points on B, E
str is the structural term that minimize the distortion of surface A, and E
pri is the "prior information" term which ensures C represent a plausible deformation. Initial values of α and β are set to 0.001 and 0.0001 respectively.
Let υ
k
be the set of landmark points sampled from the mandible, hyoid and other surface meshes of the reference image set or surface A, and θ
k be their corresponding locations on surface C(A) or the transformed surface which matches the test image set and minimize the energy function. The deformation ζ at those landmark points is simply
Image registration with landmark correspondence
Given the anatomic structure surface correspondence between the reference and target data, we developed a method to incorporate these landmarks into the image registration process. Instead of initializing the deformations at the control points to zeros or random numbers as in the Mattes method, we can use the landmark correspondence to initialize or adjust the deformations at the control points at each of the multi-resolution stages. The deformation control points are set to a uniform grid
where 0 ≤ l ≤ ρ
x
, 0 ≤ m ≤ ρ
y
, 0 ≤ n ≤ ρ
z
, and the corresponding deformation values D(λ
j
) are either initially set to zero or calculated from the deformation coefficients δ
j
of the previous iteration at a lower resolution of control points as in equation (5).
Given ν
k
and ϖ
k
as sets of corresponding landmarks in the reference and target images described in equation (7), the deformation of each control point that has landmark points in close proximity is modified to the deformation of the closest landmark point as follows
where υ
k
is the closest landmark point to λ
j
in the reference image set, and ζ
k
is the deformation at υ
k
obtained from the surface correspondence in equation (7). A new set of deformation coefficients δ is then set to the spline coefficients of the new grid of deformation values D'(λ). While the new D' might not be initially smooth, the following mutual information registration will mitigate the side effect. Finally the transformation parameter vector μ is input to the optimizer for alignment.
Similarity measurement based on anatomical structure geometry
As we need to compare similarity between anatomic structures from different patients, we do so by measuring the errors between structure surfaces using the 3D Hausdorff distance [41]. Given two surface meshes, S
R
and S
T
, the distance between a point p
R
belonging to S
R
and the mesh S
T
can be defined as follows:
We first align meshes S
R
and S
T
with the Iterative Closest Point (ICP) rigid body registration [42] so they are roughly in the same 3D coordinate. Given the 3D point sets P
R
= {p
i
} containing the n vertices of S
R
, the registration process will produce a transformation matrix T which minimizes the function
The transformed reference mesh T S
R
consists of vertices {T
pi
}, and the Hausdorff distance between T S
R
and S
T
is given by
A database DB of CT scans is created from prototypical reference subjects on which experts have drawn "ground truth" contours that denote the lymph node regions. These reference data sets {d
i
} are segmented offline to extract 3D volumes of identifiable structures, including the mandible, hyoid, jugular veins and the outer body contour that are relevant to the boundaries for the lymph node regions [5]. We use these landmark anatomic structures to rapidly produce a distance metric between a target CT scan as query Q and each data set d in DB. The feature vectors that we use to compare two CT scans include three kinds of features: 1) simple numeric 3D regional properties of these structures, such as volume and extents, 2) vector properties or the relative location between structures and 3) shape properties or the surface meshes of these structures. The feature vector consists of the following properties,
-
volume and extents of the overall head and neck region,
-
surface meshes of the mandible and outer body contour,
-
3D centroid difference vector between mandible and hyoid,
-
2D centroid difference vectors between hyoid and jugular veins, and between hyoid and spinal cord on the axial slice at the centroid of the hyoid,
-
normalized centroid locations of the hyoid and the mandible within the region.
Given feature vectors F
d
and F
Q
for model d and query Q in the feature vector space RN, the following weighed Euclidean distance is used as the distance measure:
where w
i
is the weight parameter, d
i
is the distance function for feature i,
d
h
is the Hausdorff distance defined in equation (12), and T is the ICP registration transformation matrix. The weight parameters range from 10 to 0.1 from heavy to light in the order of: 1) hyoid locations (normalized and relative to other structures), 2) mandible distance (between two models), 3) vertical distances between skull base, mandible and, hyoid, 4) head and neck outer contour and volume, and so on.
The distance between mandible meshes of two subjects is one major discriminating feature of the proposed distance measure. Figure 4 shows the measurement of point to surface distance d as in Equation (10), from which the directional Hausdorff distance d
h
between the reference mandible surface mesh to the target mesh is derived. The mesh on the left with shading indicates the distance from a given point on the surface to the mesh on the right.