Radiotherapy dose distribution prediction for breast cancer using deformable image registration

Background Radiotherapy treatment planning dose prediction can be used to ensure plan quality and guide automatic plan. One of the dose prediction methods is incorporating historical treatment planning data into algorithms to estimate the dose–volume histogram (DVH) of organ for new patients. Although DVH is used extensively in treatment plan quality and radiotherapy prognosis evaluation, three-dimensional dose distribution can describe the radiation effects more explicitly. The purpose of this retrospective study was to predict the dose distribution of breast cancer radiotherapy by means of deformable registration into atlas images with historical treatment planning data that were considered to achieve expert level. The atlas cohort comprised 20 patients with left-sided breast cancer, previously treated by volumetric-modulated arc radiotherapy. The registration-based prediction technique was applied to 20 patients outside the atlas cohort. This study evaluated and compared three different approaches: registration to the most similar image from a dataset of individual atlas images (SIM), registration to all images from a database of individual atlas images with the average method (WEI_A), and the weighted method (WEI_F). The dose prediction performance of each strategy was quantified using nine metrics, including the region of interest dose error, 80% and 100% prescription area dice similarity coefficients (DSCs), and γ metrics. A Friedman test and a nonparametric exact Wilcoxon signed rank test were performed to compare the differences among groups. The clinical doses of all cases served as the gold standard. Results The WEI_F method could achieve superior dose prediction results to those of WEI_A. WEI_F outperformed SIM in the organ-at-risk mean absolute difference (MAD). When using the WEI_F method, the MAD values for the ipsilateral lung, heart, and whole lung were 197.9 ± 42.9, 166 ± 55.1, 122.3 ± 25.5, and 55.3 ± 42.2 cGy, respectively. Moreover, SIM exhibited superior prediction in the DSC and γ metrics. When using the SIM method, the means of the 80% and 100% prescription area DSCs, 33γ metric, and 55γ metric were 0.85 ± 0.05, 0.84 ± 0.05, 0.64 ± 0.13, and 0.84 ± 0.10, respectively. The plan target volume and spinal cord MAD when using SIM and WEI were 235.6 ± 158.4 cGy versus 227.4 ± 144.0 cGy (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p > 0.05$$\end{document}p>0.05) and 61.4 ± 44.9 cGy versus 55.3 ± 42.2 cGy (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p > 0.05$$\end{document}p>0.05), respectively. Conclusions A predicted dose distribution that approximated the clinical plan could be generated using the methods presented in this study.


Background
Breast cancer is the most common malignancy in women, ranking first in female morbidity and mortality [1]. The main treatment method of early-stage breast cancer is multidisciplinary comprehensive treatment consisting of surgery, chemotherapy and radiotherapy [2,3]. Regarded as a standard treatment method for early-stage breast cancer, total breast irradiation after breast-conserving surgery can effectively improve local control rate and long-term survival rate [4].
Although two-dimensional tangent irradiation and three-dimensional conformal radiotherapy have been confirmed appropriate therapy for breast cancer, new techniques are developed to achieve better target coverage and normal tissues spare. New irradiation methods, intensity-modulated radiotherapy (IMRT) and volumetric-modulated arc therapy (VMAT) have been introduced. Compared to former techniques, the IMRT and VMAT can deliver a more uniform and conformity dose distribution inside the target and a superior protection of organs of interest (OARs) [5,6]. The quality of the treatment plan depends on the algorithms of the treatment planning system (TPS), the delivery techniques, and in particular, the experience and skill of the planner [7].
Traditionally, the criteria of the dose limitation and target coverage of normal tissues have followed the Radiation Therapy Oncology Group guidelines, which use a set of uniform standards for patients with same-sited tumors. However, owing to the large anatomy variations among individuals, while uniform standards may be suitable for several patients, they are always either easily exceeded or unachievable for others. Planners generally require sufficient trail-and-error time to avoid the risk of degraded plan quality and to generate an optimal plan. Moreover, peer review is recommended to ensure the consistency of the plan quality among different planners and institutions [8]. Although such procedures can effectively improve the quality of radiotherapy, with the development of new technology, such as remote radiotherapy, a rapid and an automatic individualized evaluation tool is urgently required.
To improve the current situation of patient-specific achievable criteria deficiencies, several studies have focused on incorporating historical treatment planning data that were considered to have achieved expert level into algorithms to estimate the organ dose for new patients [9][10][11][12][13][14][15]. These researches explain that predicted dose values could be used to estimate the plan quality and guide-automated treatment planning. However, most of those studies were devoted to predicting the dose-volume histogram (DVH), which is used extensively in treatment plan quality and radiotherapy prognosis evaluation, but remains limited. At present, oncologists and physicists are attempting to describe the radiation effects more explicitly, using models related to three-dimensional (3D) dose distribution geometry, such as normal tissue complication probability [16] and 3D cluster formation [17].
In recent years, several studies have been concerned with the prediction of dose distribution from patient computed tomography (CT) images and regions of interest (ROIs) [18][19][20][21][22][23][24]. Shiraishi and Moore used several voxel parameters that indicate the volume and location of ROIs to train artificial neural networks for prostate and stereotactic radiosurgery/radiotherapy cases, and predicted dose distributions at the area nearby plan target volume (PTV) boundary [18]. With the fully convolutional network, contoured voxel matrix and historical dose matrix were input into the network to train the prediction models, the features were selected and extracted automatically by networks. The deep learning-based methods are commonly used to predict 3D dose distribution for prostate, head and lung cancer radiotherapy [19,20,22].
With a well-trained model, deep learning-based methods could predict dose prediction in seconds, but the training process takes a lot of time and needs amount of data. Moreover, the interpretation deficiency of end-to-end deep learning network obstructs its application in clinic. Compared with deep learning-based methods, atlas-based methods could be implemented with less prepared cases, which was more flexible for institutions to build their own atlas dataset for each treatment site. The atlas-based approach has been successfully used in automatic medical image segmentation [25]. McIntosh and Purdie used an atlas-based method to transfer dose distribution from historical plans to a new patient for three sites containing whole breast IMRT, but such methods have not been extensively investigated [19]. Yoganathan and Zhang further elaborate the 3D dose prediction model based on atlas and deformable image registration (DIR) for VMAT breast and prostate cancer patients [21]. They arbitrarily chosen a reference patient from dataset and registered other patients' CT to the chosen one. A test patient was registered to the reference one and the deform vector filed was used to determine the similarity weight between new patient and atlas patients. Thereafter, the predicted dose was obtained using weighted summation of deformed atlas doses.
The purpose of this work was to further investigate the atlas-based dose prediction method for breast cancer. We used three variation atlas-selection models to predict the 3D dose distribution for breast patients, and the predicted results were compared with the clinical plans. The 3D moment invariants (3DMI), which are often used in visual pattern recognition, were first used to select atlas images in this study.

Feature selection
There were a total of 190 CT volume image pairs for all 20 atlas cases. In each pair, one image was defined as the atlas image in turn, and the other was defined as the new image, so that 380 groups could be used for statistical analysis. The predicted dose distribution for the new image was calculated from the atlas image and compared to the clinical plan dose. The results of the Spearman's rank correlation between the geometric features and dosimetric errors are listed in Table 1.
As all 11 features were significantly correlated with most of the dose prediction result measures, none of these was excluded for further analysis. Thereafter, the features were tested using the KMO metric and Bartlett's sphericity test, which yielded results of 0.734 and p = 0.000, respectively, demonstrating that these 11 features exhibited correlation, and factor analysis could be used to decrease the feature number. The number of variables used to characterize the patient geometry was reduced from 11 to 4 by applying factor analysis. Using the Kaiser eigenvalues criterion, four factors were extracted, with eigenvalues of 4.124, 1.843, 1.422, and 1.022. These factors collectively explained 76.47%   19:39 of the variance in all 11 original features. The results of the feature factor analysis are listed in Table 2.
Considering that the features with component values larger than 0.7 were the main components for each factor, F 1 was mainly composed of J 1_diff , J 2_diff , and B 4_diff . As the 3DMIs of the three features represented the PTV shape, F 1 could be interpreted as a PTV shape factor. F 2 was mainly composed of the ipsilateral lung OVH MSD and B 3_diff , and the factor represented the spatial relationship between the PTV and ipsilateral lung. As the main components of F 3 were the PTV length difference and P ovz , this factor could be explained as a PTV consistency factor on the z-axis. The main components of F 4 were the image similarity metric and PTV DSC, in that this factor could be explained as a * Represents significant correlation  global consistency factor. The four factors were then combined into a comprehensive score using Eq. (9).

Dose prediction validation
Three dose prediction algorithms were evaluated. The atlas-selection method was examined using similarity selection (SIM) and all images with a weighted combination (WEI). For the similarity selection, the comprehensive score F was used as the selection gauge. For all images with weighted combinations, the weight was examined using the average weight (WEI_A) and metric weight by means of Eq. (12) (WEI_F). The quantitative MAD results are presented in Table 3. All of the dose error metrics exhibited significant differences among the groups of prediction methods, except for the MAD of the spinal cord. The results of the post hoc analysis of the Wilcoxon signed rank test for the other dose error metrics are presented in Table 3. The results demonstrated that using the WEI_F method could achieve superior dose prediction results to those of WEI_A. Compared to WEI_F, the SIM method exhibited inferior performance in the ipsilateral lung, heart, and lung MADs, but superior performance in the 0.8 DSC, 33γ, and 55γ.

Discussion
Atlas selection and image registration have been successfully applied in automatic image segmentation, providing highly accurate results [26,27]. The new segmentation can be transformed from atlas segmentation by means of the transformation generated from the registration between the new and atlas images. This method is dependent on the strong correlation between medical images and organ segmentation. In radiotherapy, two patients with identical medical images should intuitively be treated with the same plan, and the dose distribution for a special patient depends on the relationship of the target-normal tissue position. However, unlike in image segmentation, many influencing factors exist from the image to planning the dose distribution; for example, the delivery technique, beam orientation, dose computation algorithm, optimized algorithm, and skill of the planner. To limit these influencing factors, patients enrolled in this study were treated with the same delivery technique (VMAT), dose computation algorithm, and optimized algorithm in identical TPSs. Under these circumstances, the most uncertain factors affecting the dose distribution were the planner experience and skill, containing the beam angles, and optimizing the parameter selection. Although the correlation between the medical image and dose distribution was more complex and less direct, the feasibility of using DIR to generate patient-specific dose distributions for radiotherapy was validated in this study.
In this study, three atlas-selection methods were tested for 3D dose prediction. A total of 11 objective similarity metrics for different patients constituted a comprehensive score F , which was used in WEI_F to reduce the probabilistic atlas to a deterministic atlas. The results of WEI_F were superior to those of WEI_A, demonstrating that the F calculation method used in this study was reasonable, and the atlas image that is more similar to the new image should receive a larger weight. WEI_F outperformed SIM in the OAR MADs, with the exception of the spinal cord, while SIM could provide superior prediction in the DSC and γ metrics. The OAR MAD evaluated the area outside the PTV, while the DSC and γ metrics focused more on the high-dose area, which is near and inside the PTV boundary. As the SIM method only used the information of the most similar single atlas images, and WEI_F considered all of the cases in the dataset, the results implied that the dose distribution in the area of the dose higher than 80% of the prescription dose was more strongly related to the patient-specific geometry than the lower dose area. This is reasonable because, in a VMAT plan, dose drops rapidly in the region near the PTV margin, which causes the dose distribution to vary substantially among individual patients. Moreover, our dose prediction methods registered images with PTV constraints, causing the algorithm to focus more on the high-dose area. The dose distribution usually exhibits a steep gradient in the area near the PTV boundary, and becomes flat with the voxels far from the PTV. The MAD of the spinal cord indicated no significant difference among the three methods. Because the spinal cord has a greater distance to the PTV than the other organs investigated in this study, the dose distribution in the spinal cord is more flat and less individual for different patients, and the atlas-selection methods have little impact on the spinal cord dose prediction. Figure 1 presents an example of the comparison of the clinical plan and predicted dose distributions with the corresponding DVH. As WEI_A exhibited inferior performance to WEI_F for all of the evaluation metrics, the WEI_A results were excluded from further discussion. In this case, the SIM and WEI_F methods could provide a dose distribution that approximated that of the clinical plan. The number and quality of cases in atlas database were the crucial factors for predicted results. Constructing an atlas database containing variation images would be helpful for obtaining improved prediction results. To enhance the prediction model capability and obtain more reliable validation results, our future work will focus on collecting more patient cases and investigating the optimal number for the atlas database. Expanding the database could increase the accuracy of the framework with the increased patient diversity.
The general findings of this study are consistent with previous literatures. The summary of breast cancer radiotherapy dose prediction results is shown in Table 4. Although the different patient datasets and treatment protocols that exist prevented direct comparisons with our study, we demonstrated the feasibility of accurately predicting the dose distributions for breast cancer treatment with purposed methods.
To the best of the authors' knowledge, this is the first study on DIR-based dose prediction using 3DMI. Compared with structural similarity index used by Yoganathan and Zhang [21], which was calculated over a 2D slice and was averaged to obtain a 3D metric, the 3DMI could quantify the shape of 3D structure directly and accurately. Beside of the multi-atlas method used in previous studies, our study evaluates the performance of single-atlas method in dose prediction. The results demonstrated the single-atlas performed superior in high-dose area than multi-atlas method.
The 40 involved plans were made by different dosimetrists. Since all plans were satisfying the institutional critical and reviewed by two senior dosimetrists and one oncologist, the variation between planers was supposed few and ignored in the prediction result evaluation. However, for a specific case, subjective fluctuations remained owing to the preferences of each planner. Several different plans could be all considered as achieving expert level. These plans were a series of Pareto optimized plans, which could be composed into the Pareto front [28][29][30]. Therefore, in certain cases, although the predicted dose distribution exhibited errors compared to the clinical plan, it could be acceptable in clinical applications. The feasibility of generating a series of dose distributions on the Pareto front with DIR will be tested and validated in our future work.
There are several limitations to this study. First, the methods were applied on leftbreast cancer only and the validation dataset was limited. The next phase of our study will involve more validation cases, other sites and multiform delivery techniques. Second, although the predicted results were close to the clinical accepted plans, it could not be guaranteed as the best achievable. Further validations were needed to determine whether the predicted results could be used for plan quality control. Third, the

Table 4 Summary of breast cancer radiotherapy dose prediction results
The better results were printed in italics. Notice that the results were based on different patient datasets and treatment protocols

McIntosh and Purdie [19] Yoganathan and Zhang [21]
Our's present study has not tested whether the predicted dose distribution could be transformed to treatment parameters and be used to generate clinical plan. This work is a foundation for further study, and related automatic plan research would be conducted in our future work.

Conclusions
This study investigates the atlas-based dose prediction method for breast cancer treated using VMAT. The 3DMI were first used to select atlas images for dose prediction in proposed method. The method has presented an achievable dose distribution prediction framework based on DIR. With different atlas-selection approaches, SIM exhibited superior prediction in target region while WEI_F outperformed other methods in OARs. Constructing an atlas database containing variation images is helpful for obtaining improved prediction results. The achievable dose distributions that were obtained from proposed prediction framework provided an effective foundation for treatment plan quality assurance, and for guiding automatic planning to reduce the planning time.

Methods
The purpose of this retrospective study was to predict the achievable radiotherapy dose distribution for early-stage left-sided breast cancer patients. The proposed dose prediction framework included three major steps: database building, atlas selection, and deformable registration. In the first step, a dataset was constructed with historical cases containing patient CT images and expert plans. Thereafter, an initial alignment of the new images and all images in the dataset was achieved using a rigid registration method. Finally, the atlas image was selected from the dataset and registered to a new image, and the registration displacement could be used to generate the predicted dose distribution for the new patient.

Database
A total of 40 randomly selected patients with left-sided breast cancer (stage T1M0N0), previously treated using VMAT at Zhejiang Cancer Hospital (Zhejiang, China) from 2016 to 2018, were enrolled in this study. Clinical treatment plans were generated for all patients using the RayStation (RaySearch Laboratories, Stockholm, Sweden) TPS, with 6 MV X-rays on a Trilogy Linear Accelerator (Varian Medical Systems, Palo Alto, USA). The prescription dose was 5000 cGy (200 cGy/fraction); the oncologist delineated the clinical target volume (CTV), PTV, heart, left lung, whole lung, and spinal cord. The CTV encompassed the visible breast tissue and tumor bed. The PTV was constructed by adding a 10-mm margin to the CTV for all plans. All PTVs were clipped 5 mm from the skin surface. The VMAT plans were delivered using double arcs with a gantry spacing of 4° between control points, and the beam range was adapted to the patient-specific situation. The basic characteristics of patients, contours, and plans are listed in Table 5. All plans were optimized further using a trial-and-error process to achieve optimal sparing of normal tissues. Two experienced dosimetrists and one senior oncologist at Zhejiang Cancer Hospital reviewed the plans. Normal tissues in all plans conformed the as low as achievable principle. In this study, 20 cases were chosen randomly and comprised the library of previously planned cases (atlases). Other 20 cases were used as a test set for validation. The data involved in this study were categorized into three 3D matrices: CT image, ROI labels, and dose distribution. It was achieved using functions in the Sim-pleITK of python [31,32].

Image registration
In a given image registration frame, a fixed image was defined as the object that was assumed to be static, while a floating image was defined as the object that would be transformed to be superimposed onto the target image. The matrices of the fixed and floating images were denoted by ⇀ I fixed and ⇀ I floating in a particular registration. The registration algorithm used in this study consisted of three major steps. First, rigid registration was used to bring two images into global correspondence. Thereafter, based on the linear transformation result of the rigid registration, the reference image was deformed with nonlinear transformation to achieve local correspondence to the target image. Finally, as the PTV received the greatest optimization weight during the treatment plan design, the floating image would be further deformed using PTV as a task-specific constraint.
In the first step, initial alignment of the images was achieved using a similarity 3D transform rigid registration method, whereby a rotation (with three angles), a translation (with three vectors), and isotropic scaling (with one factor) were applied to the space. The mean squared deviation (MSD) criterion based on image gray value was used as an optimization metric [33]. MSD is zero in case of the floating image exactly coinciding with the fixed image. The rigid transformation was denoted by f R , and f R ⇀ I floating was used as the initial condition for the deformable registration in the following step.
Based on the rigid registration, local image deformation was achieved using the Demons deformable registration, which is a widely used image registration method in radiotherapy [34]. The deformable transformation was denoted by f D . In addition to the image intensity, the PTV region was considered as a constraint area by means of further optimization. The delineated PTVs of the floating and fixed images were extracted as region masks. In the mask images, voxels belonging to PTV have value 1 while other voxels have value 0. A further transformation f P was generated through could be obtained by f P f D f R ⇀ I floating . All the above methods were implemented using the library from the SimpleITK system [31,32].

Geometry features
Similarity metrics between two patients were required to select the atlas image. In total, 11 features were extracted to characterize the different scales of the patient geometry, which are described in the following text. (1) The MSD was used as the image similarity metric, which described the global difference between two images. (2) The dice similarity coefficient (DSC) of two PTVs was used to describe the PTV similarity. The DSC was defined as where V PTV 1 is the PTV area in the atlas image and V PTV 2 is the PTV area in the new image. The DSC was calculated after rigid alignment between two images; DSC = 1 when PTV1 and PTV2 coincided with one another, and DSC = 0 when the two areas did not intersect. (3) The overlap volume histogram (OVH) is a general, sophisticated, and robust shape relationship descriptor associated with an OAR, measuring its proximity to a target [9]. The OVH value represents the percentage of the OAR volume that overlaps with a uniformly expended target. As the dose constraints of the ipsilateral lung and heart are the most important factors that affect the results of left-breast tangent VMAT plans, the MSD of the left lung-PTV OVH (MSD OVH ) and (4) MSD OVH of the heart-PTV were considered as features to describe the geometric relationship of the OARs and PTVs. The MSD OVH was calculated using where O1 i% and O2 i% are the overlap values at i % volume of the atlas case and the new cases.
In coplanar treatment, the jaws in the head-foot direction can block most of the radiation, so the dose distribution gradient is usually greater in the head-foot direction than in the other directions. Therefore, the relative positions between PTVs in the new image and atlas image in the head-foot direction would have a substantial impact on the dose prediction. A similar concept was also used in the DVH prediction [13][14][15]. Two features were used to describe the PTV relative positions in the head-foot direction: (5) the length difference between two images and (6) the z-axis (head-foot direction) overlap metric. The z-axis overlap metric was introduced as follows: where Z PTV1 and PTV2 is the z-axis overlap length of two PTVs, and Z PTV1 and Z PTV2 are the z-axis lengths of two PTVs. Similarly to feature (2), feature (6) was calculated following rigid alignment between two images, and PTV2 in Eq. (3) was affine transformed from the original image during the rigid alignment. The overlap in the z-axis is illustrated in Fig. 2. Equations (7) to (11): 3DMIs were used to quantify the spatial characteristics in the shape of PTVs. 3DMIs are mathematical shape descriptors that are designed to be invariant to scaling, translation, and rotation [35,36]. 3DMIs are combinations of terms describing the variance, skewness, and kurtosis of a distribution, and are often used in visual pattern recognition. For a 3D distribution, in this study a PTV mask image, the moments of order n = p + q + r are given by where x, y, z are the spatial coordinates of each voxel. f x, y, z = 1 when x, y, z inside the target area and f x, y, z = 0 when x, y, z outside the target area. The first-order moments can be used to find the centroid coordinates of the object in each direction as follows: To obtain invariance to position in the image, central moments are used and are defined as follows:  To eliminate the influence of PTV volume, the moments are commonly normalized by Finally, the normalized central moments need to be combined specifically to obtain invariance to rotation. Here, the definitions of Sadjadi and Hall [37] for three secondorder moments and of Ng et al. [38] for third-and fourth-order moments invariant to rotation were used as follows: The differences of five 3DMIs between two PTVs were considered as features for describing the geometric similarities of target areas.
With the exception of features (2) and (6), all of the values of the features decreased when the two patients were more similar. To maintain the consistency of the numerical change direction, all values of features (2) and (6) were replaced with the reciprocals of the original values.

Feature selection
To maintain simplicity in the atlas-selection method, features without significant correlations with the prediction results could be excluded to select the atlas image. The Spearman's rank correlation was used to evaluate the correlations among the 11 features and dose prediction result measures introduced in Validation study. The statistical analysis was performed within the 20 atlas cases. The correlation significances were assessed by the p value. The correlations with p < 0.05 were considered as significant. Furthermore, factor and correlation analyses were applied to characterize the correlated features with less factors. All the correlated features were tested with Bartlett's sphericity test and the Kaiser-Meyer-Olkin (KMO index). If the KMO measure > 0.5 and the Bartlett's sphericity significance < 0.05, factor analysis would be used to decrease the feature number. Factor analysis is a statistical method that is used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved variables known as factors. The factor analysis used principal components to extract the maximum variance from the features. To minimize the number of features with high loadings on any given factor, a varimax rotation was utilized, whereby the factors with eigenvalues less than 1 were excluded. A comprehensive score as a linear combination (6)  where i is the number of remaining factors and i represents the eigenvalues of F i . Finally, the quantitative index was used as a gauge for the atlas image selection. All of the mathematical manipulations were performed with SPSS v19 (IBM, Armonk, NY, USA).

Atlas-selection models
For the cases that were already planned, each voxel in the image had a corresponding dose value. These images were referred to as atlas images, while the image to be predicted was referred to as the new image. The atlas image coordinates were mapped by deformable registration onto new images, thereby providing the dose distribution of the latter. The dose distribution matrix of the atlas image was denoted by ⇀ D atlas . The transform function following image registration was f P (f D (f R ())) , while the predicted dose distribution matrix of the new image could be calculated by . During the image registration procedure, the new image acted as the fixed image, while the atlas image acted as the floating image. Two strategies were introduced to generate the predicted dose distribution, as follows:

Most similar image from database (SIM)
The new image was rigidly registered to all the images in the atlas database, and the patient similarity metric factors between the new image and each database image were calculated individually. The image with the maximum factors was considered as the most similar atlas image. Thereafter, the most similar atlas image was used for dose prediction. Figure 3a presents a flowchart of the major steps in the process.

All images from database with weighted dose distribution (WEI)
The new image was deformably registered to the database images, and each registration produced a dose distribution. The set of all registrations was combined into a probabilistic dose that assigned a probability distribution of the dose value to each voxel in the new image. This probabilistic atlas was reduced to a deterministic atlas by assigning the dose that received the weighted average among all database images to each voxel. Figure 3b presents a flowchart of the major steps in the process. The dose value in voxel i was defined as where n is the total number of images in the database, d j is the dose transformed from floating image j, and ω j is the weight factor. Two types of definition methods for ω j were used in this study. In the first, ω j was defined as 1/n , which means that all n atlas images contributed equally to the new dose. In the second, ω j was calculated from the comprehensive score F j , described in Sect. "Feature selection". First, F j was scaled to the range of (0, 1) using the following function: where F max and F min are the maximum and minimum of F j between all atlas images and new images. Because F j decreased as the similarity between two images increased, F min was the comprehensive score for evaluating the similarity between the new image and most similar atlas image. This transformation caused F ′ j to be in the range of (0, 1). For a (11)  given database of atlas images, F ′ j = 1 when j was the most similar atlas image to the new image, and F ′ j = 0 when j was the least similar atlas image. Thereafter, ω j was defined as where n is the total number of atlas images. This definition caused the atlas image that was more similar to the new image to contribute a greater weight when the final result was calculated.

Validation study
The predicted dose distribution was compared to the clinical plan dose for every registration. As one dose measure predicted the quality, the mean absolute difference (MAD) was measured for the PTV, heart, whole lung, ipsilateral lung, and spinal cord. The MAD for a given ROI was defined as where n is the number of voxels in the ROI, and Dp i and D i are the predicted and clinical plan doses for voxel i , respectively.
As a second measure of segmentation quality, the DSC was computed for 80% and 100% of the prescription dose area. For a given dose, the DSC was defined as where V p is the area of the predicted dose that is larger than the given dose, and V m is the area of the clinical plan dose that is larger than the given dose. For perfect prediction, the DSC had a value of 1. A lesser overlap resulted in smaller DSC values.
The 3D gamma analysis metric was also used to measure the dose distribution similarity quantitatively [39,40]. The gamma metric between a predicted dose-to-voxel d p and a clinical plan dose is defined as d m in point r m defined as where r p is a search over a neighborhood of voxels in the predicted dose space d p , r M the spatial distance threshold criterion, and d M the dose difference threshold criterion. The gamma factor between two distributions is the percentage of voxels with γ (r m ) ≤ 1 , which is the percentage of voxels with dose difference less than d M to at least one voxel in a spatial no larger than r M in the predicted dose image. To concentrate on the most crucial area of dose clinically including PTV coverage and dose falloff at the periphery of PTV, the gamma at 80% of prescription dose area was assessed at tolerance levels of r M = 3 mm , d p = 3% (33γ) and r M = 5 mm , d p = 5% (55γ), respectively.