Skip to main content

Sharp loss: a new loss function for radiotherapy dose prediction based on fully convolutional networks



Neural-network methods have been widely used for the prediction of dose distributions in radiotherapy. However, the prediction accuracy of existing methods may be degraded by the problem of dose imbalance. In this work, a new loss function is proposed to alleviate the dose imbalance and achieve more accurate prediction results. The U-Net architecture was employed to build a prediction model. Our study involved a total of 110 patients with left-breast cancer, who were previously treated by volumetric-modulated arc radiotherapy. The patient dataset was divided into training and test subsets of 100 and 10 cases, respectively. We proposed a novel ‘sharp loss’ function, and a parameter γ was used to adjust the loss properties. The mean square error (MSE) loss and the sharp loss with different γ values were tested and compared using the Wilcoxon signed-rank test.


The sharp loss achieved superior dose prediction results compared to those of the MSE loss. The best performance with the MSE loss and the sharp loss was obtained when the parameter γ was set to 100. Specifically, the mean absolute difference values for the planning target volume were 318.87 ± 30.23 for the MSE loss versus 144.15 ± 16.27 for the sharp loss with γ = 100 (p < 0.05). The corresponding values for the ipsilateral lung, the heart, the contralateral lung, and the spinal cord were 278.99 ± 51.68 versus 198.75 ± 61.38 (p < 0.05), 216.99 ± 44.13 versus 144.86 ± 43.98 (p < 0.05), 125.96 ± 66.76 versus 111.86 ± 47.19 (p > 0.05), and 194.30 ± 14.51 versus 168.58 ± 25.97 (p < 0.05), respectively.


The sharp loss function could significantly improve the accuracy of radiotherapy dose prediction.


One of the goals of radiotherapy planning is to protect normal tissue as much as possible while delivering sufficient doses to the tumor. Because different organs have different sensitivity to radiation, their dose constraints are also different. A key problem in external radiotherapy planning is to judge whether the selected plan achieves the optimal dose distribution while minimizing the adverse effects on the organs at risk (OAR). In precision radiotherapy, personalized schemes should be established for evaluating the planning quality. Automated radiotherapy planning and quality control have been commonly based on integrating and summarizing historical data of expert-level treatment plans as well as building models to predict reasonable and achievable dosimetric indicators for new cases [1, 2]. Numerous models have been developed for predicting achievable OAR constraints [2,3,4,5,6,7,8,9] and dose–volume histograms (DVH) [10,11,12].

Because a dose distribution is generally a better descriptor of radio-therapeutic prognosis than a single DVH index, more recently, dose distribution prediction from computed tomography (CT) images and regions of interest (ROI) has been the focus of several studies [13,14,15,16,17,18,19,20,21,22,23,24]. Also, deep learning methods have been recently proposed for predicting 3D dose distributions. Prominent examples of these methods utilize enhanced variants of the traditional convolutional neural networks (CNN), namely, the fully convolutional networks (FCN). A FCN is obtained through replacing the last fully connected layer in a CNN by a convolutional layer. The FCN architecture has been already used for medical image segmentation [25,26,27,28], and it has demonstrated outstanding performance in radiotherapy dose prediction for a variety of cancer types, including nasopharyngeal cancer, lung cancer, prostate cancer and breast cancer [16, 19,20,21,22,23].

In radiotherapy, the region of clinical concern usually accounts for only a small part of the whole imaged region. This data imbalance can lead to a decrease in the accuracy of a dose prediction model, because the model would then tend to make dose predictions closely matching the dosage values for the clinically irrelevant regions. The problem of data imbalance in deep learning has been previously addressed and successfully alleviated through special loss functions such as the focal loss designed for dense object detection and the dice loss employed in natural language processing tasks [29, 30]. Further studies have been conducted to boost the dose prediction accuracy in radiotherapy by network structure adjustment, network parameter optimization and adding beam influence factors to network inputs [16, 19,20,21]. However, loss function design for dose prediction in radiotherapy has been generally overlooked. Inspired by earlier performance improvements based on the focal and dice loss functions, we propose a novel ‘sharp loss’ function for better handling of data imbalance problems, and incorporate this loss in an FCN framework for dose prediction in radiotherapy. The sharp loss is a dynamically scaled variant of the classical mean-squared-error (MSE) loss, where the scaling factor decreases in low-dose regions. The effectiveness of this proposed loss function is demonstrated using a U-Net architecture for dose distribution prediction in radiotherapy of left-sided breast cancer.


Figure 1 shows the training and validation loss curves for the MSE loss and the sharp loss with different values of \(\gamma\). The U-Net architecture equipped with the sharp loss achieved smaller loss than the same architecture with the MSE loss. In the first 30 epochs, the validation loss with the sharp loss function and factors of \(\gamma = 50\) and \(\gamma = 100\) show more variations between the cross-validation folds than those of the other loss function variants. However, all the training and validation loss curves tended to be flat after 32 epochs. This indicates that the U-Net model has good generalization for dose prediction.

Fig. 1

Dose prediction performance with different loss functions: the training loss (dotted) and validation loss (solid) curves for the MSE loss and the sharp loss with different values of γ

Table 1 compares the mean absolute difference (MAD) values of different dose regions in the test data for the MSE loss and the sharp loss with different values of γ. Clearly, dose prediction results based on the MSE loss achieve small MAD values in all regions, but the low-error voxels are concentrated outside the body. Inside the body, the MAD values are large for high ground-truth dose values. The standard deviation (std) explain the performance of the prediction model in different dose area. The smaller the std, the better the generalization performance of the model in different dose regions. The std of the prediction error inside the body was 312.57 cGy with the MSE loss. This standard deviation becomes smaller when the sharp loss is used with γ = 1, 25, 50 or 100. In particular, the standard deviation has the smallest value for γ = 100. For γ < 100, the prediction precision inside the body increases as the value of γ becomes larger, and the sharp loss prediction results were better than those of the MSE loss. However, when γ > 100, the precision deteriorates again as the value of γ increased.

Table 1 The MAD values (cGy) of different dose regions in the test data for the MSE and sharp loss functions

Table 2 shows the MAD values of different ROI types for the MSE and sharp loss functions. The sharp loss function with γ = 100 shows the best performance for the planning target volumes (PTV), the ipsilateral lung, the heart and the contralateral lung. For the spinal cord, similar prediction errors are obtained using the sharp loss function with γ = 1, 25, 50 and 100. These errors are lower than those resulting from other loss variants.

Table 2 The MAD values (cGy) for each ROI type in the test data for the MSE and sharp loss functions

Table 3 shows the MAD values for different ROI types of the test data, along with statistical significance measures of the differences between the results obtained with the MSE loss and the sharp loss with γ = 100. Significant statistical differences were found for all ROI types, except for the case of the contralateral lung. The results demonstrated that using the sharp loss (γ = 100) leads to superior dose prediction results in comparison to the MSE loss.

Table 3 The MAD values (cGy) for different ROI types and the p-values of the statistical differences between the testing results obtained with the MSE and sharp loss functions

Figure 2 shows boxplots of the MAD values obtained for each ROI type of the test data. This figure indicates that the prediction results obtained with the sharp loss at γ = 100 are more accurate and stable than the results obtained by other loss variants.

Fig. 2

Boxplots of the MAD values obtained for each ROI type of the test data


This study demonstrated the effectiveness of the proposed sharp loss function for dose prediction in radiotherapy. As Fig. 3 shows, the sharp loss function can be divided into two stages. A ground-truth value \(D_{0}\) was defined as the demarcation point of these two stages, i.e.,

$$\begin{gathered} {\text{when}}\;D_{i} = D_{0} , \hfill \\ {\text{sharp loss}}\;(D_{i} ) = 0.99 \times {\text{MSE loss}}{.} \hfill \\ \end{gathered}$$
Fig. 3

The MSE loss function and the sharp loss function for γ = 1, 25, 50, 100, 250, and 500. For a fixed error of 0.1 in the predicted dose value, the loss value changed for different variants of the sharp loss function in the low-dose area. A voxel value 1.0 corresponds to 6000 cGy

When \(D_{i} \left[ {0,D_{0} } \right)\), the value of the sharp loss decreases with the decrease in dose. When \(D_{i} \left[ {D_{0} ,6000} \right]\), the value of the sharp loss is approximately equal to the MSE loss. The γ factor affects the loss function in two ways. First, this factor influences the value of \(D_{0}\). Second, the γ factor affects the slope in the range \(\left[ {0,D_{0} } \right)\). As the γ value increases, the loss of prediction error decreases in low-dose regions.

The γ factor was experimentally fine-tuned and its best value was found to be 100. When γ = 100, \(D_{0}\) = 455.71 cGy, which means that the sharp loss function would enable the U-Net model to pay more attention to the regions where the dose is larger than 455.71 cGy. With γ = 100 and a ground-truth dose \(Dp_{i} = 0\), the sharp loss = 0.047 the MSE loss. When γ < 100, the \(D_{0}\) value becomes larger than 455.71 cGy, and hence imprecise prediction results may be obtained in the region where the ground-truth dose values fall between 455.71 cGy and \(D_{0}\). When γ > 100, loss values close to 0 would decrease quickly, and this can lead to imprecise prediction results in the region where the ground-truth dose values are between 0 and 455.71 cGy. This phenomenon can be observed in Table 1. The increase of γ reduces the dose prediction accuracy outside the body, but improves the model performance inside the body, especially in the region where the dose is larger than 3000 cGy. However, this effect is not obvious when γ is larger than 100. Since the whole region inside the body is of clinical concern when making a treatment plan and deserves the same level of attention for dose prediction, the standard deviation of the prediction error inside the body was calculated for each loss. The standard deviation has the lowest value when γ = 100. This means that paying sufficient attention to various dose regions inside the body is warranted to ensure balance in the U-Net optimization process.

Table 2 further illustrates the advantages of the proposed sharp loss with γ = 100. For most of the considered organs, this loss achieved the best dose prediction performance. Even though the sharp loss function with γ = 100 shows the lowest standard deviation of the MAD prediction error, the prediction results still vary across different regions. The largest MAD values occur in the regions where the dose values are between 2000 and 3000 cGy, while the smallest MAD values are observed in the regions with dose values smaller than 500 cGy or larger than 5000 cGy. These results can be ascribed to two reasons. First, the predicted results are prone to higher errors in regions with high dose gradients. Those regions usually have doses between 2000 and 5000 cGy. The average dose gradient is shown in the last column of Table 2. Second, a patient case can be associated with different expertise dose distributions. Therefore, even when the predicted dose values satisfy the optimal solution, there could still be some differences between the predicted and ground-truth dose values.

Figure 4 presents a sample comparison of the clinical ground-truth dose distribution and the predicted dose distributions for the MSE and sharp loss functions, with the corresponding DVH. In this case, the prediction result using MSE loss shows a large error in the PTV region. There is a dose deficiency at the margin of PTV, and the predicted doses of some PTV voxels adjacent to the air are close to zero. Obviously, the sharp loss improves the dose distribution prediction in the ROI of the PTV. As discussed above, the sharp loss has a better correction in the areas with large dose gradients, while tumors near the body surface has a great dose drop at the air boundary. The sharp loss may bring greater improvement to the cases that the PTV near the body surface.

Fig. 4

Comparison of the ground-truth and predicted dose distributions, with corresponding DVH, for the MSE loss and the sharp loss with γ = 100

To provide a more realistic and intuitive estimate of the dosimetry, clinically relevant metrics are shown in Table 4. The homogeneity index (HI) is defined as \(\left( {D_{1} - D_{99} } \right)/5000cGy\), and the conformation index (CI) is defined as \(\left( {V_{{{\text{PTV}}}} \cap V_{{5000{\text{cGy}}}} } \right)^{2} /\left( {V_{{{\text{PTV}}}} \times V_{{5000{\text{cGy}}}} } \right)\). The clinical metrics of prediction results obtained with the sharp loss at γ = 100 are more accurate than the traditional MSE loss.

Table 4 The clinical metrics of ground-truth and predicted dose distribution using MSE loss and sharp loss (γ = 100)

The FCN is widely used in radiotherapy dose prediction at present. Kajikawa used CNN-based methods to predict radiotherapy dose of prostate cancer [22]. Chen found that ResNet101 performed better than VGGNet16 in nasopharyngeal cancer [19]. Several studies used the U-Net, which is characterized by U-shaped architecture and up-sampling and down-sampling operations, to train small size medical dataset efficiently [20, 23]. Babier developed a 3D generative adversarial network for dose prediction and demonstrated superior performance on oropharyngeal cancer cases than previous approaches [31]. The above studies enhance the model performance by improving the network structure. Besides, Barragán-Montero improved the prediction results by adding beam configuration into the input data [21]. However, the above methods are not targeted to solve the problems caused by data imbalance.

Data imbalance is a key obstacle that deteriorates performance in tasks of dense object detection or semantic segmentation. Small organ segmentation in medical imaging is particularly quite challenging due to the large imbalance between the object and background classes [32, 33]. Such imbalance also emerges in dose prediction tasks in radiotherapy. Among all voxels of the 10 test cases of this study, about 74.52% of the voxels were outside the body and the corresponding dose values were just 0. Table 1 shows that the MAD value outside the body was much lower than that inside the body (32.93 versus 186.55 cGy) when the MSE loss function was used. This indicates that a large number of the voxels outside the body overwhelms the region inside the body. Even inside the patient body, the voxels whose dose values are lower than 500 cGy accounted for 77.30% of the total voxels. As a result, the MAD value for the region with the dose range 0–500 cGy is the lowest among all regions inside the body, and the high-dose region was ignored. Data augmentation is a common method to alleviate the effects of data imbalance in deep learning. An augmentation algorithm applies multiple types of transformations to the training data in order to increase the proportion of the samples for the weakly represented class. However, data augmentation is only suitable for cases where the input images and the associated ground-truth data are scale-independent, but dose distributions in radiotherapy are dependent on the X-ray penetration depth. Loss function redesign is another way to deal with the problem of data imbalance, and this has been validated by previous studies [29, 30]. In this study, the sharp loss was proposed to solve the data imbalance problem, and this loss has indeed demonstrated improved prediction performance. To the best of the authors’ knowledge, this is the first study that uses such novel loss function for improving the 3D dose prediction performance.

Nevertheless, this study has several limitations. First, the proposed sharp loss function was applied in conjunction with the U-Net architecture only and the data were limited to breast cancer cases. The next phase of this work will involve more types of deep networks as well as cancer types associated with other sites. Second, the recommended sharp loss factor, γ = 100, was selected based only on a limited validation dataset. With different networks or datasets, alternate suitable values of γ could be sought. Because of the different characteristics of radiotherapy planning for different human organs, the universality of the proposed sharp loss function will be investigated and scrutinized in future work. Third, the model was tested on 10 cases, which limited the evaluation of trained model generalization. The dataset will be expanded in the future. Furthermore, since one of the important applications of dose prediction is automatic planning, the related automatic plan research would be conducted in future work.


This study proposes a novel loss function, named sharp loss, for FCN-based dose distribution prediction in radiotherapy. This loss function was first used to improve the outcomes of dose prediction modeling. The sharp loss function has achieved better prediction accuracy than that of the traditional MSE loss. With different values of the non-negative parameter γ greater than or equal to one, the value γ = 100 exhibited prediction results that are superior to those obtained with other values. In addition, the same γ value reduced the imbalance-induced prediction errors in regions of different doses. The sharp loss function could enable the prediction model to focus on the regions of more clinical significance. The improved predictions of dose distributions provide an effective foundation for quality assurance, automation, and efficiency in planning radiotherapy.


Dataset description

In this study, 110 patients in the early stages (Stage I and Stage II) of left-sided breast cancer were involved retrospectively. These patients received radiotherapy in the Zhejiang Cancer Hospital from January 2017 to December 2020. Breast scanning was performed using a GE LightSpeed-RT CT simulator or a Philips large-aperture CT simulator. The CT layer thickness was 5 mm and the pixel matrix had a size of 512 × 512. After scanning, the PTV, heart, ipsilateral lung, contralateral lung, and spinal cord were delineated by senior physicians with 10 years of expertise. No regions of interest overlapped with each other. Each patient received breast-conserving radiotherapy on the left side, where the internal breast region and cervical lymph nodes were not subjected to radiation. Actual treatment was carried out using volumetric-modulated arc therapy (VMAT) techniques with 6-MV double-arc X-ray. Also, the start and stop angles of the X-ray gantry were tangent to the irradiated breast. The prescribed dose was 5000 cGy spread over 25 fractions, and scaled to cover 95% of the PTV for all cases. The ROI delineation and treatment planning were completed using the RayStation radiotherapy planning system (Version 4.5, RaySearch Laboratories AB, Sweden). All plans were optimized further using a trial-and-error process to conform to the as-low-as-reasonably achievable principle for normal tissues. Moreover, all plans were reviewed by two experienced dosimetrists and one senior oncologist.

Data preprocessing

For each patient in the training set, the ROI structures and the corresponding dose distribution were converted from the Digital Imaging and Communications in Medicine (DICOM) format to matrices in Python. In this conversion process, each voxel was assigned a specific value based on the ROI type. Specifically, the voxels in the ROIs of the PTV, the ipsilateral lung, the heart, the contralateral lung, the spinal cord, and other tissues were assigned values of 1.000, 0.833, 0.667, 0.500, 0.333, and 0.167, respectively, whereas voxels outside of the body contour were assigned a value of 0. For each case, every ROI matrix retained 64 slices in the head–foot direction, and all PTV was contained within the 64 slices. In the transverse section, each slice matrix was bilinearly interpolated into a size of 256 × 256 to reduce the computational cost. The dose distribution matrices were derived from the DICOM data, and each dose value was scaled to be in the range [0, 1] through division by 6000 cGy. For each case, the dose matrix was bilinearly interpolated into a size of 64 × 256 × 256, which corresponds to the ROI image matrix size.

Fully convolutional networks

The U-Net FCN architecture was used to predict dose distributions in breast cancer. This choice is motivated by earlier successful applications of this architecture in medical image segmentation as well as dose prediction for head and neck cancer, prostate cancer and breast cancer [20, 22, 23]. The U-Net model used in this study has a total of 10 layers, with a network input of 3-dimensional 64 × 256 × 256 ROI matrices with 1 channel. Each of the first 5 layers is a down-sampling layer that contains two 3 × 3 × 3 convolution modules with valid padding, followed by a rectified linear unit (ReLU), one batch normalization module, and one 2 × 2 × 2 maximum pooling operation. The 6th, 7th, 8th, and 9th layers are up-sampling layers. Each of these layers has one concatenation module, two 3 × 3 × 3 convolution modules with valid padding, followed by a ReLU module, and one batch normalization module. Finally, the 10th layer has a 1 × 1 × 1 convolution module followed by a linear activation function for output, and the output of this layer is the dose distribution matrix. The overall architecture of the proposed network is shown in Fig. 5.

Fig. 5

The architecture of the proposed deep network architecture for radiotherapy dose prediction

Sharp loss

The sharp loss function is introduced in this work to better address the problem of data imbalance for dose distribution prediction in radiotherapy. Indeed, training data for this problem typically show an extreme imbalance between the high-dose volume (> 500 cGy) and the low-dose volume (< 500 cGy). The sharp loss function is derived from the MSE loss:

$${\text{MSE}} = \frac{1}{n}\mathop \sum \limits_{i = 1}^{n} \left( {Dp_{i} - D_{i} } \right)^{2} ,$$

where \(n\) was the total number of voxels, \(Dp_{i}\) is the predicted dose for voxel i, and \(D_{i}\) is the ground-truth dose for voxel i.

Figure 3 shows a plot of the MSE loss (in blue) as a function of the ground-truth dose value. One remarkable property of this loss is that even a voxel dose that is close to 0 incurs a loss with a non-trivial magnitude. When small loss values are summed over a large number of low-dose voxels, the clinically more relevant high-dose voxels will be overwhelmed.

The following experiments stress that remark: the large number of low-dose voxels encountered during training of the U-Net architecture dominates the MSE loss. In fact, the low-dose voxels, especially those outside the body, comprise the majority of the MSE loss and hence dominate the loss function gradient. In order to redirect the network optimization process to focus on the more relevant higher-dose voxels, the sigmoid function is employed to reduce the weight of the low-dose voxels in the network loss. The sigmoid function is mathematically defined as:

$$S\left( x \right) = \frac{1}{{1 + e^{ - \gamma x} }}, \gamma = 1.$$

Figure 6 shows a plot of this function (in blue). Because the sigmoid function is smooth and easily differentiable, it is widely used in deep learning to map real numbers into the unit interval [0, 1]. In this work, the definition of the sigmoid function was modified in terms of two aspects. First, since the predicted dose values were scaled to be in the range [0, 1], the γ factor was allowed to exceed the unit value in order to increase the loss gradient in the dose value range, especially for low-dose values. A modified sigmoid function with \(\gamma = 25\) is shown as the orange curve in Fig. 6. A larger \(\gamma\) value resulted in a steeper function behavior in the low-dose region. Second, the loss curve was moved toward the positive direction of the x axis by subtracting 0.03 from x to keep the ground-truth dose values positive. This function modification can lead to more efficient exploitation of the gradient regions of the sigmoid function. This modified function is shown in Fig. 6 as the green curve.

Fig. 6

The original sigmoid function (in blue) and its two variants (in orange and green, respectively)

As an alternative to the MSE loss, we propose in this study to reshape the loss function to assign lower weights to low-dose voxels, and thus focus the network training on the high-dose voxels. The modified sigmoid function introduced above was proposed as a modulating factor for the MSE loss. We name the modulated loss function the ‘sharp loss’, which is defined as:

$${\text{SL}} = \frac{1}{n}\mathop \sum \limits_{i = 1}^{n} \frac{1}{{1 + e^{{ - \left( {D_{i} - 0.03} \right) \times \gamma }} }}\left( {Dp_{i} - D_{i} } \right)^{2} ,$$

where \(\gamma \ge 0\) is a tunable parameter. The factor \(1/\left( {1 + e^{{ - \left( {D_{i} - 0.03} \right) \times \gamma }} } \right)\) was derived from the sigmoid function as explained above. Figure 3 shows plots of the sharp loss function for several \(\gamma\) values. For this proposed loss function, when a ground-truth dose value is close to 0, the loss for this voxel is assigned a low weight and approaches 0. Since most of the zero-dose voxels are outside the body, the prediction precision of the low-dose voxels will not cause large errors inside the body. In the following experiments, we analyzed the dose prediction performance with the sharp loss function for \(\gamma = 1, 25, 50, 100, 250,\) and \(500\).

Network model training

Ten breast cancer cases were randomly selected from the 110 available cases to form the test set, while the remaining 100 cases comprised the training set. The training data included a total of 100 three-dimensional ROI arrays and their corresponding dose matrices. The ROI arrays were used as the input data while the corresponding dose matrices were employed as the output data for prediction model training. A tenfold cross-validation scheme was employed. The network construction was implemented in the Python-based application programming interface of Keras ( The Adam algorithm was used for loss function minimization. The batch size was set to 2, and the initial learning rate was set to 0.001. Also, when the validation loss did not decrease in 5 iterations, the learning rate was attenuated at a rate of 0.1 until it reached \({10}^{-7}\). The training was carried out using two Intel Xeon Gold 5118 CPUs @ 2.30 GHz, and one NVIDIA TITAN RTX PC with a 192-GB RAM. After network training, the ROI arrays of the test set were fed into the trained model, and the corresponding dose prediction matrices were obtained.

Performance validation

The performance of the proposed system was validated on the 10 selected test cases. MAD between the ground-truth and predicted dose distributions was used to evaluate the prediction performance. The MAD performance metric is defined as:

$${\text{MAD}} = \frac{1}{n}\mathop \sum \limits_{i = 1}^{n} \left( {\left| {Dp_{i} - D_{i} } \right|} \right),$$

where n is the total number of voxels inside the evaluated area, while \(Dp_{i}\) and \(D_{i}\) are the ground-truth and predicted dose values for voxel i.

To assess the statistical significance of the prediction results obtained by different loss functions, a Wilcoxon signed-rank test was performed with a significance level of p = 0.05 [34]. Statistical analyses were conducted using SPSS v19 (IBM, Armonk, NY, USA).

Availability of data and material

The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.



Organs at risk


Dose–volume histograms


Computed tomography


Regions of interest


Convolutional neural networks


Fully convolutional networks


Mean squared error


Mean absolute difference


Planning target volumes


Homogeneity index


Conformation index


Volumetric-modulated arc therapy


Digital Imaging and Communications in Medicine


Rectified linear unit


  1. 1.

    Fan J, Wang J, Chen Z, et al. Automatic treatment planning based on three-dimensional dose distribution predicted from deep learning technique. Med Phys. 2019;46(1):370–81.

    Article  Google Scholar 

  2. 2.

    Bai X, Shan G, Chen M, et al. Approach and assessment of automated stereotactic radiotherapy planning for early stage non-small-cell lung cancer. Biomed Eng Online. 2019;18(1):101.

    Article  Google Scholar 

  3. 3.

    Hunt MA, Jackson A, Narayana A, et al. Geometric factors influencing dosimetric sparing of the parotid glands using IMRT. Int J Radiat Oncol Biol Phys. 2006;66(1):296–304.

    Article  Google Scholar 

  4. 4.

    Wu BB, Ricchetti F, Sanguineti G, et al. Patient geometry-driven information retrieval for IMRT treatment plan quality control. Med Phys. 2009;36(12):5497–505.

    Article  Google Scholar 

  5. 5.

    Reddy NMS, Chang NH, Lange CS, et al. Prostate and seminal vesicle volume based consideration of prostate cancer patients for treatment with 3D-conformal or intensity-modulated radiation therapy. Med Phys. 2010;37(7):3791–801.

    Article  Google Scholar 

  6. 6.

    Moore KL, Brame RS, Low DA, et al. Experience-based quality control of clinical intensity-modulated radiotherapy planning. Int J Radiat Oncol Biol Phys. 2011;81(2):545–51.

    Article  Google Scholar 

  7. 7.

    Petit SF, Wu B, Kazhdan M, et al. Increased organ sparing using shape-based treatment plan optimization for intensity modulated radiation therapy of pancreatic adenocarcinoma. Radiother Oncol. 2012;102(1):38–44.

    Article  Google Scholar 

  8. 8.

    Wang Y, Zolnay A, Incrocci L, et al. A quality control model that uses PTV-rectal distances to predict the lowest achievable rectum dose, improves IMRT planning for patients with prostate cancer. Radiother Oncol. 2013;107(3):352–7.

    Article  Google Scholar 

  9. 9.

    Malone C, Fennell L, Folliard T, et al. Using a neural network to predict deviations in mean heart dose during the treatment of left-sided deep inspiration breath hold patients. Phys Med. 2019;65:137–42.

    Article  Google Scholar 

  10. 10.

    Zhu X, Ge Y, Li T, et al. A planning quality evaluation tool for prostate adaptive IMRT based on machine learning. Med Phys. 2011;38(2):719–26.

    Article  Google Scholar 

  11. 11.

    Appenzoller LM, Michalski JM, Thorstad WL, et al. Predicting dose-volume histograms for organs-at-risk in IMRT planning. Med Phys. 2012;39(12):7446–61.

    Article  Google Scholar 

  12. 12.

    Bai X, Wang B, Shao K, Yang Y, Shan G, Chen M. A study of prediction model of lung dose in early stage non-small cell lung cancer with stereotactic body radiotherapy. Chin J Radiat Oncol. 2020;29(2):106–10.

    Article  Google Scholar 

  13. 13.

    Shiraishi S, Moore KL. Knowledge-based prediction of three-dimensional dose distributions for external beam radiotherapy. Med Phys. 2016;43(1):378.

    Article  Google Scholar 

  14. 14.

    Mcintosh C, Purdie TG. Contextual atlas regression forests: multiple-atlas-based automated dose prediction in radiation therapy. IEEE Trans Med Imaging. 2016;35(4):1000–12.

    Article  Google Scholar 

  15. 15.

    Dan N, Long T, Jia X, et al. Dose Prediction with U-net: A Feasibility Study for Predicting Dose Distributions from Contours using Deep Learning on Prostate IMRT Patients. arXiv preprint, 2017. 2017. arXiv:1709.09233, 2017

  16. 16.

    Kearney V, Chan JW, Haaf S, et al. DoseNet: a volumetric dose prediction algorithm using 3D fully-convolutional neural networks. Phys Med Biol. 2018;63(23):235022.

    Article  Google Scholar 

  17. 17.

    Yoganathan SA, Zhang R. An atlas-based method to predict three-dimensional dose distributions for cancer patients who receive radiotherapy. Phys Med Biol. 2019;64(8):085016.

    Article  Google Scholar 

  18. 18.

    Barragán‐Montero A M, Nguyen D, Lu W, et al. Three‐dimensional dose prediction for lung IMRT patients with deep neural networks: robust learning from heterogeneous beam configurations. 2019

  19. 19.

    Chen X, Men K, Li Y, et al. A feasibility study on an automated method to generate patient-specific dose distributions for radiotherapy using deep learning. Med Phys. 2019;46(1):56–64.

    Article  Google Scholar 

  20. 20.

    Nguyen D, Jia X, Sher D, et al. 3D radiotherapy dose prediction on head and neck cancer patients with a hierarchically densely connected U-net deep learning architecture. Phys Med Biol. 2019;64(6):065020.

    Article  Google Scholar 

  21. 21.

    Barragán-Montero AM, Nguyen D, Lu W, et al. Three-dimensional dose prediction for lung IMRT patients with deep neural networks: robust learning from heterogeneous beam configurations. Med phys. 2019;46(8):3679–91.

    Article  Google Scholar 

  22. 22.

    Kajikawa T, Kadoya N, Ito K, et al. A convolutional neural network approach for IMRT dose distribution prediction in prostate cancer patients. J Radiat Res. 2019;60(5):685–93.

    Article  Google Scholar 

  23. 23.

    Bai X, Wang S, Wang B, Zhang J, Shao K, Yang Y, Shan G, Chen M. Study of three-dimensional dose distribution prediction model in radiotherapy planning based on full convolution network. Chin J Radiat Oncol. 2020;29(8):666–70.

    Article  Google Scholar 

  24. 24.

    Bai X, Wang B, Wang S, et al. Radiotherapy dose distribution prediction for breast cancer using deformable image registration. Biomed Eng Online. 2020;19(1):39.

    Article  Google Scholar 

  25. 25.

    Thong W, Kadoury S, Piché N, et al. Convolutional networks for kidney segmentation in contrast-enhanced CT scans. Comput Methods Biomech Biomed Eng Imaging Vis. 2016;6(3):277–82.

    Article  Google Scholar 

  26. 26.

    Christ P F, Elshaer M E A, Ettlinger F, et al. Automatic liver and lesion segmentation in CT using cascaded fully convolutional neural networks and 3D conditional random fields. In: Proceedings of the international conference on medical image computing & computer-assisted intervention. 2016.

  27. 27.

    Ma Z, Zhou S, Wu X, et al. Nasopharyngeal carcinoma segmentation based on enhanced convolutional neural networks using multi-modal metric learning. Phys Med Biol. 2019;64(2):025005.

    Article  Google Scholar 

  28. 28.

    Men K, Dai J, Li Y. Automatic segmentation of the clinical target volume and organs at risk in the planning CT for rectal cancer using deep dilated convolutional neural networks. Med Phys. 2017;44(12):6377–89.

    Article  Google Scholar 

  29. 29.

    Lin T-Y, Goyal P, Girshick R, et al. Focal loss for dense object detection. 2017. arXiv:1708.02002.

  30. 30.

    Li X, Sun X, Meng Y, et al. Dice loss for data-imbalanced NLP tasks. 2019. arXiv:1911.02855.

  31. 31.

    Babier A, Mahmood R, Mcniven AL, et al. Knowledge-based automated planning with three-dimensional generative adversarial networks. Med Phys. 2020;47(2):297–306.

    Article  Google Scholar 

  32. 32.

    Valindria VV, Lavdas I, Cerrolaza J, et al. Small organ segmentation in whole-body MRI using a two-stage FCN and weighting schemes. Cham: Springer International Publishing; 2018.

    Book  Google Scholar 

  33. 33.

    Gao Y, Huang R, Chen M, et al. FocusNet: imbalanced large and small organ segmentation with an end-to-end deep neural network for head and neck CT images. Cham: Springer International Publishing; 2019.

    Google Scholar 

  34. 34.

    Woolson RF. Wilcoxon signed-rank test. Wiley; 2005.

    Book  Google Scholar 

Download references


The authors sincerely thank all study participants.


This work was supported in part by the National Natural Science Foundation of China (12005190), the Zhejiang Medical and Health Discipline Platform Project (2018ZD014), and the Zhejiang Basic Public Welfare Research Program (LSY19H180002).

Author information




BX, ZJ and HQ contributed to the study concept, design and data interpretation. WB and WS worked on the acquisition of data. BX and XYD worked on the data analysis. BX and HQ worked on the preparation of the manuscript. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Xue Bai or Qing Hou.

Ethics declarations

Ethics approval and consent to participate

This study was approved by the Ethics Committee of Zhejiang Cancer Hospital with number IRB-2020–262.

Consent for publication

Not applicable.

Competing interests

The authors declared no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Bai, X., Zhang, J., Wang, B. et al. Sharp loss: a new loss function for radiotherapy dose prediction based on fully convolutional networks. BioMed Eng OnLine 20, 101 (2021).

Download citation


  • Radiotherapy
  • Dose prediction
  • Loss function
  • Breast cancer