- Open Access
A comparison of calibration data from full field digital mammography units for breast density measurements
BioMedical Engineering OnLine volume 12, Article number: 114 (2013)
Breast density is a significant breast cancer risk factor measured from mammograms. The most appropriate method for measuring breast density for risk applications is still under investigation. Calibration standardizes mammograms to account for acquisition technique differences prior to making breast density measurements. We evaluated whether a calibration methodology developed for an indirect x-ray conversion full field digital mammography (FFDM) technology applies to direct x-ray conversion FFDM systems.
Breast tissue equivalent (BTE) phantom images were used to establish calibration datasets for three similar direct x-ray conversion FFDM systems. The calibration dataset for each unit is a function of the target/filter combination, x-ray tube voltage, current × time (mAs), phantom height, and two detector fields of view (FOVs). Methods were investigated to reduce the amount of calibration data by restricting the height, mAs, and FOV sampling. Calibration accuracy was evaluated with mixture phantoms. We also compared both intra- and inter-system calibration characteristics and accuracy.
Calibration methods developed previously apply to direct x-ray conversion systems with modification. Calibration accuracy was largely within the acceptable range of ± 4 standardized units from the ideal value over the entire acquisition parameter space for the direct conversion units. Acceptable calibration accuracy was maintained with a cubic-spline height interpolation, representing a modification to previous work. Calibration data is unit specific, can be acquired with the large FOV, and requires a minimum of one reference mAs sample. The mAs sampling, calibration accuracy, and the necessity for machine specific calibration data are common characteristics and in agreement with our previous work.
The generality of our calibration approach was established under ideal conditions. Evaluation with patient data using breast cancer status as the endpoint is required to demonstrate that the approach produces a breast density measure associated with breast cancer.
Mammographic breast density is a significant breast cancer risk factor [1–3]. Although used extensively in research, breast density is not generally used in the clinical environment for breast cancer risk applications  due in large part to the lack of an automated measurement. There are various methods under evaluation for estimating breast density from either raw or calibrated mammograms . A large portion of breast density research was derived without calibration [1, 2], as calibration is a more recent development for mammography.
Ideally, calibration adjusts for inter-patient x-ray image acquisition technique differences to produce some form of standardized data representation [6–9]. Calibration research is still in its early stage of development and there are few published reports evaluating its potential application relative to the volume of published breast density research using raw mammograms. The findings from calibration research have been mixed in identifying a measure that strengthens the associations with breast cancer in comparison with the operator-assisted percentage of breast density measure [10–15]. Due to its stage of development, it may be premature to conclude whether calibration is generally a useful technique for risk assessments. However, one benefit of establishing a calibration method is that it permits automated breast density measurements. We have posited that calibration may be an important step for automation.
Full field digital mammography (FFDM) detector technologies can be broadly categorized as either indirect or direct x-ray conversion systems . Although these designs have many characteristics that vary, until recently both technologies produced an energy weighted integrated signal at the pixel level . More recently, another type of direct x-ray conversion technology was approved for clinical use in the US that uses photon counting detection technology , which, in contrast to the established FFDM designs, does not produce an integrated weighted signal. Currently, it is not known if calibration will produce equivalent findings across these varying FFDM platforms.
We applied a calibration methodology developed previously for a General Electric Senographe 2000D FFDM system [19–22], which is an indirect x-ray conversion technology. Our findings based on images taken from this technology [12–14] suggest that calibrated breast density measurements are strong indicators of risk, providing justification to investigate the merits of calibration in more detail. As many characteristics vary between the direct and indirect x-ray conversion systems, the applicability of our calibration methodology has yet to be established for direct x-ray conversion FFDM systems.
In this current report, we expand our understanding of calibration gained previously [21, 22] and establish a calibration system for a direct x-ray conversion FFDM design using phantom images acquired from three Hologic Selenia FFDM units, as the primary analysis. We considered several design objectives. One objective is to minimize the amount of calibration data collection while maintaining acceptable calibration accuracy, representing an important compromise. Although optimal, it is nearly impossible to sample all acquisition technique combinations to construct the calibration curves. Therefore, some form of sampling scheme and interpolation methodology must be established to minimize effort while maintaining acceptance accuracy. It is reasonable to assume that if calibration requires excessive phantom imaging effort or is difficult to apply across imaging platforms without considerable modification, it may not be used beyond research. Another objective is to evaluate whether calibration data collected from one FFDM unit can be applied to another similarly manufactured unit, with or without modification, as inter-unit generalization for a given technology is an important step for universal application. As a secondary objective, we also compared calibration and detector response data obtained from the Hologic units investigated in this report with those previously acquired from the General Electric FFDM unit when applicable to assess inter-technology similarities.
We acquired calibration and exposure response data from three Hologic Selenia FFDM units to evaluate the generality of our approach. Calibration curves were generated by imaging standard breast tissue equivalent (BTE) phantoms (CIRS, Norfolk, VA) described previously . Our BTE phantom set includes 100% fibroglandular (glandular) and 100% adipose BTE materials that are of 1 mm, 2 mm, 1 cm, and 2 cm thicknesses (i.e. precise heights) and 18 cm × 24 cm in area dimension. These phantoms were combined (stacked) to produce desired composite proportions at a given total thickness (height). For example, combining a 2 cm thickness glandular phantom with a 2 cm thickness adipose phantom gives a 50% glandular composition with a total height of 4 cm. Calibration curves are functions of the compressed breast thickness above the breast support surface, referenced as height, and several other acquisition technique parameters, including target/filter combination, x-ray tube voltage (kV), current × time (mAs), and detector field of view (FOV), representing a five dimensional parameter space. As previously, we refer to the initial data collection as the baseline (BL) calibration dataset. A BL dataset was established for each unit.
The three Selenia systems evaluated in this report are located within the breast clinics at the Moffitt Cancer Center and are used for both screening and diagnostic purposes. Two of these systems, referred to as the H1 and H2, have a tungsten (W) target with rhodium (Rh) and silver (Ag) filter options. The third unit has a molybdenum (Mo) target with Mo and Rh filter options and is referred to as H3. The Selenia detector has 70 micron pitch (pixel spatial resolution), and the raw data used for this work has 14 bit per pixel dynamic range. Two detector FOVs are used for screening mammograms on these units depending upon the choice of compression paddle: 24 cm × 29 cm (large) and 18 cm × 24 cm (small). The General Electric Senographe 2000D FFDM unit is referred to as GE in the report. This unit has a Mo target with Mo and Rh filter options, and a Rh target with a Rh filter. The GE detector has 100 micron spatial resolution, a 19.2 cm × 23 cm detector FOV (i.e. 1914 × 2294 pixels) and 14 bit dynamic range per pixel for the raw data used in this work. As a standard convention, we acquired all phantom images as left cranial caudal (LCC) views. In the LCC view, the detector left border in the vertical direction is parallel with the chest wall position as observed in a displayed image.
The aims of this study were to assess the pixel value – detector exposure (detector response) relationship without attenuation, generate and assess the calibration curves for linearity, and evaluate the calibration accuracy. To minimize the BL data collection, we evaluated the calibration accuracy under these conditions: (a) when applying interpolation for the height variable; (b) when applying a data reduction step to reduce mAs sampling; and (c) as a function of FOV. To evaluate the FOV impact, we acquired the calibration datasets with the large detector FOV only. The validity of collecting calibration data with the large FOV only was evaluated by examining calibration accuracy for images acquired with the small FOV. We made direct comparisons between H1 and H2 because of their target/filter and manufacturing similarity, and evaluated whether calibration data collected with one unit is valid when applied to another similarly manufactured unit. Likewise, we made direct comparisons between the H3 and GE units for the Mo/Mo and Mo/Rh combinations, when applicable.
The analysis was restricted to specific regions depending on the FFDM design and specific analysis endpoint. For the H1 H2 and H3 units, unless stated otherwise, the analysis was constrained to a large region of interest (ROI) specific to the large FOV. This ROI is defined as 2000 × 2500 pixels (14 cm × 17.5 cm), centered in the vertical direction with an horizontal offset of 75 pixels (not included) from the outside of the detector (i.e. parallel to the chest wall) or left border (LCC view). This restriction is to avoid stacked-phantom edge effects near the detector outer edge and possible flat field non-uniformity interference at regions far (interior) from the central detector area. For the FOV analysis and for images taken with the GE unit, the analysis was constrained to 1000 × 1250 pixel ROI with a 75 pixel offset (as above). The ROIs relative to the Hologic detector and the BTE phantom area are shown in Figure 1.
Exposure response evaluation
We assessed the detector exposure and pixel value (pv) response relationships for the H1, H2, and H3 units for select kV settings for each target/filter combination, using the large FOV. The raw image pixel value (pvraw) response was modeled as a linear function of mAs by acquiring images without attenuation (i.e. open exposures of the detector). The mAs variable was sampled up to the point of detector saturation. The sample sets for each kV setting were analyzed with regression analysis and fitted to this form: <pvraw > = m × x + b, where x is the system readout mAs quantity for each acquisition. The slope (m), intercept (b), coefficient of determination (R2), and standard error (SE) in the slope were used for evaluation purposes. The brackets indicate the mean pvraw within the large ROI. We make the approximation that the system readout mAs value is a surrogate (proportional with) for the x-ray exposure at the detector, which is common practice. We made both intra-technology comparisons and comparisons with the GE exposure response, where applicable. Because H1 and H2 have the same target/filter combinations and H3 and GE have common combinations, the respective pairwise comparisons were included in the analysis. When making pairwise inter-unit slope comparisons for given kV, an important difference is defined as when the central value of mi falls outside of this tolerance range: mj ± 2 × SEj or vice versa, where the index = 0 is reserved for the GE unit. Where appropriate, we compared the entire set of mj across units with a t-test. Demonstrating that the response is linear has important implications in the BL calibration data collection requirements. When the linear approximation holds, the mAs sampling may be reduced to one sample in the BL dataset.
Calibration dataset and characterization
The phantom imaging techniques and methods for constructing the BL calibration datasets (i.e. calibration curves) were described previously [21, 22]. The same approach was applied in this report with some modification. Briefly, to construct the calibration curves for a given acquisition technique, two series of BTE phantoms were imaged to generate the respective glandular and adipose calibration curves for BL sampled heights defined as tk. Reference points derived from theses curves are used in the calibration application (discussed below). The phantom heights (total stacked heights) for a given calibration curve range from 2-7 cm depending on the acquisition technique, and were taken at 1 cm increments for convenience. To estimate the kV range, we selected the automated exposure control (auto-kV mode) and adjusted the compression paddle over a range of heights for fixed target/filter combinations. We estimated the W/Rh range is between 26-30 kV, and the W/Ag range is between 27-32 kV for the H1 and H2 systems. The same procedure was followed for Mo/Mo and Mo/Rh techniques for the H3 system giving 25-31 kV and 27-34 kV ranges, respectively. BL calibration datasets (H1, H2, H3 and GE units) were acquired with the same reference mAs setting defined as: xr = 160 mAs. We selected a reference mAs value that does not cause detector saturation when imaging phantom configurations with smaller heights, in particular adipose phantoms, while providing sufficient signal when imaging phantoms with larger heights, in particular glandular phantoms, over the entire acquisition technique range considered, as discussed previously .
For both comparison and presentation purposes, we evaluated the calibration curves using linear regression methods without regard to calibration accuracy. We subdivided the large ROI (2000 × 2500 pixel region shown in Figure 1) into a grid consisting of 25 × 25 pixel smaller non-overlapping sub-regions defined as rs. This gives 80 × 96 = 7680 rs sub-regions (for the large FOV). As above, tk is the BL phantom height in cm with the index k designating a sampled height. For a given phantom configuration (fixed height and BTE type), we average the pixel values (i.e. <pvraw>) within rs giving the mean exposure, El(rs), at rs and tk. For this report, the index, l, is reserved for the BTE type designation: l = a for adipose; and l = g for glandular. We divide El(rs) by the reference mAs giving the relative mean exposure, REl(rs) = El(rs) / xr (i.e. the reference xr = 160 mAs) at each subdivision. We evaluate the natural logarithm of the relative mean exposure, LREl(rs) = ln[REl(rs)], as a function of increasing tk giving a regional calibration curve; for reference, this defines logarithm of the relative exposure (LRE) domain, which holds at the pixel level as well. For inter-unit comparisons, we applied linear regression at each rs for each BTE type resulting in a distribution for the slopes (μl), logarithmic intercepts (LIl), and R2 values estimated by fitting the ordered pairs [tk, LREl (rs)] to this model
When fitted to this form (tk+1 > tk), the magnitude of the slope can be interpreted as the effective x-ray attenuation coefficient (i.e. μg for glandular and μa for adipose tissue, cited as positive quantities in the tables and expressions) measured in cm-1 for a given kV and target/filter combination. The LIl quantities are the respective intercepts, which are unit-less. We summarized these regression parameter distributions with the mean and mean standard error (SE). As above, we use the μl ± 2 × SEl tolerance gauge for the inter-system pairwise comparisons. Where appropriate, we compared the entire set of effective x-ray attenuation coefficients across systems with a t-test for each BTE material. This sub-region analysis also gives a method for assessing the spatial uniformity of the calibration data.
When calibrating an arbitrary image, the operation takes place in the LRE domain. In contrast to the calibration curve normalization that uses the reference mAs, the LRE for an arbitrary image (i.e. a prospective calibration application) is formed by normalizing either pvraw or < pvraw > by the acquisition system readout mAs defined as x before applying the natural logarithm given by: LRE = ln(pvraw/x). This normalization holds under certain conditions when the exposure response is linear. Similarly when the response is linear, two calibration points are required to calibrate an arbitrary image. These calibration points are derived from the BL curves and correspond to the theoretical pixel values in the LRE domain that would result when imaging materials that are (a) 100% glandular tissue = pvg, and (b) 100% adipose tissue = pva for a specific acquisition technique and height. For consistency with our past convention, we refer to the calibration domain as the percent glandular (PG) representation with values theoretically ranging from 0-100 PG units. This representation is analogous to a normalized x-ray attenuation coefficient representation, which is easily converted to total volume or average volumetric glandular metric by incorporating the compressed breast thickness (height) into the analysis . The calibration mapping takes this form: PGcal = M × LRE + B, where M and B are specific to a given kV, target/filter combination and height above the breast support surface; capitals are used to distinguish these parameters from the open detector exposure relationships. The LRE can be determined at the pixel level or sub-region level by using either the respective pixel value with the corresponding height or sub-region mean pixel value with corresponding mean height above the support surface.
For efficient prospective calibration applications, the BL calibration data must be stored. Therefore, we investigated two storage methods. The stored BL calibration data is then used in the specification of M and B. Both M and B are determined (fixed kV and target/filter) by considering the endpoints for a specific height t = t0. In the LRE domain, we set PGcal = 100 when LRE = pvg, PGcal = 0 when LRE = pva and solve for M and B: M = 100 × (pvg – pva)-1 and B = 50 – ½ M × (pvg + pva), giving one method for specifying M and B. In this specification approach, when t0 does not correspond exactly with a specific sample height from the BL, a cubic-spline interpolation was used to determine pvg and pva at t0. The second method for specifying M and B expresses pva and pvg as functions of the regressions parameters (μg, μa, LIg and LIa ) and t0 using Equation (1) by substituting tk with t0: for example, pvg ≈ − μg × t0 + LIg. In this case, the M and B specification and height interpolation are performed simultaneously; the validity of this approach relies on the agreement with Equation (1) and was the method developed previously for the GE unit [21, 22]. With either specification method, the B relationship can be expressed in a simpler form to include only the pva term or the pvg term, or the regression parameters from one of the calibration curves. We have included both measured terms (or all four regression parameters) to reduce variation in the event the curves or parameters carry dissimilar accuracy. We note, the 0–100 (PG units) calibration range is imposed by the development and it is not unique but follows intuition.
When applying the calibration, the large ROI within a given image is divided into 25 × 25 pixel sub-regions as above and the average of each sub-region is used in the calibration equation giving PGcal = M × <LRE(rs,<t0>)> + B, where < t0 > is the mean height above the breast support surface about rs, resulting in a spatial distribution of calibrated values. The methods described in the Calibration dataset and characterization Section indicate the calibration curves, in the most general terms, are functions of position. For this report, we used the mean values of the calibration BL data taken over all rs in the specification of M and B (both methods), removing the spatial dependency.
Calibration accuracy evaluation
To evaluate the intra-machine calibration accuracy near the BL acquisition date (for the H1, H2, and H3 units), we imaged 4 cm composite phantoms comprised of a 2 cm adipose phantom stacked upon a 2 cm glandular phantom for the majority of kV settings and target/filter combinations. For a few of the larger kV acquisitions we used the same adipose and glandular ratio to construct 6 cm phantoms to avoid detector saturation. We refer to these composite phantoms as 50/50 mixtures. We also acquired 50/50 mixture images with three mAs settings to evaluate the impact of reference mAs normalization on the calibration accuracy: 120 mAs, 160 mAs (the reference) and 200 mAs (i.e. two additional samples for comparison purposes).
For the accuracy evaluation, we used the two methods outlined above for specifying M and B to select the optimal technique and make comparisons with our previous work. This evaluation was performed in four related steps. In step 1, we used the pva and pvg determined with the BL dataset to calibrate 50/50 mixtures acquired with heights included in the BL; this should provide the best accuracy because no interpolation is required. In step 2, we calibrated the same mixtures used in step 1 with the regression parameter specification method; this does not permit a fair comparison with the first step because it includes interpolation but is required for the comparisons in the next two steps. To fully evaluate both interpolation methods, we also included additional 50/50 mixture acquisitions using the reference mAs (xr = 160 mAs) with heights set at 4.2 cm, 4.4 cm and 6.4 cm, which were not included in the BL datasets (i.e. non-BL mixtures). In step 3, we used pva and pvg derived from spline interpolation in the calibration of the non-BL mixtures, and in step 4 we used the regression parameters to calibrate the same non-BL mixtures. The comparison of step 1 with step 3 and comparison of step 2 with step 4 provides an intra-specification method evaluation by considering BL and non-BL height samples. The comparison of step 1 and 3 with step 2 and 4 provides a means for selecting the optimal interpolation method. From previous experience, we used an empirically derived tolerance of approximately ± 4 PG unit deviation from the ideal PGcal = 50 for comparing calibration accuracy. For these comparisons, we acquired additional 50/50 mixtures using both BL heights (4 and 6 cm phantom heights) and non-BL heights. To minimize serial drift influences within the BL and non-BL comparison, we acquired both phantom series on the same day.
We performed two additional experiments to assess the calibration generality and accuracy. First, to evaluate whether calibration data acquired from one FFDM unit is applicable to another similar unit, we switched the BL calibration data and used BL1 (i.e. from H1) to calibrate 50/50 mixtures (with 160 mAs) acquired from H2 and vice versa, referred to as the cross-unit calibration analysis (findings discussed with those resulting from step 1). Secondly to evaluate FOV influences, we acquired 50/50 mixtures using the small FOV and performed calibration with the BL calibration data acquired with the large FOV for the H1, H2, and H3 units. To perform the small FOV analysis, a reduced ROI was used comprised of 1000 × 1250 pixels, outlined in Figure 1.
The open detector exposure relationships (pv and exposure response) for all systems are summarized in Table 1. Example plots are shown in Figure 2 for the similar H1 and H2 units. Plots for the H3 and GE units for common filter/target combinations are shown in Figure 3. The plots in both figures are representative of the linear response relationship for the four units. The R2 estimates (Table 1) are close to unity for all of the acquisition techniques considered, indicating the relationships are well approximated as linear for all units. Despite their design similarities, the response varies beyond our tolerance (i.e. mj ± 2 × SEj) between the H1 and H2 units within kV settings. Although beyond the tolerance, the percent difference between m1 and m2 is within 3.3%-5.5%, whereas the intercepts show much larger variation. Comparing the set of m1 estimates with the set of m2 estimates (t-test) gave P > 0.96, indicating the exposure response does not differ significantly across similar systems. The pairwise responses also vary beyond the tolerance across the H3 and GE systems as expected for all observations. Although the exposure response quantities vary across all systems, the response linearity is a common characteristic across all units (H1, H2, H3, and GE). This common trait suggests the mAs sampling can be reduced to one sample for a given target/filter combination and kV setting (as evaluated below).
The effective attenuation coefficients (μl) and logarithmic intercepts (LIl) for the H1 and H2 units are shown in Table 2 separated by the BTE type and FFDM unit. We have provided the absolute value of the slope from the regression analysis, which is cited as μl, and the corresponding SEl. Example calibration curve plots for these units fitted with regression analysis are shown in Figure 4. The R2 findings indicate the linear model fits well. The agreement of respective μl pair and SE vary. For example, the μa pairwise comparison for W/Rh combinations indicates there is close agreement for the 26–29 kV as gauged by the preset tolerance (μl ± 2 × SEl) with little variation at 26 kV and a maximum 2.3% variation at 30 kV, which is beyond the tolerance. The corresponding variations across the μg pairs show greater variation for the W/Rh combinations but are within the tolerance. The W/Ag glandular and adipose coefficients follow a similar trend and are within the similarity tolerance. Comparing the set of μa estimates for H1 with the corresponding set from H2 (t-test) gave P > 0.70. Similarly, comparing the μg set between H1 and H2 gave P > 0.45. These comparisons indicate the set of effective x-ray attenuation coefficients for a given BTE material does not differ significantly across similar systems. Because of the target/filter difference, no comparisons of the H3 and GE units with the H1 and H2 units are provided. The μl, associated SEl, and LIl for the H3 and GE units are shown in Table 3 for the Mo/Mo and Mo/Rh combinations, and example calibration curve plots fitted with regression analysis are shown in Figure 5. The R2 quantities indicate linearity is a common trait across these two different units. The pairwise attenuation coefficients are within magnitude agreement as are the LIl quantities for these units but are not interchangeable or within the tolerance range when comparing the H3 and GE units. As above, comparing the μa set for H3 with the corresponding set for GE (t-test) gave P > 0.14, indicating the set of adipose x-ray attenuation coefficients is similar across systems that use different detector technologies. In contrast, the corresponding μg set comparison gave P < 0.0001, suggesting the attenuation coefficients for the glandular BTE material differ across these systems.
For the BL calibration accuracy evaluation, the spline specification method findings (step 1) are presented in this section because the M and B are specified by the calibration points at tk, which are special cases. For the most part as shown in Table 4, the within-unit accuracy for the H1 and H2 units is within ± 4 PG units of the ideal value (i.e. PGcal = 50). However, there is greater variation for W/Ag acquisitions in the larger kV settings. This may be because the H1 calibration data for these samples was acquired on a different date than the rest of the respective BL dataset. The within-unit W/Ag accuracy for the most part is similar to the intra-system accuracy, whereas the accuracy for the W/Rh shows greater variation from the ideal value. The accuracy for the examples taken with non-reference mAs settings are similar to those obtained with the 160 mAs reference, showing the validity of the LRE normalization. The cross-unit calibration findings, provided in the right side of Table 4 for H1 and H2 units, show a trend beyond our tolerance gauge of ± 4 PG. These findings suggest that the calibration data in general is specific to the unit, even though they are identical. In addition to the x-ray attenuation coefficient differences, another source of variation stems from the LIs, which may vary due to the inter-system exposure response differences (Table 1). The accuracy evaluation for H3 is shown in Table 5 using the same format (without cross-unit calibration). The Mo/Mo and Mo/Rh accuracies marginally exceed the tolerance gauge but are similar across the mAs range. Because we do not have similar experiments performed with the GE unit, direct comparisons are not possible. However, in general, the accuracies obtained with H3, as well as the H1 and H2 units, are similar to those obtained with GE previously . The accuracies shown in Tables 4 and 5 with the respective standard deviations (SDs) indicate that spatial non-uniformity has a minimal influence.
Table 6 shows the calibration generated with linear regression parameter specification method (i.e. step 2) for the H1 and H2 units. For the 160 mAs reference examples, the accuracy for 5 of the 11 acquisition techniques was outside of the ± 4 PG tolerance for the H1 unit. Similarly, the calibration was beyond the tolerance for 6 of the 11 acquisition techniques for H2. For the H3 unit, the accuracy was beyond the tolerance for all 15 acquisition techniques and exceeded +7 PG for 9 of these techniques (data not shown to limit the presentation). The accuracy for non-reference mAs examples follows a similar accuracy trend. The accuracies in Table 6 should be compared with respective findings in Table 4 (left side).
The cubic-spline height interpolation findings for the H1, H2, and H3 systems are shown in Table 7 for the non-BL evaluation (step 3). When comparing either within or across the H1 and H2 systems, the findings show that non-BL height accuracy is within the ± 4 PG tolerance for all but one acquisition technique indicating similarity across systems and the validity of the spline interpolation. The right portion of Table 7 shows the H3 evaluation for the Mo/Mo and Mo/Rh examples. Although the calibration accuracies are marginally above the tolerance for both the BL and non-BL heights, the accuracies are similar to those shown in Table 5, again demonstrating the validity of the spline interpolation technique. The regression parameter interpolation findings for the non-BL evaluation are shown in Table 8 (step 4). The accuracies for the non-BL from H1 are within the tolerance, whereas the majority of the H2 accuracies are beyond the tolerance. Although the H3 accuracy is in agreement with its related findings (Table 5), the BL accuracies are beyond the tolerance, and the non-BL calibration quantities deviate beyond the BL quantities. In summary, interpolation with the regression parameter method is inferior to the spline method when considering the H1, H2, and H3 units in combination. We note, the H3 findings for both BL and non-BL examples are consistently beyond the tolerance in contrast with H1 and H2 findings. At this time, we cannot account for this discrepancy.
In the final analysis, we assessed the potential influence of the system FOV for the H1, H2 and H3 units (cubic-spline approach). Table 9 shows the findings when applying the calibration data acquired with the large FOV to 50/50 mixtures taken with the small FOV. For comparison, 50/50 mixtures acquired with the large FOV were also calibrated; both sets of images were acquired on the same day to minimize serial drift influences. Considering the large FOV findings as the standards, the respective small FOV calibration accuracy is well within ± 4 PG tolerance, demonstrating the FOV change has little influence.
A calibration system for Hologic Selenia FFDM units was established upon our previous work [21, 22] using a different FFDM technology. The findings demonstrate the generality of our approach. There are both important similarities and differences when comparing the inter-FFDM technology calibration requirements. The mAs normalization was similar across the two technologies and is dependent in part upon the linearity of the pixel value and exposure relationship and the validity of ignoring the intercept term (i.e. assuming the relationship is proportional in addition to linear). The findings suggest that at a minimum, one reference mAs sample may be sufficient for generating calibration curves in agreement with our previous findings. We showed that the calibration data could be acquired with the large FOV only without impacting the calibration accuracy for images acquired with the small FOV. The ability to use a single reference mAs and FOV results in a substantial reduction in data collection required to establish the BL calibration datasets. Although the calibration curves were well approximated as linear for the systems evaluated in this report, we required a cubic-spline height interpolation for the H1, H2, and H3 units. This spline interpolation requirement is in contrast with our previous work, where the effective x-ray attenuation coefficients and logarithmic intercepts (i.e. regression parameters) were stored and then used for generating both the height interpolation and calibration points. Consistent with our findings from similar GE systems , each similarly-manufactured Hologic system (i.e. H1 and H2) requires its own BL calibration dataset to maintain acceptable calibration accuracy.
There are several limitations with this work. The data was collected over a period of approximately 35 days and the phantom heights were precise. In previous work , we showed that the GE unit exhibited serial drift with respect to the BL dataset and drift should be accounted for to maintain prospective calibration accuracy. Because the data in this report was collected over a relatively short time interval, serial drift influences are likely minimal. Similarly, the calibration accuracy was evaluated without height uncertainty. Therefore, the accuracies obtained in this report may be considered ideal.
Our original objective was to develop a continuous calibrated breast density measurement applicable across imaging platforms. Additionally, calibration may be useful for other than risk applications, such as estimating the BI-RADS breast composition descriptors . The BI-RADS breast composition descriptors were developed for standardized reporting purposes and synchronized with situations where mammographic sensitivity may be lower due to composition. Calibrated tissue composition measurements may be useful for both breast cancer risk applications as well as providing quantitative sensitivity measure.
This initial evaluation in combination with our previous calibration findings indicate that the same calibration approach may apply to both indirect and direct x-ray conversion technologies. Because the BL dataset requires a considerable amount of phantom imaging, it is not cost-effective to acquire serial replications of the BL dataset on a regular basis for calibration purposes. Therefore, it is imperative to evaluate the forward serial applicability or stability of the BL datasets . In addition, alternative methods of updating the BL dataset with a minimal amount of serial phantom imaging will be explored in future work. Previously, we adapted the Cumulative Sum approach to monitor the forward stability of the BL dataset . However, the serial updating remains an open-ended problem. For this report, the compressed breast thickness was not a source of uncertainty. The calibration accuracies in the work were obtained under relatively ideal conditions by design. The compression paddle on the Hologic systems in this report is spring tensioned and therefore somewhat different from the technology we evaluated previously. During actual breast imaging, the compression paddle tilts and warps, and the system compressed breast thickness readout is often nominal , which are common traits across the FFDM designs. Additional work is required to assess the influence of uncertainty in paddle height (relative to breast support surface) using deformable phantoms and generate a compressed breast thickness correction before applying calibration to actual mammograms. Although the calibration accuracies were within our preset tolerances for the most part, the viability of our technique with this particular FFDM technology will require evaluation with patient images to show that a calibrated measure of breast density is associated with breast cancer.
Index reserved for adipose breast tissue equivalent material
Intercept of the open detector exposure relationships
Calibration application additive parameter
Breast tissue equivalent
Mean exposure at given sub-region rs and baseline phantom height in cm
full field digital mammography
field of view
Index reserved for fibroglandular breast tissue equivalent material
General Electric Senographe 2000D FFDM unit
Hologic Selenia unit 1
Hologic Selenia unit 2
Hologic Selenia unit 3
Index designating a sampled height
Subscript index reserved for breast tissue equivalent material
Left cranial caudal
Natural logarithm of the relative exposure
- LREl (rs):
Natural logarithm of the relative exposure at rs as a function of increasing baseline phantom height in cm
Slope of the open detector exposure relationships
Calibration application multiplier factor
Milliampere × second
A calibrated quantity
Adipose pixel value in the LRE domain
Glandular pixel value in the LRE domain
Raw image pixel value
Coefficient of determination
Relative mean exposure at a given sub-region rs and baseline phantom height in cm
Region of interest
Baseline phantom sample height in cm
Effective x-ray attenuation coefficient in cm-1
Arbitrary mAs quantity
The reference, 160 mAs.
Boyd NF, Martin LJ, Yaffe M, Minkin S: Mammographic density. Breast Cancer Res 2009, 11(Suppl 3):S4. 10.1186/bcr2423
McCormack VA, dos Santos Silva I: Breast density and parenchymal patterns as markers of breast cancer risk: a meta-analysis. Cancer Epidemiol Biomarkers Prev 2006, 15(6):1159–1169. 10.1158/1055-9965.EPI-06-0034
Boyd NF, Martin LJ, Bronskill M, Yaffe MJ, Duric N, Minkin S: Breast tissue composition and susceptibility to breast cancer. J Natl Cancer Inst 2010, 102(16):1224–1237. 10.1093/jnci/djq239
Brower V: Breast density gains acceptance as breast cancer risk factor. J Natl Cancer Inst 2010, 102(6):374–375. 10.1093/jnci/djq080
Yaffe MJ: Mammographic density. Measurement of mammographic density. Breast Cancer Res 2008, 10(3):209. 10.1186/bcr2102
Highnam R, Brady M: Mammographic Image Analysis. Boston, MA: Kluwer Academic Publishers; 1999.
Kaufhold J, Thomas JA, Eberhard JW, Galbo CE, Trotter DE: A calibration approach to glandular tissue composition estimation in digital mammography. Med Phys 2002, 29(8):1867–1880. 10.1118/1.1493215
Malkov S, Wang J, Kerlikowske K, Cummings SR, Shepherd JA: Single x-ray absorptiometry method for the quantitative mammographic measure of fibroglandular tissue volume. Med Phys 2009, 36(12):5525–5536. 10.1118/1.3253972
Pawluczyk O, Augustine BJ, Yaffe MJ, Rico D, Yang J, Mawdsley GE, Boyd NF: A volumetric method for estimation of breast density on digitized screen-film mammograms. Med Phys 2003, 30(3):352–364. 10.1118/1.1539038
Boyd N, Martin L, Gunasekara A, Melnichouk O, Maudsley G, Peressotti C, Yaffe M, Minkin S: Mammographic density and breast cancer risk: evaluation of a novel method of measuring breast tissue volumes. Cancer Epidemiol Biomarkers Prev 2009, 18(6):1754–1762. 10.1158/1055-9965.EPI-09-0107
Ding J, Warren R, Warsi I, Day N, Thompson D, Brady M, Tromans C, Highnam R, Easton D: Evaluating the effectiveness of using standard mammogram form to predict breast cancer risk: case–control study. Cancer Epidemiol Biomarkers Prev 2008, 17(5):1074–1081. 10.1158/1055-9965.EPI-07-2634
Heine JJ, Cao K, Rollison DE: Calibrated measures for breast density estimation. Acad Radiol 2011, 18(5):547–555. 10.1016/j.acra.2010.12.007
Heine JJ, Cao K, Rollison DE, Tiffenberg G, Thomas JA: A quantitative description of the percentage of breast density measurement using full-field digital mammography. Acad Radiol 2011, 18(5):556–564. 10.1016/j.acra.2010.12.015
Heine JJ, Fowler EEE, Flowers CI: Full field digital mammography and breast density: comparison of calibrated and noncalibrated measurements. Acad Radiol 2011, 18(11):1430–1436. 10.1016/j.acra.2011.07.011
Shepherd JA, Kerlikowske K, Ma L, Duewer F, Fan B, Wang J, Malkov S, Vittinghoff E, Cummings SR: Volume of mammographic density and risk of breast cancer. Cancer Epidemiol Biomarkers Prev 2011, 20(7):1473–1482. 10.1158/1055-9965.EPI-10-1150
Mahesh M: AAPM/RSNA physics tutorial for residents: digital mammography: an overview. Radiographics 2004, 24(6):1747–1760. 10.1148/rg.246045102
Bick U, Diekmann F (Eds): Medical Radiology Diagnostic Imaging and Radiation Oncology. Berlin: Springer; 2010.
Aslund M, Cederstrom B, Lundqvist M, Danielsson M: Physical characterization of a scanning photon counting digital mammography system based on Si-strip detectors. Med Phys 2007, 34(6):1918–1925. 10.1118/1.2731032
Heine JJ, Behera M: Effective x-ray attenuation measurements with full field digital mammography. Med Phys 2006, 33(11):4350–4366. 10.1118/1.2356648
Heine JJ, Cao K, Beam C: Cumulative Sum quality control for calibrated breast density measurements. Med Phys 2009, 36(12):5380–5390. 10.1118/1.3250842
Heine JJ, Cao K, Thomas JA: Effective radiation attenuation calibration for breast density: compression thickness influences and correction. BioMed Eng OnLine 2010, 9: 73. 10.1186/1475-925X-9-73
Heine JJ, Thomas JA: Effective x-ray attenuation coefficient measurements from two full field digital mammography systems for data calibration applications. Biomed Eng Online 2008, 7: 13. 10.1186/1475-925X-7-13
D’Orsi CJ, Bassett LW, Berg WA, et al.: Breast Imaging Reporting and Data System: ACR BI-RADS. 4th edition. Reston, VA: American College of Radiology; 2003.
This work was supported by the Bankhead-Coley Cancer Research Program Grant #3BB04-51005, and the National Institutes of Health grants #R01CA166269 and #R01CA114491.
The authors declare that they have no competing interests.
The database was constructed by EF under the supervision of JH. JH, EF and BL developed the manuscript content. EF, JH and BL performed the data analysis. All authors contributed to manuscript composition. All authors read and approved the final manuscript.
About this article
Cite this article
Fowler, E.E., Lu, B. & Heine, J.J. A comparison of calibration data from full field digital mammography units for breast density measurements. BioMed Eng OnLine 12, 114 (2013). https://doi.org/10.1186/1475-925X-12-114
- Breast density
- Phantom imaging
- Direct x-ray conversion
- Full field digital mammography