A comparison of calibration data from full field digital mammography units for breast density measurements

Fowler, Erin EE; Lu, Beibei; Heine, John J

doi:10.1186/1475-925X-12-114

Research
Open access
Published: 09 November 2013

A comparison of calibration data from full field digital mammography units for breast density measurements

Erin EE Fowler¹,
Beibei Lu¹ &
John J Heine¹

BioMedical Engineering OnLine volume 12, Article number: 114 (2013) Cite this article

3389 Accesses
6 Citations
Metrics details

Abstract

Background

Breast density is a significant breast cancer risk factor measured from mammograms. The most appropriate method for measuring breast density for risk applications is still under investigation. Calibration standardizes mammograms to account for acquisition technique differences prior to making breast density measurements. We evaluated whether a calibration methodology developed for an indirect x-ray conversion full field digital mammography (FFDM) technology applies to direct x-ray conversion FFDM systems.

Methods

Breast tissue equivalent (BTE) phantom images were used to establish calibration datasets for three similar direct x-ray conversion FFDM systems. The calibration dataset for each unit is a function of the target/filter combination, x-ray tube voltage, current × time (mAs), phantom height, and two detector fields of view (FOVs). Methods were investigated to reduce the amount of calibration data by restricting the height, mAs, and FOV sampling. Calibration accuracy was evaluated with mixture phantoms. We also compared both intra- and inter-system calibration characteristics and accuracy.

Results

Calibration methods developed previously apply to direct x-ray conversion systems with modification. Calibration accuracy was largely within the acceptable range of ± 4 standardized units from the ideal value over the entire acquisition parameter space for the direct conversion units. Acceptable calibration accuracy was maintained with a cubic-spline height interpolation, representing a modification to previous work. Calibration data is unit specific, can be acquired with the large FOV, and requires a minimum of one reference mAs sample. The mAs sampling, calibration accuracy, and the necessity for machine specific calibration data are common characteristics and in agreement with our previous work.

Conclusion

The generality of our calibration approach was established under ideal conditions. Evaluation with patient data using breast cancer status as the endpoint is required to demonstrate that the approach produces a breast density measure associated with breast cancer.

Introduction

Mammographic breast density is a significant breast cancer risk factor [1–3]. Although used extensively in research, breast density is not generally used in the clinical environment for breast cancer risk applications [4] due in large part to the lack of an automated measurement. There are various methods under evaluation for estimating breast density from either raw or calibrated mammograms [5]. A large portion of breast density research was derived without calibration [1, 2], as calibration is a more recent development for mammography.

Ideally, calibration adjusts for inter-patient x-ray image acquisition technique differences to produce some form of standardized data representation [6–9]. Calibration research is still in its early stage of development and there are few published reports evaluating its potential application relative to the volume of published breast density research using raw mammograms. The findings from calibration research have been mixed in identifying a measure that strengthens the associations with breast cancer in comparison with the operator-assisted percentage of breast density measure [10–15]. Due to its stage of development, it may be premature to conclude whether calibration is generally a useful technique for risk assessments. However, one benefit of establishing a calibration method is that it permits automated breast density measurements. We have posited that calibration may be an important step for automation.

Full field digital mammography (FFDM) detector technologies can be broadly categorized as either indirect or direct x-ray conversion systems [16]. Although these designs have many characteristics that vary, until recently both technologies produced an energy weighted integrated signal at the pixel level [17]. More recently, another type of direct x-ray conversion technology was approved for clinical use in the US that uses photon counting detection technology [18], which, in contrast to the established FFDM designs, does not produce an integrated weighted signal. Currently, it is not known if calibration will produce equivalent findings across these varying FFDM platforms.

We applied a calibration methodology developed previously for a General Electric Senographe 2000D FFDM system [19–22], which is an indirect x-ray conversion technology. Our findings based on images taken from this technology [12–14] suggest that calibrated breast density measurements are strong indicators of risk, providing justification to investigate the merits of calibration in more detail. As many characteristics vary between the direct and indirect x-ray conversion systems, the applicability of our calibration methodology has yet to be established for direct x-ray conversion FFDM systems.

In this current report, we expand our understanding of calibration gained previously [21, 22] and establish a calibration system for a direct x-ray conversion FFDM design using phantom images acquired from three Hologic Selenia FFDM units, as the primary analysis. We considered several design objectives. One objective is to minimize the amount of calibration data collection while maintaining acceptable calibration accuracy, representing an important compromise. Although optimal, it is nearly impossible to sample all acquisition technique combinations to construct the calibration curves. Therefore, some form of sampling scheme and interpolation methodology must be established to minimize effort while maintaining acceptance accuracy. It is reasonable to assume that if calibration requires excessive phantom imaging effort or is difficult to apply across imaging platforms without considerable modification, it may not be used beyond research. Another objective is to evaluate whether calibration data collected from one FFDM unit can be applied to another similarly manufactured unit, with or without modification, as inter-unit generalization for a given technology is an important step for universal application. As a secondary objective, we also compared calibration and detector response data obtained from the Hologic units investigated in this report with those previously acquired from the General Electric FFDM unit when applicable to assess inter-technology similarities.

Methods

We acquired calibration and exposure response data from three Hologic Selenia FFDM units to evaluate the generality of our approach. Calibration curves were generated by imaging standard breast tissue equivalent (BTE) phantoms (CIRS, Norfolk, VA) described previously [22]. Our BTE phantom set includes 100% fibroglandular (glandular) and 100% adipose BTE materials that are of 1 mm, 2 mm, 1 cm, and 2 cm thicknesses (i.e. precise heights) and 18 cm × 24 cm in area dimension. These phantoms were combined (stacked) to produce desired composite proportions at a given total thickness (height). For example, combining a 2 cm thickness glandular phantom with a 2 cm thickness adipose phantom gives a 50% glandular composition with a total height of 4 cm. Calibration curves are functions of the compressed breast thickness above the breast support surface, referenced as height, and several other acquisition technique parameters, including target/filter combination, x-ray tube voltage (kV), current × time (mAs), and detector field of view (FOV), representing a five dimensional parameter space. As previously, we refer to the initial data collection as the baseline (BL) calibration dataset. A BL dataset was established for each unit.

The three Selenia systems evaluated in this report are located within the breast clinics at the Moffitt Cancer Center and are used for both screening and diagnostic purposes. Two of these systems, referred to as the H₁ and H₂, have a tungsten (W) target with rhodium (Rh) and silver (Ag) filter options. The third unit has a molybdenum (Mo) target with Mo and Rh filter options and is referred to as H₃. The Selenia detector has 70 micron pitch (pixel spatial resolution), and the raw data used for this work has 14 bit per pixel dynamic range. Two detector FOVs are used for screening mammograms on these units depending upon the choice of compression paddle: 24 cm × 29 cm (large) and 18 cm × 24 cm (small). The General Electric Senographe 2000D FFDM unit is referred to as GE in the report. This unit has a Mo target with Mo and Rh filter options, and a Rh target with a Rh filter. The GE detector has 100 micron spatial resolution, a 19.2 cm × 23 cm detector FOV (i.e. 1914 × 2294 pixels) and 14 bit dynamic range per pixel for the raw data used in this work. As a standard convention, we acquired all phantom images as left cranial caudal (LCC) views. In the LCC view, the detector left border in the vertical direction is parallel with the chest wall position as observed in a displayed image.

The aims of this study were to assess the pixel value – detector exposure (detector response) relationship without attenuation, generate and assess the calibration curves for linearity, and evaluate the calibration accuracy. To minimize the BL data collection, we evaluated the calibration accuracy under these conditions: (a) when applying interpolation for the height variable; (b) when applying a data reduction step to reduce mAs sampling; and (c) as a function of FOV. To evaluate the FOV impact, we acquired the calibration datasets with the large detector FOV only. The validity of collecting calibration data with the large FOV only was evaluated by examining calibration accuracy for images acquired with the small FOV. We made direct comparisons between H₁ and H₂ because of their target/filter and manufacturing similarity, and evaluated whether calibration data collected with one unit is valid when applied to another similarly manufactured unit. Likewise, we made direct comparisons between the H₃ and GE units for the Mo/Mo and Mo/Rh combinations, when applicable.

The analysis was restricted to specific regions depending on the FFDM design and specific analysis endpoint. For the H₁ H₂ and H₃ units, unless stated otherwise, the analysis was constrained to a large region of interest (ROI) specific to the large FOV. This ROI is defined as 2000 × 2500 pixels (14 cm × 17.5 cm), centered in the vertical direction with an horizontal offset of 75 pixels (not included) from the outside of the detector (i.e. parallel to the chest wall) or left border (LCC view). This restriction is to avoid stacked-phantom edge effects near the detector outer edge and possible flat field non-uniformity interference at regions far (interior) from the central detector area. For the FOV analysis and for images taken with the GE unit, the analysis was constrained to 1000 × 1250 pixel ROI with a 75 pixel offset (as above). The ROIs relative to the Hologic detector and the BTE phantom area are shown in Figure 1.

Exposure response evaluation

We assessed the detector exposure and pixel value (pv) response relationships for the H₁, H₂, and H₃ units for select kV settings for each target/filter combination, using the large FOV. The raw image pixel value (pv_raw) response was modeled as a linear function of mAs by acquiring images without attenuation (i.e. open exposures of the detector). The mAs variable was sampled up to the point of detector saturation. The sample sets for each kV setting were analyzed with regression analysis and fitted to this form: <pv_raw > = m × x + b, where x is the system readout mAs quantity for each acquisition. The slope (m)_, intercept (b), coefficient of determination (R²), and standard error (SE) in the slope were used for evaluation purposes. The brackets indicate the mean pv_raw within the large ROI. We make the approximation that the system readout mAs value is a surrogate (proportional with) for the x-ray exposure at the detector, which is common practice. We made both intra-technology comparisons and comparisons with the GE exposure response, where applicable. Because H₁ and H₂ have the same target/filter combinations and H₃ and GE have common combinations, the respective pairwise comparisons were included in the analysis. When making pairwise inter-unit slope comparisons for given kV, an important difference is defined as when the central value of m_i falls outside of this tolerance range: m_j ± 2 × SE_j or vice versa, where the index = 0 is reserved for the GE unit. Where appropriate, we compared the entire set of m_j across units with a t-test. Demonstrating that the response is linear has important implications in the BL calibration data collection requirements. When the linear approximation holds, the mAs sampling may be reduced to one sample in the BL dataset.

Calibration dataset and characterization

The phantom imaging techniques and methods for constructing the BL calibration datasets (i.e. calibration curves) were described previously [21, 22]. The same approach was applied in this report with some modification. Briefly, to construct the calibration curves for a given acquisition technique, two series of BTE phantoms were imaged to generate the respective glandular and adipose calibration curves for BL sampled heights defined as t_k. Reference points derived from theses curves are used in the calibration application (discussed below). The phantom heights (total stacked heights) for a given calibration curve range from 2-7 cm depending on the acquisition technique, and were taken at 1 cm increments for convenience. To estimate the kV range, we selected the automated exposure control (auto-kV mode) and adjusted the compression paddle over a range of heights for fixed target/filter combinations. We estimated the W/Rh range is between 26-30 kV, and the W/Ag range is between 27-32 kV for the H₁ and H₂ systems. The same procedure was followed for Mo/Mo and Mo/Rh techniques for the H₃ system giving 25-31 kV and 27-34 kV ranges, respectively. BL calibration datasets (H₁, H₂, H₃ and GE units) were acquired with the same reference mAs setting defined as: x_r = 160 mAs. We selected a reference mAs value that does not cause detector saturation when imaging phantom configurations with smaller heights, in particular adipose phantoms, while providing sufficient signal when imaging phantoms with larger heights, in particular glandular phantoms, over the entire acquisition technique range considered, as discussed previously [22].

For both comparison and presentation purposes, we evaluated the calibration curves using linear regression methods without regard to calibration accuracy. We subdivided the large ROI (2000 × 2500 pixel region shown in Figure 1) into a grid consisting of 25 × 25 pixel smaller non-overlapping sub-regions defined as r_s. This gives 80 × 96 = 7680 r_s sub-regions (for the large FOV). As above, t_k is the BL phantom height in cm with the index k designating a sampled height. For a given phantom configuration (fixed height and BTE type), we average the pixel values (i.e. <pv_raw>) within r_s giving the mean exposure, E_l(r_s), at r_s and t_k. For this report, the index, l, is reserved for the BTE type designation: l = a for adipose; and l = g for glandular. We divide E_l(r_s) by the reference mAs giving the relative mean exposure, RE_l(r_s) = E_l(r_s) / x_r (i.e. the reference x_r = 160 mAs) at each subdivision. We evaluate the natural logarithm of the relative mean exposure, LRE_l(r_s) = ln[RE_l(r_s)], as a function of increasing t_k giving a regional calibration curve; for reference, this defines logarithm of the relative exposure (LRE) domain, which holds at the pixel level as well. For inter-unit comparisons, we applied linear regression at each r_s for each BTE type resulting in a distribution for the slopes (μ_l), logarithmic intercepts (LI_l), and R² values estimated by fitting the ordered pairs [t_k, LRE_l (r_s)] to this model

LR E_{l} = μ_{l} \times t_{k} + L I_{l} .

(1)

When fitted to this form (t_k+1 > t_k), the magnitude of the slope can be interpreted as the effective x-ray attenuation coefficient (i.e. μ_g for glandular and μ_a for adipose tissue, cited as positive quantities in the tables and expressions) measured in cm^-1 for a given kV and target/filter combination. The LI_l quantities are the respective intercepts, which are unit-less. We summarized these regression parameter distributions with the mean and mean standard error (SE). As above, we use the μ_l ± 2 × SE_l tolerance gauge for the inter-system pairwise comparisons. Where appropriate, we compared the entire set of effective x-ray attenuation coefficients across systems with a t-test for each BTE material. This sub-region analysis also gives a method for assessing the spatial uniformity of the calibration data.

Calibration procedure

When calibrating an arbitrary image, the operation takes place in the LRE domain. In contrast to the calibration curve normalization that uses the reference mAs, the LRE for an arbitrary image (i.e. a prospective calibration application) is formed by normalizing either pv_raw or < pv_raw > by the acquisition system readout mAs defined as x before applying the natural logarithm given by: LRE = ln(pv_raw/x). This normalization holds under certain conditions when the exposure response is linear. Similarly when the response is linear, two calibration points are required to calibrate an arbitrary image. These calibration points are derived from the BL curves and correspond to the theoretical pixel values in the LRE domain that would result when imaging materials that are (a) 100% glandular tissue = pv_g, and (b) 100% adipose tissue = pv_a for a specific acquisition technique and height. For consistency with our past convention, we refer to the calibration domain as the percent glandular (PG) representation with values theoretically ranging from 0-100 PG units. This representation is analogous to a normalized x-ray attenuation coefficient representation, which is easily converted to total volume or average volumetric glandular metric by incorporating the compressed breast thickness (height) into the analysis [21]. The calibration mapping takes this form: PG_cal = M × LRE + B, where M and B are specific to a given kV, target/filter combination and height above the breast support surface; capitals are used to distinguish these parameters from the open detector exposure relationships. The LRE can be determined at the pixel level or sub-region level by using either the respective pixel value with the corresponding height or sub-region mean pixel value with corresponding mean height above the support surface.

For efficient prospective calibration applications, the BL calibration data must be stored. Therefore, we investigated two storage methods. The stored BL calibration data is then used in the specification of M and B. Both M and B are determined (fixed kV and target/filter) by considering the endpoints for a specific height t = t₀. In the LRE domain, we set PG_cal = 100 when LRE = pv_g, PG_cal = 0 when LRE = pv_a and solve for M and B: M = 100 × (pv_g – pv_a)^-1 and B = 50 – ½ M × (pv_g + pv_a), giving one method for specifying M and B. In this specification approach, when t₀ does not correspond exactly with a specific sample height from the BL, a cubic-spline interpolation was used to determine pv_g and pv_a at t₀. The second method for specifying M and B expresses pv_a and pv_g as functions of the regressions parameters (μ_g, μ_a, LI_g and LI_a ) and t₀ using Equation (1) by substituting t_k with t₀: for example, pv_g ≈ − μ_g × t₀ + LI_g. In this case, the M and B specification and height interpolation are performed simultaneously; the validity of this approach relies on the agreement with Equation (1) and was the method developed previously for the GE unit [21, 22]. With either specification method, the B relationship can be expressed in a simpler form to include only the pv_a term or the pv_g term, or the regression parameters from one of the calibration curves. We have included both measured terms (or all four regression parameters) to reduce variation in the event the curves or parameters carry dissimilar accuracy. We note, the 0–100 (PG units) calibration range is imposed by the development and it is not unique but follows intuition.

When applying the calibration, the large ROI within a given image is divided into 25 × 25 pixel sub-regions as above and the average of each sub-region is used in the calibration equation giving PG_cal = M × <LRE(r_s,<t₀>)> + B, where < t₀ > is the mean height above the breast support surface about r_s, resulting in a spatial distribution of calibrated values. The methods described in the Calibration dataset and characterization Section indicate the calibration curves, in the most general terms, are functions of position. For this report, we used the mean values of the calibration BL data taken over all r_s in the specification of M and B (both methods), removing the spatial dependency.

Calibration accuracy evaluation

To evaluate the intra-machine calibration accuracy near the BL acquisition date (for the H₁, H₂, and H₃ units), we imaged 4 cm composite phantoms comprised of a 2 cm adipose phantom stacked upon a 2 cm glandular phantom for the majority of kV settings and target/filter combinations. For a few of the larger kV acquisitions we used the same adipose and glandular ratio to construct 6 cm phantoms to avoid detector saturation. We refer to these composite phantoms as 50/50 mixtures. We also acquired 50/50 mixture images with three mAs settings to evaluate the impact of reference mAs normalization on the calibration accuracy: 120 mAs, 160 mAs (the reference) and 200 mAs (i.e. two additional samples for comparison purposes).

For the accuracy evaluation, we used the two methods outlined above for specifying M and B to select the optimal technique and make comparisons with our previous work. This evaluation was performed in four related steps. In step 1, we used the pv_a and pv_g determined with the BL dataset to calibrate 50/50 mixtures acquired with heights included in the BL; this should provide the best accuracy because no interpolation is required. In step 2, we calibrated the same mixtures used in step 1 with the regression parameter specification method; this does not permit a fair comparison with the first step because it includes interpolation but is required for the comparisons in the next two steps. To fully evaluate both interpolation methods, we also included additional 50/50 mixture acquisitions using the reference mAs (x_r = 160 mAs) with heights set at 4.2 cm, 4.4 cm and 6.4 cm, which were not included in the BL datasets (i.e. non-BL mixtures). In step 3, we used pv_a and pv_g derived from spline interpolation in the calibration of the non-BL mixtures, and in step 4 we used the regression parameters to calibrate the same non-BL mixtures. The comparison of step 1 with step 3 and comparison of step 2 with step 4 provides an intra-specification method evaluation by considering BL and non-BL height samples. The comparison of step 1 and 3 with step 2 and 4 provides a means for selecting the optimal interpolation method. From previous experience, we used an empirically derived tolerance of approximately ± 4 PG unit deviation from the ideal PG_cal = 50 for comparing calibration accuracy. For these comparisons, we acquired additional 50/50 mixtures using both BL heights (4 and 6 cm phantom heights) and non-BL heights. To minimize serial drift influences within the BL and non-BL comparison, we acquired both phantom series on the same day.

We performed two additional experiments to assess the calibration generality and accuracy. First, to evaluate whether calibration data acquired from one FFDM unit is applicable to another similar unit, we switched the BL calibration data and used BL₁ (i.e. from H₁) to calibrate 50/50 mixtures (with 160 mAs) acquired from H₂ and vice versa, referred to as the cross-unit calibration analysis (findings discussed with those resulting from step 1). Secondly to evaluate FOV influences, we acquired 50/50 mixtures using the small FOV and performed calibration with the BL calibration data acquired with the large FOV for the H₁, H₂, and H₃ units. To perform the small FOV analysis, a reduced ROI was used comprised of 1000 × 1250 pixels, outlined in Figure 1.

Results

Exposure response

The open detector exposure relationships (pv and exposure response) for all systems are summarized in Table 1. Example plots are shown in Figure 2 for the similar H₁ and H₂ units. Plots for the H₃ and GE units for common filter/target combinations are shown in Figure 3. The plots in both figures are representative of the linear response relationship for the four units. The R² estimates (Table 1) are close to unity for all of the acquisition techniques considered, indicating the relationships are well approximated as linear for all units. Despite their design similarities, the response varies beyond our tolerance (i.e. m_j ± 2 × SE_j) between the H₁ and H₂ units within kV settings. Although beyond the tolerance, the percent difference between m₁ and m₂ is within 3.3%-5.5%, whereas the intercepts show much larger variation. Comparing the set of m₁ estimates with the set of m₂ estimates (t-test) gave P > 0.96, indicating the exposure response does not differ significantly across similar systems. The pairwise responses also vary beyond the tolerance across the H₃ and GE systems as expected for all observations. Although the exposure response quantities vary across all systems, the response linearity is a common characteristic across all units (H₁, H₂, H₃, and GE). This common trait suggests the mAs sampling can be reduced to one sample for a given target/filter combination and kV setting (as evaluated below).

Table 1 Exposure and pixel value response analysis by target/filter and select kV combination

Full size table

Calibration datasets

The effective attenuation coefficients (μ_l) and logarithmic intercepts (LI_l) for the H₁ and H₂ units are shown in Table 2 separated by the BTE type and FFDM unit. We have provided the absolute value of the slope from the regression analysis_, which is cited as μ_l, and the corresponding SE_l. Example calibration curve plots for these units fitted with regression analysis are shown in Figure 4. The R² findings indicate the linear model fits well. The agreement of respective μ_l pair and SE vary. For example, the μ_a pairwise comparison for W/Rh combinations indicates there is close agreement for the 26–29 kV as gauged by the preset tolerance (μ_l ± 2 × SE_l) with little variation at 26 kV and a maximum 2.3% variation at 30 kV, which is beyond the tolerance. The corresponding variations across the μ_g pairs show greater variation for the W/Rh combinations but are within the tolerance. The W/Ag glandular and adipose coefficients follow a similar trend and are within the similarity tolerance. Comparing the set of μ_a estimates for H₁ with the corresponding set from H₂ (t-test) gave P > 0.70. Similarly, comparing the μ_g set between H₁ and H₂ gave P > 0.45. These comparisons indicate the set of effective x-ray attenuation coefficients for a given BTE material does not differ significantly across similar systems. Because of the target/filter difference, no comparisons of the H₃ and GE units with the H₁ and H₂ units are provided. The μ_l, associated SE_l, and LI_l for the H₃ and GE units are shown in Table 3 for the Mo/Mo and Mo/Rh combinations, and example calibration curve plots fitted with regression analysis are shown in Figure 5. The R² quantities indicate linearity is a common trait across these two different units. The pairwise attenuation coefficients are within magnitude agreement as are the LI_l quantities for these units but are not interchangeable or within the tolerance range when comparing the H₃ and GE units. As above, comparing the μ_a set for H₃ with the corresponding set for GE (t-test) gave P > 0.14, indicating the set of adipose x-ray attenuation coefficients is similar across systems that use different detector technologies. In contrast, the corresponding μ_g set comparison gave P < 0.0001, suggesting the attenuation coefficients for the glandular BTE material differ across these systems.

Table 2 Baseline (BL) calibration dataset summary for the Hologic Selenia (H ₁ and H ₂ ) units

Full size table

Table 3 Baseline (BL) calibration data summary for two different FFDM technologies

Full size table

Calibration accuracy

For the BL calibration accuracy evaluation, the spline specification method findings (step 1) are presented in this section because the M and B are specified by the calibration points at t_k, which are special cases. For the most part as shown in Table 4, the within-unit accuracy for the H₁ and H₂ units is within ± 4 PG units of the ideal value (i.e. PG_cal = 50). However, there is greater variation for W/Ag acquisitions in the larger kV settings. This may be because the H₁ calibration data for these samples was acquired on a different date than the rest of the respective BL dataset. The within-unit W/Ag accuracy for the most part is similar to the intra-system accuracy, whereas the accuracy for the W/Rh shows greater variation from the ideal value. The accuracy for the examples taken with non-reference mAs settings are similar to those obtained with the 160 mAs reference, showing the validity of the LRE normalization. The cross-unit calibration findings, provided in the right side of Table 4 for H₁ and H₂ units, show a trend beyond our tolerance gauge of ± 4 PG. These findings suggest that the calibration data in general is specific to the unit, even though they are identical. In addition to the x-ray attenuation coefficient differences, another source of variation stems from the LIs, which may vary due to the inter-system exposure response differences (Table 1). The accuracy evaluation for H₃ is shown in Table 5 using the same format (without cross-unit calibration). The Mo/Mo and Mo/Rh accuracies marginally exceed the tolerance gauge but are similar across the mAs range. Because we do not have similar experiments performed with the GE unit, direct comparisons are not possible. However, in general, the accuracies obtained with H₃, as well as the H₁ and H₂ units, are similar to those obtained with GE previously [22]. The accuracies shown in Tables 4 and 5 with the respective standard deviations (SDs) indicate that spatial non-uniformity has a minimal influence.

Table 4 Calibration accuracy for the H ₁ and H ₂ units

Full size table

Table 5 Within-unit calibration accuracy for the H ₃

Full size table

Table 6 shows the calibration generated with linear regression parameter specification method (i.e. step 2) for the H₁ and H₂ units. For the 160 mAs reference examples, the accuracy for 5 of the 11 acquisition techniques was outside of the ± 4 PG tolerance for the H₁ unit. Similarly, the calibration was beyond the tolerance for 6 of the 11 acquisition techniques for H₂. For the H₃ unit, the accuracy was beyond the tolerance for all 15 acquisition techniques and exceeded +7 PG for 9 of these techniques (data not shown to limit the presentation). The accuracy for non-reference mAs examples follows a similar accuracy trend. The accuracies in Table 6 should be compared with respective findings in Table 4 (left side).

Table 6 Calibration accuracy for the H ₁ and H ₂ units using the regression parameters

Full size table

The cubic-spline height interpolation findings for the H₁, H₂, and H₃ systems are shown in Table 7 for the non-BL evaluation (step 3). When comparing either within or across the H₁ and H₂ systems, the findings show that non-BL height accuracy is within the ± 4 PG tolerance for all but one acquisition technique indicating similarity across systems and the validity of the spline interpolation. The right portion of Table 7 shows the H₃ evaluation for the Mo/Mo and Mo/Rh examples. Although the calibration accuracies are marginally above the tolerance for both the BL and non-BL heights, the accuracies are similar to those shown in Table 5, again demonstrating the validity of the spline interpolation technique. The regression parameter interpolation findings for the non-BL evaluation are shown in Table 8 (step 4). The accuracies for the non-BL from H₁ are within the tolerance, whereas the majority of the H₂ accuracies are beyond the tolerance. Although the H₃ accuracy is in agreement with its related findings (Table 5), the BL accuracies are beyond the tolerance, and the non-BL calibration quantities deviate beyond the BL quantities. In summary, interpolation with the regression parameter method is inferior to the spline method when considering the H₁, H₂, and H₃ units in combination. We note, the H₃ findings for both BL and non-BL examples are consistently beyond the tolerance in contrast with H₁ and H₂ findings. At this time, we cannot account for this discrepancy.

Table 7 Calibration accuracy using the cubic-spline height interpolation

Full size table

Table 8 Calibration Accuracy using the regression parameters

Full size table

In the final analysis, we assessed the potential influence of the system FOV for the H₁, H₂ and H₃ units (cubic-spline approach). Table 9 shows the findings when applying the calibration data acquired with the large FOV to 50/50 mixtures taken with the small FOV. For comparison, 50/50 mixtures acquired with the large FOV were also calibrated; both sets of images were acquired on the same day to minimize serial drift influences. Considering the large FOV findings as the standards, the respective small FOV calibration accuracy is well within ± 4 PG tolerance, demonstrating the FOV change has little influence.

Table 9 Calibration accuracy for images acquired with the small FOV calibrated with data acquired with the large FOV

Full size table

Discussion

A calibration system for Hologic Selenia FFDM units was established upon our previous work [21, 22] using a different FFDM technology. The findings demonstrate the generality of our approach. There are both important similarities and differences when comparing the inter-FFDM technology calibration requirements. The mAs normalization was similar across the two technologies and is dependent in part upon the linearity of the pixel value and exposure relationship and the validity of ignoring the intercept term (i.e. assuming the relationship is proportional in addition to linear). The findings suggest that at a minimum, one reference mAs sample may be sufficient for generating calibration curves in agreement with our previous findings. We showed that the calibration data could be acquired with the large FOV only without impacting the calibration accuracy for images acquired with the small FOV. The ability to use a single reference mAs and FOV results in a substantial reduction in data collection required to establish the BL calibration datasets. Although the calibration curves were well approximated as linear for the systems evaluated in this report, we required a cubic-spline height interpolation for the H₁, H₂, and H₃ units. This spline interpolation requirement is in contrast with our previous work, where the effective x-ray attenuation coefficients and logarithmic intercepts (i.e. regression parameters) were stored and then used for generating both the height interpolation and calibration points. Consistent with our findings from similar GE systems [22], each similarly-manufactured Hologic system (i.e. H₁ and H₂) requires its own BL calibration dataset to maintain acceptable calibration accuracy.

There are several limitations with this work. The data was collected over a period of approximately 35 days and the phantom heights were precise. In previous work [20], we showed that the GE unit exhibited serial drift with respect to the BL dataset and drift should be accounted for to maintain prospective calibration accuracy. Because the data in this report was collected over a relatively short time interval, serial drift influences are likely minimal. Similarly, the calibration accuracy was evaluated without height uncertainty. Therefore, the accuracies obtained in this report may be considered ideal.

Our original objective was to develop a continuous calibrated breast density measurement applicable across imaging platforms. Additionally, calibration may be useful for other than risk applications, such as estimating the BI-RADS breast composition descriptors [23]. The BI-RADS breast composition descriptors were developed for standardized reporting purposes and synchronized with situations where mammographic sensitivity may be lower due to composition. Calibrated tissue composition measurements may be useful for both breast cancer risk applications as well as providing quantitative sensitivity measure.

Conclusion

This initial evaluation in combination with our previous calibration findings indicate that the same calibration approach may apply to both indirect and direct x-ray conversion technologies. Because the BL dataset requires a considerable amount of phantom imaging, it is not cost-effective to acquire serial replications of the BL dataset on a regular basis for calibration purposes. Therefore, it is imperative to evaluate the forward serial applicability or stability of the BL datasets [20]. In addition, alternative methods of updating the BL dataset with a minimal amount of serial phantom imaging will be explored in future work. Previously, we adapted the Cumulative Sum approach to monitor the forward stability of the BL dataset [20]. However, the serial updating remains an open-ended problem. For this report, the compressed breast thickness was not a source of uncertainty. The calibration accuracies in the work were obtained under relatively ideal conditions by design. The compression paddle on the Hologic systems in this report is spring tensioned and therefore somewhat different from the technology we evaluated previously. During actual breast imaging, the compression paddle tilts and warps, and the system compressed breast thickness readout is often nominal [21], which are common traits across the FFDM designs. Additional work is required to assess the influence of uncertainty in paddle height (relative to breast support surface) using deformable phantoms and generate a compressed breast thickness correction before applying calibration to actual mammograms. Although the calibration accuracies were within our preset tolerances for the most part, the viability of our technique with this particular FFDM technology will require evaluation with patient images to show that a calibrated measure of breast density is associated with breast cancer.

Abbreviations

a:: Index reserved for adipose breast tissue equivalent material
Ag:: Silver
b:: Intercept of the open detector exposure relationships
B:: Calibration application additive parameter
BL:: Baseline
BTE:: Breast tissue equivalent
El(rs):: Mean exposure at given sub-region r_s and baseline phantom height in cm
FFDM:: full field digital mammography
FOV:: field of view
g:: Index reserved for fibroglandular breast tissue equivalent material
GE:: General Electric Senographe 2000D FFDM unit
Glandular:: Fibroglandular
H1:: Hologic Selenia unit 1
H2:: Hologic Selenia unit 2
H3:: Hologic Selenia unit 3
k:: Index designating a sampled height
l:: Subscript index reserved for breast tissue equivalent material
LCC:: Left cranial caudal
LIl:: Logarithmic intercept
LRE:: Natural logarithm of the relative exposure
LREl (rs):: Natural logarithm of the relative exposure at r_s as a function of increasing baseline phantom height in cm
m:: Slope of the open detector exposure relationships
M:: Calibration application multiplier factor
mAs:: Milliampere × second
Mo:: Molybdenum
PG:: Percent glandular
PGcal:: A calibrated quantity
pva:: Adipose pixel value in the LRE domain
pvg:: Glandular pixel value in the LRE domain
pvraw:: Raw image pixel value
rs:: Sub-regions
R2:: Coefficient of determination
REl(rs):: Relative mean exposure at a given sub-region r_s and baseline phantom height in cm
Rh:: Rhodium
ROI:: Region of interest
SE:: Standard error
SD:: Standard deviation
tk:: Baseline phantom sample height in cm
μl:: Effective x-ray attenuation coefficient in cm^-1
W:: Tungsten
x:: Arbitrary mAs quantity
xr:: The reference, 160 mAs.

References

Boyd NF, Martin LJ, Yaffe M, Minkin S: Mammographic density. Breast Cancer Res 2009, 11(Suppl 3):S4. 10.1186/bcr2423
Article Google Scholar
McCormack VA, dos Santos Silva I: Breast density and parenchymal patterns as markers of breast cancer risk: a meta-analysis. Cancer Epidemiol Biomarkers Prev 2006, 15(6):1159–1169. 10.1158/1055-9965.EPI-06-0034
Article Google Scholar
Boyd NF, Martin LJ, Bronskill M, Yaffe MJ, Duric N, Minkin S: Breast tissue composition and susceptibility to breast cancer. J Natl Cancer Inst 2010, 102(16):1224–1237. 10.1093/jnci/djq239
Article Google Scholar
Brower V: Breast density gains acceptance as breast cancer risk factor. J Natl Cancer Inst 2010, 102(6):374–375. 10.1093/jnci/djq080
Article Google Scholar
Yaffe MJ: Mammographic density. Measurement of mammographic density. Breast Cancer Res 2008, 10(3):209. 10.1186/bcr2102
Article Google Scholar
Highnam R, Brady M: Mammographic Image Analysis. Boston, MA: Kluwer Academic Publishers; 1999.
Book MATH Google Scholar
Kaufhold J, Thomas JA, Eberhard JW, Galbo CE, Trotter DE: A calibration approach to glandular tissue composition estimation in digital mammography. Med Phys 2002, 29(8):1867–1880. 10.1118/1.1493215
Article Google Scholar
Malkov S, Wang J, Kerlikowske K, Cummings SR, Shepherd JA: Single x-ray absorptiometry method for the quantitative mammographic measure of fibroglandular tissue volume. Med Phys 2009, 36(12):5525–5536. 10.1118/1.3253972
Article Google Scholar
Pawluczyk O, Augustine BJ, Yaffe MJ, Rico D, Yang J, Mawdsley GE, Boyd NF: A volumetric method for estimation of breast density on digitized screen-film mammograms. Med Phys 2003, 30(3):352–364. 10.1118/1.1539038
Article Google Scholar
Boyd N, Martin L, Gunasekara A, Melnichouk O, Maudsley G, Peressotti C, Yaffe M, Minkin S: Mammographic density and breast cancer risk: evaluation of a novel method of measuring breast tissue volumes. Cancer Epidemiol Biomarkers Prev 2009, 18(6):1754–1762. 10.1158/1055-9965.EPI-09-0107
Article Google Scholar
Ding J, Warren R, Warsi I, Day N, Thompson D, Brady M, Tromans C, Highnam R, Easton D: Evaluating the effectiveness of using standard mammogram form to predict breast cancer risk: case–control study. Cancer Epidemiol Biomarkers Prev 2008, 17(5):1074–1081. 10.1158/1055-9965.EPI-07-2634
Article Google Scholar
Heine JJ, Cao K, Rollison DE: Calibrated measures for breast density estimation. Acad Radiol 2011, 18(5):547–555. 10.1016/j.acra.2010.12.007
Article Google Scholar
Heine JJ, Cao K, Rollison DE, Tiffenberg G, Thomas JA: A quantitative description of the percentage of breast density measurement using full-field digital mammography. Acad Radiol 2011, 18(5):556–564. 10.1016/j.acra.2010.12.015
Article Google Scholar
Heine JJ, Fowler EEE, Flowers CI: Full field digital mammography and breast density: comparison of calibrated and noncalibrated measurements. Acad Radiol 2011, 18(11):1430–1436. 10.1016/j.acra.2011.07.011
Article Google Scholar
Shepherd JA, Kerlikowske K, Ma L, Duewer F, Fan B, Wang J, Malkov S, Vittinghoff E, Cummings SR: Volume of mammographic density and risk of breast cancer. Cancer Epidemiol Biomarkers Prev 2011, 20(7):1473–1482. 10.1158/1055-9965.EPI-10-1150
Article Google Scholar
Mahesh M: AAPM/RSNA physics tutorial for residents: digital mammography: an overview. Radiographics 2004, 24(6):1747–1760. 10.1148/rg.246045102
Article Google Scholar
Bick U, Diekmann F (Eds): Medical Radiology Diagnostic Imaging and Radiation Oncology. Berlin: Springer; 2010.
Google Scholar
Aslund M, Cederstrom B, Lundqvist M, Danielsson M: Physical characterization of a scanning photon counting digital mammography system based on Si-strip detectors. Med Phys 2007, 34(6):1918–1925. 10.1118/1.2731032
Article Google Scholar
Heine JJ, Behera M: Effective x-ray attenuation measurements with full field digital mammography. Med Phys 2006, 33(11):4350–4366. 10.1118/1.2356648
Article Google Scholar
Heine JJ, Cao K, Beam C: Cumulative Sum quality control for calibrated breast density measurements. Med Phys 2009, 36(12):5380–5390. 10.1118/1.3250842
Article Google Scholar
Heine JJ, Cao K, Thomas JA: Effective radiation attenuation calibration for breast density: compression thickness influences and correction. BioMed Eng OnLine 2010, 9: 73. 10.1186/1475-925X-9-73
Article Google Scholar
Heine JJ, Thomas JA: Effective x-ray attenuation coefficient measurements from two full field digital mammography systems for data calibration applications. Biomed Eng Online 2008, 7: 13. 10.1186/1475-925X-7-13
Article Google Scholar
D’Orsi CJ, Bassett LW, Berg WA, et al.: Breast Imaging Reporting and Data System: ACR BI-RADS. 4th edition. Reston, VA: American College of Radiology; 2003.
Google Scholar

Download references

Acknowledgements

This work was supported by the Bankhead-Coley Cancer Research Program Grant #3BB04-51005, and the National Institutes of Health grants #R01CA166269 and #R01CA114491.

Author information

Authors and Affiliations

Division of Population Science, H. Lee Moffitt Cancer Center & Research Institute, 12902 Magnolia Drive, Tampa, FL, 33612, USA
Erin EE Fowler, Beibei Lu & John J Heine

Authors

Erin EE Fowler
View author publications
You can also search for this author in PubMed Google Scholar
Beibei Lu
View author publications
You can also search for this author in PubMed Google Scholar
John J Heine
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to John J Heine.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

The database was constructed by EF under the supervision of JH. JH, EF and BL developed the manuscript content. EF, JH and BL performed the data analysis. All authors contributed to manuscript composition. All authors read and approved the final manuscript.

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Authors’ original file for figure 4

Authors’ original file for figure 5

Rights and permissions

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Fowler, E.E., Lu, B. & Heine, J.J. A comparison of calibration data from full field digital mammography units for breast density measurements. BioMed Eng OnLine 12, 114 (2013). https://doi.org/10.1186/1475-925X-12-114

Download citation

Received: 30 August 2013
Accepted: 23 October 2013
Published: 09 November 2013
DOI: https://doi.org/10.1186/1475-925X-12-114

A comparison of calibration data from full field digital mammography units for breast density measurements