A comparison of calibration data from full field digital mammography units for breast density measurements

Background Breast density is a significant breast cancer risk factor measured from mammograms. The most appropriate method for measuring breast density for risk applications is still under investigation. Calibration standardizes mammograms to account for acquisition technique differences prior to making breast density measurements. We evaluated whether a calibration methodology developed for an indirect x-ray conversion full field digital mammography (FFDM) technology applies to direct x-ray conversion FFDM systems. Methods Breast tissue equivalent (BTE) phantom images were used to establish calibration datasets for three similar direct x-ray conversion FFDM systems. The calibration dataset for each unit is a function of the target/filter combination, x-ray tube voltage, current × time (mAs), phantom height, and two detector fields of view (FOVs). Methods were investigated to reduce the amount of calibration data by restricting the height, mAs, and FOV sampling. Calibration accuracy was evaluated with mixture phantoms. We also compared both intra- and inter-system calibration characteristics and accuracy. Results Calibration methods developed previously apply to direct x-ray conversion systems with modification. Calibration accuracy was largely within the acceptable range of ± 4 standardized units from the ideal value over the entire acquisition parameter space for the direct conversion units. Acceptable calibration accuracy was maintained with a cubic-spline height interpolation, representing a modification to previous work. Calibration data is unit specific, can be acquired with the large FOV, and requires a minimum of one reference mAs sample. The mAs sampling, calibration accuracy, and the necessity for machine specific calibration data are common characteristics and in agreement with our previous work. Conclusion The generality of our calibration approach was established under ideal conditions. Evaluation with patient data using breast cancer status as the endpoint is required to demonstrate that the approach produces a breast density measure associated with breast cancer.


Introduction
Mammographic breast density is a significant breast cancer risk factor [1][2][3]. Although used extensively in research, breast density is not generally used in the clinical environment for breast cancer risk applications [4] due in large part to the lack of an automated measurement. There are various methods under evaluation for estimating breast density from either raw or calibrated mammograms [5]. A large portion of breast density research was derived without calibration [1,2], as calibration is a more recent development for mammography.
Ideally, calibration adjusts for inter-patient x-ray image acquisition technique differences to produce some form of standardized data representation [6][7][8][9]. Calibration research is still in its early stage of development and there are few published reports evaluating its potential application relative to the volume of published breast density research using raw mammograms. The findings from calibration research have been mixed in identifying a measure that strengthens the associations with breast cancer in comparison with the operator-assisted percentage of breast density measure [10][11][12][13][14][15]. Due to its stage of development, it may be premature to conclude whether calibration is generally a useful technique for risk assessments. However, one benefit of establishing a calibration method is that it permits automated breast density measurements. We have posited that calibration may be an important step for automation.
Full field digital mammography (FFDM) detector technologies can be broadly categorized as either indirect or direct x-ray conversion systems [16]. Although these designs have many characteristics that vary, until recently both technologies produced an energy weighted integrated signal at the pixel level [17]. More recently, another type of direct x-ray conversion technology was approved for clinical use in the US that uses photon counting detection technology [18], which, in contrast to the established FFDM designs, does not produce an integrated weighted signal. Currently, it is not known if calibration will produce equivalent findings across these varying FFDM platforms.
We applied a calibration methodology developed previously for a General Electric Senographe 2000D FFDM system [19][20][21][22], which is an indirect x-ray conversion technology. Our findings based on images taken from this technology [12][13][14] suggest that calibrated breast density measurements are strong indicators of risk, providing justification to investigate the merits of calibration in more detail. As many characteristics vary between the direct and indirect x-ray conversion systems, the applicability of our calibration methodology has yet to be established for direct x-ray conversion FFDM systems.
In this current report, we expand our understanding of calibration gained previously [21,22] and establish a calibration system for a direct x-ray conversion FFDM design using phantom images acquired from three Hologic Selenia FFDM units, as the primary analysis. We considered several design objectives. One objective is to minimize the amount of calibration data collection while maintaining acceptable calibration accuracy, representing an important compromise. Although optimal, it is nearly impossible to sample all acquisition technique combinations to construct the calibration curves. Therefore, some form of sampling scheme and interpolation methodology must be established to minimize effort while maintaining acceptance accuracy. It is reasonable to assume that if calibration requires excessive phantom imaging effort or is difficult to apply across imaging platforms without considerable modification, it may not be used beyond research. Another objective is to evaluate whether calibration data collected from one FFDM unit can be applied to another similarly manufactured unit, with or without modification, as inter-unit generalization for a given technology is an important step for universal application. As a secondary objective, we also compared calibration and detector response data obtained from the Hologic units investigated in this report with those previously acquired from the General Electric FFDM unit when applicable to assess inter-technology similarities.

Methods
We acquired calibration and exposure response data from three Hologic Selenia FFDM units to evaluate the generality of our approach. Calibration curves were generated by imaging standard breast tissue equivalent (BTE) phantoms (CIRS, Norfolk, VA) described previously [22]. Our BTE phantom set includes 100% fibroglandular (glandular) and 100% adipose BTE materials that are of 1 mm, 2 mm, 1 cm, and 2 cm thicknesses (i.e. precise heights) and 18 cm × 24 cm in area dimension. These phantoms were combined (stacked) to produce desired composite proportions at a given total thickness (height). For example, combining a 2 cm thickness glandular phantom with a 2 cm thickness adipose phantom gives a 50% glandular composition with a total height of 4 cm. Calibration curves are functions of the compressed breast thickness above the breast support surface, referenced as height, and several other acquisition technique parameters, including target/filter combination, x-ray tube voltage (kV), current × time (mAs), and detector field of view (FOV), representing a five dimensional parameter space. As previously, we refer to the initial data collection as the baseline (BL) calibration dataset. A BL dataset was established for each unit.
The three Selenia systems evaluated in this report are located within the breast clinics at the Moffitt Cancer Center and are used for both screening and diagnostic purposes. Two of these systems, referred to as the H 1 and H 2 , have a tungsten (W) target with rhodium (Rh) and silver (Ag) filter options. The third unit has a molybdenum (Mo) target with Mo and Rh filter options and is referred to as H 3 . The Selenia detector has 70 micron pitch (pixel spatial resolution), and the raw data used for this work has 14 bit per pixel dynamic range. Two detector FOVs are used for screening mammograms on these units depending upon the choice of compression paddle: 24 cm × 29 cm (large) and 18 cm × 24 cm (small). The General Electric Senographe 2000D FFDM unit is referred to as GE in the report. This unit has a Mo target with Mo and Rh filter options, and a Rh target with a Rh filter. The GE detector has 100 micron spatial resolution, a 19.2 cm × 23 cm detector FOV (i.e. 1914 × 2294 pixels) and 14 bit dynamic range per pixel for the raw data used in this work. As a standard convention, we acquired all phantom images as left cranial caudal (LCC) views. In the LCC view, the detector left border in the vertical direction is parallel with the chest wall position as observed in a displayed image.
The aims of this study were to assess the pixel valuedetector exposure (detector response) relationship without attenuation, generate and assess the calibration curves for linearity, and evaluate the calibration accuracy. To minimize the BL data collection, we evaluated the calibration accuracy under these conditions: (a) when applying interpolation for the height variable; (b) when applying a data reduction step to reduce mAs sampling; and (c) as a function of FOV. To evaluate the FOV impact, we acquired the calibration datasets with the large detector FOV only. The validity of collecting calibration data with the large FOV only was evaluated by examining calibration accuracy for images acquired with the small FOV. We made direct comparisons between H 1 and H 2 because of their target/filter and manufacturing similarity, and evaluated whether calibration data collected with one unit is valid when applied to another similarly manufactured unit. Likewise, we made direct comparisons between the H 3 and GE units for the Mo/Mo and Mo/Rh combinations, when applicable.
The analysis was restricted to specific regions depending on the FFDM design and specific analysis endpoint. For the H 1 H 2 and H 3 units, unless stated otherwise, the analysis was constrained to a large region of interest (ROI) specific to the large FOV. This ROI is defined as 2000 × 2500 pixels (14 cm × 17.5 cm), centered in the vertical direction with an horizontal offset of 75 pixels (not included) from the outside of the detector (i.e. parallel to the chest wall) or left border (LCC view). This restriction is to avoid stacked-phantom edge effects near the detector outer edge and possible flat field non-uniformity interference at regions far (interior) from the central detector area. For the FOV analysis and for images taken with the GE unit, the analysis was constrained to 1000 × 1250 pixel ROI Figure 1 Breast tissue equivalent phantom positioning, detector field of view, and regions of interest. This shows the large field of view (largest rectangle) for the Hologic Selenia unit, the phantom (gray rectangle) placement on the detector (18 cm × 24 cm or approximately 2500 × 3400 pixels), and the regions of interest (ROIs) used for the analysis outlined with narrow light borders. The size of the large ROI is 2000 × 2500 pixels, and the size of the small ROI is 1000 × 1250 pixels. with a 75 pixel offset (as above). The ROIs relative to the Hologic detector and the BTE phantom area are shown in Figure 1.

Exposure response evaluation
We assessed the detector exposure and pixel value (pv) response relationships for the H 1 , H 2 , and H 3 units for select kV settings for each target/filter combination, using the large FOV. The raw image pixel value (pv raw ) response was modeled as a linear function of mAs by acquiring images without attenuation (i.e. open exposures of the detector). The mAs variable was sampled up to the point of detector saturation. The sample sets for each kV setting were analyzed with regression analysis and fitted to this form: <pv raw > = m × x + b, where x is the system readout mAs quantity for each acquisition. The slope (m) , intercept (b), coefficient of determination (R 2 ), and standard error (SE) in the slope were used for evaluation purposes. The brackets indicate the mean pv raw within the large ROI. We make the approximation that the system readout mAs value is a surrogate (proportional with) for the x-ray exposure at the detector, which is common practice. We made both intratechnology comparisons and comparisons with the GE exposure response, where applicable. Because H 1 and H 2 have the same target/filter combinations and H 3 and GE have common combinations, the respective pairwise comparisons were included in the analysis. When making pairwise inter-unit slope comparisons for given kV, an important difference is defined as when the central value of m i falls outside of this tolerance range: m j ± 2 × SE j or vice versa, where the index = 0 is reserved for the GE unit. Where appropriate, we compared the entire set of m j across units with a t-test. Demonstrating that the response is linear has important implications in the BL calibration data collection requirements. When the linear approximation holds, the mAs sampling may be reduced to one sample in the BL dataset.

Calibration dataset and characterization
The phantom imaging techniques and methods for constructing the BL calibration datasets (i.e. calibration curves) were described previously [21,22]. The same approach was applied in this report with some modification. Briefly, to construct the calibration curves for a given acquisition technique, two series of BTE phantoms were imaged to generate the respective glandular and adipose calibration curves for BL sampled heights defined as t k . Reference points derived from theses curves are used in the calibration application (discussed below). The phantom heights (total stacked heights) for a given calibration curve range from 2-7 cm depending on the acquisition technique, and were taken at 1 cm increments for convenience. To estimate the kV range, we selected the automated exposure control (auto-kV mode) and adjusted the compression paddle over a range of heights for fixed target/filter combinations. We estimated the W/Rh range is between 26-30 kV, and the W/Ag range is between 27-32 kV for the H 1 and H 2 systems. The same procedure was followed for Mo/Mo and Mo/Rh techniques for the H 3 system giving 25-31 kV and 27-34 kV ranges, respectively. BL calibration datasets (H 1 , H 2 , H 3 and GE units) were acquired with the same reference mAs setting defined as: x r = 160 mAs. We selected a reference mAs value that does not cause detector saturation when imaging phantom configurations with smaller heights, in particular adipose phantoms, while providing sufficient signal when imaging phantoms with larger heights, in particular glandular phantoms, over the entire acquisition technique range considered, as discussed previously [22].
For both comparison and presentation purposes, we evaluated the calibration curves using linear regression methods without regard to calibration accuracy. We subdivided the large ROI (2000 × 2500 pixel region shown in Figure 1) into a grid consisting of 25 × 25 pixel smaller non-overlapping sub-regions defined as r s . This gives 80 × 96 = 7680 r s subregions (for the large FOV). As above, t k is the BL phantom height in cm with the index k designating a sampled height. For a given phantom configuration (fixed height and BTE type), we average the pixel values (i.e. <pv raw >) within r s giving the mean exposure, E l (r s ), at r s and t k. For this report, the index, l, is reserved for the BTE type designation: l = a for adipose; and l = g for glandular. We divide E l (r s ) by the reference mAs giving the relative mean exposure, RE l (r s ) = E l (r s ) / x r (i.e. the reference x r = 160 mAs) at each subdivision. We evaluate the natural logarithm of the relative mean exposure, LRE l (r s ) = ln[RE l (r s )], as a function of increasing t k giving a regional calibration curve; for reference, this defines logarithm of the relative exposure (LRE) domain, which holds at the pixel level as well. For inter-unit comparisons, we applied linear regression at each r s for each BTE type resulting in a distribution for the slopes (μ l ), logarithmic intercepts (LI l ), and R 2 values estimated by fitting the ordered pairs [t k , LRE l (r s )] to this model When fitted to this form (t k+1 > t k ), the magnitude of the slope can be interpreted as the effective x-ray attenuation coefficient (i.e. μ g for glandular and μ a for adipose tissue, cited as positive quantities in the tables and expressions) measured in cm -1 for a given kV and target/filter combination. The LI l quantities are the respective intercepts, which are unitless. We summarized these regression parameter distributions with the mean and mean standard error (SE). As above, we use the μ l ± 2 × SE l tolerance gauge for the inter-system pairwise comparisons. Where appropriate, we compared the entire set of effective x-ray attenuation coefficients across systems with a t-test for each BTE material. This sub-region analysis also gives a method for assessing the spatial uniformity of the calibration data.

Calibration procedure
When calibrating an arbitrary image, the operation takes place in the LRE domain. In contrast to the calibration curve normalization that uses the reference mAs, the LRE for an arbitrary image (i.e. a prospective calibration application) is formed by normalizing either pv raw or < pv raw > by the acquisition system readout mAs defined as x before applying the natural logarithm given by: LRE = ln(pv raw /x). This normalization holds under certain conditions when the exposure response is linear. Similarly when the response is linear, two calibration points are required to calibrate an arbitrary image. These calibration points are derived from the BL curves and correspond to the theoretical pixel values in the LRE domain that would result when imaging materials that are (a) 100% glandular tissue = pv g , and (b) 100% adipose tissue = pv a for a specific acquisition technique and height. For consistency with our past convention, we refer to the calibration domain as the percent glandular (PG) representation with values theoretically ranging from 0-100 PG units. This representation is analogous to a normalized x-ray attenuation coefficient representation, which is easily converted to total volume or average volumetric glandular metric by incorporating the compressed breast thickness (height) into the analysis [21]. The calibration mapping takes this form: PG cal = M × LRE + B, where M and B are specific to a given kV, target/filter combination and height above the breast support surface; capitals are used to distinguish these parameters from the open detector exposure relationships. The LRE can be determined at the pixel level or sub-region level by using either the respective pixel value with the corresponding height or sub-region mean pixel value with corresponding mean height above the support surface.
For efficient prospective calibration applications, the BL calibration data must be stored. Therefore, we investigated two storage methods. The stored BL calibration data is then used in the specification of M and B. Both M and B are determined (fixed kV and target/filter) by considering the endpoints for a specific height t = t 0 . In the LRE domain, we set PG cal = 100 when LRE = pv g , PG cal = 0 when LRE = pv a and solve for M and B: M = 100 × (pv gpv a ) -1 and B = 50 -½ M × (pv g + pv a ), giving one method for specifying M and B. In this specification approach, when t 0 does not correspond exactly with a specific sample height from the BL, a cubic-spline interpolation was used to determine pv g and pv a at t 0 . The second method for specifying M and B expresses pv a and pv g as functions of the regressions parameters (μ g , μ a , LI g and LI a ) and t 0 using Equation (1) by substituting t k with t 0 : for example, pv g ≈ − μ g × t 0 + LI g . In this case, the M and B specification and height interpolation are performed simultaneously; the validity of this approach relies on the agreement with Equation (1) and was the method developed previously for the GE unit [21,22]. With either specification method, the B relationship can be expressed in a simpler form to include only the pv a term or the pv g term, or the regression parameters from one of the calibration curves. We have included both measured terms (or all four regression parameters) to reduce variation in the event the curves or parameters carry dissimilar accuracy. We note, the 0-100 (PG units) calibration range is imposed by the development and it is not unique but follows intuition.
When applying the calibration, the large ROI within a given image is divided into 25 × 25 pixel sub-regions as above and the average of each sub-region is used in the calibration equation giving PG cal = M × <LRE(r s, <t 0 >)> + B, where < t 0 > is the mean height above the breast support surface about r s , resulting in a spatial distribution of calibrated values. The methods described in the Calibration dataset and characterization Section indicate the calibration curves, in the most general terms, are functions of position. For this report, we used the mean values of the calibration BL data taken over all r s in the specification of M and B (both methods), removing the spatial dependency.

Calibration accuracy evaluation
To evaluate the intra-machine calibration accuracy near the BL acquisition date (for the H 1 , H 2 , and H 3 units), we imaged 4 cm composite phantoms comprised of a 2 cm adipose phantom stacked upon a 2 cm glandular phantom for the majority of kV settings and target/filter combinations. For a few of the larger kV acquisitions we used the same adipose and glandular ratio to construct 6 cm phantoms to avoid detector saturation. We refer to these composite phantoms as 50/50 mixtures. We also acquired 50/50 mixture images with three mAs settings to evaluate the impact of reference mAs normalization on the calibration accuracy: 120 mAs, 160 mAs (the reference) and 200 mAs (i.e. two additional samples for comparison purposes).
For the accuracy evaluation, we used the two methods outlined above for specifying M and B to select the optimal technique and make comparisons with our previous work. This evaluation was performed in four related steps. In step 1, we used the pv a and pv g determined with the BL dataset to calibrate 50/50 mixtures acquired with heights included in the BL; this should provide the best accuracy because no interpolation is required. In step 2, we calibrated the same mixtures used in step 1 with the regression parameter specification method; this does not permit a fair comparison with the first step because it includes interpolation but is required for the comparisons in the next two steps. To fully evaluate both interpolation methods, we also included additional 50/50 mixture acquisitions using the reference mAs (x r = 160 mAs) with heights set at 4.2 cm, 4.4 cm and 6.4 cm, which were not included in the BL datasets (i.e. non-BL mixtures). In step 3, we used pv a and pv g derived from spline interpolation in the calibration of the non-BL mixtures, and in step 4 we used the regression parameters to calibrate the same non-BL mixtures. The comparison of step 1 with step 3 and comparison of step 2 with step 4 provides an intra-specification method evaluation by considering BL and non-BL height samples. The comparison of step 1 and 3 with step 2 and 4 provides a means for selecting the optimal interpolation method. From previous experience, we used an empirically derived tolerance of approximately ± 4 PG unit deviation from the ideal PG cal = 50 for comparing calibration accuracy. For these comparisons, we acquired additional 50/50 mixtures using both BL heights (4 and 6 cm phantom heights) and non-BL heights. To minimize serial drift influences within the BL and non-BL comparison, we acquired both phantom series on the same day. We performed two additional experiments to assess the calibration generality and accuracy. First, to evaluate whether calibration data acquired from one FFDM unit is applicable to another similar unit, we switched the BL calibration data and used BL 1 (i.e. from H 1 ) to calibrate 50/50 mixtures (with 160 mAs) acquired from H 2 and vice versa, referred to as the cross-unit calibration analysis (findings discussed with those resulting from step 1). Secondly to evaluate FOV influences, we acquired 50/50 mixtures using the small FOV and performed calibration with the BL calibration data acquired with the large FOV for the H 1 , H 2 , and H 3 units. To perform the small FOV analysis, a reduced ROI was used comprised of 1000 × 1250 pixels, outlined in Figure 1.

Exposure response
The open detector exposure relationships (pv and exposure response) for all systems are summarized in Table 1. Example plots are shown in Figure 2 for the similar H 1 and H 2 units. Plots for the H 3 and GE units for common filter/target combinations are shown in Figure 3. The plots in both figures are representative of the linear response relationship for the four units. The R 2 estimates (Table 1) are close to unity for all of the acquisition techniques considered, indicating the relationships are well approximated as linear for all units. Despite their design similarities, the response varies beyond our tolerance (i.e. m j ± 2 × SE j ) between the H 1 and H 2 units within kV settings. Although beyond the tolerance, the percent difference between m 1 and m 2 is within 3.3%-5.5%, whereas the intercepts show much larger variation. Comparing the set of m 1 estimates with the set of m 2 estimates (t-test) gave P > 0.96, indicating the exposure response does not differ significantly across similar systems. The pairwise responses also vary beyond the tolerance across the H 3 and GE systems as expected for all observations. Although the exposure response quantities vary across all systems, the response linearity is a common characteristic across all units (H 1 , H 2 , H 3 , and GE). This common trait suggests the mAs sampling can be reduced to one sample for a given target/filter combination and kV setting (as evaluated below).

Calibration datasets
The effective attenuation coefficients (μ l ) and logarithmic intercepts (LI l ) for the H 1 and H 2 units are shown in Table 2 separated by the BTE type and FFDM unit. We have provided the absolute value of the slope from the regression analysis , which is cited as μ l , and the corresponding SE l . Example calibration curve plots for these units fitted with regression analysis are shown in Figure 4. The R 2 findings indicate the linear model fits well. The agreement of respective μ l pair and SE vary. For example, the μ a pairwise comparison for W/Rh combinations indicates there is close agreement for the 26-29 kV as gauged by the preset tolerance (μ l ± 2 × SE l ) with little variation at 26 kV and a maximum 2.3% variation at 30 kV, which is beyond the tolerance. The corresponding variations across the μ g pairs show greater variation for the W/Rh combinations but are within the tolerance. The W/Ag glandular and adipose coefficients follow a similar trend and are within the similarity tolerance. Comparing the set of μ a estimates for H 1 with the corresponding set from H 2 (t-test) gave P > 0.70. Similarly, comparing the μ g set between H 1 and H 2 gave P > 0.45. These comparisons indicate the set of effective x-ray attenuation coefficients for a given BTE material does not differ significantly across similar systems. Because of the target/filter  (2) Table 3 for the Mo/Mo and Mo/Rh combinations, and example calibration curve plots fitted with regression analysis are shown in Figure 5. The R 2 quantities indicate linearity is a common trait across these two different units. The pairwise attenuation coefficients are within magnitude agreement as are the LI l quantities for these units but are not interchangeable or within the tolerance range when comparing the H 3 and GE units. The effective x-ray attenuation coefficients, μ a and μ g , logarithmic intercepts, LI a and LI g , and coefficient of determination (R 2 ) were derived with regression analysis for the similar Hologic (H 1 and H 2 ) units for the W/Rh and W/Ag target/filter combinations are provided as mean values. The mean μ l and the associated mean standard error (SE l ) are cited. The x-ray tube voltage (kV) and phantom height ranges are provided. For each kV setting, the images were acquired by incrementing the phantom heights from the lower range to the upper range in 1 cm increments. The absolute value of the slope is equivalent to the effective x-ray attenuation coefficient for each acquisition technique and BTE material.
As above, comparing the μ a set for H 3 with the corresponding set for GE (t-test) gave P > 0.14, indicating the set of adipose x-ray attenuation coefficients is similar across systems that use different detector technologies. In contrast, the corresponding μ g set comparison gave P < 0.0001, suggesting the attenuation coefficients for the glandular BTE material differ across these systems.

Calibration accuracy
For the BL calibration accuracy evaluation, the spline specification method findings (step 1) are presented in this section because the M and B are specified by the calibration points at t k , which are special cases. For the most part as shown in Table 4, the within-unit accuracy for the H 1 and H 2 units is within ± 4 PG units of the ideal value (i.e. PG cal = 50). However, there is greater variation for W/Ag acquisitions in the larger kV settings. This may be because the H 1 calibration data for these samples was acquired on a different date than the rest of the respective BL dataset. The within-unit W/Ag accuracy for the most part is similar to the intra-system accuracy, whereas the accuracy for the W/Rh shows greater variation from the ideal value. The accuracy for the examples taken with non-reference mAs settings are similar to those obtained with the 160 mAs reference, showing the validity of the LRE normalization. The cross-unit calibration findings, provided in the right side of Table 4 for H 1 and H 2 units, show a trend beyond our tolerance gauge of ± 4 PG. These findings suggest that the calibration data in general is specific to the unit, even though they are identical.
In addition to the x-ray attenuation coefficient differences, another source of variation stems from the LIs, which may vary due to the inter-system exposure response differences ( Table 1). The accuracy evaluation for H 3 is shown in Table 5 using the same  The effective x-ray attenuation coefficients, μ a and μ g , logarithmic intercepts LI a and LI g , and coefficient of determination (R 2 ), derived with regression analysis, for the Hologic (H 3 ) and the General Electric Senographe 2000D (GE) FFDM systems for the Mo/Mo and Mo/Rh target/filter combinations are provided as mean values. The mean μ l and the associated mean standard error (SE l ) are cited. The x-ray tube voltage (kV) and phantom height (thickness) ranges are provided. For each kV sample, images were acquired by incrementing the phantom heights from the lower range limit to the upper range limit in 1 cm increments. The absolute value of the slope, as cited, is equivalent to the effective attenuation coefficient for a given BTE material.
H 2 units, are similar to those obtained with GE previously [22]. The accuracies shown in Tables 4 and 5 with the respective standard deviations (SDs) indicate that spatial non-uniformity has a minimal influence. Table 6 shows the calibration generated with linear regression parameter specification method (i.e. step 2) for the H 1 and H 2 units. For the 160 mAs reference examples, the accuracy for 5 of the 11 acquisition techniques was outside of the ± 4 PG tolerance for the H 1 unit. Similarly, the calibration was beyond the tolerance for 6 of the 11 acquisition techniques for H 2 . For the H 3 unit, the accuracy was beyond the tolerance for all 15 acquisition techniques and exceeded +7 PG for 9 of these techniques (data not shown to limit the presentation). The accuracy for non-reference mAs examples follows a similar accuracy trend. The accuracies in Table 6 should be compared with respective findings in Table 4 (left side).
The cubic-spline height interpolation findings for the H 1 , H 2 , and H 3 systems are shown in Table 7 for the non-BL evaluation (step 3). When comparing either within or across the H 1 and H 2 systems, the findings show that non-BL height accuracy is within the ± 4 PG tolerance for all but one acquisition technique indicating similarity across systems and the validity of the spline interpolation. The right portion of Table 7 shows the H 3 evaluation for the Mo/Mo and Mo/Rh examples. Although the calibration accuracies are marginally above the tolerance for both the BL and non-BL heights, the accuracies are similar to those shown in Table 5, again demonstrating the validity of the spline interpolation technique. The regression parameter interpolation findings for the non-BL evaluation are shown in Table 8 (step 4). The accuracies for the non-BL from H 1 are within the tolerance, whereas the majority of the H 2 accuracies are beyond the tolerance. Although the H 3 accuracy is in agreement with its related findings (Table 5), the BL accuracies are beyond the tolerance, and the non-BL calibration quantities deviate beyond the BL quantities. In summary, interpolation with the regression      In the final analysis, we assessed the potential influence of the system FOV for the H 1 , H 2 and H 3 units (cubic-spline approach). Table 9 shows the findings when applying the calibration data acquired with the large FOV to 50/50 mixtures taken with the small FOV. For comparison, 50/50 mixtures acquired with the large FOV were also calibrated; both sets of images were acquired on the same day to minimize serial drift influences. Considering the large FOV findings as the standards, the respective small FOV calibration accuracy is well within ± 4 PG tolerance, demonstrating the FOV change has little influence.

Discussion
A calibration system for Hologic Selenia FFDM units was established upon our previous work [21,22] using a different FFDM technology. The findings demonstrate the generality of our approach. There are both important similarities and differences when comparing the inter-FFDM technology calibration requirements. The mAs normalization was similar across the two technologies and is dependent in part upon the linearity of the pixel value and exposure relationship and the validity of ignoring the intercept term (i.e. assuming the relationship is proportional in addition to linear). The findings suggest that at a minimum, one reference mAs sample may be sufficient for generating calibration curves in agreement with our previous findings. We showed that the calibration data could be acquired with the large FOV only without impacting the calibration accuracy for images acquired with the small FOV. The ability to use a single reference mAs and FOV results in a substantial reduction in data collection required to establish the BL calibration datasets. Although the calibration curves were well approximated as linear for the systems evaluated in this report, we required a cubic-spline height interpolation for the H 1 , H 2 , and H 3 units. This spline interpolation requirement is in contrast with our previous work, where the effective x-ray attenuation coefficients and logarithmic intercepts (i.e. regression parameters) were stored and then used for generating both the height interpolation and calibration points. Consistent with our findings from similar GE systems [22], each similarly-manufactured Hologic system (i.e. H 1 and H 2 ) requires its own BL calibration dataset to maintain acceptable calibration accuracy.
There are several limitations with this work. The data was collected over a period of approximately 35 days and the phantom heights were precise. In previous work [20], we showed that the GE unit exhibited serial drift with respect to the BL dataset and drift should be accounted for to maintain prospective calibration accuracy. Because the data in this report was collected over a relatively short time interval, serial drift influences are likely minimal. Similarly, the calibration accuracy was evaluated without height uncertainty. Therefore, the accuracies obtained in this report may be considered ideal.
Our original objective was to develop a continuous calibrated breast density measurement applicable across imaging platforms. Additionally, calibration may be useful for other than risk applications, such as estimating the BI-RADS breast composition descriptors [23]. The BI-RADS breast composition descriptors were developed for standardized reporting purposes and synchronized with situations where mammographic sensitivity may be lower due to composition. Calibrated tissue composition measurements may be useful for both breast cancer risk applications as well as providing quantitative sensitivity measure.

Conclusion
This initial evaluation in combination with our previous calibration findings indicate that the same calibration approach may apply to both indirect and direct x-ray conversion technologies. Because the BL dataset requires a considerable amount of phantom imaging, it is not cost-effective to acquire serial replications of the BL dataset on a regular basis for calibration purposes. Therefore, it is imperative to evaluate the forward serial applicability or stability of the BL datasets [20]. In addition, alternative methods of updating the BL dataset with a minimal amount of serial phantom imaging will be explored in future work. Previously, we adapted the Cumulative Sum approach to monitor the forward stability of the BL dataset [20]. However, the serial updating remains an open-ended problem. For this report, the compressed breast thickness was not a source of uncertainty. The calibration accuracies in the work were obtained under relatively ideal conditions by design. The compression paddle on the Hologic systems in this report is spring tensioned and therefore somewhat different from the technology we evaluated previously. During actual breast imaging, the compression paddle tilts and warps, and the system compressed breast thickness readout is often nominal [21], which are common traits across the FFDM designs. Additional work is required to assess the influence of uncertainty in paddle height (relative to breast support surface) using deformable phantoms and generate a compressed breast thickness correction before applying calibration to actual mammograms. Although the calibration accuracies were within our preset tolerances for the most part, the viability of our technique with this particular FFDM technology will require evaluation with patient images to show that a calibrated measure of breast density is associated with breast cancer.