Mitotic counts in breast cancer should be standardized with a uniform sample area
 Michael Bonert^{1}Email author and
 Angela J. Tate^{2}
DOI: 10.1186/s129380160301z
© The Author(s) 2017
Received: 1 August 2016
Accepted: 18 December 2016
Published: 16 February 2017
Abstract
Background
Mitotic rate is routinely assessed in breast cancer cases and based on the assessment of 10 high power fields (HPF), a nonstandard sample area, as per the College of American Pathologists cancer checklist. The effect of sample area variation has not been assessed.
Methods
A computer model making use of the binomial distribution was developed to calculate the misclassification rate in 1,000,000 simulated breast specimens using the extremes of field diameter (FD) and mitotic density cutoffs (3 and 8 mitoses/mm^{2}), and for a sample area of 5 mm^{2}. Mitotic counts were assumed to be a random sampling problem using a mitotic rate distribution derived from an experimental study (range 0–16.4 mitoses/mm^{2}). The cellular density was 2500 cell/mm^{2}.
Results
For the smallest microscopes (FD = 0.40 mm, area 1.26 mm^{2}) 16% of cases were misclassified, compared to 9% of the largest (FD 0.69 mm, area 3.74 mm^{2}), versus 8% for 5 mm^{2}. An excess of 27% of score 2 cases were misclassified as 1 or 3 for the lower FD.
Conclusion
Mitotic scores based on ten HPFs of a small field area microscope are less reliable measures of the mitotic density than in a bigger field area microscope; therefore, the sample area should be standardized. When mitotic counts are close to the cutoffs the score is less reproducible. These cases could benefit from using larger sample areas. A measure of mitotic density variation due to sampling may assist in the interpretation of the mitotic score.
Keywords
HPF Mitotic counting Breast cancer Sampling Cancer grading Reproducibility SimulationBackground
Tumour growth rate is a prognostic marker and can be evaluated by its correlate at the cellular level: mitoses. Thus, mitotic counts are used in a wide number of neoplasms to predict prognosis, and highly proliferative neoplasms (with many mitoses) usually have a worse prognosis. Mitotic counts are performed by a pathologist, counting mitotic figures at a high magnification. As mitotic figures are rare in relation to the number of cells, 10 high power fields (HPF) of view are typically examined. As cellularity is time consuming to quantify, mitoses/area is often used instead of mitoses/cell.
Considered as a sampling problem, mitotic counting is, typically biased in a number of ways: (1) many pathologists do not start the count until they have found one mitosis, (2) pathologists count mitoses in the area of the tumour they consider to be the most mitotically active (usually the most poorly differentiated portion). The former introduces a systematic bias that consistently skews the results in the direction towards a higher mitotic score. The later factor is not a significant factor if one frames the problem as an assessment of the poorly differentiated region of the tumour (as opposed to the tumour as a whole).
In breast pathology, mitotic counts are a part of the Nottingham score and have been demonstrated to be a histomorphologic predictor of outcome. The procedure for counting mitoses was laidout in the original paper that described the scoring system [1], and has subsequently been clarified in the CAP protocol, where it states it should be done on the “most mitotically active area”. The system recognizes that mitotic rate per area is a strong predictor and essentially standardizes the cutpoints (3 and 8 mitoses/mm^{2}) to two separate mitotic rates (mitoses/area), creating a three tier system. The system pseudostandardizes the sample area to 10 HPF, where one HPF is the field area seen with the 40x objective and dependent on the field diameter of the microscope.
Mitotic counting in breast pathology has been considered a sampling problem and it has been studied experimentally and modelled mathematically [3]. Experimentally assessing the reproducibility of mitotic counts rigorously is an onerous proposition, and it would be prohibitively expensive to do a large study from which the misclassification errors due to sampling can be accurately assessed. The reproducibility of mitotic counts in the context of breast pathology is “low” [3]; thus, small sample sizes (< 50 cases) are not sufficient. Unselected breast cancer cases are usually mitotic score 1, and this makes a study of misclassification more onerous, as the effect size is smaller than it would be if the cases are equally distributed among the different scores.
A computer simulation of this problem is an elegant solution as it can avoid the onerous labour and control for many confounders. Populations of simulated specimens can be randomly sampled, and compared to the true mitotic rates. Such a gold standard comparison is not possible using glass slides and pathologists. This can be done millions of times to determine the distribution of correct and incorrectly classified simulated specimens, and is a powerful tool for demonstrating the effect of sampling 10 HPFs from microscopes with different field areas.
The area of a HPF measured in mm^{2} may vary considerably from microscope to microscope. Using the smallest and largest field diameters from the table in the College of American Pathologists (CAP) checklist for Invasive Breast Cancer [2], a microscope with a field diameter of 0.40 mm has a HPF area of 0.13 mm^{2}, while one with a field diameter of 0.69 mm has a HPF area of 0.37 mm^{2}. This is almost a threefold difference in the area sampled in one HPF.
HPF is a tenuous pseudostandard and, unfortunately, this measure is widely used throughout pathology, from eosinophilic esophagitis [4] to gastrointestinal stromal tumours [5], and breast cancer [6] to mention a few. It has been recognized that the HPF, an often used measure of area, has no uniform definition and is a significant and under recognized source of variability that can lead to misclassification affecting reproducibility. As a result, there has been a gradual move toward establishing standardized sample areas, e.g. 5 mm^{2} for gastrointestinal stromal tumour [5]; however, it stubbornly persists in many areas of pathology.
Sampling problems, similar to mitotic counting, are all around us, and differences in sample size significantly effect predictions. A public opinion survey with 1000 individuals is more representative of the population than one with 500 individuals. This paper will show the same applies for the sample area, and will demonstrate how large the effect of sample size is in the context of breast cancer mitotic scoring.
This paper will rigorously assess the impact of the sample area on the mitotic score in the threetier system used in breast pathology and it will demonstrate that a standardized sample area is essential.
Methods
An in silico model was developed based on the binomial distribution [7], using the software GNU Octave (www.gnu.org/software/octave/). The model assumes mitotic counting is a sampling problem. It assessed the classification and misclassification rates of 1,000,000 simulated breast specimens, using the sample areas for the extremes of the field diameter range (FD = 0.40 mm, 10 HPF = 1.26 mm^{2}, FD = 0.69 mm, 10 HPF = 3.74 mm^{2}) in the College of American Pathologists (CAP) checklist [6], as well as the areas of 5.00 mm^{2}, which would be equivalent to 40 HPF at a FD 0.40 mm, or almost 14HPF for FD 0.69 mm.
Each simulated breast specimen was assigned a true mitotic density, based on an experimentally determined distribution from Meyers et al. The mitotic rate in the population ranged from 0 to 16.4 mitoses/mm^{2} and is similar to the distribution seen in an unselected population; however, has more cases in the higher mitotic score categories.
The sampling processes, i.e. the simulated mitotic counts, were each represented by a random number between 0 and 1, which were considered equivalent to the percentile score of all possible sampling results. Thus, the random numbers could be substituted for the cumulative probability—in the (sampled) mitotic count–probability distributions (i.e. Fig. 1)—and thus converted to the sampled mitotic counts. The sampled mitotic counts were subsequently converted into (sampled) mitotic scores. The true and sampled mitotic scores were then determined based on the true and sampled mitotic densities, using the cutoffs of 3 and 8 mitoses/mm^{2} as in 2013 version of the CAP checklist [2], and the misclassification or agreement was tabulated. For each of the 1,000,000 cases the cellular density (2500 cells/mm^{2}) was held constant.
For example:
If the random generated number is 0.42644 the mitotic score is determined to be 7, as the random number is less than the cumulative probability for 7 mitoses (0.45285) and greater than the cumulative probably for 6 mitoses (0.31318).
The number of specimens with a particular mitotic rate for the sample population are shown in Fig. 1, which was interpolated to generate a table of mitotic rates and their relative frequency in the population of specimens.
Results
Percentage classification for sample area 1.26 mm^{2}
True score 1 n = 655,196 (%)  True score 2 n = 245,122 (%)  True score 3 n = 99,682 (%)  Totals n = 1,000,000 (%)  

Sample score 1  96  37  <1  72 
Sample score 2  4  55  25  18 
Sample score 3  <1  8  75  9 
Percentage classification for sample area 3.74 mm^{2}
True score 1 n = 682,735 (%)  True score 2 n = 209,385 (%)  True score 3 n = 107,880 (%)  Totals n = 1,000,000 (%)  

Sample score 1  95  14  0  68 
Sample score 2  5  82  18  22 
Sample score 3  0  4  82  10 
Percentage classification for sample area 5.00 mm^{2}
True score 1 n = 684,813  True score 2 n = 212,349  True score 3 n = 102,838  Totals n = 1,000,000 (%)  

Sample score 1  96  12  0  68 
Sample score 2  4  83  13  22 
Sample score 3  0  5  87  1 
Accuracy by area of all classifications of simulated breast specimens
Area (mm^{2})  Percentage correct  Percent misclassified 

1.26  84  16 
3.74  91  9 
5.00  92  8 
10.0  95  5 
15.0  96  4 
20.0  96  4 
Discussion
The threetier system reproducibly separates score 1 and score 3; the lowest sample area (1.26 mm^{2}) has less than 1% of cases misclassified as score 1 when it is truly score 3.
Intuitively, the middle group should have a higher misclassification rate than the other two, as it directly interfaces with the two other groups. The model reproduces this expected pattern; the middle group has the highest misclassification rate.
The smallest field diameter microscopes (0.40 mm), due to sampling error, incorrectly categorize an additional of 7% of all tumors when compared to the largest field diameter microscopes (0.69 mm), 27% more of cases in the mitotic score 2 group.
The results reproduce findings by Meyer et al.; the misclassification rate is quite high (9–16% of cases). There is a clear trend to less misclassification with greater sample areas and the misclassification rate has a nonlinear relationship with the sample area where incremental increases in area have successively lower reductions in the misclassification rate, as shown by the flattening of the misclassification from 5 to 20 mm^{2} (Table 4).
Limitations
The study did not consider the common practise of beginning the mitotic count with a mitotic figure. The effect of this practise could be calculated; however, it adds another level of complexity and likely does not change the overall conclusions. As well, the study did not systematically assess the impact of cellularity in the simulated breast specimens; however, some smaller calculations suggest it is not a significant factor (data not shown).
The findings are not corroborated by a large data set with patient outcomes and sample areas. This is a true short coming; however, we are not in possession of such a data set, though we hope this study will spurn some data mining by others. These findings regarding sampling theory as it applies to simulated breast specimens are practically selfevident, particularly if examined in the context of the vast experience with similar problems in opinion research (public polling) and manufacturing (statistical process control).
Conclusions
Ten HPF is not a good standard sample area, as the misclassification rate is dependent on the microscope. The reproducibility of the mitotic score is poor, especially when close to the CAP Protocol cutpoints of 3 and 8 mitoses/mm^{2} (see Fig. 2; Additional file 1: Appendix S1).
The mitotic count cutpoints should be standardized and the sample area standardized; this could be accomplished by varying the number of HPFs counted and may be less complicated than the table in the CAP checklist (see Additional file 2: Appendix S2).
Reducing misclassification
Generally, reducing the misclassification error requires a larger sample area, as noted by Meyer et al. [3]. However, we believe advocating larger sample areas (>5 mm^{2}) would be impractical and needlessly tedious, as many cases can be assessed with a relatively small area. Also, as we have shown, a number of cases close to the cutpoint, considered practically, will frequently be misclassified unless one samples the whole or at least the entirety of the most poorly differentiated component, of the tumour.
We believe a more rational approach would be to triage cases into (a) “needs a larger sample area”, and (b) “confident it is correctly classified”. The triage decision would be guided by a count on a (small) standardized sample area and a confidence interval around the cutpoints. Cases deemed to need a larger sample area would be classified based on the larger sample area. We believe it is reasonable to draw the line after limited additional sampling; with a statement about the confidence interval—to make the clinician aware that a number of the cases will be misclassified by chance so that it can be taken into account. It is possible that some pathologists might already do such an activity, by performing repeated counts in several areas (Additional file 3).
We strongly believe that the term “high power field” and its cousins (“intermediate power field”, and “low power field”) should be completely abandoned as measures of area in pathology. Their continued use is offensive to any person that has given thought to why measures (such as the foot, millimeter and kilogram) are standardized or has some understanding of sampling theory.
Accurately quantifying proliferation will likely remain important for predicting cancer outcomes in the near term. Proliferative activity is used to subclassify breast cancer, has been quantified with Ki67 labeling, and is central to commercial ancillary tests for breast cancer, e.g. Oncotype Dx [8].
The issue identified in this paper may explain, in part, why alternatives to the mitotic count have been sought; mitotic counts done by humans have limitations and have been done without much attention to sampling theory.
Abbreviations
 CAP:

College of American Pathologists
 FD:

Field diameter
 HPF:

High power field
Declarations
Authors’ contributions
MB did the calculations and wrote the first draft of the manuscript. AJT audited the calculations, revised the manuscript. Both authors read and approved the final manuscript.
Acknowledgements
Dr. Beverley A. Carter was consulted on questions in breast pathology.
Competing interests
Both authors declare that they have no competing interests.
Ethics approval
Not applicable—study does not involve the use of animal tissue or human tissue.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Authors’ Affiliations
References
 Simpson JG. Prognostic value of histologic grade and proliferative activity in axillary nodepositive breast cancer: results from the EasterEastern Cooperative Oncology Group Companion Study, EST 4189. J Clin Oncol. 2000;18:2059–69.Google Scholar
 Lester SC, et al. Protocol for the Examination of Specimens From Patients With Invasive Carcinoma of the Breast. College of American Pathologists. 18 Dec 2013. http://www.cap.org/apps/docs/committees/cancer/cancer_protocols/2013/BreastInvasive_13protocol_3200.pdf. Accessed March 18, 2014.
 Meyer JS, Cosatto E, Graf HP. Mitotic index of invasive breast carcinoma. Achieving clinically meaningful precision and evaluating tertial cutoffs. Arch Pathol Lab Med. 2009;133(11):1826–33.Google Scholar
 Dellon ES, Aderoju A, Woosley JT, Sandler RS, Shaheen NJ. Variability in diagnostic criteria or eosinophilic esophagitis: a systematic review. Am J Gastroenterol. 2007;102(10):2300–13.View ArticleGoogle Scholar
 Rubin BP, et al. Protocol for the Examination of Specimens From Patients With Gastrointestinal Stromal Tumor (GIST). College of American Pathologists. Oct 2013. http://www.cap.org/apps/docs/committees/cancer/cancer_protocols/2013/GIST_13protocol_3022.pdf. Accessed March 18, 2014.
 Lester SC, et al. Protocol for the Examination of Specimens From Patients With Invasive Carcinoma of the Breast. College of American Pathologists. 28 Jan 2016. http://www.cap.org/ShowProperty?nodePath=/UCMCon/Contribution%20Folders/WebContent/pdf/cpbreastinvasive16protocol3300.pdf. Accessed 31 March 2016.
 Walpole RM. Probability and Statistics for Engineers and Scientists, 5th edn. Macmillan Coll Div; 1993.
 Inwald EC, KlinkhammerSchalke M, Hofstaedter F, Zeman F, Koller M, Gerstenhauer M, Ortmann O. Ki67 is a prognostic parameter in breast cancer patients: results of a large populationbased cohort of a cancer registry. Breast Cancer Res Treat. 2013;139(2):539–52.View ArticleGoogle Scholar