Statistical colour models: an automated digital image analysis method for quantification of histological biomarkers
© Shu et al. 2016
Received: 27 October 2015
Accepted: 18 April 2016
Published: 27 April 2016
Colour is the most important feature used in quantitative immunohistochemistry (IHC) image analysis; IHC is used to provide information relating to aetiology and to confirm malignancy.
Statistical modelling is a technique widely used for colour detection in computer vision. We have developed a statistical model of colour detection applicable to detection of stain colour in digital IHC images. Model was first trained by massive colour pixels collected semi-automatically. To speed up the training and detection processes, we removed luminance channel, Y channel of YCbCr colour space and chose 128 histogram bins which is the optimal number. A maximum likelihood classifier is used to classify pixels in digital slides into positively or negatively stained pixels automatically. The model-based tool was developed within ImageJ to quantify targets identified using IHC and histochemistry.
The purpose of evaluation was to compare the computer model with human evaluation. Several large datasets were prepared and obtained from human oesophageal cancer, colon cancer and liver cirrhosis with different colour stains. Experimental results have demonstrated the model-based tool achieves more accurate results than colour deconvolution and CMYK model in the detection of brown colour, and is comparable to colour deconvolution in the detection of pink colour. We have also demostrated the proposed model has little inter-dataset variations.
A robust and effective statistical model is introduced in this paper. The model-based interactive tool in ImageJ, which can create a visual representation of the statistical model and detect a specified colour automatically, is easy to use and available freely at http://rsb.info.nih.gov/ij/plugins/ihc-toolbox/index.html. Testing to the tool by different users showed only minor inter-observer variations in results.
KeywordsColour detection Statistical model Colour deconvolution Digital pathology Histological image processing Biomarker quantification Software
Histopathological assessment is a crucial clinical diagnostic technique. A wide range of immunohistochemical and histochemical stains are available to assist histological assessment by providing contrast between a protein (or cell type) of interest and background tissue. These stains colour the target antigens or proteins, called biomarkers, with different chromogens to visualise them to assist visual microscopic analysis .
Many methods have been used to quantify the stain colour in IHC images [5–7]. However, misdetection is a common problem when two or more chromogens with overlapping absorption spectra are used on one slide [7, 8]. For example, the brown colour pixels were missing from a dark DAB-stained area when the single Y channel in the CMYK model was used for classification  (see Fig. 1d). Colour deconvolution (CD) , one of the most popular methods, falsely recognized the brown colour pixels in the dark DAB-stained area as blue colour pixels (see Fig. 1b, c). Colour deconvolution exploits differences in the light absorption spectra of different colour stains, but because it is based on a linear light absorption algorithm detection accuracy may be reduced if the light is not linearly absorbed by the stain, as is the case with DAB stain .
In this study we treated the detection of stained pixels as a colour detection problem in computer vision. Pixels stained a specified colour, positive colour pixels, are considered as a group of pixels which can be extracted from the background, the negative colour pixels. The method of stain colour detection in digital IHC images proposed here is a statistical colour detection model. A model is created from a huge collection of colour pixels that contains both the positive and negative colour pixels in the image. A maximum likelihood classifier based on statistical models of the positive and negative pixels, is used to classify pixels in digital slides into positively and negatively stained pixels automatically.
Interactive tool in imageJ
We developed this colour detection method into a semi-automatic plugin in ImageJ which could be used to assist with IHC image analysis. The colour detection function is based on the statistical model presented in "Methods" section ; this allows rapid colour detection from arbitrary IHC stained slides. This tool was first published in , and here we modified the performance and added built-in models for the detection of stain colour in DAB and PSR stained specimens.
Overview of application software
Users begin training by selecting a interested colour region (ICR) using a rectangular tool in ImageJ. There are two further components to this visual selection process; selection of the colour of interest and placement of a sliding bar within the scrolling panel, shown in Fig. 3. Background pixels can be filtered out using the sliding bar and appear as 255 in the resulting image. A statistical model is constructed from the histogram of the remaining colour pixels, which are quantified and collected. The training phase involves re-selecting an ICR in multiple training samples to obtain a wide range of shades of the target colour. When a new training sample is added the model is re-calculated automatically on the basis of the accumulated histograms.
When sufficient training samples have been collected, the statistical model created can be saved for reuse in subsequent detection phases. In the detection phase, the tool allows the user to use either the default DAB detection model obtained in our experiment, or a saved user-generated. The selected model can be used to detect similarly coloured stain in IHC images automatically.
Experiments and discussion
Data and three steps test
Number of patients of each datset and the number of images were captured and used for training and testing
WSI of oesophageal cancer
TMA of colorectal cancer
One sort of dataset was obtained from human oesophageal biopsy, one was obtained from human colorectal biopsy, and the other one was obtained from human liver cirrhosis biopsy. Slides come from hundreds of specimen and were prepared in different time and scanned using a Hamamatsu scanner. All slides were prepared by NHS Trust Nottingham University Hospital. Each whole slide (WS) comes from individual patient and tissue micro array (TMA) slides come from a total of 700 patients . We show the number of images was used for training and testing in Table 1. These images were used in step two and step three. Whole Slide Images and TMA images, were randomly captured from oesophageal cancer dataset and colorectal cancer dataset, respectively. For PSR stained dataset, we separated 15 slides into 60 images with litter overlap regions and randomly selected 25 images to compromise the experimental dataset. And for DAB stained liver cirrhosis dataset, we separated 100 slides into 189 images with litter overlap regions and randomly selected 48 images to compromise the experimental dataset. The flowchart of carrying out experiments was shown in Fig. 4.
Images used in the first step, were captured from whole slides with DAB staining. We used this dataset to select the colour space, and determine paramenters, such as number of colour bins and \(\theta\). This was already done in  and we specified the dataset and briefly described the experimental results in this paper. In step two, slides were prepared with two approaches, WS and TMA. Both of them were stained by DAB staining. In this step, we assessed user-independence and detection accuracy as comparing with two previous methods . Datasets used in step three were stained by PSR staining and DAB staining. This step assessed the proposed model in detecting of different stain colour  and the same stain used on another disease . In this paper, we added tests, in step two and step three, to evaluate the variations of detection results among vectors and statistical models. We also added tests to evaluate statistical model in histochemical stain detection and compared it with CD method. Each dataset was prepared in different magnification and resolution. We explained them one by one in the following three steps.
Step one: testing the tool using different colour models
The data for building the statistical colour models included 20 images with a resolution of 6720 × 4200. The models were then tested on another set of 75 images with the same resolution. Both the training and testing images were captured under 20× magnification. They were randomly captured from 14 whole slides.
This semi-automatic tool was first used to label colour-positive pixels manually as described in "Softwares" section from 20 training images. Labeled pixels were collected and quantized into histogram bins to construct the statistical model based on (Eqs. 1, 7 and 8). The ground truth of the test dataset was also prepared manually by using this tool to eliminate all negative colour pixels.
The tested colour spaces were listed in "Methods" section. Two spaces which only use chromaticity channels are included: the opponent colour space and CbCr space. It is interesting to note that the CbCr chromaticity space has the smallest number of overlapping bins and the experimental results confirmed that this space gave the best performance.
This indicates that chromaticity is sufficient for accurate colour representation and that luminance is a distraction when building the model. As mentioned before relying on 2D chromaticity signals makes the model simpler, faster to compute and less demanding of memory. The optimal number of histogram bins is 128; this number produced better results than 256-bin histograms at a smaller computation cost. Please see more details in [6, 7].
Step two: user-independence of the model
Images were randomly captured from 74 whole slides and 14 TMA slides. Each TMA slide contained 16 × 7 cores. We randomly captured 60 images from either kinds of slide. The training dataset contained 10 WS images with a resolution of 6720 × 4200 and 10 TMA images with a resolution of 5120 × 4096. Both kinds of images were re-sorted into three sets of training samples. Each set of training samples consisted of 10 images, such as 10 WS images, 10 TMA images or a mixed set of 10 images (5 WS images and 5 TMA images). The test datasets were two datasets comprising DAB-stained WS images or DAB-stained TMA images. Both test datasets consisted of 50 images captured under 40× magnitude with a resolution of 1680 × 1050.
Since the construction of the statistical model is based on collecting colour pixels using an interactive tool, models constructed by different users may produce different detection results when applied to a given set of images. It was therefore important to evaluate the robustness of the tool-generated statistical colour detection models. The robustness of statistical colour models created with the interactive tool was evaluated by measuring detection accuracy and variations in detection.
Four users participated in an experiment investigating detection of the brown colour in DAB-stained IHC images. All four users used the same training dataset to create models using the interactive tool. These models were then tested on the same test datasets, which were different from the training sets. As users may differ in what colours they classify as ’brown’. We calculated their true-positive ratio and false-positive ratio separately.
Each user was required to build three statistical models to detect brown colour. The colour pixels used were collected separately from each set of DAB-stained training samples. In this way the four users created 12 models that were automatically generated from the collections of colour pixels they selected using the interactive tool.
Step two: comparison of the statistical colour detection method with other methods
The dataset used in the comparative study was the same as that used in the robustness evaluation reported in previous experiment.
In this study we compared the statistical method with two previously developed colour detection methods in widespread use [9, 10]. All the methods were trained and tested based on the same datasets, which were prepared from different types of images of IHC staining. We compared the terms of detection accuracy, separation of stain colours, and variations between user trained models and vectors.
Accuracy of colour detection in DAB-stained samples
10 CD vectors obtained from 5 WS training images and 5 TMA training images for brown colour detection
The AUC values of the ROC curves of 12 models, when false positive ratio equals 10 %
Model (WS) (%)
Model (TMA) (%)
Model (mix) (%)
Model (WS) (%)
Model (TMA) (%)
Model (mix) (%)
To clarify these results, we calculated AUROC (area under ROC curve). Table 3 shows that the statistical colour models produced the best results. CD produced much better results than CMYK on both WS and TMA test images. Table 2 also shows that the user-generated models had varied slightly in terms of detection accuracy. For example, for brown colour detection, the lowest AUC was 94.6 % and the maximum was 97.2 %.
These results indicate that models generated by different users using this tool are all highly accurate and therefore that the method is robust and fairly user-independent. However, CD method with trained vectors has obvious variations in detection results, especially in detection of DAB stained TMA images, see Fig. 5. The results also show the mixed models and models constructed not from corresponding training images can generate similar results to the models only trained by corresponding training images. It demostrates the model constructed by the whole range of colour shades can be adopted in different datasets obtained from different diseases for the same stain colour detection.
Dark stain colour detection
Normal brown colour was detected easily and separated from the background by all three methods. Detection of brown coloration in a dark-stained slide is more challenging however; the CMYK method undercounted dark brown-coloured pixels (Fig. 1d, e) whereas CD falsely detected dark brown as the colour blue (Fig. 1b, c). This evaluation of CMYK demonstrated that a colour space-based method performed less accurately in stain colour detection . Classification of multi-stain colours in colour space may be affected by overlap. The CD method also suffers from this problem and the non-linear light absorption of DAB stain. The statistically-based interactive tool detected dark brown and blue correctly (see Fig. 7).
Step three: use of statistical colour models in assessment of human histopathology
We prepared two datasets in this experiment, one contained 25 images randomly selected from 60 images stained with PSR, and the other contained 48 images randomly selected from 189 slides stained with DAB. In the former dataset, we created a statistical colour model using a training set of 5 PSR WS images with a resolution of 3360 × 2100. The test dataset consisted of 20 PSR-stained images, captured under 5× magnification with a resolution of 3360 × 2100. For the later, the training samples used to create the model were 10 images with a resolution of 5600 × 4200. The model was then tested on a large dataset consisting of 38 images, captured under 5 × magnification with a resolution of 5600 × 4200.
The statistical colour model can be adapted to detect pixels of any colour, not just the brown target pixels typical of IHC. The PSR stain is used to assess fibrosis in liver tissue; it stains the connective tissue matrix pink and background liver tissue pale yellow.
Elastin accumulates in the liver as fibrosis progresses  and can be specifically detected using IHC. The target pixels are stained brown with blue counter-stain. We applied the statistical detection method to the detection of brown colour in liver cirrhosis biopsies stained for elastin fibres.
The AUC values of the ROC curves of pink colour detection in PSR stained images and brown colour detection in DAB stained elastin images, when false positive ratio equals 10 %
ST model (%)
CD best vector (%)
Model (ST trans) (%)
Model (ST elastin) (%)
Model (CD builtin) (%)
Model (CD best vector) (%)
The ROC curves for pink colour detection and brown colour detection are shown in Fig. 8. The specified resutls are shown in Table 4. For pink colour detection, statistical model and CD method obtained similar results. In particular, statisical model had fewer true-positive rate and AUC value than CD, but a 0.6 % false-positive rate compared with 9.1 % for the CD method.
For the detection of brown colour, we used two statistical models mentioned above. Although both models and CD method achieved highly accurate results, the statistical models achieved much better results, both in accuracy and AUC. The transferred statistical model in results was simlar to the model trained from corresponding training images. It also demostrates the model constructed by the whole range of colour shades can reduce the inter-dataset variation.
We compared the methods in calculating the percentage of pink-coloured pixels or brown-coloured pixels represented in the slide, and the correlation between the detected results and the manually calculated results . The calculation process was similar to . The manually calculated results were obtained by using the manually labelled stain colour against the tissue slide. Results have shown statistical model can achieve higher \(R^2\) than CD method in detecting of DAB stain and have equal \(R^2\) with CD method in detection of PSR stain.(PSR: ST 0.9994 vs CD 0.9823; DAB: ST 0.8658 vs CD 0.5183).
It is clear that stain colour detection is similar to normal colour detection in computer vision. A statistical model can also produce good results in medical image analysis. The statistical model combined with an interactive human training process yielded better results than CD or CMYK methods with the DAB-stained tissue samples. This study has demonstrated that the tool we have developed, which is based on a statistical model, to colour detection is in concordance with human evaluation.
The accuracy of the model may be affected by the colour space selected and the collected colour pixels used to train the model. Four colour spaces were compared in the DAB colour detection study, which compared detection accuracy using the different chromomeric channel domains. RGB colour space and the absolute luminance channel can be discarded to reduce computation costs and reduce the requirement for computer memory.
The model is generated from colour pixels collected from a set of training images using an interactive tool. Although the tool makes the colour pixel collection process easier, individual human differences might affect detection accuracy, so we evaluated the robustness of the method, by testing 12 models generated by four different users from three sets of training images on the same test dataset. Detection accuracy varied only slightly between users and there were no obvious inter-observer differences.
It is commonly that slides prepared with different approaches or from different diseases may stain with same colour. Models constrcuted from one dataset might not be transferred to another dataset for the same stain colour detection with same accuracy. We considered this issue and crossly tested trained models among different datasets. Results have shown only slightly variations between these models and there were no obvious inter-dataset performance degradation.
Model detects IHC and histochemical stain colour stable and efficient. The tool makes the model customization very easy. The user should aim to collect pixels representing the whole range of colour shades. Collecting a sufficiently large sample of colour pixels may reduce both inter-observer and inter-dataset differences.
JS concieved of the study, participated in its design, program the software, carried out the experiments, analyzed the data and drafted the manuscript. DGE prepared the datasets and helped to draft the manuscript. JD helped to draft the manuscript. GQ concieved of the study, participated in its design, analyzed the data and helped to draft the manuscript. MI concieved of the study, participated in its design and prepared the datasets. All authors read and approved the final manuscript.
This work is partially supported by UK EPSRC Grant EP/J020257/1 and by the International Doctoral Innovation Center (IDIC) program of the University of Nottingham Ningbo China sponsored by Ningbo Municipal Bureaus of Education and Science & Technology and by National Natural Science Foundation of China (No.61371143) and by subject construction and cultivation of superiority subject program of the North China University of Technology (No.XN078).
The authors declare that they have no competing interests.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Ruifrok A, Walker RA. Quantification of immunohistochemistry-issues concerning methods, utility and semiquantitative assessment i. Histopathology. 2006;49:406–10.View ArticleGoogle Scholar
- Rodrigues NR, Rowan A, Smith ME, Kerr IB, Bodmer WF, Gannon JV, Lane DP. p53 mutations in colorectal cancer. Proc Natl Acad Sci USA. 1990;87:7555–9.View ArticleGoogle Scholar
- Pellicoro A, Aucott RL, Ramachandran P, Robson AJ, Fallowfield JA, Snowdon VK, Hartland SN, Vernon M, Duffield JS, Benyon RC, et al. Elastin accumulation is regulated at the level of degradation by macrophage metalloelastase (mmp-12) during experimental liver fibrosis. Hepatology. 2011;55(6):1965–75.View ArticleGoogle Scholar
- Ishak K, Baptista A, Bianchi L, Callea F, De Groote J, Gudat F, Denk H. Histological grading and staging of chronic hepatitis. J Hepatol. 1995;22:696–9.View ArticleGoogle Scholar
- Jie S, Guoping Q, Mohammad I, Dolman DG, editors: 2013 Seventh international conference on image and graphics (ICIG), on IEEE: 26-28 July 2013; Qingdao: IEEE, Qingdao; 2013.Google Scholar
- Jie S, Guoping Q, Mohammad I, Philip K. Biomarker detection in whole slide imaging based on statistical color models. MIDAS J Comput Imaging Biomark Tumors (CIBT). 2010.Google Scholar
- Jie S. Immunohistochemistry image analysis : protein, nuclei and gland. PhD thesis, University of Nottingham, Computer Science Department; 2015.Google Scholar
- Brey EM, Lalani Z, Johnston C, Wong M, McIntire LV, Duke PJ, Patrick CWJ. Automated selection of dab-labeled tissue for immunohistochmical quantification. J Histochem Cytochem. 2003;51:575–84.View ArticleGoogle Scholar
- Nhu-An P, Andrew M, Joerg S, Sarit A, Vladimir I, Ming-Sound T, James H, David WH. Quantitative image analysis of immunohistochemical stains using a cmyk color model. Diagn Pathol. 2007.Google Scholar
- Ruifrok A, Johnston DA. Quantification of histochemical staining by color deconvolution. Anal Quant Cytol Histol. 2001;23:291–9.Google Scholar
- van der Loos CM. Multiple immunoenzyme staining: methods and visualizations for the observation with spectral imaging. J Histochem Cytochem. 2008;56:313–28.View ArticleGoogle Scholar
- Jones MJ, Rehg JM. Statistical color models with application to skin detection. Int J Comp Vision. 2002;46(1):81–96.View ArticleMATHGoogle Scholar
- Megan JM, Wakkas F, Rose W, Yoko Y, Michael P, Alison P, Chris W, Susan D, Rachel SK, David JK, Elain CJ, Mohammad I. Aberrant p53 expression lacks prognostic or predictive significance in colorectal cancer: results from the victor trial. Anticancer Res. 2015;35(3):1641–5.Google Scholar
- Plugin of colour deconvolution. http://www.dentistry.bham.ac.uk/landinig/software/cdeconv/cdeconv.htmlGoogle Scholar
- Plugin of CMYK. http://imagej.nih.gov/ij/plugins/cmyk/index.htmlGoogle Scholar
- Ruifrok AC, Katz RL, Johnston DA. Comparison of quantification of histochemical staining by hue-saturation-intensity (hsi) transformation and color-deconvolution. Appl Immunohistochem Mol Morphol. 2003;11:85–91.Google Scholar
- Liban E, Ungar H. Elastosis in fibrotic and cirrhotic processes of the liver. Arch Pathol. 1959;68(3):331–41.Google Scholar