Histological image segmentation using fast mean shift clustering method
- Geming Wu†^{1},
- Xinyan Zhao†^{2},
- Shuqian Luo^{1} and
- Hongli Shi^{1}Email author
https://doi.org/10.1186/s12938-015-0020-x
© Wu et al.; licensee BioMed Central. 2015
Received: 15 November 2014
Accepted: 2 March 2015
Published: 20 March 2015
Abstract
Background
Colour image segmentation is fundamental and critical for quantitative histological image analysis. The complexity of the microstructure and the approach to make histological images results in variable staining and illumination variations. And ultra-high resolution of histological images makes it is hard for image segmentation methods to achieve high-quality segmentation results and low computation cost at the same time.
Methods
Mean Shift clustering approach is employed for histological image segmentation. Colour histological image is transformed from RGB to CIE L*a*b* colour space, and then a* and b* components are extracted as features. To speed up Mean Shift algorithm, the probability density distribution is estimated in feature space in advance and then the Mean Shift scheme is used to separate the feature space into different regions by finding the density peaks quickly. And an integral scheme is employed to reduce the computation cost of mean shift vector significantly. Finally image pixels are classified into clusters according to which region their features fall into in feature space.
Results
Numerical experiments are carried on liver fibrosis histological images. Experimental results demonstrate that Mean Shift clustering achieves more accurate results than k-means but is computational expensive, and the speed of the improved Mean Shift method is comparable to that of k-means while the accuracy of segmentation results is the same as that achieved using standard Mean Shift method.
Conclusions
An effective and reliable histological image segmentation approach is proposed in this paper. It employs improved Mean Shift clustering, which is speed up by using probability density distribution estimation and the integral scheme.
Keywords
Background
In recent years, with the increasing demands of quantitative analysis, digital image processing techniques attract more and more attention in histopathology [1,2]. They are considered to be more reliable than traditional manual assessment which heavily depends on the operator’s experience and usually can not be reproduced. Image segmentation serves as a fundamental and key technique and is typically the first step in digital image analysis. In histological image analysis, it partitions a digitized histological image into multiple homogeneous or similar regions which corresponding to different tissues or cellular components. As the result of image segmentation is taken as the input of the successive processing steps, it is essential and critical to the quality of final result of qualitative image analysis.
Histological image segmentation is much difficult due to the complex approach to make histological images. Histological specimens are fixed, processed, embedded, sectioned and then stained into different colours in different tissues. Coloured histological sections provide anatomical details for the diagnosis and histopathology, and bring out special features of different tissues at the microscopic level. To identify different tissues or cellular components, histological sections are segmented according to colour, shape or texture features after acquired with high-resolution digital camera, and then classified by commonly employing supervised methods [3-6]. Several methods based on digital image processing and pattern recognition techniques have been proposed to deal with histological image segmentation problem in the past years [7-12]. When no training data set is available in histological image analysis, feature vectors or points representing pixels in the histological image are usually extracted from their local properties. Then unsupervised techniques are used to label pixels into different clusters by separating feature vectors in the feature space. There are many existing approaches in literature which can be employed for segmenting histological image. These include clustering based methods (k-Means [13] and Mean Shift [14]), mixture models based schemes (i.e. Gaussian mixture models based Expectation-Maximization clustering, GMM-EM [15]) and state-of-the-art energy minimization based approaches (graph-cuts methods [16-18] and Markov Random Field based method [19]). However, histological image segmentation still faces several technique challenges. One of them is how to achieve high-quality segmentation results with low computation cost, especially for sequential histological image analysis. The size of acquired histological image is usually very large for better investigation of micro-structures. It makes most of existing image segmentation algorithms, such as Mean Shift, very time-consuming and hard to be used in practice. Mean Shift is a non-parametric clustering approach which has no assumptions on the shape of the distribution and the number of clusters. So Mean Shift may achieve better segmentation results than model-based clustering schemes when it is used as a histological image segmentation method.
In this paper, we focus on pixel-level segmentation by colours in histological image with unsupervised method. A fast Mean Shift clustering approach is proposed and applied to segment histological images properly. The new method estimates the probability density distribution in advance and then separates the feature space into different regions by employing Mean Shift to find the density peaks. Feature points are divided into clusters according to the region that they fall into. And an integral scheme is employed to speed up the computation of mean shift vector. To apply the proposed method to histological image segmentation, CIE L*a*b* colour space is used as feature space.
The rest parts of this article are organized as follows: Section 2 describes the main ideas and schemes employed to speed up Mean Shift clustering approach in details. The framework of histological image segmentation based on the proposed method is also given in this section. Section 3 shows numerical experimental results from live fibrosis histological images, and followed by the study conclusion in Section 4.
Methods
Feature extraction in L*a*b* color space
To obviously identify different tissues in histological specimens, they are stained into different colours. Digitized histological images are usually acquired and stored in RGB. CIE L*a*b* colour space is a non-linear transformation of RGB, and models all visible colours approximate to human vision. So the simple Euclidean distance in L*a*b* space could differentiate among colours perceptually. We convert histological images from RGB space to L*a*b* space and only extract a* and b* components as features for clustering to ignore variations in brightness. Thus histological image segmentation becomes a 2D clustering problem. Denote V _{ ab } the feature space consisting of a* and b* components.
Fast mean shift clustering
where S _{ h } is a sphere with center x and radius h, and K _{ s } is the number of points located in S _{ h }. h is termed the window size.
Standard Mean Shift clustering method employs an iterative gradient ascent procedure to estimate local density. For a given point x in the feature space, it sets x ← x + m _{ h }(x) and repeats this step until convergence. The stationary points of this procedure represent the modes of the underlying distribution. Points associated with the same stationary point are considered as members of the same cluster.
Mean shift vector estimation
To find out the mean shift vector at x, one has to compute the distances from all points to the specified centre x and selects out those located in S _{ h }. Thus the computational cost of mean shift vector is very high and results in standard Mean Shift a very slow clustering approach.
If w _{ i } and c _{ i } are computed in advance and stored in the frequency matrix w and the centre matrix c respectively, R _{ h } corresponds to a sub-region of these matrices. Thus for a given x, it is not needed to find out the points located in S _{ h } by computing the distances from all points to the x. So the computation cost of \( {\widehat{m}}_h \) is much lower than that of m _{ h }.
Using of integral image scheme
where (m,n) satisfies ‖c(m, n) − x‖_{∞} ≤ e.
The above formula shows that the computation cost of mean shift vector is no longer related to the number of points and can be performed in constant time.
Histological image segmentation framework
Based on the above discussion, the framework of the proposed fast mean shift method (FMShift) for histological image segmentation is summarized as follows:
Algorithm: Fast mean shift clustering for histological image segmentation
Initialization:
Input color histological image I, e and h (≫e).
Main:
Step 1: Transform I from RGB to L*a*b* color space, and extract a* and b* color components to build a feature space V _{ ab }. Then separate the feature space into tiny squares with side length of 2e, and compute the frequency matrix w of feature points and center matrix c.
Step 2: Compute W and M by using Eq. (6) and (7). Employ Mean Shift approach to separate the feature space into different regions by finding the density peaks.
Step 3: Divide pixels of histological image into clusters according to which region their corresponding feature point falls into.
Output:
Segmentation labels
Experiments and discussion
k-Means++ [23], GMM-EM, Mean Shift, Kernel Graph-cuts (KGC) [18], hidden Markov Random Field with Expectation-Minimization algorithm (HMRF-EM) [19] and the proposed FMShift method had been implemented using MATLAB (The MathWorks, Inc., Natick, MA, USA), and were employed for histological image segmentation. a* and b* colour components of histological images were used as input features for previous three methods and FMShift while RGB for KGC and HMRF-EM. For the convenience of comparisons, the number of clusters k was given in prior. The clustering results of FMShift and Mean Shift were sorted by the number of assigned points in descent order. Then only the top k clusters were kept and others were reassigned into the nearest cluster. Considering that the objective of the segmentation is to identify fibrosis and vessels from other liver tissues, we set k = 3 in all experiments. In FMShift method, we set the parameter e to one thousandth of maximum range of a* and b* colour components, and h to 40 times e. Radial basis function with σ = 0.5 was used as the kernel function in KGC method. And the number of iterations of EM algorithm in HMRF-EM and GMM-EM is limited up to 100.
where H(X) is the entropy of X, and I(X,Y) is the mutual information between X and Y. Larger values of DI and RI and a smaller value of VoI mean higher segmentation accuracy.
Comparison of segmentation accuracies
Method | Dice index | Rand index | Variation of information | |
---|---|---|---|---|
Fibrosis | Vessels | |||
k-Means | 0.67 ± 0.24 | 0.52 ± 0.24 | 0.69 ± 0.08 | 1.25 ± 0.20 |
GMM-EM | 0.60 ± 0.13 | 0.68 ± 0.22 | 0.76 ± 0.10 | 1.03 ± 0.29 |
HMRF-EM | 0.28 ± 0.14 | 0.76 ± 0.22 | 0.62 ± 0.10 | 1.35 ± 0.24 |
KGC | 0.23 ± 0.10 | 0.69 ± 0.19 | 0.55 ± 0.08 | 1.62 ± 0.16 |
Mean Shift | 0.87 ± 0.05 | 0.84 ± 0.09 | 0.91 ± 0.03 | 0.53 ± 0.12 |
FMShift | 0.86 ± 0.05 | 0.84 ± 0.07 | 0.91 ± 0.02 | 0.54 ± 0.12 |
Comparison of average computation times
Method | k-Means++ | GMM-EM | HMRF-EM | KGC | Mean Shift | FMShift |
---|---|---|---|---|---|---|
Avg. times (sec.) | 4.2 | 117.0 | 954.2 | 14.8 | 732.3 | 6.6 |
It is shown in Figure 4 that the tissues adjacent to collagen tend to be wrong segmented as fibrosis using HMRF-EM and KGC. The possible reason is that the shape of collagen is thin and elongated. Graph cut based image segmentation method has the problems to segment them due to shrinkage bias, and HMRF-EM uses spatial information through the mutual influences of neighbours. So it results in small values of Dice Indexes for fibrosis segmentation by using HMRF-EM and KGC methods and also decreases the Rand Indexes as listed in Table 1. At the same time, k-Means method can successfully detect spherical clusters with similar size but is not a good scheme for others. This limitation makes it easy to assign points to wrong clusters. Thus the tissues adjacent to sinusoids are segmented wrong by k-Means++ as shown in Figure 4. GMM-EM is based Gaussian mixture models and is sensitive to the distribution of feature points. However, this requirement is usually hard to meet in the case of histological image segmentation. So GMM-EM does not achieve high Dice and Rand indexes as shown in Table 1.
The quantitative results listed in Tables 1 and 2 show that the speed of FMShift method is comparable to that of k-Means++ and much faster than other schemes including standard Mean Shift clustering approach while the segmentation accuracies obtained by FMShift and Mean Shift are almost the same but much better than that of other methods. By estimating the probability density distribution in advance and employing the integral image scheme, FMShift reduces the computation cost of standard Mean Shift clustering significantly. And because histological images are stained manually, the range of colour components is restricted. This makes the frequency matrix w sparse and also speeds it up to find the density peaks in FMShift. The above reasons make FMShift a very fast approach even in handling large-scale histological image segmentation problems. At the same time, FMShift is Mean Shift scheme based. So it has no demands on the shape of underlying distribution and thus achieves high accuracies in liver histological image segmentation.
Conclusions
We have developed a histological image segmentation approach by employing improved Mean Shift clustering. To eliminate illumination variations, colour histological image is transformed into CIE L*a*b* colour space, and then a* and b* components are extracted as features for clustering. The clustering approach consists of three steps. In the first step, the probability density distribution is estimated by splitting the effective feature space located with observations into tiny squares and computing the frequencies of observations occurring in each square. The second step is to separate the feature space into different regions by employing Mean Shift scheme to finding density peaks. Then all observations are assigned into different clusters according to which square they fall into in the last step. And an integral image scheme is used to speed up the computation of mean shift vector at the same time. By employing the probability density estimation and integral scheme, the computation cost of standard Mean Shift clustering method is significantly reduced while keeping the accuracy the same. From the results of numerical experiments on liver fibrosis histological images, we have the conclusion that the proposed method is a fast and reliable approach for color image segmentation, especially for large-scale histological image segmentation.
In this paper, we only use two dimensional color features of histological images and discuss how to accelerate Mean Shift method in the two-dimension case. Our future work includes extending fast Mean Shift scheme to high dimensional case and taking more features, such as shape and texture, into consideration for better performance.
Notes
Declarations
Acknowledgements
This study was supported by Basic-clinical Cooperation Research Foundation of Capital Medical University of China (No. 13JL65), the National Natural Science Foundation of China, Grant No. 61227802, 60532090 and 30770593, and by the 7th Framework Programme of the European Community, Grant Agreement Number PIRSES-GA-2009-269124.
Authors’ Affiliations
References
- Madabhushi A. Digital pathology image analysis: Opportunities and challenges. Imaging Med. 2009;1:7–10.View ArticleGoogle Scholar
- He L, Long LR, Antani S, Thoma GR. Histology image analysis for carcinoma detection and grading. Comput Methods Programs Biomed. 2012;107(3):538–56.View ArticleGoogle Scholar
- Tabesh A, Teverovskiy M, Pang HY, et al. Multifeature prostate cancer diagnosis and Gleason grading of histological images. IEEE Trans Med Imaging. 2007;26(10):1366–78.View ArticleGoogle Scholar
- Al-Kofahi Y, Lassoued W, Lee W, Roysam B. Improved automatic detection and segmentation of cell nuclei in histopathology images. IEEE Trans Biomed Eng. 2010;57(4):841–52.View ArticleGoogle Scholar
- Janssens T, Antanas L, Derde S, Vanhorebeek I, Berghe GV, Grandas FG. CHARISMA: An integrated approach to automatic H&E-stained skeletal muscle cell segmentation using supervised learning and novel robust clump splitting. Med Image Anal. 2013;17:1206–19.View ArticleGoogle Scholar
- Xu Y, Zhu JY, Chang EIC, Lai MD, Tu ZW. Weakly supervised histopathology cancer image segmentation and classification. Med Image Anal. 2014;18:591–604.View ArticleGoogle Scholar
- Saraswat M, Arya KV. Automated microscopic image analysis for leukocytes identification: A survey. Micron. 2014;65:20–33.View ArticleGoogle Scholar
- Yang L, Meer P, Foran DJ. Unsupervised segmentation based on robust estimation and colour active contour models. IEEE Trans Inform Technol Biomed. 2005;9:475–86.View ArticleGoogle Scholar
- Lezoray O, Cardot H. Cooperation of colour pixel classification schemes and colour watershed: a study for microscopic images. IEEE Trans Image Process. 2002;11(7):783–9.View ArticleGoogle Scholar
- Mohapatra S, Patra D, Satpathy S. Automated leukemia detection in blood microscopic images using statistical texture analysis. In: Proceedings of International Conference on Communication, Computing & Security. New York, USA: ACM; 2011. p. 184–7.Google Scholar
- Ko BC, Gim J, Nam J. Automatic white blood cell segmentation using step-wise merging rules and gradient vector flow snake. Micron. 2011;42:695–705.View ArticleGoogle Scholar
- He L, Long LR, Antani S, Thoma GR. Distribution Fitting-based Pixel Labeling For Histology Image Segmentation, Proc. SPIE 7963, Medical Imaging. Florida, USA: Computer-Aided Diagnosis, SPIE; 2011. p. 79633D.Google Scholar
- Hartigan JA, Wong MA. Algorithm AS 136: A K-Means Clustering Algorithm, Journal of the Royal Statistical Society. Series C. 1979;28(1):100–8.MATHGoogle Scholar
- Comaniciu D, Meer P. Mean Shift: A Robust Approach Toward Feature Space Analysis, IEEE Trans. Pattern Analysis and Machine Intelligence. 2002;24(5):603–19.View ArticleGoogle Scholar
- Xu L, Jordan MI. On Convergence Properties of the EM Algorithm for Gaussian Mixtures. Neural Comput. 1996;8(1):129–51.View ArticleGoogle Scholar
- Shi J, Malik J. Normalized Cut Method for Image Segmentation, IEEE Trans. Pattern Analysis and Machine Intelligence. 2000;22(8):888–905.View ArticleGoogle Scholar
- Boykov Y, Veksler O, Zabih R. Fast approximate energy minimization via graph cuts. IEEE Trans Pattern Anal Mach Intell. 2001;23(11):1222–39.View ArticleGoogle Scholar
- Ben Salah M, Mitiche A, Ben Ayed I. Multiregion Image segmentation by Parametric Kernel Graph Cuts. IEEE Trans Image Process. 2011;20(2):545–57.View ArticleMathSciNetGoogle Scholar
- Zhang Y, Brady M, Smith S. Segmentation of brain MR images through a hidden Markov Random Field model and the expectation–maximization algorithm. IEEE Trans Med Imaging. 2001;20(1):45–57.View ArticleGoogle Scholar
- Franklin C. Summed-area tables for texture mapping, SIGGRAPH ’84: Proceedings of the 11th annual conference on Computer graphics and interactive techniques. ACM, New York, USA. 1984;18(3):207–12.Google Scholar
- Viola P, Michael J. Rapid object detection using a boosted cascade of simple features, Proceedings of the 2001. IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 2001;1:511–8.Google Scholar
- Wang J, Guo YW, Ying YT, Liu YL, Peng QS. Fast non-local algorithm for image denoising, Proceeding of IEEE International Conference on Image Processing. 2006. p. 1429–32.Google Scholar
- Arthur D, Vassilvitskii S. k-means++: The advantages of of careful seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms. New York, USA: ACM; 2007. p. 1027–35.Google Scholar
- Rand WM. Objective criteria for the evaluation of clustering methods. J Am Stat Assoc. 1971;66(336):846–50.View ArticleGoogle Scholar
- Meilă M. Comparing clusterings – an information based distance. J Multivar Anal. 2007;98:873–95.View ArticleMATHGoogle Scholar
Copyright
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.