Data cluster analysis-based classification of overlapping nuclei in Pap smear samples
© Guven and Cengizler; licensee BioMed Central Ltd. 2014
Received: 16 October 2014
Accepted: 1 December 2014
Published: 9 December 2014
The extraction of overlapping cell nuclei is a critical issue in automated diagnosis systems. Due to the similarities between overlapping and malignant nuclei, misclassification of the overlapped regions can affect the automated systems’ final decision. In this paper, we present a method for detecting overlapping cell nuclei in Pap smear samples.
Judgement about the presence of overlapping nuclei is performed in three steps using an unsupervised clustering approach: candidate nuclei regions are located and refined with morphological operations; key features are extracted; and candidate nuclei regions are clustered into two groups, overlapping or non-overlapping, A new combination of features containing two local minima-based and three shape-dependent features are extracted for determination of the presence or absence of overlapping. F1 score, precision, and recall values are used to evaluate the method’s classification performance.
In order to make evaluation, we compared the segmentation results of the proposed system with empirical contours. Experimental results indicate that applied morphological operations can locate most of the nuclei and produces accurate boundaries. Independent features significance test indicates that our feature combination is significant for overlapping nuclei. Comparisons of the classification results of a fuzzy clustering algorithm and a non-fuzzy clustering algorithm show that the fuzzy approach would be a more convenient mechanism for classification of overlapping.
The main contribution of this study is the development of a decision mechanism for identifying overlapping nuclei to further improve the extraction process with respect to the segmentation of interregional borders, nuclei area, and radius. Experimental results showed that our unsupervised approach with proposed feature combination yields acceptable performance for detection of overlapping nuclei.
KeywordsPap smear Nuclei Overlapped Clustering
Although cervical cancer is one of the most mortal cancers in women, it is highly curable if it is diagnosed at an early stage. Pap smear test is a popular gynecological scanning test to diagnose cervical cancer. It is based on interpretation of cervical cells under microscopic examination. During manual screening of cervical cytology samples, the observer searches for morphometric changes and visual abnormalities on cells . The false rate ratio may be increase in this screening due to subjective variability of different observers. Moreover, manual screening is an unreasonably time-consuming and costly process due to several types of distortions such as uneven dyeing, optical errors, artifacts, overlapping cells, mucus, blood etc. on samples. Thus, there has been a great motivation for automating Pap-test to reduce human error and to decrease the time consumption . An automated Pap smear screening system should be able to delineate cells within samples to classify cervical cells.
In malignant cells, nuclei may be disproportionately enlarged and irregular both in form and outline. Thus, one of the most common features that guide the detection of an existing malignancy is an increased nucleus-to-cytoplasm ratio . Hence, one of the highest priority tasks for an automated Pap smear monitoring system is the segmentation of cell nuclei. Moreover, the correct interpretation of nuclei abnormality depends on accuracy of the nuclei detection mechanism in automated systems .
In most Pap smear samples, some nuclei overlapping occurs, which is a factor that makes automated Pap smear monitoring systems error prone [4, 5]. Overlapping cell nuclei often appear as adjacent darker regions within Pap smear samples. The appearance of these darker regions most likely cause automated systems to interpret the whole area as a single nucleus. Overlapping nuclei in the segmented region may cause the misclassification of a nucleus as abnormal. Thus, overlapping and adjacent nuclei must be distinguished prior to any further processing [3, 6].
Many studies have sought to develop methods for accurately determining the borders of overlapping cell nuclei. For instance, Jung et al. reported an unsupervised Bayesian classification scheme for separating overlapping regions . In another study, Li et al. utilized a modified gradient vector flow , as well as radiating gradient vector flow (RGVF) snake and k-means unsupervised clustering methods, for the accurate extraction of overlapping cytoplasm and nuclei in their study. Other methods including watershed were also proposed in the literature . These previous studies show that there has been a great interest in accurately determining cell nuclei borders inside adjacent regions . However, it is critical that before any further separation process takes place, each nucleus should be judged as to whether it is overlapping or not. Our study objective was to develop a fully automated elimination mechanism specializing in the classification of overlapping nuclei. Our proposed model is not a segmentation approach for determining interregional borders. Furthermore, this model may judge the region even if there are no apparent interregional nuclei borders.
We used morphological operations to determine cell nuclei borders and a clustering-based decision mechanism to examine detected objects to assess the presence of single or multiple nuclei inside a region. Using this approach, several new features are extracted to optimize the success of the clustering algorithm. We prefer to use a fuzzy c-means algorithm as a clustering method in this study, as it provides an unsupervised decision mechanism capable of distinguishing different classes of cell nuclei from their previously extracted features. One of the reasons we prefer a clustering-based algorithm is that no training or learning stages are needed in clustering-based approaches. This results in flexibility in the developed system and increases the success rates in cases where multiple samples are examined due to practical requirements.
A. Data set
All methods introduced here were applied to the study test set which consisted of a total of 290 nuclei within 10 cervical images where 8% of the detected nuclei overlapped. All images were taken from different subjects. A NIKON microscope equipped with 100× magnification is used for taking images which are processed with Papanicolaou staining. The study is performed in accordance with the Declaration of Helsinki and approved by institutional ethics committee. Ground truthing of the segmentation and classification processes was performed by two observers. While all images originally had 2560×1920 pixels, the samples were down-sized to 1280×960 pixels. All original sample images were stored in RGB color space in a JPEG format. In addition to the test set, we used a sample set of 16 public cervical cytology images from the International Symposium on Biomedical Imaging (ISBI, http://cs.adelaide.edu.au/~carneiro/isbi14_challenge/dataset.html) 14 Challenge for tuning and evaluation purpose. The set contains 690 nuclei where 14% of them were determined as overlapped. It should be noted that images from ISBI were not previously ranked for abnormality. Instead, we used 140 normal and 140 abnormal nuclei images from the Herlev data set (HDS) for observing the significance of the proposed feature set for abnormal and normal cell nuclei. The HDS consists of segmented single cells collected and ranked by cytotechnicians at the Department of Pathology at Herlev University Hospital and the Department of Automation at Technical University of Denmark for classification experiments. Totally 1240 nuclei including samples from Herlev data set were examined in the study.
B. Determination of nuclei boundaries
Cell nuclei appear as one of the darkest regions in most cervical samples. Other darker regions include those attributed to artifacts, mucus, blood, etc. According to global data, it is reasonable to presume that the location of the cell nuclei is in the intensity valleys . Nuclei boundaries cause the formation of high gradients on images as a result of the density difference between the cytoplasm and nuclei regions . Using this global information on the appearance of nuclei as a guide, we divided the nuclei extraction process into four consecutive steps in our study: 1) extraction of the gradient magnitude of the images; 2) filtering the images with an edge detection filter; 3) cleaning of some of the final images of any remaining artifacts via object size based filtering. 4) Final morphological operations for touching pixels and remaining artifacts. Sample outputs of these steps are presented in Figure 3 as a block diagram. A similar approach was used by Plissiti et al. , in which the authors extracted and filtered the gradient magnitude of samples to assess the initial nuclei contours.
We determined the corresponding gradient value of a sample image at a particular coordinate by combining the partial derivative of the image in the x and y directions. We converted all sample images to grayscale before beginning the work flow, and applied a Sobel operator as a discrete differentiation operator to determine the partial derivative in both directions.
In the final form of the output image, most of the nuclei boundaries were detected and filtered. However, nuclei belong to blood cells and nuclei outside the cytoplasmic regions were still within the samples. All undesired objects outside the previously detected cell clusters were removed as final step of nuclei segmentation stage. Most of the blood cells and nuclei outside the cell clusters were removed in this stage. HSV filter was eliminated most of the blood cells as a result of color and contrast differences. Final form of a sample region is given in Figure 3.
C. Feature extraction
Extracted features from detected nuclei regions
Major axis to minor axis ratio*
Equivalent diameter to actual diameter ratio*
Number of local minima**
Max distance between local minima**
The shape-based features we utilized depended on the regularity of the nuclei perimeter. We extracted regularity information by evaluating both axes, as shown in Figure 4e. The major and minor axes are basically two lines through the center of an ellipse-shaped object. The difference between the lengths of these two lines is less in the single regions than in the overlapping regions in most of the sample images. Overlapping results in flattened regions, which may be a characteristic appearance. Thus, one of the shape-based features we used was the highly distinctive ratio of these axes for nuclei discrimination . We also extracted the eccentricity of the candidate region in our evaluation, to determine how closely the shape of each object was to an ideal circle, as formulated in Table 1.
The final shape-based feature we extracted was the ratio of the object’s equivalent diameter to the actual diameter, which may significantly change if the boundary of the object is wavy and irregular. Most single nuclei tend to appear as circular smooth objects. High irregularity and/or a wavy regional boundary structure may indicate the presence of overlapping. Formulations of these shape-based features are given in Table 1.
Local minima points located inside regions most often indicate higher matter density. We may presume that more than one nucleus inside a candidate region will change the regular density distribution inside the region boundary. The existence of multiple nuclei inside a region causes fluctuations in the bottom points, which then increases the number of local minima [12, 13]. According to this information, increases in the number of local minima points are most likely inside overlapped regions. Increased numbers of local minima points are shown in shades of blue in Figure 5b.
The fuzzy c-means (FCM) clustering method, was first introduced by Dunn in 1973 and then improved by Bezdek et al. in 1981. FCM is simply the optimization of the basic c-means objective function using a fuzzy approach. In contrast to k-means clustering, every observation has a degree of association with all sets, according to their distances apart, and observations do not belong to just one cluster. Every point has a set of coefficients , each of these coefficients represents a degree of association with one of the clusters, and the centroids of the clusters are the weighted means of the sets.
Clustering is determined by an iterative algorithm. Centers of the clusters and coefficients are updated upon each iteration until the change in coefficients is less than a given threshold. A block diagram of the algorithm we used is shown in Figure 6b. Ultimately, all observations are divided into two main clusters at the end of the iteration process.
Segmentation success of method for 290 nuclei where tanimoto coefficient utilized as success criteria
Significance comparison of proposed and test feature sets for 290 Nuclei where independent features significance test utilized for objective comparison
Proposed feature set
Test feature set
Cluster Centroids of different classes from independent data sets
HERLEV data set
Number of nuclei
Number of nuclei
Test data set
Number of nuclei
Number of nuclei
Classification performance comparison of K-means and fuzzy C-means
Time elapsed (ms)
Overlapping occurs in most of the Pap smear samples in different degrees. Overlapped and adjacent nuclei regions appear mostly as larger, irregular objects in the samples . That excessive growth in size occurs in malignant cases is a matter of a priori knowledge about nuclei in Pap smear samples . Therefore a fully automated classification system for histological abnormalities should be able to differentiate and also separate overlapping/aggregating candidate objects.In this study we proposed a prerequisite approach for a fully automated separation system which involves a pre-classification system for advanced abnormality detection and interregional border extraction of nuclei. Most of the separation studies in the literature do not have any particular detection mechanism for locating and differentiating overlapping/aggregating nuclei. In this study, our goal was to propose an approach that could be used with previously introduced separation methods. Accordingly, we propose a new combination of features for clustering and for detecting overlapping regions even if there are abnormal nuclei inside the regions.
The gradient magnitude of the samples is processed with an edge filter initially, to extract the borders of the nuclei. The actual walls are then filtered from the remaining pixel groups using morphology-based filtering. We evaluated the capability of the proposed basic automated segmentation method by determining the Tanimoto coefficient (also known as Jaccard Index), which is a frequently used similarity measure for evaluating the similarity between two binary images [16, 18]. According to our Tanimoto similiarity criteria results in Table 2, the examined methods are capable of segmenting most of the nuclei regions. There are many studies that prefer similar morphological operations for pre-segmenting or preprocessing cervical cell nuclei [11, 19], and the proposed differentiating mechanism may also be integrated with other automated segmentation methods such as the watershed, active contours, and machine learning-based segmentation approaches [4, 5, 10]. The success of the segmentation stage directly affects the overall classification ability of the proposed approach . Proposed combination of features are also evaluated with semi-automated setup where nuclei were segmented by an observer. Results of semi-automated experiments, presented in Figure 10 showed that, some of the undetected nuclei within the fully-automated test samples are probably an effect of non-adaptive nature of preferred segmentation methods. It should be noted that, our approach may be highly compatible with semi-automated systems or a better adaptive segmentation mechanisms. An adaptive segmentation approach, perhaps based on a non-linear decision mechanism, could be adapted in future work to increase the detection capability.
We combine size and textural features in this study to achieve optimum results. We also evaluated the significance of the proposed feature combination by comparing it with an alternative feature set which is formed by frequently used features for classification of nuclei. Both feature sets were then compared with samples from the test data set for 290 nuclei. Results of this comparison given in Table 3 showed that the proposed feature set achieves a higher level of significance for nuclei overlapping. In previous studies, similar feature sets were used for segmentation and separation of overlapped nuclei [5, 11, 12]. Also, the experiments in previous studies showed that conducting a clustering analysis on size-dependent features only may not be sufficient for recognition of overlapping . So, there are also many studies that have combined textural and shape-based features [5, 11, 12]. However, the combination of features we introduce in this study is unique for use in the discrimination of overlapping. It should be noted that most studies did not have any particular mechanism for classifying overlapped regions before the segmentation process. Usually, morphological operations or alternative preprocessing stages were carried out prior to any further analyses [1, 3]. Methods, introduced in the study should be seen as a supporting approach to potentially increase the separation capabilities of existing overlapping nuclei segmentation methods.
In the present work, we classified the extracted nuclei features from nuclei using clustering-based methods. Since there is no need for a training set or stage with data clustering approaches, this system may be promising for the varying conditions of different samples. We also examined and compared two well-known fuzzy (FCM) and non-fuzzy (k-means) clustering approaches [14, 15]. According to Table 5, both of these methods are capable of discriminating overlapping. However the fuzzy c-means is faster and has a higher f-score, so it is computationally more effective and a better choice for our work. In addition, some consideration should be given to the idea that an optimizing fuzzy clustering approach may increase the classification capability .
The proposed features were also examined with samples from the Herlev data set, a well-known data set frequently used for performance testing benchmark data . Samples from the HDS were pre-classified and segmented. These samples are preferred for determining the centroids of clusters, since the data include both malignant and normal samples. We expected that the developed system would cluster overlapped and non-overlapped nuclei even in data containing abnormal cells data.
Table 4 presents that, both of the textural features tend to increase due to expanded area of nuclei in abnormal cases. However, it should be noted that most of the nuclei preserves it’s circular or ellipsoid structure in abnormal cases which is also indicated in the table. All shape based feature centroids are closer to normal single nuclei centroids in abnormal class in Table 4. Moreover textural features tend to change more significantly in overlapped class. In the presented data centroids of malignant and normal cell features are closer in value, which may indicate that abnormal and normal nuclei are most likely being classified in the same cluster.
Previous studies show that, features extracted from both cytoplasmic region and nuclei are essential for detection of abnormality in an automated Pap smear screening system [2, 5, 7, 22]. We proposed methods for discrimination of overlapped nuclei which should be suggested as an elimination mechanism before feature extraction for abnormality detection . As a result of refined samples from overlapped regions, classification abilities of automated systems are expected to be improved. It should be noted that eliminated overlapped regions can be separated in further stages for searching abnormality inside the region.
The developed and proposed methods in this study may be considered as a supporting approach for studies of the segmentation of interregional borders of nuclei where overlapping occurs. Our method does not depend on a certain quantity of nuclei inside the region. In fact, greater numbers of nuclei inside a region may be an advantage for classifying local minima-based features. In a practical sense, the main contribution of our method is as a pre-classification approach which includes specialized features for effective discrimination despite the varying overlapping conditions. We hope this study may serve as a new basis for further studies in automated Pap smear screening.
The authors would like to thank Assoc. Prof. Dr. Mutlu Avci, Prof. Dr. Aysun Uguz and Prof. Dr. Seyda Erdogan of Cukurova University for their valuable help.
- Fadasdeng LJ, Yu YJ, Min TZ, Kun YY, Dong W: Design of a separating algorithm for overlapping cell images [j]. J Comput Res Dev 2000, 2: 228–232.Google Scholar
- Lakshmi GK, Krishnaveni K: Automated extraction of cytoplasm and nuclei from cervical cytology images by fuzzy thresholding and active contours. Int J Comput Appl 2013, 73: 26–30.Google Scholar
- Jung C, Kim C, Chae SW, Oh S: Unsupervised segmentation of overlapped nuclei using bayesian classification. Biomed Eng IEEE Trans 2010, 57: 2825–2832.View ArticleGoogle Scholar
- Sulaiman SN, Isa NAM, Yusoff IA, Othman NH: Overlapping cells separation method for cervical cell images. Intelligent Systems Design and Applications (ISDA) 2010 10th International Conference on. Nov 2010, 1218–1222. doi:10.1109/ISDA.2010.5687020View ArticleGoogle Scholar
- Lu Z, Carneiro G, Bradley AP: Automated nucleus and cytoplasm segmentation of overlapping cervical cells. In Medical Image Computing and Computer-Assisted Intervention - MICCAI 2013. Volume 8149. Edited by: Mori K, Sakuma I, Sato Y, Barillot C, Navab N. Springer Berlin, Heidelberg; 2013:452–460. doi:10.1007/978–3-642–40811–3_57View ArticleGoogle Scholar
- Li K, Lu Z, Liu W, Yin J: Cytoplasm and nucleus segmentation in cervical smear images using Radiating GVF Snake. Pattern Recognition 2012,45(4):1255–1264. [http://dx.doi.org/10.1016/j.patcog.2011.09.018] 10.1016/j.patcog.2011.09.018View ArticleGoogle Scholar
- Li K, Lu Z, Liu W, Yin J: Cytoplasm and nucleus segmentation in cervical smear images using radiating gvf snake. Pattern Recognit 2012,45(4):1255–1264. 10.1016/j.patcog.2011.09.018View ArticleGoogle Scholar
- Malpica N, de Solórzano CO, Vaquero JJ, Santos A, Vallcorba I, García-Sagredo JM, del Pozo F: Applying watershed algorithms to the segmentation of clustered nuclei. Cytometry 1997,28(4):289–297.View ArticleGoogle Scholar
- Chen C, Wang W, Ozolek JA, Rohde GK: A flexible and robust approach for segmenting cell nuclei from 2d microscopy images using supervised learning and template matching. Cytometry Part A 2013,83(5):495–507.View ArticleGoogle Scholar
- Plissiti ME, Nikou C, Charchanti A: Automated detection of cell nuclei in pap smear images using morphological reconstruction and clustering. Inf Technol Biomed IEEE Trans 2011,15(2):233–241.View ArticleGoogle Scholar
- Plissiti ME, Nikou C, Charchanti A: Combining shape, texture and intensity features for cell nuclei extraction in pap smear images. Pattern Recognit Lett 2011,32(6):838–853. 10.1016/j.patrec.2011.01.008View ArticleGoogle Scholar
- Plissiti M. E, Nikou C: Overlapping cell nuclei segmentation using a spatially adaptive active physical model. Image Process IEEE Trans 2012,21(11):4568–4580.MathSciNetView ArticleGoogle Scholar
- Cengizler C: A fluid dynamics based image segmentation approach and pap-smear image data classification. PhD thesis. Cukurova University, Institue of Natural and Applied Sciences; 2013Google Scholar
- MacQueen J: Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability Volume 1. California, USA: University of California Press; 1967:281–297.Google Scholar
- Bezdek JC, Ehrlich R, Full W: Fcm: The fuzzy c-means clustering algorithm. Comput Geosci 1984,10(2):191–203.View ArticleGoogle Scholar
- Lipkus AH: A proof of the triangle inequality for the tanimoto distance. J Math Chem 1999,26(1–3):263–265.View ArticleGoogle Scholar
- Weiss SM, Indurkhya N, Zhang T, Damerau F: Text mining: predictive methods for analyzing unstructured information. Newyork: Springer-Verlag New, York Inc; 2010.Google Scholar
- Tan P-N, Steinbach M, Kumar V: Introduction to Data Mining, Addison. Boston, MA USA: Wesley Longman, Publishing Co., Inc; 2005.Google Scholar
- Walker RF, Jackway P, Lovell B, Longstaff ID: Classification of cervical cell nuclei using morphological segmentation and textural feature extraction. In Intelligent Information Systems,1994. Proceedings of the 1994 Second Australian and New Zealand Conference on. IEEE; 1994:297–301. doi:10.1109/ANZIIS.1994.396977Google Scholar
- Wang XY, Garibaldi JM: A comparison of fuzzy andnon-fuzzy clustering techniques in cancer diagnosis. In Proceedings of the 2nd International Conference on Computational Intelligence in Medicine and Healthcare (CIMED2005). BIOPATTERN Network of Excellence; 2005:250–256.Google Scholar
- Byriel J: Neuro-fuzzy classification of cells in cervical smears. Master’s Thesis Technical University of Denmark: Oersted-DTU, Automation; 1999Google Scholar
- Cengizler C, Guven M, Avci M: A fluid dynamics-based deformable model for segmentation of cervical cell images. Signal Image Video Process 2014,8(1):21–32. Springer London doi:10.1007/s11760–014–0719–3View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.