Feature extraction for the analysis of colon status from the endoscopic images

Background Extracting features from the colonoscopic images is essential for getting the features, which characterizes the properties of the colon. The features are employed in the computer-assisted diagnosis of colonoscopic images to assist the physician in detecting the colon status. Methods Endoscopic images contain rich texture and color information. Novel schemes are developed to extract new texture features from the texture spectra in the chromatic and achromatic domains, and color features for a selected region of interest from each color component histogram of the colonoscopic images. These features are reduced in size using Principal Component Analysis (PCA) and are evaluated using Backpropagation Neural Network (BPNN). Results Features extracted from endoscopic images were tested to classify the colon status as either normal or abnormal. The classification results obtained show the features' capability for classifying the colon's status. The average classification accuracy, which is using hybrid of the texture and color features with PCA (τ = 1%), is 97.72%. It is higher than the average classification accuracy using only texture (96.96%, τ = 1%) or color (90.52%, τ = 1%) features. Conclusion In conclusion, novel methods for extracting new texture- and color-based features from the colonoscopic images to classify the colon status have been proposed. A new approach using PCA in conjunction with BPNN for evaluating the features has also been proposed. The preliminary test results support the feasibility of the proposed method.


Background
In the case of colorectal cancer, abnormal cell growth takes place in the large intestine resulting in the formation of tumors. The detection of any abnormal growth in the colon at an early stage will increase the patient's chance of survival. A few methods, such as sigmoidoscopy, barium x-ray, etc., are available for examination of the colon, but colonoscopy is considered to be the best procedure at present for the detection of abnormalities in the colon [1]. Despite the usefulness of colonoscopic methods, an expert endoscopist is needed to detect colorectal cancer. The endoscopist uses a colonoscope to detect the presence of abnormalities in the colon. The analysis of the endoscopic images is usually performed visually and qualitatively. Consequently, there are constraints such as time-consuming procedures, subjective diagnosis by the expert, interpretational variation, and non-suitability for comparative evaluation. A computer-assisted scheme will help considerably in the quantitative characterization of abnormalities and image analysis, thereby improving overall efficiency in managing the patient.
Computer-assisted diagnosis in colonoscopy consists of colonoscopic image acquisition, image processing, parametric feature extraction, and classification. A number of schemes have been proposed to develop methods for computer-assisted diagnosis for the detection of colonic cancer images. Some researchers use microscopic images [2-4] and others use endoscopic images [5][6][7][8].
Esgiar, et al. [2,4] and Todman, et al. [3] have been using microscopic images to analyze and identify features of normal and cancerous colonic mucosa. A number of quantitative techniques for the analysis of images used in the diagnosis of colonic cancer have been investigated. Features based on texture analysis were derived using the co-occurrence matrix, viz., angular second moment, entropy, contrast, inverse difference moment, dissimilarity, and correlation [2]. Orientational coherence metrics have been derived from neurophysiological foundations and applied to the classification of colonic cancer images [3]. Fractal analysis has been also investigated in separating normal and cancerous images [4].
Krishnan, et al. [5][6][7][8] have been using endoscopic images to define features of the normal and the abnormal colon. New approaches for the characterization of colon based on a set of quantitative parameters, extracted by the fuzzy processing of colon images, have been used for assisting the colonoscopist in the assessment of the status of patients and were used as inputs to a rule-based decision strategy to find out whether the colon's lumen belongs to either an abnormal or normal category. The quantitative characteristics of the colon are: hue component, mean and standard deviation of RGB, perimeter, enclosed boundary area, form factor, and center of mass [5]. The analysis of the extracted quantitative parameters was performed using three different neural networks selected for classification of the colon. The three networks include a two-layer perceptron trained with the delta rule, a multilayer perceptron with Backpropagation learning and a self-organizing network. A comparative study of the three methods was also performed and it was observed that the self-organizing network is more appropriate for the classification of colon status [6]. A method of detecting the possible presence of abnormalities during the endoscopy of the lower gastro-intestinal system using curvature measures has been developed. In this method, image contours corresponding to haustra creases in the colon are extracted and the curvature of each contour is computed after nonparametric smoothing. Zero-crossings of the curvature along the contour are then detected. The presence of abnormalities is identified when there is a contour segment between two zero-crossings having the opposite curvature signs to those of the two neighboring contour segments. The proposed method can detect the possible presence of abnormalities such as polyps and tumors [7]. Fuzzy rulebased approaches to the labeling of colonoscopic images to render assistance to the clinician have been proposed. The color images are segmented using a scale-space filter. Several features are selected and fuzzified. The knowledgebased fuzzy rule-based system labels the segmented regions as background, lumen, and abnormalities (polyps, bleeding lesions) [8].
Endoscopic images contain rich information of texture and color. Therefore, the additional texture and color information can provide better results for the image analysis than approaches using merely intensity information. In this research work, the definition and extraction of quantitative parameters from endoscopic images based on texture and color information have been proposed. The test results obtained so far by the proposed approach have been encouraging.

Methodology
Endoscopic imaging methods have been used to examine the condition of the colon. The development of an intelligent method for the identification of the colon status is being explored as a computer-aided tool for the early detection of colorectal cancer. It is important to extract the quantitative parameters representing the characteristic properties of the colon from an endoscopic image. The quantitative features are used for detecting the normal or abnormal condition of the colon from the endoscopic images. Endoscopic images contain texture and color information. Features from texture and color are extracted from the endoscopic image to identify a normal colon from an abnormal colon. Krishnan, et al have extracted features from the histogram of the image in the chromatic domain, and the shape of the lumen in the spatial domain and fed them into a feed-forward Neural Network [6]. Four statistical measures, derived from the co-occurrence matrix in four different angles, namely angular second moment, correlation, inverse difference moment, and entropy, have been extracted by Karkanis [9].

Texture-based Feature Extraction
Texture analysis is one of the most important features used in image processing and pattern recognition. It can give information about the arrangement and spatial properties of fundamental image elements. Many methods have been proposed to extract texture features, e.g. the cooccurrence matrix [10], and the texture spectrum in the achromatic component of the image [11]. In this research, a new approach of obtaining quantitative parameters from the texture spectra is proposed both in the chromatic and achromatic domains of the image. These features are evaluated to demonstrate their usefulness and potential for classifying the colon status.
The definition of texture spectrum employs the determination of the texture unit (TU) and texture unit number (N TU ) values. Texture units characterize the local texture information for a given pixel and its neighborhood, and the statistics of all the texture units over the whole image reveal the global texture aspects.
Given a neighborhood of δ × δ pixels, which are denoted by a set containing δ × δ elements, P = {P 0 , P 1 , ..., P (δ × δ)-1 }, where P 0 represents the chromatic or achromatic value of the central pixel and P i {i = 1,2, ..., (δ × δ)-1} is the chromatic or achromatic value of the neighboring pixel i, The element E i occupies the same position as the i-th pixel.
Each element of the TU has one of three possible values; therefore the combination of all the eight elements results in 6561 possible TU's in total. The texture unit number (N TU ) is the label of the texture unit and is defined using the following equation: The texture spectrum histogram (Hist(i)) is obtained as the frequency distribution of all the texture units, with the abscissa showing the N TU and the ordinate representing its occurrence frequency.

Color-based Feature Extraction
Color colonoscopic images exhibited the same color features for the same colon status [1]. Malignant tumors are usually inflated and inflamed. The inflammation is usually reddish and more severe in color than the surrounding tissues. Benign tumors exhibit less intense hues. Redness may specify bleeding and black may be treated as deposits due to laxatives. Green may be the presence of faecal materials, which are not clear during the pre-operative preparation, and yellow relates to pus formation. Based on these properties, some features are extracted from the chromatic and achromatic histograms of the image. Principal component analysis is a feature reduction technique. It is an orthogonal decomposition, which projects data onto the eigenvectors of the covariance matrix of the data. By sorting the eigenvectors by eigenvalue, the projected dimensions can be ranked by variance (which is proportional to eigenvalue). The highest eigenvalues contain the dimensions of highest variance and class separability. By eliminating very small eigenvalues, the dimensionality of the projected space can be reduced without losing much information.

Results and Discussion
The proposed approaches were evaluated using 66 clini-cally obtained colonoscopic images (54 abnormal images and 12 normal images). Figure 1

Figure 1
The selected colonoscopic images of normal (N01, N02, N03, N06, N07) and abnormal (A02, A03, A04, A05, A06) cases.  Table 6 shows the number of features after applying PCA with different threshold values, τ, and without applying PCA. For example, τ = 5% means that those principal components whose contribution is less than 5% to the total variance in the feature set were eliminated. Figure 3 shows the graphs of the Marquardt training time comparison using PCA with different τ (5%, 2%, 1%, 0.1%) and various numbers of hidden neurons. Figure 4 shows the graphs of the Marquardt training (using PCA features, τ = 1%) time comparison for various numbers of hidden neurons. It shows that by using PCA, the overall training time is decreased, thereby increasing the perform-

Figure 2
The graphs of training time comparison for various training algorithms with different numbers of hidden neurons.
(page number not for citation purposes)    ance of BPNN. The success of the classification of colon status is measured by the accuracy, which is shown in Table 7. The {H, S, I} has better performance for classification than other data sets {C} on average, for both with PCA (96.96%, τ = 1%) and without PCA (93.02%). The classification rate increased by applying PCA, and τ = 1% gave better accuracy rates than other values of τ.
Color features extracted from the histogram of the endoscopic images were also defined. Certain lower and upper threshold values of the region of interest (ROI) were selected for the histograms of each image. Where possible, certain patterns from R, G, B, H, S, and I histograms were identified which could distinguish between normal and abnormal colon conditions. The histograms are plotted in Figures 5,6,7,8,9,10. The selection of the ROI is rather subjective. Based on the histograms of the images, the ROI for each type of histogram is selected as an input for the NN classification. The ROI is chosen such that patterns are seen in the ROI for both normal and abnormal cases. Figure 5 shows a histogram for both normal and abnormal endoscopic images. It is observed that for normal and abnormal conditions, the redness is concentrated around the 200 to 255 levels (where 0 is the lowest intensity value and 255 is the greatest intensity value). Figure 6 shows another histogram of both normal and abnormal endoscopic images. It is observed that the histogram has a bell shape pattern for the normal condition and very little pattern is observed for the abnormal condition. The green parameter can be extracted in the ROI from 100 to 255 of the green level. From a similar histogram in Figure 7, the abnormal condition exhibits certain patterns, but the same is not observed in the normal condition. This does not pose a problem to the feature extraction module, as the main objective behind the parameter extraction was to

Figure 4
The graphs of Marquardt training time comparison on the usage of PCA with various numbers of hidden neurons (τ = 1%)

Figure 5
Histograms of red component of Selected Colonoscopic Images

Figure 6
Histograms of green component of Selected Colonoscopic Images

Figure 7
Histograms of blue component of Selected Colonoscopic Images

Figure 8
Histograms of hue component of Selected Colonoscopic Images

Figure 9
Histograms of saturation component of Selected Colonoscopic Images

Figure 10
Histograms of intensity component of Selected Colonoscopic Images differentiate the normal condition from the abnormal condition. As long as there are some patterns observed in one of the conditions, the parameter extracted from this color space can be considered useful in classification. The selected ROI in the blue level histogram is from 0 to 50.
For the hue component shown in Figure 8, very similar patterns are observed for the normal and abnormal conditions. Its ROI is determined to be in the range of 10 to 50 for both normal and abnormal cases. In the saturation domain shown in Figure 9, the shapes of the histograms for the normal and abnormal cases are also very similar. Generally however, the tail for the normal case has a higher value. The ROI, then, for the extraction of the saturation parameter is set in the range of 100 to 150. In the intensity histogram shown in Figure 10, the pattern observed is very similar to that of the green histograms. In this case, the range is 150 to 255 of the intensity level. Although the highest value of 255 may be included, the lower region begins at 150, which would justify that this extraction intensity parameter will not show the effect of the light source. It may consist of some reflection but the value will be very minimal. Table 8 shows the classification results for detecting the colon status using color features (β C ). A BPNN is used to perform the classification. The procedures for the classification using texture features were applied for the classification using color features. The training was done with the Marquardt algorithm. It was found that the average accuracy using PCA with τ = 0.1% is higher than the classification with PCA, where τ = 5%, 2%, or 1% or without PCA. At τ = 0.1%, PCA does not change the dimension of the features. The number of features will be the same as the number we would have if we were not using PCA at all. The fewer the number of features used, the worse the average classification rate becomes. This shows that all of the extracted color features (β C ) are significant.
The images are rich with texture and color information. Therefore, it is important to utilize the extracted texture and color features. Experiments have been done that hybrid the texture and color features for colon status classification. It is shown in Table 10 that the average classification accuracy with a hybrid of texture and color features for the classification using PCA with τ = 1% is higher than the classification with PCA, where τ = 5%, 2%, or 0.1% or without PCA.

Conclusion
In conclusion, new approaches on extracting new textureand color-based features from colonoscopic images for the analysis of the colon status have been developed. Texture and color are important features, in which some features are able to distinguish the normal and abnormal colon status. However, the classification using only texture or only color features is not as complete a classification as possible since endoscopic image contains both texture and color information. Therefore, the hybrid of texture and color features is proposed to give a better approach to classify colon status.