Breast masses in mammography classification with local contour features

Background Mammography is one of the most popular tools for early detection of breast cancer. Contour of breast mass in mammography is very important information to distinguish benign and malignant mass. Contour of benign mass is smooth and round or oval, while malignant mass has irregular shape and spiculated contour. Several studies have shown that 1D signature translated from 2D contour can describe the contour features well. Methods In this paper, we propose a new method to translate 2D contour of breast mass in mammography into 1D signature. The method can describe not only the contour features but also the regularity of breast mass. Then we segment the whole 1D signature into different subsections. We extract four local features including a new contour descriptor from the subsections. The new contour descriptor is root mean square (RMS) slope. It can describe the roughness of the contour. KNN, SVM and ANN classifier are used to classify benign breast mass and malignant mass. Results The proposed method is tested on a set with 323 contours including 143 benign masses and 180 malignant ones from digital database of screening mammography (DDSM). The best accuracy of classification is 99.66% using the feature of root mean square slope with SVM classifier. Conclusion The performance of the proposed method is better than traditional method. In addition, RMS slope is an effective feature comparable to most of the existing features. Electronic supplementary material The online version of this article (doi:10.1186/s12938-017-0332-0) contains supplementary material, which is available to authorized users.

suggestions to decrease the false detection rate and false negative rate [6]. In screening mammography, if a doctor sees a clearly defined mass whose contour is microlobulated or spiculated, he need not ask patient to do pathological puncture. He is quite sure that the mass is malignant Fig. 1b. If the contour of a breast mass is regular and the shape is nearly round, then the mass is probably benign Fig. 1a. The computer assisted diagnose can distinguish the two classes breast mass. It can decrease the pain of patient to do pathological puncture.
Researchers proposed many methods to describe the shape and texture in the system of CAD. Shape descriptor is compactness, eccentricity, moment, Fourier transformation descriptor, statistical marginal characteristics [7][8][9][10][11]. Texture descriptions gray level co-occurrence matrix and fractal dimension and so on [5,[12][13][14]. Pohlman et al. [15] proposed a method to transform 2D contour of breast mass to 1D signature. The signature of a contour is obtained by a function of radial distance from the centroid to the contour versus the angle of the radial line over the range (0°-360°). In this way, a signature of small fluctuation is obtained if the contour of breast mass is benign. Otherwise, if it is a malignant mass, a signature of large fluctuation is obtained. Fractal character can describe the fluctuation. So in literature [16] the breast mass is classified with the fractal analysis and the classification accuracy is greater than 80%. However, the function of radial versus degree could lead to a multi-value function in the case of an irregular or speculated margin [17]; the signature computed in this manner would also have ranges of undefined values in the case of a contour for which the centroid falls outside the region enclosed by the contour. Rangaraj et al. [16] improved the method. They transformed the 2D contour of breast mass to 1D signature by polygonal modeling of contours of breast masses using the turning angle function. Rangayyan and Nguyen [2] demonstrated the usefulness of fractal analysis for the classification of breast masses with the box-counting and ruler methods for the derivation of the FD of the two-dimensional 2D contours of masses as well as their one-dimensional 1D signatures. Some literatures [2,9,[18][19][20] revealed that the regular extent is also very important to make a distinction between benign and malign breast mass. If the shape of mass is circular or oval then its probability to be benign is larger than to be malign mass. So we propose a new method in this paper to express the regularity of the contour for breast mass. At first, abnormal area in the mammography image is labelled by experimental doctors. Second, we translate 2D contour to 1D signature using the Euclid distance from the edge of the breast mass to periphery of the circular or oval centered with centroid. This method describes not only the roughness of the contour but also the regular degree of the contour. Third, we segment the whole 1D signature into different subsections. Fourth, we extract several local contour features. At last, the feature vectors re-organized according to the local feature value of each subsection are fed different classifiers. The flowchart of our proposed is shown as Fig. 2.
The remainder of this paper is organized as follows. The new method for translating 2D contour to 1D signature is proposed in "Methods". In "Features", we extract fractal dimension FD, w, µ R /σ R (where µ R means mean radial distance of tumor boundary, and σ R means standard deviation), and root mean square slope features describing the contour characteristic. Then in the next Section, experimental results and analysis are introduced. The last is the summary of our work and the prospect of future work.

Methods
In this part, the database is firstly introduced. Second, the method of 2D contour to 1D signature is illustrated in some detail. Finally, we explain how to segment 1D signature into subsections and how to re-organize these subsections.

Database
In this paper, digital database for screening mammography (DDSM) has been utilized to provide the mammography images. This database is provided by the Massachusetts General Hospital, the University of South Florida, and Sandia National Laboratories [21,22]. This database includes about 2620 cases. Each case has 4 mammography images composed of two view images of each breast, along with some associated patient  16:44 information. Images containing suspicious areas have associated pixel-level ground truth information about the locations and types of suspicious regions. This information is saved as an overlay file. Each overlay file may specify multiple abnormalities. Each abnormality has information on the lesion type, the assessment, the subtlety, the pathology and at least one outline. Each boundary is specified as a chain code. The details about the DDSM database can be found in literature [23] or availability of data and materials at the end of this article. The database includes Normal, benign and cancer volumes. The research object in this article is the contour of benign and malignant mass. So we choose 323 contours of mammography images from DDSM database including 143 contours of benign images and 180 contours of malign images. In order to simplicity and convenience of experiment, we choose some mammography images including single abnormality. The numbers of the images of we used are listed on the Additional file 1: Appendix S1. Among 143 benign images, most contours are similar ellipse. These benign mass is prone to classify wrongly using existing method. All images are from the different patient.

2D contour to 1D signature
The benign mass has a smooth shape that results in a simple signature, whereas the malignant tumor has a jagged contour that leads to a rough signature. The contours of every abnormality are extracted by means of connecting the point expressed with chain code in the overlay files. Figure 3a  The center p c (x 0 , y 0 ) of 2D contour is expressed as The first point on the contour we choose is on the right of center point. It is the crossover point of the horizontal line passed through the center point and the contour of breast mass. Radius is the distance between the point p i on the contour and the center p c (x 0 , y 0 ). The diameter of one axis x is D x = max i,j∈{1,...,N } |x i − x j |. The diameter of the other axis y is D y = max i,j∈{1,...,N } |y i − y j |. So the equation of the ellipse centered as p c (x 0 , y 0 ) and If D x = D y , the ellipse is transformed into a circle centered as p c (x 0 , y 0 ) and diameter is D x (D y ). This ellipse or circle is the a b standard of breast mass contour. If the points on the contour of breast mass are all near the ellipse, we can declare the contour is regular. The probability that the mass is benign is high. Otherwise, the mass is determined as malign. We define that h p (i) is the distance function between p c (x 0 , y 0 ) and h is also the function of the number of pixel on contour. Figure 4a and b show 1D signature of benign and malign breast mass in Fig. 3a and b.

Subsection and integration
The method which 2D contour transforms into 1D signature can describe the feature of the whole contour. Sometimes the local feature is also very important to classify the benign and malignant breast mass. In Fig. 1b, for example, the 2/3 contour in the left is smooth and regular but subsection in the right is microlobulated. It is not precise if we extract the feature on whole contour. So we propose a method that a whole signa-

Features
In this part, four features are introduced. Among them, RMS s is first proposed by us. It can describe the variation of 1D signature in vertical direction well.  16:44 Root mean square roughness w Root mean square roughness describes the irregular degree of 1D signature. The root mean square roughness is defined as: w is root mean square roughness defined as w = �h 2 � − �h� 2 . Among the equation, 〈 〉 expresses the statistical average, w expresses the fluctuation degree of h in vertical direction. The shape is more regular with the value of more small. That is to say that the margin is more close to a circle or ellipse. The mass will more probably be benign than malign. So root mean square roughness may be used as a feature to classify the benign or malign breast mass.
The µ R /σ R ratio The µ R /σ R ratio (where µ R means mean radial distance of tumor boundary, and σ R means standard deviation), describes the circularity of the breast mass contour. Malignant mass should have smaller values of circularity than benign mass. Haralick [24] proved that the µ R /σ R ratio is a good feature in classifying malignant mass and benign mass. Polhman [15] applied this feature in his 1D signature and acquired the good result.

Fractal dimension
According to the fractal geometry of Mandelbort, the fractal dimension can describe the property of self-similarity in some way. Many fractal models are proposed to analyze fractal phenomenon of nature. The popular fractal model is differential box-counting method. Studies prove that the differential box-counting method is appropriate to selfsimilarity fractal model. In medical image, the fractal Brownian motion (fBm) model has been shown to be suitable for the analysis of medical image because the intensity surface of a medical image can be viewed as the end result of random walk. The fBm model belongs to the class of statistically self-affine fractal concept and regards naturally occurring rough surfaces as the end result of random walks. Since the roughness of the intensity surface of a medical image can also be viewed as the end result of a random walk, the fBm model suits for the analysis of medical images. To the affine fractal random rough model, autocorrelation function and height-height correlation function can be expressed as [23]: where α is the fractal exponent, the relative between α and fractal dimension D is α = d − D, d is the space dimension, and α is constraint by 0 ≤ α ≤ 1. w is root mean square roughness expressing the fluctuation degree of h in vertical direction, and ξ is correlation length expressing the fluctuation degree of ρ in horizontal direction. The autocorrelation function R of h(i) is can be defined as: Here, ρ = |i 2 − i 1 | is the interval between two points on signature. The autocorrelation function R has some characteristics such as: (1) If the signal is the smooth and steady random process, R(i + ρ) is irrelevant to n and relevant to only o i.e. R(i + ρ) = R(ρ) . With the increment of correlation interval ρ, R(ρ) decreases little by little and tends to be zero. The rate of decrease is decided by the distance between two points irrelevant to each other. The correlation length is defined by the value of correlation interval at the point that the autocorrelation function R(ρ) decreases to e −1 of the maximum. The correlation length ξ expresses the speed that R(ρ) decreasing with ρ.If the interval between two points is less than ξ, the two points are correlated. Otherwise, the two points are independent. The fluctuation in the horizontal direction is expressed with ξ and the fluctuation in the vertical direction is expressed with w.
In the condition of ρ << ξ, self-affine fractal surface h(n) satisfies self-affine transform below: If the scale is small as 1/ε, the average variation of height difference is ε 2α . This variation is corresponding to the power law variation of height-height correlation function during the short distance. The relationship is The power law variation of height-height correlation function can describe statistically self-similarity characteristic and local fluctuation. If α is smaller, the local fluctuation is more violent and fractal dimension is larger. From the Eq. (5), we can conclude that in log-log coordinate system h(ρ) is proportional to ρ when ρ << 1. 2α can be estimated from the slope of the line approximated by linear least squares fitting on log(H (ρ)) versus log(ρ) when we choose a range of the lower scale ρ. Figure 5 shows the curve of log(H(ρ)) versus log(ρ) and the linear fitting for benign and malign mass. In this paper, we look the 1D signature of contour as height distribution of the affine fractal random surface. The fractal dimension indicates the self-similarity feature and it also expresses the local non-smooth fluctuation of the signature. The fractal dimension D is larger and larger; the local fluctuation of the signature is more and more drastic. Here we use the fractal exponent α of 1D signature of contour as the third feature to distinguish the benign mass from the malign one.

RMS slope s
Each point on the contour has different slope. The variation of slope describes the shape of contour. If the contour is smooth, the variation of slope is slow and regular; otherwise, variation of slope is drastic. When we transform 2D contour into 1D signature, the value in the Y-axis expresses the circularity. The absolute value of the slope shows the variation speed of contour. So we take the slope distribution of each point on the contour as one of the features to discriminate malign mass from benign mass. Slope is acquired by linear interval. Root mean square slope is defined as: We can see from the Fig. 4 and Eq. (6) that the slope of benign mass has small value and the fluctuation is gentle. While the slope of malignant mass has big value and the (4) h(x 0 , y 0 ) = ε 2α h(εx 0 , εy 0 ) fluctuation is violent. The variation range of the RMS s for malignant mass is wider than benign mass.

Classification
K-Nearest-neighbor (KNN), support vector machine (SVM) and artificial neural network (ANN) are used as classifiers in this paper to differ benign mass from malign mass of breast. We choose K = 1 in KNN classifier and use a linear support vector machine classifier. The NNet classier is configured with 10 nodes in the hidden layer. The internal weight is initialized with randomly chosen values. 323 contours are divided into two subsets 300 contours for training and 23 for testing. The software we use is Matlab R2015b on a Win10 Operating System.

Experimental results and analysis
In this part, the performance of the proposed method is reported. Then, performance of four features is compared. Third, the effect of subsections is analyzed. And finally, classifier performance is shown.
Performance evaluation for 2D contour to 1D signature Table 1 show the comparison of our proposed method and existing method. We can see that the accuracy used our method is higher than used existing method. The obvious promotion is the accuracy of alpha. It raises 14.90%, whereas the accuracy of RMS s barely changes. This is because the accuracy of RMS s itself is close to 100%. It is difficult to rise greatly. To similar ellipse cases of breast mass in selected database, our proposed method can not only describe the circularity of contour but also illustrate the degree of margin fluctuation. While traditional method used only the standard deviation of median filtering and origin boundary to quantify the degree of margin fluctuation. From Fig. 6 we can see that whether accuracy or sensitivity and specificity are improved with our method. Especially, the specificity of w for SVM raises 10.90%. Figure 7 and Table 1 show the performance of four features with three different classifiers. No matter which classifier is used, the result proves that our proposed feature is better than existing one. To the features w and s, SVM classifier is the almost the same as ANN and is better than KNN. To other features, SVM is the best among these three classifiers. SVM is robust for small sample data. The accuracy of fractal feature α is 99.33%. Its performance is better than w and µ R /σ R . This is because the 1D signature of contour for breast mass accords with the fractal characteristic. The highest accuracy is 99.96% using the feature of root mean square slope with SVM classifier. The reason is that RMS slope can describe the variation of vertical direction of 1D signature. It is very important to distinguish the benign mass and malignant one. Figure 8 shows the performance of four features for subsection using h(i) proposed in this paper. Performance is improved due to considering the local features in our method. Experiment proves that subsection is efficient to improve the performance for four  16:44 features. Due to the slope feature has high performance, the improvement is not obvious. It can be seen that the accuracy increases quickly with the increasing the number of the subsections at the start for the feature of fractal dimension. Later the performance is stable with the larger N. This is because when N is larger, the segment is shorter; the number of point on the contour is less. The accuracy is affected due to the less point on the subsection. In three classifiers, SVM acquire the best performance using the feature of RMS slope. The performance of subsection is stable using the ANN classifier for four features.

Conclusion and future work
It is very important for contour to distinguish the benign breast mass from malign one. In this paper, we propose three shape features of broken line for contour to classify the benign and malign breast mass. The accuracy rate attains 99.66% with the RMS slope feature. In addition, we compute fractal dimension by another method of height-height correlation function in log-log coordinate. The accuracy rate attains 99.33%. It is higher than µ R /σ R and w. For further researches, the selection of N and some texture features could be studied for improving the classification performances. We can choose more cases in order that our study has a wider application range. Also, more advanced classification methods such as deep neural network can be used to improve the classification accuracy.