This study uses machine learning based on directional local contrast to detect MA on preprocessed and segmented image patches.

First, preprocessing was used to enhance the image quality. Then blood vessels (BVs) were segmented using an improved enhanced function based on Jerman’s work [33]. With BVs eliminated, candidate regions of MA were then extracted. Next, processed images were divided into patches according to the location of the MA candidate regions. Finally, based on the features extracted, candidate patches were classified to differentiate the MA and non-MA. The process is illustrated in Fig. 2. To comprehensively evaluate the algorithm on different patient cohorts, the proposed method was validated on two independent databases (e-ophtha MA and DIARETDB1) on lesion level.

### Preprocessing

#### Shade correction

BVs, HMs and MAs all appear dark red in color fundus images (as shown in Fig. 1). To make the dark red regions more obvious, preprocessing progress was performed in the Field of View (FOV).

Compared to the other two channels, the green channel has higher contrast between target area and background, and therefore contains the most abundant information. Thus, the first step of preprocessing was to extract the green channel image \(I_\text {g}\). Then, median filter on the green channel image \(I_\text {g}\) was applied to obtain the background \(I_\text {bg}\). The filter size was larger than the maximal BV width in the fundus image. In this study, the filter size of \(50\times 50\) was selected. Finally, the result derived by the median filter \(I_\text {bg}\) was subtracted from the green channel image \(I_\text {g}\) to eliminate the background and achieve the image with the effect of shade correction, named \(I_\text {eq}\). The main structure of BVs in \(I_\text {eq}\) was clear. Other dark regions that include HM and MA in the original image were also highlighted in \(I_\text {eq}\). But the overall image was dark with low contrast. To make the image clearer, the mean gray value of the green channel image \(I_\text {g}\) in FOV was added on image \(I_\text {eq}\) to maintain in the same gray range as the green channel image. As follows:

$$\begin{aligned} I_\text {sc}=I_\text {eq}+\text {mean}(I_\text {g}) \end{aligned}$$

(1)

The result of shade correction (\(I_\text {sc}\)) is shown as Fig. 9b where BV can be obviously observed.

#### Segmentation of blood vessels

For accurate MA detection, it is important to separate BV and MA which are similar in color and brightness. Thus, BVs were segmented and eliminated to minimize the influence on MA detection. The most widely used BV segmentation methods are machine learning [35] and deep learning [36, 37], which have high accuracy but are complicated and time-consuming. Inspired by Jerman et al.’s work [33], which was improved from the widely used method for BV enhancement of Frangi et al.’s work [34], this study constructed a response function to enhance BV by analyzing the eigenvalues of Hessian matrix, to achieve simple, fast, and accurate BV segmentation.

For a shade-corrected image \(I_\text {sc}(x,y)\), first, Gaussian filter was applied to reduce the noise. Different \(\sigma\) values were selected for Gaussian filter to obtain different enhancement functions, and the largest response function was selected to derive BV enhanced image. The value range of \(\sigma\) was \(3\sim 6\) and the step was 1 in this study.

$$\begin{aligned} G(x,y)=\frac{1}{2\pi \sigma ^2}e^{-\frac{x^2+y^2}{2\sigma ^2}} \end{aligned}$$

(2)

The second-order partial derivatives of *x* and *y* on the filtered image were computed respectively, then convolved with \(I_\text {sc}(x,y)\), getting \(L_\text {xx}\), \(L_\text {xy}\), \(L_\text {yy},\) respectively. The Hessian matrix *H* was composed as follows:

$$\begin{aligned} H=\left[ \begin{array}{rcl} L_\text {xx} &{} L_\text {xy} \\ L_\text {xy} &{} L_\text {yy}\ \end{array} \right] . \end{aligned}$$

(3)

Then, two eigenvalues \(\lambda _1\) and \(\lambda _2\) could be obtained from Hessian matrix *H* as follows, where tmp represented an intermediate variable.

$$\begin{aligned} \text{tmp}=\sqrt{(L_\text {xx}-L_\text {yy})^2+4*L_\text {xy}^2} \end{aligned}$$

(4)

$$\begin{aligned} \mu _1=0.5*(L_\text {xx}+L_\text {yy}+\text{tmp}) \end{aligned}$$

(5)

$$\begin{aligned} \mu _2=0.5*(L_\text {xx}+L_\text {yy}-\text{tmp}) \end{aligned}$$

(6)

If \(|\mu _1|\le |\mu _2|\), then \(\lambda _1=\mu _1,\lambda _2=\mu _2\). Otherwise, \(\lambda _1=\mu _2,\lambda _2=\mu _1\) (that is, to ensure \(|\lambda _2|\ge |\lambda _1|\)). \(\lambda _2\) was normalized as follows:

$$\begin{aligned} \lambda _{\rho }=\left\{ \begin{array}{lll} \text {max}(\lambda _2) &{} &{}{\lambda _2>0}\\ 0 &{} &{}\text {else}\end{array} \right. . \end{aligned}$$

(7)

Enhancement function was calculated as follows:

$$\begin{aligned} \text {Enhancement}=\left\{ \begin{array}{lll} 0 &{} &{}{{\lambda _2}\le {0}\cup {\lambda _{\rho }}\le {0}}\\ 1 &{} &{}{{\lambda _2}\ge {\frac{\lambda _{\rho }}{2}}>0}\\ {\lambda _2}^2(\lambda _{\rho }-\lambda _2)(\frac{3}{\lambda _2+\lambda _{\rho }})^3 &{} &{}\text {else}\\ \end{array} \right. . \end{aligned}$$

(8)

The BV enhancement result is shown in Fig. 9c. Because the BV enhancement process might also enhance some other dark red regions in the original image, MA might be enhanced. The green circles in Fig. 9c indicate the MAs contained in the BV enhancement image.

First, the Ostu’s adaptive threshold method was used to binarize the BV enhanced image in order to obtain the preliminary BV segmentation result, as shown in Fig. 9d. MAs might be contained in the result, shown as the green circles in Fig. 9d. To remove MAs from BV segmentation result, small regions were excluded to obtain the final BV main structure segmentation result, as shown in Fig. 9e. Compared with Fig. 9d, MAs have been removed.

#### Contrast enhancement and noise reduction

Due to the tiny structure of MA, the accuracy of MA detection is sensitive to image quality. Therefore, further preprocessing was performed to improve the image quality.

To enhance the contrast between background and highlighted dark regions (including BVs, HMs, and MAs) in the shade-corrected image \(I_\text {sc}\), the image was processed with contrast-limited adaptive histogram equalization (CLAHE) method. A \(7\times 7\) Gaussian filter was then applied to reduce the influence of noise. The result (\(I_\text {gauss}\)) is shown in Fig. 10b.

#### Blood vessels removal

To eliminate the effects of BV on MA detection, on the filtered image (\(I_\text {gauss}\)), the gray value of the corresponding position of the segmented BV was directly set to zero, obtaining \(I_\text {gsBV0}\) as shown in Fig. 10c.

### Extraction of MA candidate regions

In the previous step of BV removal, the gray value of BV has been directly set to zero. To further make all the other parts black except for the MA candidates, first, the mean gray value of original green channel image in FOV (see Eq. 1) was subtracted from \(I_\text {gsBV0}\). Then contrast stretch was used, obtaining \(I_\text {CS}\) as shown in Fig. 10d. Finally, the Ostu’s adaptive threshold method was used to binarize to obtain the preliminary MA candidate regions \(I_\text {can1}\), as shown in Fig. 10e.

In BV segmentation, the small regions were excluded to ensure that candidate regions of MA were not included in the segmented BV. As a result, the preliminary MA candidate regions contained some small BV segments which appear as slender structures in Fig. 10e.

Then, the connected component analysis and shape characteristics were used to refine the results of MA candidates. Since MAs appear as dots, the remaining slender structures could be excluded by the ratio (named *R*) of the length of the major axis and the minor axis of each candidate region (or connected component). If the ratio *R* of a candidate region exceeds the set threshold, the region would be discarded.

The threshold was set based on the mean value of the ratio *R* of all connected components in the preliminary MA candidate regions. Considering the proportion of the number of the remaining slender structure, through several experiments, 1.2 was finally selected to multiply the mean value, formulated as follows:

$$\begin{aligned} \text {Threshold}=1.2*\frac{1}{n}\sum _{i=1}^{n}{R(i)}, \quad i=1,2,\ldots ,n, \end{aligned}$$

(9)

where *n* is the total number of connected components in the preliminary MA candidate regions.

Finally, in remaining candidate regions, those with area larger than 500 pixels were discarded, since MAs are often much smaller (maximum MA with \(100\sim 300\) pixels in existing works [22, 23, 28, 38], and 361 pixels labeled in e-ophtha MA database). The final result of MA candidate regions \(I_\text {can}\) is shown in Fig. 10f, where green circles indicate ground truth of MAs.

### Image patches

MA is difficult to observe and detect at the whole image level due to its tiny structure. Therefore, the detection of MA was based on image patches.

All images in training set and test set were divided into \(25\times 25\) patches according to the location of candidate regions, and a suitable classification method was selected to determine whether each test patch contains MA, so as to realize MA detection. MA patches and non-MA patches are shown in Fig. 11.

### Features extraction and classification

Although the interference of BV on MA detection was eliminated, in addition to image noises in the obtained candidate regions, there might also be HMs. To further separate the true MA from the candidate regions, it is necessary to analyze the difference between MA and HM. As shown in Figs. 1 and 12, MAs appear as isolated dark red dots with clear borders, while HMs appear as dark red regions of different sizes and shapes with blurred borders. Thus, using shape features or combined with gradient features could distinguish MA and HM.

Moreover, for each MA candidate patch, the directional local contrast (DLC) of its center point *p* was calculated to distinguish MA and other structures. Assuming that the gray level of the center point *p* is \(I_p\), the *DLC* of the point *p* along the direction angle \(\theta\) is defined as

$$\begin{aligned} DLC_{p(\theta )}= & {} \frac{I_p-{\overline{I}}_{p(\theta )}}{{\overline{I}}_{p(\theta )}}, \end{aligned}$$

(10)

$$\begin{aligned} {\overline{I}}_{p(\theta )}= & {} \frac{1}{r}\sum _{q\in {N_{p(\theta ,r)}}}{I_q}, \end{aligned}$$

(11)

where \({\overline{I}}_{p(\theta )}\) is the mean gray value of neighboring pixels of point *p* along the direction angle \(\theta\), *r* the radius of neighboring region, and \(N_{p(\theta ,r)}\) the set of neighboring pixels of point *p* along the direction angle \(\theta\) within the radius *r*:

$$\begin{aligned} N_{p(\theta ,r)}=\left\{ (x_q,y_q)|x_q=x_p+k\cos \theta ,y_q=y_p+k\sin \theta , \quad k=1,2,\ldots ,r\right\} . \end{aligned}$$

(12)

Finally, the obtained *DLC* vector of point *p* is

$$\begin{aligned} DLC_p=(DLC_{p(\theta _1)},DLC_{p(\theta _2)},\ldots ,DLC_{p(\theta _n)}), \end{aligned}$$

(13)

where *n* is the number of angles, 12 was selected, that is, \(360^{\circ }\) was divided into 12 angles with a step of \(30^{\circ }\). And the selected value of *r* was 12, which was half the patch size. DLC indicates the local contrast characteristics in a neighborhood. A negative DLC value indicates that the gray value of the current pixel is lower than that of the neighborhood. Therefore, DLC could be used to differentiate the structures with different local contrast. Figure 13 shows DLC distribution of center point of each region of MA, HM, BV and background (BG) shown in Fig. 12.

As shown in Fig. 13, the difference between MA and other regions (HM, BV, BG) in DLC distribution is obvious. The analysis of variance (ANOVA) results showed that the DLC of MA is significantly different from those of HM, BV, and BG (p\(<0.05\) for all).

With the aim of distinguishing between true MA and non-MA, totally seven types of features including color, grayscale, DLC, shape, texture, Gaussian filter-based, and gradient were selected for each candidate (or patch). The details are shown in Table 6, where \(I_\text {CLAHE}\) indicates the result of CLAHE on the green channel.

**Area** is the total number of pixels in candidate region, that is the number of white pixels of each connected component in Fig. 10f.

**Perimeter** is the total number of boundary pixels that surround candidate region.

**Circularity** of each candidate region is defined as \(\text{Circularity}=\frac{\text{Perimeter}^2}{4*\pi *\text{Area}}\)

**Aspect ratio** is the ratio of the length of major axis to minor axis of candidate region.

**Eccentricity** is the ratio of the distance between the two focal points of the circumscribed ellipse of candidate region (focal length 2*c*) to its major axis length (2a), and it is equal to 0 for a circular region, as \(\text{Eccentricity}=\frac{c}{a}\)

**Solidity** is the ratio of area of candidate region to its convex hull area (ConA), as \(\text{Solidity} = \frac{{\text{Area}}}{{\text{ConA}}}\)

**Entropy** indicates image randomness or texture complexity, representing the clustering characteristics of gray distribution. With *n* different gray values in the image and proportion *p*(*i*) of each gray value *i*, it is defined as \(\text{Entropy}=-\sum _{i=1}^{n}({p(i)*\text {log}_2{p(i)}}),i=1,\ldots ,n\). When gray level of image is uniform, entropy can reach the maximum.

**Energy** reflects the uniformity of image gray distribution and texture thickness, and it is the sum of square of element values of gray-level co-occurrence matrix (GLCM). With *P* indicating GLCM, it is defined as \(\text{Energy}=\sum _{i,j}{P(i,j)^2}\)

**Homogeneity** reflects the tightness of distribution of elements to the diagonal in GLCM, defined as \(\text{Homogeneity}=\sum _{i,j}\frac{P(i,j)}{1+|i-j|}\)

**Skewness** reflects the asymmetry of gray value distribution of image. For input gray values *x* of pixels, with \(\mu\) and \(\sigma\) representing the mean and standard deviation of the input *x*, respectively, and *E* representing expectation, it is defined as \(\text{Skewness}=\frac{E(x-\mu )^3}{\sigma ^3}\)

As for gradient features, d*x* and d*y* indicate the gradient in the horizontal and vertical direction, respectively. Gradient at MA boundary has an abrupt change. As shown in Fig. 12a, gradient vector on the boundary of MA is divergent, where the absolute value of gradient is large. Whereas the absolute value of gradient at the center of MA is small. As shown in Fig. 12b, the distribution of gradient vector in the patch of HM is relatively messy. It can be seen from Fig. 12 that the difference in gradient vector distribution is obvious between MA and other structures. Therefore, features based on gradient can be used as reference to distinguish MA from image patches of candidate regions.

### Machine learning algorithms for classification

After all the 44 features extracted from each MA candidate patch, machine learning technique would be used to classify each patch to MA or non-MA.

As aforementioned, a variety of classification methods were used in automatic DR detection, including MA detection with Naive Bayesian [16], random forest [17], support vector machine [18, 26, 39] and K-nearest neighbor [23, 24]. Three classifiers were selected for MA detection and compared the results in this work: NB, KNN and SVM.

#### Naive Bayesian

NB classifier is supervised learning technique based on Bayesian theory. It assumes that features are independent (naive) of each other. NB calculates the prior probability of each class and the conditional probability of each class for each feature from the training samples. For a test sample to be classified, Bayesian theory is used to calculate the posterior probability of test sample belonging to each class, and the class with the maximum posterior probability will be chosen. NB is simple and powerful therefore has been widely used in machine learning algorithms of spam processing, document classification, disease diagnosis and other aspects.

#### K-nearest neighbor

KNN is a very special kind of machine learning algorithm because it does not have a learning process in a general sense, which named lazy learning. Lazy learning is also called mechanical learning due to its high dependence on training samples. Mechanical learning does not build a model, so KNN is a non-parametric supervised learning method. For a test sample to be classified, KNN calculates the distance between the test sample and all training samples, selects the K samples closest to the test sample from the training dataset, and according to the majority-voting rule, the test sample is classified into the class that the more samples of K nearest neighbors belong to.

#### Support vector machine

SVM is a classic supervised learning algorithm in the field of machine learning, mainly used to solve the problem of data classification. SVM algorithm aims to find a maximum-margin hyperplane to correctly divide the training samples to different categories. For linearly inseparable problems, kernel functions are used to map the input features into the higher-dimensional feature space where a linear separation is possible. Kernel functions used in SVM classifier mainly include linear, sigmoid, polynomial and radial basis function (RBF).

### Experiment materials

The fundus images of two available public retinal image databases (e-ophtha MA [40] and DIARETDB1 [41]) were used for experiments.

In e-ophtha MA database, there are four image resolutions ranging from \(1440\times 960\) to \(2544\times 1696\) pixels with \(45^{\circ }\) FOV. The e-ophtha MA database contains 233 images without MAs, and 148 images MAs, which are manually annotated by an ophthalmologist and confirmed by another. The 148 images with totally 1306 MAs were used in this study for experiments.

DIARETDB1 database contains 89 color fundus images with resolution of \(1500\times 1152\) pixels and \(50^{\circ }\) FOV. Four medical experts marked MA in DIARETDB1 database independently. Considering the disagreement between the four experts annotations, we use the annotations with confidence \(\ge {75\%}\) (at least three of the four experts agree that there is an MA) as standard for MA labeling, and there are 182 MAs with confidence \(\ge {75\%}\).

74 images were selected from e-ophtha MA database and segmented to derive the training set. The segmented patches on MA locations and non-MA candidate patches were positive and negative samples, respectively, making up the training set.

The remaining 74 images in e-ophtha MA database and the whole DIARETDB1 database were used as test set to verify the performance of the proposed MA detection method.

### Experiment procedure

From each patch of the training set, the aforementioned 44 features were extracted to train the classifier. Then, the trained classifier was used on each test patch to determine whether the candidate was MA based on the 44 features extracted, so as to achieve the purpose of MA detection.

NB, KNN and SVM with RBF kernel function were used to perform the experiment and compared to choose the best performed classifier, and then the results of MA detection were evaluated on lesion level.

### Evaluation

For the problems of class-imbalance, the receiver operating characteristic (ROC) curve can better evaluate the performance of the classifier. In this study, the number of MAs in candidate regions is much smaller than that of Non-MAs. Therefore, **ROC** curve and its corresponding Area Under Curve (**AUC**) as well as free-response ROC (**FROC**) curve were used to evaluate the proposed MA detection method. The closer ROC curve is to the upper left corner, the more convex the ROC curve is, the larger the value of AUC is, the better the algorithm performs.

The abscissa of ROC curve is false-positive rate (FPR), which is the proportion of negative samples with positive classification results to all negative samples. The ordinate is Sensitivity, also named true-positive rate (TPR), which is the proportion of positive samples with positive classification results to all positive samples. The two variables corresponding were calculated as follows:

$$\begin{aligned} \text{FPR}\,=\, & {} 1-\text{Specificity}=1-\frac{\text{TN}}{\text{TN}+\text{FP}} \end{aligned}$$

(14)

$$\begin{aligned} \text{TPR}= \, & {} \text{Sensitivity}=\frac{\text{TP}}{\text{TP}+\text{FN}} \end{aligned}$$

(15)

TP indicates that MA is correctly predicted. TN indicates that non-MA is correctly predicted. FP indicates that non-MA is erroneously predicted as MA. And FN indicates that MA is erroneously predicted as non-MA.

Compared with ROC curve, FROC curve has the same Sensitivity on the ordinate and the false-positive per image (FPI) on the abscissa.

As both databases provide the annotation of MA on lesion level, evaluation of the results was performed on lesion level. As aforementioned, the ground truth of DIARETDB1 database with confidence \(\ge {75\%}\) was used as the standard for evaluation of MA detection results.

First of all, ROC curve and AUC were used to evaluate the performance of three classifiers on MA detection, so as to select the best performed classifier, and further analyze its results of MA detection.

Next, FROC curve and FROC **score** were used to evaluate the performance of the proposed MA detection method and be compared with existing algorithms.

Then, average computation time per image of the whole experiment and of each processing step was recorded and compared with existing algorithms.

Last, typical examples of MA detection results evaluated on lesion level were analyzed.