Microaneurysm detection in fundus images using a two-step convolutional neural network

Background and objectives Diabetic retinopathy (DR) is the leading cause of blindness worldwide, and therefore its early detection is important in order to reduce disease-related eye injuries. DR is diagnosed by inspecting fundus images. Since microaneurysms (MA) are one of the main symptoms of the disease, distinguishing this complication within the fundus images facilitates early DR detection. In this paper, an automatic analysis of retinal images using convolutional neural network (CNN) is presented. Methods Our method incorporates a novel technique utilizing a two-stage process with two online datasets which results in accurate detection while solving the imbalance data problem and decreasing training time in comparison with previous studies. We have implemented our proposed CNNs using the Keras library. Results In order to evaluate our proposed method, an experiment was conducted on two standard publicly available datasets, i.e., Retinopathy Online Challenge dataset and E-Ophtha-MA dataset. Our results demonstrated a promising sensitivity value of about 0.8 for an average of >6 false positives per image, which is competitive with state of the art approaches. Conclusion Our method indicates significant improvement in MA-detection using retinal fundus images for monitoring diabetic retinopathy.


Introduction
Diabetic retinopathy (DR) is a diabetic disorder caused by changes in the blood vessels of the retina.The damage of retinal blood vessels may end in blindness.DR occurs in most people with diabetes and its treatment depends on the patient's age and the duration of DR.In 2000, WHO estimates that 171 million people have diabetes and 366 million cases will occur in 2030 [11].DR can be treated effectively with laser therapy if detected early.The important symptoms of diabetes are swelling of the blood vessels, fluid leakage in eyes and also in some cases the growth of new blood vessels on the retina.80% of patients with diabetes have DR for more than 10 years [25].Microaneurysm (MA) is the first symptom of DR that causes blood leakage to the retina.This lesion usually appears as small red circular spots with a diameter of fewer than 125 micrometers [11].
As mentioned in [3], the methods of automatic MA detection proceed in three stages (preprocessing, MA candidate extraction and classification).Preprocessing stage is to reduce the noise and enhance the contrast of input images applying image preprocessing techniques.These techniques are performed on the green colour plane of RGB images, because in this plane microaneurysms have the higher contrast with the background.In the second stage (MA candidate extraction stage), candidate regions for MA are detected, and because many of the blood vessels may result in false positives, the blood vessels are extracted from the candidates image using blood vessel segmentation algorithms.In the third stage (classification stage), after applying feature extraction and selection, a classification algorithm is used to categorize features into MA candidate (abnormal) and non-MA candidate (normal), while a proba-bility is estimated for each candidate using a classifier and a large set of specifically designed features to represent a MA.
Over the past decade, there has been a dramatic increase in automatic microaneurysms (MA) detection.First MA detection was introduced based on mathematical morphology detection approach in fluorescein angiograms.In these methods, to distinguish MA from vessels a morphological top-hat transformation with a linear structuring element at different orientations is used.Then, some papers, such as Spencer, Cree, Frame, and coworkers [7,19], tried to improve these approaches.They added two more steps to the basic top-hat transform based detection technique, matched filtering postprocessing step and a shade-correction preprocessing step.After detection and segmentation of candidate MA, various shape and intensity based features were extracted and finally a classifier was used to separate the real MA from spurious responses.After that, a modification version of the top-hat based algorithm is proposed and applied to high-resolution, red free fundus photographs by Hipwell et al. [10].Fleming et al. [6] proposed an improvement of this method by locally normalizing the contrast around candidate lesions and eliminating candidates detected on vessels through a local vessel segmentation step.Niemeijer et al. [15] detect MA candidates in color fundus images through presenting a hybrid scheme that used both the top-hat based method as well as a supervised pixel classification based method.
In addition to mathematical morphology approach, some other techniques are used for detection of red lesion in fundus image.Sinthanayothin et al. [18] assumed that images can be categorized in three classes, vessels, red lesion and MA.Then after applying neural networks as a recursive region growing procedure to segment both the vessels and red lesions in a fundus image, any remaining objects were identified as microaneurysms.Kamel et al. [12] also utilized a NN approach for automatic detection of MA in retinal angiograms.Using NN provide the ability of detecting the regions with MA and rejecting other regions.Usher et al. [20] used neural network to detect the microaneurysms.First of all preprocessing is done.After preprocessing, microaneurysms are extracted using recursive region growing and adaptive intensity thresholding with moat operator and edge enhancement operator.Moreover, Quellec et al. [16] described a supervised MA detection method based on template matching in wavelet-subbands.
However, literature reviews have indicated that there are still some problems in this area that haven't been considered before.One of the main problems in MA de-tection is the poor quality images in JPEG format of the publicly available datasets, so MAs are too blurred or too small to be detected.Moreover, considering that the scale of Gaussian kernel is fixed, different size of MAs cannot be covered properly.larger MA will not be extracted if only small scale is used, and if a large scale is used, then small MAs that lie close to each other are considered as one MA.This, produces a lower correlation coefficient.Furthermore, a few MAs which are located close to blood vessels are missed in preprocessing stage, because they are recognized as part of the vascular map which should be removed in this stage.Another challenge of MA detection using neural networks, is imbalanced dataset which means the number of MA samples are usually much higher than the number of Non-MA ones.This can lead to improper and imbalanced training of networks, and also increase complexity and decrease classification accuracy because of processing of a large amount of uninformative data.
In this paper, a new method for MAs detection in fundus images based on deep-learning neural networks is developed to address the problems with the current automatic detection algorithms.Deep learning algorithms, in particular convolutional networks, have rapidly become a methodology of choice for analyzing medical images.Deep learning is an improvement of artificial neural networks, consisting of more layers that permit higher levels of abstraction and improved predictions from data [8].In our proposed method, by using the characteristic of convolution neural networks, the MA candidates are selected from the informative part of the image where their structure is similar to an MA, then a CNN will detect the MA and Non-MA spots.According to our results, the proposed method can decrease false-positive rate and can be considered as a powerful solution for automatic MA-detection approach.
This paper starts with a brief introduction to deep learning in Part 2. Part 3 is dedicated to our proposed method.Then Part 4 shows our experimental results, and Part 5 dedicated for discussion.Finally, in part 6, we conclude our work.

Deep Learning in Medical Image Analysis
Artificial neural networks and deep learning, conceptually and structurally inspired by neural systems, rapidly become an interesting and promising methodology for researchers in various fields including medical imaging analysis.Deep learning means learning of the representations of data with multiple levels of abstraction used for computational models that are composed of multiple processing layers.These methods rapidly become an interesting and promising methodology for researcher and are gaining acceptance for numerous practical applications in engineering.Deep learning has performed especially well as classifiers for image-processing applications and as function estimators for both linear and non-linear applications.Deep learning recognize complicated structure in big data sets by utilizing the backpropagation algorithm to indicate how the internal parameters of a NN should be changed to compute the representation in each layer from the representation in the previous layer [13].
In particular, convolutional neural networks (CNNs) have proven to automatically learn mid-level and highlevel abstractions obtained from raw data (e.g., images), and so have been considered as powerful tools for a broad range of computer vision tasks.Recent results indicate that the generic descriptors extracted from CNNs are extremely effective in object recognition and localization in natural images.Medical image analysis are quickly entering the field and applying CNNs and other deep learning methodologies to a wide variety of applications.In medical imaging, the accurate diagnosis of a disease depends on both image acquisition and image interpretation.Thanks to the emerging of modern devices acquiring images very fast and with high resolution, image acquisition has improved substantially over recent years.The image interpretation process, however, has just recently begun to benefit from machine learning.

Convolutional Neural Networks (CNNs)
The most successful type of models for image analysis to date are convolutional neural networks (CNNs).CNN consists of a set of layers called convolutional layers each of which contains one or more planes as a feature map.Each unit in a plane receives input from a small neighborhood in the planes of the previous layer.Each plane has a fixed feature detector that is convolved with a local window which is scanned over the planes in the previous layer to detect increasingly more relevant image features, for example lines or circles that may represent straight edges or circles, and then higher order features like local and global shape and texture.To detect multiple features, multiple planes are usually used in each layer.The output of the CNN is typically one or more probabilities or class labels [21].
Fig. 1 shows one of the architecture of CNN structured we used in MA detection.As can be seen, the network is designed as a series of stages.The first three stages are composed of convolutional layers (blue) and pooling layers (green) and the output layer (brown) is consist of three fully-connected layers and the last layer

The proposed method
To address the usual problems of previous works, mentioned in Introduction (poor quality of images, the fixed scale of Gaussian kernel, MAs located close to blood vessels and imbalanced dataset), we proposed a twostage training strategy where informative normal samples are selected from a probability map which is the output of the first CNN, called basic CNN.The final CNN classify each pixel in the test images as MA or non-MA.This CNN gets the probability map from the previous stage as the weighted matrix for the input test images, and result in a final probability-map for each test image showing the probability of being a pixel MA or non-MA.Fig. 2 shows different steps of the proposed method.

Preprocessing Step
Because the retinal images are usually non-uniform illumination, a preprocessing step is needed to apply colour normalization and eliminate retina background.For this purpose, first the background image by using the median filter with the size of 30 × 30 pixel is obtained, then it is subtracted from the original image.Afterwards, input patches are produced for the basic CNN based on their centered pixel, those with a MA pixel at the center are considered as MA samples and those with non-MA pixel are considered as non-MA samples for training.

Candidate Selection by basic CNN
Fig. 1 shows the architecture of basic CNN.The training procedure in CNN is a sequential process that requires multiple iterations to optimize the parameters and extract distinguishing characteristics from images.In each iteration, a subset of samples are chosen randomly and applied to optimize the parameters.This is obtained by back propagation (BP) and minimizing cost function [13].In this stage the basic CNN is trained with small patches whose labels are determined by the label of the their central pixel.The basic CNN is used to solve the imbalanced data problem where the number of patches including MA and Non-MA in retinal images is not balanced and cause network complexity and improper convergent.To avoid imbalanced data problem, in each epochs, an equal number of MA and non-MA patches are selected to train the network.However, because of not using all non-MAs in learning process, selecting the equal number of MA and non-MA patches will cause a high positive-false in initial results.the basic CNN returns an initial probability map indicating for each input pixel the inital probability of belonging to MA.This map is needed to prepare the input dataset for the final CNN.

Classification by final CNN
The final CNN works as the main classifier to extract the MA candidate regions.This CNN has more layers, and therefore more abstract levels than the basic CNN which lead to a discriminative MA modelling.Unlike the basic CNN which used a random sample from the input dataset pool, the final CNN apply the probability map from the previous stage as the weighted matrix for the input images.In other words, the CNN inputs for classification are selected from the informative patches whose structure is similar to MA.This method results in decreasing computational complexity.The output of this CNN is a map for each test image showing the MA probability of a pixel.However, this map is noisy and a post-processing step is needed.

Post-processing
In practice, the probability map obtained from the final CNN is extremely noisy.For example when there is two close candidates, they are merged and considered as one.Therefore, to obtained a smoothed probability map, it is convolved with a 5-pixel-radius disk kernel.The local maximum of the new map are expected to lie at the disk centres in the noisy map, i.e., at the centroids of each MA to obtain a set of candidates for each image.

The architectures of CNNs
In this work, two different structure are used for the basic and final CNNs.As can be seen from Fig. 1, the basic CNN includes three convolution layers, each of which followed by a pooling layer, then three fullyconnected layer and finally a Softmax layer in the output layer.The final CNN has more layers than the basic CNN.The corresponding layer number of final CNN is five convolution and pooling layers, then two fullyconnected and one Softmax classification layer which is fully connected with two neurons for MA and non-MA, Table 1 and 2. In this work, to increase the accuracy, a dropout training with a maxout activation function is used.Dropout means to reduce over-fitting by randomly omitting the output of each hidden neuron with a probability of 0.25.
Training process is similar to standard neural network using stochastic gradient descent.we have incorporated dropout training algorithm for three convolutional layers and one fully connected hidden layer.16 filter sizes 6 × 6 in the first convolution layer, 16 filter size 5 × 5 in the second layer, and 16 filter size 3 × 3 is applied in the third convolution layer, and then maxout activation function is used for all layers in the network except for the softmax layer.The filter size in Max pool layer is 2 × 2 with stride 2. After each pair convolution and pooling layers, an activation LeakyReLU layer is applied that improved the version of ReLU (rectify linear unit).In this version, unlike the ReLU in which negative values become zero and so neurons become deactivated, these values in the Leaky ReLU will not be zero, instead, the value of is added to the Equation 1.
Where a is a small constant value.The final layers of the network consist of a fully connected layer and a final softmax classification layer.This function produces a score ranging between 0 and 1, indicating the probability of pixel belongs to the MA class.To train the network, loss function of a binary cross entropy is used.cross entropy calculate the difference between predicted values (p) and labels (t), using the following equation:

Experimental Results
To verify our proposed method, we implement the CNNs using deep learning Keras libraries based on Linux Mint operating system with 32G RAM, Intel (R) Core (TM) i7-6700K CPU and NVIDIA GeForce GTX 1070 graphics card.In this experiment, we used two standard publicly available datasets, ROC [2] and E-Ophtha-MA [1] databases to train and test the proposed method for the detection of MA in retinal images.ROC includes 100 colour image of the retina that obtained from Topcon NW 100, Topcon NW 200 and Canon CR5-45NM cameras with JEPG format.These images were divided into two parts of 50 subsets of training and testing.The image dimensions are 768 × 576 , 1058 × 1061 and 1389 × 1383 [14].E-Ophtha-MA database contains 148 colour images of JEPG format and with the size of 2544 × 1696.
To validate results, a cross-validation algorithm is utilized to divide the data to 75% training and 25% testing sets, then exchange the training and testing sets in successive rounds such that all data has a chance of being trained and tested.For accuracy evaluation, we computed true positive (TP) as the number of MA pixels correctly detected, false positive (FP) as the number of non-MA pixels which are detected wrongly as MA pixels, false negative (FN) as the number of MA pixels that were not detected and true negative (TN) as the number of no MA pixels which were correctly identified as non-MA pixels.For better representation of accuracy, sensitivity is defined as follow.
In this experiment, to verify the accuracy of the proposed method, we compared our sensitivity value with the usual current works (Latim [16], OkMedical [24], Waikato [5], Fujita Lab [9], B Wu's method [23], Valladolid [17]), Table 3 and 4. From these tables, our proposed method, compared with other methods, has the lowest sensitivity (0.04) when the average number of FP per image (FPs/Img) is 1/8, while this values increased quickly and increased to a maximum of 0.76 at FPs/Img equals 8. Valladolid assume all pixels in the image are part of one of three classes: class 1 (background elements), class 2 (foreground elements, such as vessels, optic disk and lesions), and class 3 (outliers).A three class Gaussian mixture model is fit to the image intensities and a group of MA candidates are segmented by thresholding the fitted model.The sensitivity of this method is 0.19 at FPs/Img = 1/8 and gradually increase to 0.52 at FPs/Img = 8.The Waikato Microaneurysm Detector performs a tophat transform by morphological reconstruction using an elongated structuring element at different orientations which detects the vasculature.After removal of the vasculature and a microaneurysm matched filtering  [17] 0.19 0.22 0.25 0.30 0.36 0.41 0.52 Waikato [5] 0.06 0.11 0.18 0.21 0.25 0.30 0.33 Latim [16] 0.17 0.23 0.32 0.38 0.43 0.53 0.60 OkMedical [24] 0.20 0.27 0.31 0.36 0.39 0.47 0.50 Fujita Lab [9] 0.18 0.22 0.26 0.29 0.35 0.40 0.47  [23] 0.06 0.12 0.17 0.24 0.32 0.42 0.57 step the candidate positions are found using thresholding.In comparison with other methods, Waikato has the lowest sensitivity ranging from 0.06 to 0.33.Latim assumes that microaneurysms at a particular scale can be modelled with 2-D, rotation-symmetric generalized Gaussian functions.It then uses template matching in the wavelet domain to find the MA candidates.Latim method can be considered to have the second high sensitivity value after our proposed method.The sensitivity of this method is 0.17 at FPs/Img = 1/8 and 0.60 at FPs/Img = 8.OkMedical responses from a Gaussian filter-bank are used to construct probabilistic models of an object and its surroundings.By matching the filter-bank outputs in a new image with the constructed (trained) models a correlation measure is obtained.In Fujita lab work, a double ring filter was designed to detect areas in the image in which the average pixel value is lower than the average pixel value in the area surrounding it.Instead, the modified filter detects areas where the average pixel value in the surrounding area is lower by a certain fraction of the number of pixels under the filter in order to reduce false positive tions on small capillaries.The sensitivity of OkMedical and Fujita ranged from 0.18 to 0.50.Fig. 3 confirm our results on Table 3 and 4.This figure shows the free-response receiver operating characteristic (FROC) which is a graphical plot illustrating the diagnostic ability of classifiers and created by plotting the sensitivity (TP rate) against the average of FP per image at various threshold settings, and compare the sensitivity of the proposed method and other methods from [5,9,16,17,24] on ROC and E-Ophtha-MA databases.From Fig. 3a we can see that the sensitivity of the proposed method on ROC dataset is about 0.3 higher that other methods.It is about 0.6 for the FP greater than 1 and reached the maximum of 0.8, while this number for other methods doesn't exceed 0.6.Fig. 3b also shows that the sensitivity of the proposed methods on E-Ophtha-MA databse is about 0.2 greater than BWu's method [23].Table 5 shows Competition performance measure (CPM) to evaluate results.This table also confirm that our proposed method has the highest CPM value for both ROC and E-Ophtha-MA datasets.These values are 0.45 and 0.42 for ROC and E-Ophtha-MA datasets respectively, while the maximum corresponding values for other methods are 0.38 and 0.35.

Conclusion
In this paper, an approach for automatic MAs detection in retinal images based on deep-learning CNN is developed to address the previous works problems such as    [17] 0.32 Waikato [5] 0.21 -Latim [16] 0.38 -OkMedical [24] 0.36 -Fujita Lab [9] 0.31 -B Wu's method [23] -0.35 imbalanced dataset and inaccurate MA detection.In this method, because of using a two-stage CNN, the MAs candidate for classification process are selected from a balanced dataset and informative part of the image where their structure is similar to MA, and this results in decreasing computational complexity.According to our experimental results based on two standard publicly available dataset, the proposed method has a higher sensitivity value, and can decrease false-positive rate compared to previous methods; it ,therefore, can Fig.4: Pixel probability maps obtained from the final CNN for a different number of epochs.In initial epochs, the probability map include low probabilities of MA (depicted as green spots), in the subsequent epochs, the medium and high probabilities are in blue and purple respectively.
be considered as a powerful improvement for previous MA-detection based on retinal images approach.For the future work, we plan to improve the training phase by combining the base and the final CNNs, and also apply this method on other medical application where imbalance data is an issue.any studies with human or animal subjects performed by any of the authors; hence, formal consent is not applicable.In this study, two standard publicly available databases, ROC [2] and E-Ophtha-MA [1] databases are used.

Fig. 1 :
Fig. 1: The architecture of applied CNN in this project.

Fig. 3 :
Fig. 3: The comparison of FROC curves of the proposed and previous methods

Table 1 :
Architectures of Final CNN

Table 2 :
Architectures of Basic CNN

Table 3 :
Sensitivities of the different methods at the various measurement points.All microaneurysms from the test set are included

Table 4 :
Sensitivities of the different methods at the various measurement points.allmicroaneurysms from the test set are included FROC results on Retinopathy online challenge dataset at average number of False positives per image

7
Compliance with Ethical Standards -conflict of interest: The authors (Noushin Eftekheri, Dr.Hamidreza Pourreza and Dr.Ehsan Saeedi) declare that they have no conflict of interest.-Ethical Standards: This article does not contain