Automatic Segmentation of Coronary Lumen and External Elastic Membrane in IVUS Images Using 8-layer U-Net

Background: Intravascular ultrasound (IVUS) is the golden standard in accessing the coronary lesions, stenosis, and atherosclerosis plaques. In this paper, a fully-automatic approach by an 8-layer U-Net is developed to segment the coronary artery lumen and the area bounded by external elastic membrane, i.e. EEM cross section area (EEM-CSA). The database comprises of single-vendor and single-frequency IVUS data. Particularly, the proposed data augmentation of MeshGrid combined with ip and rotation operations is implemented, improving the model performance without pre- or post-processing of the raw IVUS images. Results: The mean intersection of union (mIoU) of 0.941 and 0.750 for the lumen and EEM-CSA respectively were achieved, which exceeded the manual labeling accuracy of the clinician. Conclusion: The accuracy shown by the proposed method is sucient for subsequent reconstruction of 3D IVUS images, which is essential for doctors’ diagnosis in the tissue characterization of coronary artery walls and plaque compositions, qualitatively and quantitatively.


Background
Coronary heart disease has been the leading cause of death worldwide [1], and the coronary atherosclerosis is the dominant cause of coronary heart disease. In early atherosclerosis, coronary artery remodeling slows down the progression of vascular stenosis with the accumulation of coronary plaques. Intravascular ultrasound (IVUS) is one of the most effective real-time medical imaging techniques, which play critical roles in the diagnosis and treatment of coronary heart disease.
2D IVUS images are acquired serially by an IVUS catheter pulling back through the coronary artery and can evaluate arterial distensibility caused by atherosclerotic plaque. The accurate segmentation of lumen and external elastic membrane cross section area (EEM-CSA) from 2D coronary IVUS images that contributes to assess the atherosclerosis plaque and its vulnerability by measuring lumen diameter, plaque eccentricity, plaque burden, etc. has crucial clinical signi cance. However, it is time-consuming and experience-dependent for doctors to manually delineate the lumen and EEM contours on the 2D IVUS images. A typical IVUS pullback contains more than 3000 images, so an accurate, fast, and fully automatic segmentation of lumen and EEM-CSA is highly desirable but remains a challenging task due to the relative complexity of IVUS images.
Several segmentation techniques and methods in image processing and computer vision have been performed for coronary IVUS images [3][4][5][6][7][8]. Traditional image processing methods, including graph search, active surfaces, and active contours, were applied to segment IVUS images based on local image properties or global gray level properties. 3D fast marching method [7,8], incorporating the texture gradient and the gray level gradient, was applied to segment the walls of the coronary artery with an interactive initialization on EEM borders. In recent years, deep learning has been widely applied in the medical imaging analysis and achieved remarkable results [13][14][15]. It has been done to detect the lumen and media-adventitia borders in IVUS due to its capabilities in automatic feature extraction [16][17][18].
In this paper, we describe the development and evaluation of a U-Net [19] based pipeline that automatically segments the lumen and EEM-CSA of 2D IVUS images. The pipeline has two major steps: rst, the data augmentation of MeshGrid combined with ip and rotation operations (MeshGrid + Flip + Rotate) is performed on raw IVUS images; second, an 8-layer deep U-Net is used for pixel-level prediction.

Results
Experiments were carried out for segmenting the lumen and EEM-CSA with four augmentation strategies Particularly obvious difference was shown in the 2nd column of Fig. 3.

Discussion
The IVUS images varied signi cantly from the intensity gradient of edge of lumen to the contour curvature of plaques. The current dataset was limited to the single-vendor and single-brand. However, the proposed method provided acceptable segmentation results for both lumen and EEM-CSA from the frames in the dataset. On the visual comparison, in the case of the EEM-CSA segmentation, the performance was lower for complex frames while it was comparable good for simple frames; excellent results were seen for lumen segmentation for all cases. The segmentation of bifurcation images might be di cult due to the ambiguous vessel de nition. Shadows at the back of calci ed plaques also might cause troubles to train the model. The dataset needs to be enriched in future to test the robustness of the proposed method.
The data augmentation of MeshGrid + Flip + Rotate helped improve the segmentation performance. It was capable to generalize well to eliminate the outliers (2nd column of Fig. 4). Neither pre-processing steps nor post-processing steps were necessary. The model was trained well on the current dataset, which provided MIoU of 0.941 for lumen predictions. However, when the testing set deviated far from the training set, such as serious artifacts, mixture plaques and branch vessels, the accuracy for EEM-CSA became relatively low (MIoU of 0.750). It can be improved largely when more coronary IVUS data of different categories are collected for training in the future.
From clinical prospective, a clinical threshold to assess the quality of the method should be provided by expert physicians to interpret the segmentation feature with complex or simple frames. For example, a fast pullback through the calci ed lesion may result in loss of image features, an increase of catheter artifacts and calci ed shadows from the echogenicity of the lumen and plaque textures, which is tougher to be annotated even for experienced physicians. The cardiac cycle motion and coronary vessel pulsation due to the variability or arrhythmia of heart rate might push the catheter to touch the vessel and plaque boundaries, which increase the artifacts and motion jitters in the IVUS images.
Our future research is to extend the current dataset to enhance the robustness and generalization of our method presented in this paper. The heterogeneous dataset of IVUS images shall cover different medical centers, different probe frequencies from different venders. More IVUS image categories from different artery pullback sections and different characteristics should be considered: plaque, bifurcations, branches, shadow artifact, stent, and catheter artifact, etc. Each frame shall be cross-labeled by three expert physicians according to the respective categories to assess the method, which will make it more convincing.

Conclusion
In this paper, an 8-layer U-Net is proposed with the data augmentation of MeshGrid + Flip + Rotate, which is speci cally ts for the coronary IVUS lumen and EEM-CSA segmentation task. The experimental results show its superiority in segmentation accuracy and e ciency. Furthermore, it provides a good start for the image-based gating to implement 3D IVUS reconstruction when fused with X-ray projections, which allows uid and dynamic analysis on plaques and vascular walls of coronary arteries.

Method
In this section, we rst introduce the coronary IVUS dataset used for training and testing. Then, the 8-layer deep U-Net architecture that predicts the masks for the lumen and the EEM-CSA of IVUS images is presented. The training details are described, and the metric for evaluating the proposed method is illustrated.

Dataset and Augmentation
We use the coronary IVUS dataset from The Second A liated Hospital of Zhejiang University School of Medicine. It consists of in vivo pullback of coronary artery acquired by the iLab IVUS from Boston Scienti c Corporation equipped with the 40-MHz OptiCross catheter. It contains IVUS frames from 30 patients, which are chosen at the end-diastolic cardiac phase in DICOM formats, with the resolution for each frame is 512×512. The dataset is split into two parts, 567 frames for training and 108 frames for testing, respectively. The training set is used for building the deep learning model and the testing set is used to evaluate the model performance.
IVUS images contain catheter, lumen, endothelium, intima, media, external elastic membrane, adventitia, atherosclerosis plaque. The external elastic membrane is usually treated as the borders of media and adventitia. The media is gray or dark as it contains dense smooth muscle. The adventitia is similar to external tissues surrounding the vascular walls. The endothelium and intima are thinner than the lumen and media. Thus, the lumen and EEM-CSA can be manually annotated by experienced physicians as the ground truth for metric evaluation. Each IVUS frame has been manually annotated for the lumen and EEM-CSA in the short-axis view by three clinical experts, daily working with the speci c IVUS brand from the Cardiology Department, shown in Figure 1. Each expert is blinded to the other two experts' annotations and every frame is repeatedly labeled by each of the three experts to ensure the correctness and blindness of the annotations.
The training set comprises of 567 frames, which is not large enough for training a CNN model from scratch. Data augmentation is essential for better performance. The augmentation is twofold and performed online. First, the coronary IVUS raw images and the corresponding ground truth are randomly

Model Architectures
The U-Net is one type of the fully convolutional network (FCN) [20] and is the most common convolutional network architecture for biomedical image segmentation. It consists of encoder and decoder parts and predicts segmentation mask at pixel-level instead of image-level classi cation. The encoder part is used for down-sampling and extracts higher-level features. The decoder part is used for up-sampling the output from the encoder part and concatenates the feature maps of the corresponding layer by skip-connection. The skip-connection is to relieve the gradient diffusion problem due to deep layers. The nal decoder layer is activated by softmax to produce the class probability map to recover the segment predictions.
The encoder part has 9 blocks and each incorporates two repeated operations of 3×3 convolution, batch normalization (BN) and LeakyReLU activation. The down-sampling operation of 3×3 convolution with stride 2×2 reduces feature maps by half. The size of the 8 th block is 2×2 to capture the deeper abstract information. The decoder part has 8 blocks to restore the image dimension. Each up-sampling operation contains a 5×5 deconvolution with stride 2ni. The skip-connection concatenates the corresponding feature maps. The last convolution of 1co outputs the probability map of mask class prediction by softmax activation. The entire architecture is shown in Figure 2.

Implementation Details
The model was trained and evaluated on Dell PowerEdge T640 server with Xeon Silver 4114 processor, 128GB of RAM, and four Nvidia GTX 1080Ti graphics cards. It took less than 90 minutes for training and 10ms per image for inference.
The deep learning framework used in this study was Tensor ow 1.13. The optimizer was Adam [21], which was fast and robust. The weights were initialized randomly and the batch size was set to 16. The initial learning rate was 0.001 with the decay of 0.1 every 2000 iterations. A total of 8000~10000 iterations were done for training. Lumen and EEM-CSA were trained and predicted at one shot with the softmax function as the output activation, which gave each pixel its class probability. The loss function was the sparse softmax cross entropy: (1) with K being the number of classes, being the predicted probability belonging to class j, p yj being the true probability.

Evaluation Criteria
In semantic segmentation, the mean intersection over union (MIoU) is the standard metric to evaluate the model, alternatively called Jaccard measure (JM). We compute the MIoU score between the ground truth and the predicted masks: with k being the number of classes excluding background, p ij being the number of pixels of class i predicted to class j.

Availability of data and materials
The datasets analyzed during the current study are available from the corresponding author on reasonable request. The proposed U-Net Architecture with eight layers