To fuse the deep and radiomic features of MRI images, and establish an automatic prenatal prediction and typing model for placental invasion, this research mainly includes data collection, ROI extraction, deep and radiomics features extraction, and classification network training. The process flow diagram is shown in Fig. 3.
In the figure, firstly, the U-net is trained using the ROI data marked by the radiologists to make it have the ability to segment the placental tissue from the original MRI image. Then, the trained U-net is used to realize the automatic extraction of placental tissue, and ROI expansion is performed to determine the relative better ROIso as to the deep features and radiomics features can be extracted. Finally, a multilayer perceptron network is established by combining radiomics and deep features to realize the prenatal diagnosis of placental invasion.
Data collection
It is fundamental to collect necessary MR images and related clinical materials for the prenatal evaluation of placental invasion. The MR images and clinical materials were collected from the Affiliated Hospital of Medical College of Ningbo University and Ningbo women’s and children’s hospital, the time span of the collected materials was from January 2017 to November 2020. All included cases were suspected of placental invasion by ultrasonography or clinical examination. Meanwhile, the surgical (delivery) and postoperative pathological data of relevant patients were obtained.
All MRI examinations were performed by radiologists with more than 5-year of work experience using 1.5 Tesla units to perform 8 or 16-channel array sensitivity-coded abdominal coil scans. The imaging equipment of the Affiliated Hospital of Medical College of Ningbo University is Ge signa twinspeed 1.5T superconducting dual gradient magnetic resonance scanner with 8-channel body phased array coil. The imaging equipment of Ningbo Women’s and Children’s Hospital is Philips Achieva Noval Dual 1.5T superconducting dual gradient magnetic resonance scanner, using a 16-channel body phased array coil. Before MR examination, the patients were asked to fill the bladder with moderate water and respiratory training. When scanning, the supine position is adopted, and the head is advanced. All sequences are scanned in three directions: transverse, sagittal, and coronal. The scanned images are stored in the hospital’s Picture Archiving and Communication System (PACS) in the Digital Imaging and Communications in Medicine (DICOM) format.
The inclusion criteria were as follows:
(1) Patients who underwent MRI examination after 30 weeks of gestation with T2 weighted image (T2WI) sequence; (2) Patients with definite placental invasion or pathological records after cesarean section; (3) Patients with good image quality and meeting the diagnostic requirements. The exclusion criteria were as follows: (1) Patients without T2WI MRI data; (2) Patients without clinical or surgical pathology confirmation; (3) Patients with severe image artifacts due to fetal movement or poor cooperation of pregnant women.
According to the above criteria, we collected 352 patients’ data from the Affiliated Hospital of Medical College of Ningbo University and Ningbo women’s and children’s Hospital. There were 147 cases without placental invasion and 205 cases with placental invasion. Among 205 cases of placental invasion, 66 cases were placenta accreta, 117 cases were placenta increta and 22 cases were placenta percreta. We divide it into the train set and test set. There were 189 cases in the train set and 163 cases in the test set. In the train set, 84 cases without placental invasion, 34 cases with placenta accrete, 60 cases with placenta increta, and 11 cases with placenta percreta. The test set included 63 cases without placental invasion, 32 cases with placenta accrete, 57 cases with placenta increta, and 11 cases with placenta percreta.
Considering that the main signs of placental invasion in MR imaging are as follows: the placental signal was uneven (low or slightly high, mixed high signal shadows were seen in the placenta) on T2WI images, the local irregular thinning or disappearance of moderate or slightly high signal myometrium on T2WI images, the placenta or (and) uterine localized abnormal protrusion reflected by T2WI images, and low signal strip shadow in the placenta in T2WI images, etc. [32]. Therefore, T2WI is the main reference imaging sequence for clinical diagnosis of placenta accrete. In this study, T2 sequences of transverse, coronal, and sagittal were selected to study the auxiliary diagnosis of placental invasion.
ROI extraction
Extraction of ROI is the basis of computer-aided diagnosis of placental invasion. Based on U-net, we established the model of placental tissue segmentation, thus extracting the ROI automatically. Firstly, some MR images were selected, and two radiologists with more than 5-year working experience annotated the region and of placental tissue and outlined the boundary of placental. The annotation software was ITK-SNAP (version 3.6.0, download website: http://www.itksnap.org/). To ensure annotating placental region accuracy, the two radiologists annotated each image separately and took the intersection of the labeled areas. If the annotating regions of the two radiologists diverge significantly, another radiologist with more than 10-year of working experience was invited to evaluate the labeling results, and the final results were given after negotiation among them. Figure 4 shows the placental tissue boundary of T2WI labeled by the radiologist. Figures (a), (b), and (c) show the transverse, sagittal, and coronal planes respectively.
In this study, 490 T2WI images were selected annotated the placental region as the ground truth training U-net. Recently, U-net has been successfully applied in image segmentation, especially in medical image segmentation. By end-to-end training from very few images, U-net can obtain accurate target boundary location in image segmentation. The U-net consists of two paths: down-sampling and up-sampling. The down-sampling encodes image semantic information through the level by level convolution and pooling while the up-sampling decodes the spatial and multi-scale information by step-by-step de-convolution to acquire multi-level features of the image and simultaneously achieves target segmentation. To make up for the loss of the spatial and boundary information in the encoding stage, the feature maps of the encoder and decoder were fused by concatenating correspondingly using the skip connection. By fusion low-level spatial information and high-level semantic information, the decoder of the U-net can obtain more high-resolution information when up-sampling to recover the details of the original image more perfectly, and then improve the segmentation accuracy.
Although U-net can segment the placental area accurately, for subsequent placental invasion typing, placental tissue alone cannot fully characterize the relationship between placenta and neighboring tissues and organs. This is because placental invasion is not only related to the characteristics within the placental region, but also the characteristics of the boundary between the placenta and the uterine myometrium [33]. In addition, according to relevant reports, placental tissue infiltration of the bladder and other tissues and organs adjacent to the placenta is also a specific sign for the diagnosis of penetrating placental invasion [35]. To evaluate the discriminative power of peri-placenta pixels on placental invasion property, 10, 20, 40, and 60 pixels were extended from the placental region segmented by U-Net to form ROIs for subsequent placental invasion diagnosis. The reasonable extension was determined through the follow-up placental invasion diagnosis experiments. Based on ROI, the deep and radiomic features were extracted to construct the evaluation model of placental invasion typing. Figure 5 shows an example of the placental tissue segmented by U-Net and the ROIs formed by extending the boundary with different sizes.
Extraction of radiomics features
After segmenting the ROI, we use PyRadiomics (version 3.0, download address: https://github.com/Radiomics/pyradiomics) [36] to extract radiomic features to train the auxiliary diagnosis model of placental invasion. The extracted features can be divided into three categories: (1) Intensity-based features; (2) Shape-based features; (3) Texture-based features. Among them, the intensity-based feature transforms ROI into a single histogram (which describes the distribution of pixel intensity) and derives some basic features (such as energy, entropy, kurtosis, and skewness) from it. Shape-based features describe the geometric structure of ROI, which is useful in a sense because the shape of placental tissue is highly correlated with placental invasion [32, 37]. Texture based features are the most informative features, especially for the issue of tissue heterogeneity, because texture-based features can capture the spatial relationship between adjacent pixels [8, 9, 34]. In this paper, we use the Gray-level co-occurrence matrix (GLCM), Gray-level run length matrix (GLRLM), and Gray-level size zone matrix (GLSZM), etc. to calculate various texture features. We extracted 100 image features, including 18 intensity-based features, 9 shape-based features, and 73 texture-based features.
Extraction of deep features
Radiomic features can describe the gray distribution, shape, texture, and other characteristics of placental tissue in MR images, but it is difficult to accurately describe the overall structural relationship between lesions and surrounding tissues, which is of great significance for the diagnosis of placental invasion and evaluation of the degree of placenta implantation. In recent years, image features extracted by deep convolution neural network (DCNN) have been proved to be effective in improving the accuracy of image classification, segmentation, or retrieval [38]. We transformed the prenatal prediction and typing of placental invasion into a classification problem. Therefore, according to the characteristics of placental invasion in MR images, a deep dynamic convolution neural network (DDCNN) is designed to extract the deep features. The structure of DDCNN is shown in Fig. 6. As can be seen from Fig. 6, the backbone of DDCNN is a multi-layer automatic coding network, which is composed of the encoder and decoder with a symmetrical structure [39]. During network training, we intercept the original MR image with the smallest bounding rectangle of the extracted ROI and use it as the input of DDCNN. In the encoding stage, the input MR image undergoes a 5-stage Group Model A convolution and pooling operation, and the input MR image is mapped into a feature vector that can represent the semantic information of the placenta. In the decoding stage, the feature vector output by the encoder undergoes a 5-stage Group Model B upsampling and deconvolution operation and restores the original input image as much as possible as the training target. After the DDCNN training is completed, we remove the decoder, fix the encoder parameters, and input the smallest external rectangle region containing the ROI of the MR image, then we can project it into a low-order feature space through the encoder, and realize the extraction of depth features. With this structure, DDCNN can extract the depth features of MR images through unsupervised training, thus solving the problems of traditional CNN in extracting the depth features of MR images, such as the difficulty of extracting supervised samples and the network easily falling into overfitting.
The Group Model A and Group Model B of the DDCNN are shown in Fig. 7. Each level of the Group Model is mainly composed of a dynamic convolutional layer [40], RELU activation layer [41], BN layer [42], and the max-pooling layer or up-sampling layer.
It can be seen from Fig. 7 that in the Group Model, the traditional convolution operation is replaced by dynamic convolution. The method to achieve dynamic convolution is to replace the traditional fixed convolution kernel with a dynamic convolution kernel that is adaptively adjusted with the input image. Specifically, it uses the mechanism of multiple convolution kernels and introduces a lightweight squeeze and excitation module [30] to build an attention model. Through model training, the respective weights of multiple convolution kernels are obtained, and the dynamic convolution kernel obtained by weighting and superimposing each convolution kernel participates in the convolution operation. Suppose the multiple convolution kernels introduced in a certain layer of convolution are \({\text{conv}}_1,{\text{conv}}_2,\ldots ,covn_N\), and their respective weights are \(w_1,w_2,\ldots ,w_N\). The squeeze and excitation introduced in the model is shown in the attention module in Fig. 7. The input images are processed by average pooling, full connection, relu, and finally mapped to the output \(w_1,w_2,\ldots ,w_N\) by softmax [40]. Where:
$$\begin{aligned} w_1+w_2+\ldots +w_N=1 \end{aligned}$$
(8)
where N denotes the number of convolution kernels.
After the weights of each convolution kernel are obtained, the dynamic convolution kernel involved in feature extraction can be constructed through weighted superposition [40]:
$$\begin{aligned} {\text{conv}}=\sum _{i=1}^{N}{w_i*{\text{conv}}_i} \end{aligned}$$
(9)
where \({\text{conv}}_i\) denotes the ith convolutional kernel, \(w_i\) represents the weight of the corresponding ith convolution generated by the attention module, N indicates the number of convolutional kernels, and conv is the synthesized convolutional kernel, which means the final convolutional kernel involved in the operation in Group Model A.
With this structure, the convolution kernel will be adjusted adaptively with the input image during convolution operation, which can better adapt to the structural heterogeneity of placenta tissue of different patients and different types of placental invasion, so that the extracted deep features can effectively describe the pathological information contained in placental tissue.
Prenatal prediction and typing of placental invasion
Based on the above-mentioned features, we train a classifier using the multi-layer perceptron model to divide the patients into four types: no placental invasion, placenta accreta, placenta increta, and placenta percreta according to the T2WI images, so as to realize the prenatal prediction and typing of placental invasion. The constructed classifier is shown in Fig. 8. As shown in Fig. 8, the input of the classifier is the radiomic feature extracted from the ROI and the deep features extracted by the DDCNN encoder. To maintain the balance between the two types of features, their dimensions are all set to 100. When training the classifier, the results confirmed by clinical or surgical pathology are used as supervision information. The classifier consists of four layers, in which the number of neurons in each layer is 200, 100, 20, and 4 respectively. The activation function of the middle layers is Relu, and the output of the last layer of the classifier is a 4-dimensional feature vector, which is activated by softmax which is often used in multi-classification problems.
The softmax [40] first enhances the difference between input values by nonlinear exponential operation with base exp, and then the output of multiple neurons is mapped to the values in the (0,1) interval and normalized to a probability distribution, so as to perform multi-classification, as shown in formula (10).
$$\begin{aligned} {S_i=\frac{\exp ^{y_{j}}}{\sum _{j=0}^{3}{exp^{y_{j}} }}},{0<=i<=3} \end{aligned}$$
(10)
where \(y_j\) is the output of the classifier, \(S_i\) is the probability value of patients corresponding to four types of no placental invasion, placenta accreta, placenta increta, and placenta percreta, and the type with the highest probability is taken as the final prediction result.