Non-negative discriminative brain functional connectivity for identifying schizophrenia on resting-state fMRI

Background Schizophrenia is a clinical syndrome, and its causes have not been well determined. The objective of this study was to investigate the alteration of brain functional connectivity between schizophrenia and healthy control, and present a practical solution for accurately identifying schizophrenia at single-subject level. Methods 24 schizophrenia patients and 21 matched healthy subjects were recruited to undergo the resting-state functional magnetic resonance imaging (rs-fMRI) scanning. First, we constructed the brain network by calculating the Pearson correlation coefficient between each pair of the brain regions. Then, this study proposed a novel non-negative discriminant functional connectivity selection method, i.e. non-negative elastic-net based method (N2EN), to extract the alteration of brain functional connectivity between schizophrenia and healthy control. It ranks the significance of the connectivity with a uniform criterion by introducing the non-negative constraint. Finally, kernel discriminant analysis (KDA) is exploited to classify the subjects with the selected discriminant brain connectivity features. Results The proposed method is applied into schizophrenia classification, and achieves the sensitivity, specificity and accuracy of 100, 90.48 and 95.56%, respectively. Our findings also indicate the alteration of functional network can be used as the biomarks for guiding the schizophrenia diagnosis. The regions of cuneus, superior frontal gyrus, medial, paracentral lobule, calcarine fissure, surrounding cortex, etc. are highly relevant to schizophrenia. Conclusions This study provides a method for accurately identifying schizophrenia, which outperforms several state-of-the-art methods, including conventional brain network classification, multi-threshold brain network based classification, frequent sub-graph based brain network classification and support vector machine. Our investigation suggested that the selected discriminant resting-state functional connectivities are meaningful features for classifying schizophrenia and healthy control.


Background
Schizophrenia is a severely debilitating psychiatric disease, and its symptoms include obstacles in thinking, perception, emotion, behavior, etc. [1]. According to estimation of the world health organization, the global lifetime risk of schizophrenia is about 3.8-8.4‰. Schizophrenia poses a huge burden on patients, families and society. So it is very important to investigate the accurate diagnosis method for schizophrenia.
Schizophrenia is often considered to be a disorder of connectivity between brain regions. However, among the large-scale brain network, the alteration of the connectivity between schizophrenia and healthy control has not been well investigated. With advance in machine learning [2,3] and modern imaging technologies, e.g., functional magnetic resonance imaging (fMRI) providing the functional interaction of the brain regions, more and more attention has been focused on machine learning based disease diagnosis using fMRI [4][5][6][7][8].
In general, the machine learning based diagnosis methods using fMRI can be summarized as two categories, i.e. the voxel based methods and the brain network [9][10][11] based methods. The voxel based methods need to construct the correlation model for each pair of voxels. For example, Demirci et al. used independent component analysis (ICA) to obtain independent component (IC) spatial maps from the voxels of fMRI data, and exploited projection pursuit algorithm for classification [8]. A three-phase feature selection method was proposed for diagnosis of schizophrenia using fMRI [12]. Cao et al. exploited sparse representation classification for voxel based schizophrenia diagnosis and biomarker selection [13]. It is usually hard to obtain enough schizophrenia participants to construct robust voxel network model, and the high computational cost also constrains its application. Besides above voxel based methods, brain network based methods also have been applied into neurodegenerative diseases diagnosis including schizophrenia and Alzheimer's disease [14]. Most of these methods first constructed the brain functional network by measuring the correlation between each pair of brain regions, and then they selected the significant regions with different metrics, e.g. clustering coefficient [14] and Bayesian information criterion (BIC) [15]. Compared to the voxel based methods, the brain network based methods has higher classification efficiency and can provide more intuitive biomarkers for diagnosis.
The mental activity occurring during rest is relevant to the phenomenology of schizophrenia [16]. The resting-state functional network analysis is potentially useful in revealing the pathophysiology of schizophrenia. Compared to task-driven analysis, resting-state functional analysis can provide more complete and accurate brain network features for identifying schizophrenia [17]. Many machine learning algorithms were developed to extract discriminative features from resting-state functional network for schizophrenia [18][19][20]. Shen et al. [21] proposed to use a correlation coefficient method to select the highly discriminative functional connectivity features from resting-state functional network, and then they mapped these features into low-dimensional space by using locally linear embedding (LLE). They found the data in mapping space can reveal the distinct differences of functional patterns between schizophrenia and healthy control. Lynall et al. [22] investigated the some topological properties of the functional connectivity network, and found schizophrenia tends to have more diverse profile of brain functional network. Castro et al. [23] proposed to use multiple kernel learning to fuse Zhu et al. BioMed Eng OnLine (2018) 17:32 the magnitude and phase data in fMRI, and achieved promising result in schizophrenia classification. By using multivariate statistics, Pergola et al. [24] found accounting for the alternation of thalamic would improve the detection of subjects at familial risk for schizophrenia.
In most of the schizophrenia classification problems, the size of the subjects is limited and the number of the features, such as brain connectivities and voxels, is very large. Schizophrenia classification with fMRI data is a typical small sample size learning problem, which leads curse of dimensionality, overfitting and poor generalization ability for classification algorithms. It is well known that feature selection stage plays an important role in improving the classification performance. Therefore, it is very necessary to design an effective feature selection method for schizophrenia classification problem.
Lasso [25,26] is one of the most famous feature extraction methods. It aims to identify the most significant features for a compact data representation by constructing l 1 -norm constraint based sparse representation model between feature and label. In Lasso, most of the sparse representation coefficients are equal or close to zero, and sparse representation coefficients with large amplitude are preserved, which makes it easy to explain the significances of the features in nature. Lasso and its variants have been widely used for removing redundant features in brain disease diagnosis. Rashid et al. exploited Lasso for estimating the dynamic connectivity from resting fMRI and identify differences among schizophrenia and other neurodegenerative diseases [14]. Zhang et al. introduced the group structure into Lasso and applied it into identifying Alzheimer's disease and its early stage. Fused Lasso exploits the ordering of data by explicitly regularizing the differences between neighboring samples [27]. Watanabe et al. used the fused Lasso regularized support vector machine to account for the high dimensional correlation maps of brain [28].
Suppose we have p subjects and each subject has m brain connectivity or voxel features (p ≪ m). Lasso can select at most p features from the data because of the convex optimization problem. On one hand, the limited features selected by Lasso are often difficult to descript the difference between healthy control and schizophrenia in complex brain structure and function. On the other hand, if there are some pairwise correlations among a group of connectivity or voxel features, Lasso selects only one feature and does not consider which one to select. By combining l 1 -norm and l 2 -norm regularizations in regression model, Zou et al. proposed elastic-net feature selection algorithm [29]. Elastic-net takes both automatic feature selection and continuous shrinkage into account, and it can select groups of correlated variables. In elastic-net, the weights of two regular terms can be dynamically adjusted, which provides a flexible way to extract the discriminative connectivities for identifying schizophrenia. In addition, the conventional feature extraction methods often cannot rank the significances of the brain connectivities with a uniform criterion. For example, in Lasso, the significance value of each connectivity could be positive or negative. Both the sign and absolute value of the significance have effect on the rank of the connectivities.
In this paper, we propose a novel feature extraction method, i.e. non-negative elasticnet based method (N2EN), for robust classification of healthy schizophrenia patients and healthy controls. Figure 1 shows the flowchart of the proposed method. We first use the N2EN to select the most significant connectivities from the whole brain network, and obtain the discriminative sub-network for schizophrenia. The N2EN is a multivariate representation model based on the connectivities, which can reflect the global structure of brain network. In N2EN, the significances of the brain connectivities are all positive, so we can rank the brain connectivities according to the absolute values of the significances. Then, we project the features of discriminative sub-network into reproducing kernel Hilbert space by kernel discriminant analysis (KDA) [30], in which the projection is obtained by maximizing the between-class covariance and minimizing the within-class covariance. Finally, the nearest neighbor classifier is used for classification. We applied the proposed method into schizophrenia identification. The results show that our method outperforms conventional brain network based methods and voxel based methods in identifying schizophrenia.

Participants
In this study, all participants were recruited from the department of psychiatry, affiliated Nanjing brain hospital of Nanjing medical university. 24 schizophrenia patients were recruited. All the recruited subjects were diagnosed by expert consensus panels. We used the positive and negative syndrome scale (PANSS) to obtain the score of severity of schizophrenia. PANSS is a widely used medical scale for measuring symptom severity of schizophrenia patients [31]. For each patient, an approximately 45-min clinical interview is conducted. The patient is rated from 1 to 7 on 30 different symptoms based on the interview as well as reports of family members or primary care hospital workers. The higher the PANSS score is, the more serious the symptoms are. Patients were not included if they had a prior history of organic brain diseases, infectious diseases, alcohol or drug abuse. 21 healthy subjects with matched age, gender, and average education were recruited as a control group. The inclusion criteria are as follows: (1)   organic brain disease, infectious diseases or other chronic somatic diseases. Table 1 shows demographic characteristics and clinical variables of the participants.

Image acquisition
Data acquisition was performed using a 3-T Siemens Tim-Trio scanner with a 12-channel head coil. The resting-state fMRI (R-fMRI) images of each participant were acquired with the following parameters: flip angle = 90°, TR/TE = 2000/30 ms, imaging matrix = 64 × 64, FOV = 256 × 256 mm, 36 slices, 180 volumes, and voxel thickness = 3 mm. During scanning, all subjects were instructed to keep their eyes open and stare at a fixation cross in the middle of the screen for 5 min.

Pre-processing
For each participant, the image prepossessing step is as follows: the first 10 volumes of functional time series were discarded because of the instability of MRI signal. Then, the leaving volumes were slice acquisition corrected, head-motion corrected, normalized to the SPM5 Montreal Neurological Institute template, and re-sampled to 3-mm 3 voxels. After linear detrending, data was filtered using typical temporal bandpass (0.01-0.08 Hz), slow-5 bandpass (0.01-0.027 Hz), and slow-4 bandpass (0.027-0.073 Hz), respectively. Next, the motion parameters, the global mean signal, WM, CSF as nuisance covariates were used to reduce the effects of head motion and non-neuronal BOLD fluctuations.

Identifying the alteration of connectivity by N2NE
Suppose we have the time series from m brain regions with a system of fMRI volumes. We can construct the brain functional network by the commonly used Pearson correlation coefficient [2]. Among a large number of brain connectivities, it is necessary to extract the alteration or discriminative connectivities for schizophrenia. Let x i, j denote the ith connectivity feature of the brain network from jth subject, and y j denotes the class label of jth subject. For estimating the significance of the connectivity, we construct the following model: where . , a m ] T , and e denotes the model error. Using l 2 -norm or Tikhonov regularization to prevent over-fitting, the vector of regression coefficients can be obtained by solving the following problem: where γ is the regular parameter. The coefficient vector a obtained by above model is not sparse, which implies it is difficult to select the discriminative feature by comparing the values of these coefficients. Furthermore, there are positive and negative values in representation coefficients, which lead to the significance of the connectivity hard to rank with a uniform criterion. In this paper, we proposed non-negative sparse representation and ridge regression to guarantee an optimum. The objective function is as follows: where ||.|| 1 denotes the l 1 -norm of the vector, e.g. ||a|| 1 = |a i | . γ 1 and γ 2 are the nonnegative parameter controlling the combination of the two regular terms. Figure 2 shows the comparison between the mixed norms used in elastic-net and the other norms. We define the following two augmented sets: The size of X * and y * are (n + m) × m and n + m , respectively. Let α = γ 1 (1 + γ 2 ) −1/2 and a * = (1 + γ 2 ) −1/2 a . Then the objective function in Eq. (3) becomes So computing the solution of the proposed feature selection problem is equivalent to solving a non-negative sparse representation problem on augmented sets.
where v is the vector whose elements are all 1, Eq. (5) is equivalent to: Because X T X is nonnegative, symmetrical and semi-definite, the above problem is convex and has globally optimal solution. The non-negative elastic-net feature selection not only provides interpretation to the importance of the connectivity between two brain regions, but also tolerates data noise existing in brain network to some extent. We perform the non-negative elastic-net on brain network data, and the selected connectivities form a brain sub-network, which is referred to as discriminative sub-network for identifying schizophrenia (DSNIS). The ratio of the number of removed connectivities and the number of all the connectivities in original brain network are defined as the sparsity of DSNIS. The DSNIS can be expected has strong ability to simultaneously identify schizophrenia and evaluate the importance of the connectivity between two brain regions for schizophrenia. Elastic-net is useful when multiple features are associated with another feature. Because, in above case, Lasso tends to pick one of them at random, while the elastic-net prefers to pick two, which alleviates the small sample size problem in schizophrenia classification. The non-negative elastic-net is a unified model to non-negative Lasso and ridge regression. When γ 1 = 0 , the non-negative elastic-net degrades into non-negative ridge regression. When and γ 2 = 0 , it degrades into non-negative Lasso. The proposed non-negative elastic-net is a well-defined convex optimization problem, which ensures the existence and uniqueness of solution.

Classification with significant alteration connectivity
Considering most of the brain network data are linearly non-separable, for achieving good classification performance, we exploit KDA, i.e. a nonlinear method, for classification. Suppose we have a set of DSNIS samples, let Z = {z i }, i = 1, 2, . . . , n . By the nonlinear mapping φ induced by the kernel function f, each sample is transformed into the feature space and becomes φ(z i ), c = 1, 2, . . . , n . In the feature space, KDA seeks the projection directions on which the samples from different classes are far from each other and samples from same class are close to each other simultaneously. The objective function of KDA is as follows: where c is the class number, n i is the sample number of ith class, and z j i represents the ith sample from jth class; u φ and u i φ are the global centroid and the centroid of ith class in the feature space, respectively; S φ b and S φ w are the between-class scatter matrix and within-class scatter matrix in the feature space, respectively. We also define the total scatter matrix in feature space.
, and W is defined as: If KK is singular, we perform Eigen-decomposition on K for obtaining a stable solution of the problem in Eq. (12). For a sample z, its projection in feature space along direction w is: where K (:, i) is the ith column of K. Finally, nearest neighbor classifier is used for disease classification.

Classification for schizophrenia
We applied the proposed method to schizophrenia classification, and compared our method with conventional brain network classification (CNC) [32], multi-threshold brain network based classification (MTNC) [33], frequent sub-graph based brain network classification (FSGNC) [34] and support machine vector (SVM) [20]. Accuracy, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) are measured to evaluate the performance of these methods, and they can be calculated as follows.
1/n c , if z i and z j are from the same class where TP , TN , FP and FN are the number of the patients correctly predicted, healthy controls correctly predicted, healthy controls predicted as patients, and patients predicted as healthy controls, respectively.
The leave-one-out cross validation is carried out to assess classification performance. In experiment, we run our algorithm and the other brain network based algorithms including CNC, MTNC, FSGNC and SVM on our dataset. According to classification results of each brain network based algorithm, we counted the number of TP , TN , FP and FN , respectively. Then we calculated the above 5 measures for each brain network based algorithm, and the results are shown in Table 2. The results demonstrate that our method outperforms CNC, MTNC, FSGNC and SVM. The accuracies of these brain network based methods verses the variations of network threshold are shown in Fig. 3a. It demonstrates our method has higher accuracy than the other methods with the same network threshold. We also show the performance of our method versus the variation of threshold and sparsity in Fig. 3b.
Several voxel based methods are also compared with our method. Table 3 shows the comparison between our method and several voxel based methods. In Table 3, the  accuracies of the these voxel based methods were the results reported in the relevant references [12,13,36,37].
To evaluate the performance of non-negative sparse representation used in our method, several feature selection methods including t test, Lasso, Tikhonov regularization, Laplacian and sparsity score are also compared. The accuracies of these methods are shown in Fig. 4.
The functional brain network is a kind of topologically complex network. In experiment, the differences in topological properties of functional network are measured by connectivity strength [22], connectivity diversity [22], clustering coefficient [9], overlap score [35] and weighted overlap score [35]. These topological metrics are calculated in group-level, and the results are shown in Table 5.

Alternation of the brain connectivity
We also indicate the biomarks for identifying schizophrenia by extracting its alteration of the brain connectivity. In our method, the discriminative ability of each connectivity for identifying schizophrenia is learned by non-negative elastic-net. We show all the discriminative abilities of the connectivities of the brain network in Fig. 5a. In Fig. 5a, both the horizontal axis and the vertical axis stand for the brain region index used in Automated Anatomical Labeling (AAL) temple [38]. For example, the position in ith row and jth column in Fig. 5a denotes the connectivity between ith brain region and jth brain region, and its weight can be judged by the Table 3 The

accuracies of our method and several voxel based methods
The highest accuracy is in italics

Discussion
For the classification performance, our method achieves the accuracy of 95.56% with the sensitivity of 100%. The experimental results imply that schizophrenia patients may have the altered graph structure of many brain region connectivities and our method is more appropriate in extracting the altered pattern than other methods. Figure 3 demonstrates that our method is robust to the variations of network threshold and sparsity.
As a brain network based method, our method achieves higher accuracy than theses voxel based methods shown in Table 3. More importantly, although our method is performed on resting state-fMRI data, which is usually considered more difficult than classification on fMRI during special task, it still achieves higher accuracy than those methods performed on fMRI during sensorimotor task or auditory oddball (AOD) task. In addition, our method is much more efficient than those voxel based methods in computational cost. Because the number of the brain region connectivity used in our method is much smaller than the voxel number or voxel connectivity number used in voxel based methods.
As can be seen from Fig. 4, our method outperforms t-test, Lasso, Tikhonov regularization, Laplacian and sparsity score. It is noteworthy that although fully connected brain network (brain network threshold is set to 0) are directly used for feature selection and classification, our method still achieves better performance. It implies there exist some weak connectivities having discriminative ability for classification in brain network. As can be seen from Fig. 5a, the discriminative abilities of the connectivities are very sparse, which is a good property for determining biomarks in clinical trials.
Among the related brain regions of the top 10 SAC, cuneus, superior frontal gyrus, medial, middle temporal gyrus, superior temporal gyrus, median cingulate and paracingulate gyri, inferior frontal gyrus, triangular part and precuneus belong to default mode Network (DMN) [39,40], which is a group of areas in the human brain  17:32 characterized, collectively, by functions of a self-referential nature. Buuren et al. also indicates that the alteration of the brain network of schizophrenia and Alzheimer are closely related to DMN [41]. Furthermore, Precentral_L and Paracentral_Lobule_L related to the movement are also selected by our method. Recent studies show that these movement related regions are very important for identifying schizophrenia patients with psychedelic covet condition [42]. As we know, there are positive and negative values produced by the previous feature extraction method, such as Lasso. Because of introducing the non-negative constraint term, all the weight score of the brain connectivity by using the proposed method are positive values, which enable us to rank all the brain connectivities with a uniform criterion.
The results shown in Table 5 demonstrate that the difference between schizophrenia and healthy control in connectivity strength is generally small and not significant. The connectivity diversity of schizophrenia patients group is higher than that of healthy controls group. In other words, there is a variety of differences in brain connectivities between patients, which might be viewed as the functional network evidence for why schizophrenia is a highly heterogeneous disease. The clustering coefficient of the functional network of schizophrenia patients group is lower than that of healthy controls group. It demonstrates that a part of the connectivities may diminish or completely disappear in the functional network of patient. The overlap score and weighted overlap score are designed to capture the influence of network difference from each group [35]. Compared to the other three topological metrics, both of the overlap score and weighted overlap show more obvious difference between the schizophrenia patients and healthy controls. These two metrics are proper in distinguishing schizophrenias and healthy controls in group-level.

Conclusion
In this paper, we proposed a simple and effective feature selection method for identifying schizophrenia. On a real dataset, it achieved the sensitivity, specificity and accuracy of 100, 90.48 and 95.56%, respectively. Compared to conventional brain network significant feature extraction methods, the proposed method can simultaneously alleviate the small sample problem and rank the significance of the connectivity with a uniform criterion. The proposed method not only obtained good performance in classification accuracy, but also provided biomarkers to guide the schizophrenia diagnosis.