Data compilation
A subset of the Multi-parameter Intelligent Monitoring for Intensive Care (MIMIC) II database [2] was analyzed in this study. For details about the publicly available MIMIC II database, please refer to its documentation [11]. The time series data in the MIMIC II database are organized into records, each of which corresponds to a particular ICU stay. The records that met the following inclusion criteria comprised the subset:
-
The record was from an adult patient.
-
Minute-by-minute time series of heart rate (HR) and systolic (SBP), diastolic (DBP), and mean arterial blood pressure (MAP) computed by bedside monitors were present.
-
Age and medication information, which is part of the clinical data in the MIMIC II database, was present.
-
Hemodynamic time series and clinical data had been matched (i.e., it was certain that the time series and clinical data had come from the same ICU stay of the same patient).
From each selected record, one or more examples were compiled, depending on the compilation mode (see below for more details). Each example consisted of three time intervals: (1) an observation window of either 30 or 60 minutes; (2) a target window of 30 minutes; and (3) a gap of either 1 or 2 hours separating the observation and target windows. The information in the observation window served as the basis of the inputs to the pattern classifiers and a prediction was generated at the end of the observation window. This setup is graphically illustrated in Figure 1. Different observation window sizes were analyzed to see if observing the time series for a longer period time would improve prediction, whereas different gap sizes investigated whether predicting further into the future corresponds to a more challenging problem.
Each target window was labeled either "control" or "hypotensive". Although there are a number of possible definitions of HE, we arbitrarily defined an HE as a 30 minute period in which MAP was less than 60 mmHg and greater than 10 mmHg for at least 90% of the 30 minute period. Any 30 minute window that did not meet the HE definition and contained MAP values between 10 and 200 mmHg for more than 90% of the window was regarded as a control (normotensive) example. The lower and upper bounds of 10 and 200 mmHg served as a crude filter that eliminated physiologically unlikely outliers. Examples with target windows that satisfied neither definition were excluded from the study. For each target window, the corresponding observation window was also checked for physiologic validity; all of the 4 time series (HR and 3 arterial blood pressure (ABP)) in the observation window must have contained values between 10 and 200 (in mmHg and bpm for ABP and HR, respectively) for more than 95% of the window. All examples with observation windows that failed to pass this validity check were excluded from analysis.
In addition, there were two example compilation modes: single and multiple. In single compilation, only one example was obtained from each record. Because there were far fewer hypotensive than control examples, each record was checked for a hypotensive example first. Control examples were compiled from the records that contained no hypotensive episode. Since each record almost always contained multiple control or hypotensive examples, one example was randomly selected, with equal probabilities, to be included in the data. In multiple compilation, on the other hand, a 30 minute sliding window with no overlap traversed each record, and as many examples as possible were compiled, provided that the validity checks for the observation and target windows were passed.
Feature extraction
For feature extraction, two additional time series were derived as follows:
-
Pulse pressure (PP) was derived using PP = SBP - DBP.
-
Relative cardiac output (CO) was estimated using CO = HR × PP according to the Windkessel model [12]. This estimation differs from the actual cardiac output by a scalar multiple, equal to arterial compliance, but this relative CO was sufficient for the purpose of pattern recognition. Sun et al. [13] evaluated this Windkessel estimator against thermodilution CO measurements and reported that its estimation error was lower than several more advanced CO estimation methods, despite its crude nature.
Based on all 6 time series (HR, SBP, DBP, MAP, PP, CO) and clinical data, a total of 102 features were extracted from the observation window of each example. The strategy was to rely on pattern recognition tools for finding discriminatory information in the time series without any a priori knowledge. Thus, it was crucial that a wide selection of features were included to capture various signal aspects and hence to maximize the likelihood of finding discriminatory information.
The features were loosely classified into 4 groups: statistical, wavelet, cross-correlation, and clinical features. These feature groups are described in the ensuing sections.
Statistical features
Mean, median, standard deviation, variance, interquartile range, skewness, kurtosis, and linear regression slope were computed for each of the 6 time series. Mean and median quantitatively represent the magnitude of each physiologic variable, whereas standard deviation, variance, and interquartile range describe its variability. Variability may be indicative of the effort of the cardiovascular system to control blood pressure and of the related hemodynamic instability [14]. Skewness and kurtosis are the third and fourth moments of amplitude distribution, respectively, and reflect the shape of the distribution. Lastly, linear regression slope was calculated using the least-squares criterion and measures the average rate of change in the observation window.
Cross-correlation features
The cross-correlation at zero lag, which is a measure of coupling between two time series, was computed for all possible time series pairs. If two distinct time series are denoted as X = {x
1,
x
2, ..., x
n
} and Y = {y
1, y
2, ..., y
n
}, the cross-correlation between them was computed as follows:
(1)
Wavelet features
To capture relative energies in different spectral bands, a 5-level discrete wavelet decomposition of each of the 6 time series was conducted with the Meyer wavelet. Such a wavelet decomposition breaks down the variability of a given time series into different spectral bands. If the decomposition for a given time series X is denoted as W
X
= [a
5
d
5
d
4 ... d
1], where a
5 is the approximation signal and d
k
is the kth -level detail signal, the energy in a
5 was computed as follows:
(2)
where ||•|| is the Euclidean norm. Similarly, the energy in the kth -level detail signal was:
(3)
for k = 1, 2, ..., 5. Finally, the relative energy contribution from each decomposition level was calculated as follows:
for k = 1, 2, ..., 5 and where
(6)
These wavelet energy features have been frequently applied in a variety of signal processing and pattern recognition studies (e.g., [15–17]).
Clinical features
Patient age was utilized as a feature to examine if certain age groups are more likely to develop HEs. Since the same age was recorded for all examples from the same patient, it only provided an a priori bias, rather than temporally localized information.
Also, the amounts of hemodynamically active medications, in mcg/kg, delivered to the patient during the observation window were computed. Such medications influence blood pressure, and logically, their administration should impact short-term blood pressure level. Two types of medications were considered as separate features: 1) vasoconstrictors and positive inotropic drugs which tend to raise blood pressure, and 2) vasodilators, diuretics, and sedatives which tend to decrease blood pressure. The first group included the following medications: neosynephrine, norepinephrine, vasopressin, dopamine, dobutamine, epinephrine, milrinone, and isuprel. The following medications comprised the second group: nitroglycerine, nitroprusside, diltiazem, esmolol, labetalol, and lasix.
Dimensionality reduction
Each extracted feature was first normalized to be zero-mean and unit-variance. Subsequently, the raw feature dimensionality of 102 was reduced via principal component analysis (PCA). PCA is a popular, widely-known dimensionality reduction algorithm that has been employed in previous pattern recognition applications in biomedical engineering, including prosthetic control based on the myoelectric signal [18] and classification of gene expression microarray data [19]. In this study, PCA was executed on training data and retained the principal components with the largest eigenvalues that captured approximately 90% of the total variance. Both training and test data were projected onto the same feature space defined by the selected principal components. Across different training data sets in cross-validation (the details of the cross-validation are to be discussed in the next section), the reduced dimensionality ranged from 15 to 19. This substantial reduction in dimensionality implies that PCA removed redundancy among the features. The transformed features in reduced dimensionality served as the inputs to classification and regression models, which are described next.
Classification
According to the label assigned to each example (control or hypotensive), artificial neural networks were trained to perform a binary classification. Independent neural networks were trained for different combinations of gap and observation window sizes, as well as for different cross-validation folds and compilation modes. In both compilation modes, a 5-fold cross-validation was conducted to evaluate classification performance. However, in order to balance the two groups in training data so that the classifier is prevented from favoring the majority group, a subset of the majority group (which was always the control group) was randomly sampled without replacement. This randomized sub-sampling was repeated 10 times. On the other hand, test data were left unbalanced. Moreover, the partition between training and test data was conducted with respect to records rather than individual examples. In other words, examples from the same record belonged exclusively to either training or test data. The assumption here was that examples from the same record, which could be immediately next to one another, are likely to contain similar patterns. Then, once trained on several examples, other examples from the same record may be easier to classify, which would lead to over-estimated classification performance.
Feed-forward, 3-layer neural networks with one hidden layer of 20 hidden units were trained. The neural networks utilized the log-sigmoid activation function in both the hidden and output layers. Because the log-sigmoid activation function is bounded between 0 and 1, it is an appropriate choice for outputs in probability, which is the case in this binary classification. Also, it has been proven that this 3-layer architecture with sigmoid activation functions can approximately realize any continuous input-output mapping [20]. The threshold on the posterior probability was determined from the classifier's receiver operating characteristic (ROC) curve based on training data. The selection criterion for the optimal threshold was the following:
(7)
where Th
s
is the selected threshold and Th is the threshold variable ranging from 0 to 1. A random 20% partition of the training data was utilized for validation. During training, early stopping based on the validation set was employed for regularization.
For performance evaluation, AUC, accuracy, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) were calculated.
Regression
To investigate the feasibility of directly estimating the MAP in the target window, a regression approach was also pursued. The same neural network architecture as that in classification was employed, except that the activation functions were the hyperbolic tangent sigmoid and linear function in the hidden and output layers, respectively. This combination of activation functions ensures a dynamic swing on the output that is necessary for regression. Since the binary labels are irrelevant to regression, the distinction between the control and hypotensive groups was eliminated. Instead, the blood pressure in each target window was represented by the median MAP in the window. A 5-fold cross-validation was employed again; however, due to the large number of examples in multiple compilation (over 141,000), a subset of 2,500 examples was randomly constructed without replacement as training data. This randomized sub-sampling was repeated 10 times. As in classification, the inclusion of examples from the same record in both training and test data was prohibited.
Regression performance was evaluated by computing a mean absolute error, MAE, which was computed as follows:
(8)
where N is the total number of test cases, is the i th predicted MAP, and y
i
is the corresponding true MAP. Furthermore, a linear regression line was fit to the scatter plot of true versus predicted MAP with the least-squares criterion. The least squares were iteratively reweighted with a bisquare weighting function. The slope and intercept of the regression line were then reported as performance metrics. In perfect regression, the slope and intercept should be 1 and 0, respectively. Lastly, the Pearson correlation coefficient between the predicted and true blood pressure was computed.