 Research
 Open Access
Machine learning approaches for predicting high cost high need patient expenditures in health care
 Chengliang Yang^{1}Email author,
 Chris Delcher^{2},
 Elizabeth Shenkman^{2} and
 Sanjay Ranka^{1}
https://doi.org/10.1186/s1293801805683
© The Author(s) 2018
 Published: 20 November 2018
Abstract
Background
This paper studies the temporal consistency of health care expenditures in a large state Medicaid program. Predictive machine learning models were used to forecast the expenditures, especially for the highcost, highneed (HCHN) patients.
Results
We systematically tests temporal correlation of patientlevel health care expenditures in both the short and long terms. The results suggest that medical expenditures are significantly correlated over multiple periods. Our work demonstrates a prevalent and strong temporal correlation and shows promise for predicting future health care expenditures using machine learning. Temporal correlation is stronger in HCHN patients and their expenditures can be better predicted. Including more past periods is beneficial for better predictive performance.
Conclusions
This study shows that there is significant temporal correlation in health care expenditures. Machine learning models can help to accurately forecast the expenditures. These results could advance the field toward precise preventive care to lower overall health care costs and deliver care more efficiently.
Keywords
 Highcost
 High need patients
 Machine learning
 Predictive modeling
Background
Health care is one of the largest components of the global economy. According to the World Bank, in 2014, health care expenditures accounted for 9.95% of the world’s total gross domestic product (GDP). Additionally, per capita health expenditures have increased during the last decade. In the United States, the Centers for Medicare & Medicaid Services (CMS) reported that in 2014, health care accounted for 17.5% of the national GDP [1]. This amount is expected to increase over the next several years. because of the expansion of insurance coverage under the Affordable Care Act. In addition, resources, in terms of expenditures, are disproportionately consumed by a relatively small proportion of the health care utilizing population [2]. As a result, this group of health care utilizers has been termed highcost, highneed (HCHN) patients due to its disproportionate spending concentration [3] and highly prevalent comorbid chronic condition profile [4, 5]. Stakeholders have argued the need for higher efficiency health care, especially for HCHN patients [4]. For example, managed care organizations and health plans (MCOs) have been deployed and funded under capitation payment systems to incentivize health care providers to deliver more costeffective services [6–8].
Information technology provides a new, promising way to approach a widerange of health care problems, especially in the “Big Data” era [9]. Health care utilization routinely generates vast amounts of data from sources ranging from electronic medical records, insurance claims, vital signs, and patientreported outcomes. Moreover, predicting health outcomes using data modeling approaches is an emerging field that can reveal important insights into factors associated with disproportionate spending patterns. Specifically, if we are able to forecast expenditures at the patientlevel with good accuracy, we could improve targeted care by anticipating health care needs of HCHN patients. Moreover, predictive modeling can improve our understanding of causal pathways leading to our understanding of expensive events which informs systemlevel strategies for prevention. Indeed, prevention one of the most effective ways to lower health care expenditures while delivering better quality of care [10–12].
HCHN patients have shown a high degree of persistence in medical expenditures reflected in administrative claims data [2, 13], although some recent studies have concluded that the high utilization may be temporary and not persistent [14]. The purpose of this study is to better understand these temporal patterns and to apply machine learningbased models to predict expenditures. Additionally, we would like to determine if practical timeframes (e.g. 1 month, 6 months) are feasible choices for predicting expenditures.
Approaches to predictive modeling include risk adjustment models that are linear in design and form the basis of many capitation payment systems [6–8]. This models suffer from a number of limitations including use of (1) variables with limited predictive accuracy, (2) specific patient populations or type of care, and (3) populationlevel models that offer limited information at the patientlevel [15–17]. The authors in [18] analyzed the temporal utilization pattern of high utilizers in a large public state insurance program [19]. Studies that attempt to predict HCHN status using patientlevel expenditures are lacking. This study aims to address this gap in the literature.

We used a large longitudinal administrative claims dataset from a public insurance program with millions of enrollees to examine the correlation of patientlevel health expenditures across time periods of varying length.

We applied machine learning models to predict future health expenditures, especially for the HCHN patients, defined here as being in the top 10% of expenditures. The methods scale to input variables of thousands dimensions and millions of patients. The findings indicated that health care expenditures can be effectively predicted (overall Rsquared \(>\,0.7\)). The prediction error for the HCHN patients is lower than the general population, suggesting better model performance for HCHN patients.

Contributions of input variables to explaining model variance were quantified for each single prediction so that model users can identify potentially modifiable risk factors for possible intervention.
Methods
Data and preprocessing
Data
Study population This study examines administrative insurance claims from the Medicaid program of the state of Texas which has approximately the third largest Medicaid population (annual enrollment of 4.7 million) in the United States [20]. During the study period (2011–2014), there were 1,734,896 adults (ages 18–65) enrolled in the Texas Medicaid program for at least one month. To be included in any analysis, enrolled status needed to be maintained for more than twothirds of the time for any periodofinterest (either observed or forecasted). Minimum enrollment criteria are applied to avoid including patients enrolled for very short periods of time with highlyvariable health care profiles relative to the general medicaid population. For this population, total medical expenditure was defined as the sum of professional, institutional, and dental claims. Pharmacy expenditures were not included in this study. During the study period, the Texas Medicaid program was structured as both feeforservice (FFS) and MCObased payment models while FFS was being phased out. For both models, we used the final paid amounts to represent expenditures.
Variables available included diagnosis codes (International Classification of Diseases, Ninth Revision, Clinical Modification, ICD9CM), procedure codes (ICD9CM procedure codes, Current Procedural Terminology [CPT] and Healthcare Common Procedure Coding System [HCPCS]), and medication codes (National Drug Codes [NDC]). During the study period, 3233 unique ICD9CM procedure codes, 21,374 unique ICD9CM diagnosis codes, 21,603 unique CPT and HCPCS codes, and 28,366 NDC codes were identified. This study was approved by the IRB of the University of Florida.
Chronic condition cohorts We examined the temporal correlation of health expenditures among entire study population as well as four chronic disease cohorts (diabetes, chronic obstructive pulmonary disease [COPD], asthma and hypertension). The difference in correlation strength among chronic disease cohorts as compared to general population was assessed. We identified these clinical cohorts using the Clinical Classifications Software (CCS) [21], using all diagnostic codes, as follows: diabetes (CCS category 49 and 50), COPD (CCS category 127), asthma (CCS category 128) and hypertension (CCS category 98 and 99).
Objectives

Per member per month dollar amount (PMPM, total medical expenditure divided by number of months enrolled in medicaid). This measure is commonly used for expenditure analyses in medicaid programs [22].

Per member per month dollar amount with log base 10 transformed, logPMPM).

Rank percentiles of the per member per month dollar amount (pctlPMPM). This is a continuous measure obtained by dividing the descending ordered rank of PMPM by the number of enrollees in the dataset. Values range from 0 to 1.
Predictors

Diagnostic codes (ICD9CM) grouped into CCS categories (283 categories) [21].

Procedures codes (CPT and HCPCS) grouped into CCS [21] categories (231 categories).

Medication information represented by National Drug Codes (NDC). These are grouped by pharmacy classes (893 classes) provided by the U.S. Food and Drug Administration (FDA)’s NDC Directory (Updated Oct. 20, 2015).

Demographic variables such as age, sex, race/ethnicity (White, Black, Hispanic, American Indian or Alaskan, Asian, Unknown/Other), and disabled status.
Experiment setup
To examine the temporal correlation of patients total medical expenditures between consecutive time periods (1 month, 3 months, 6 months, 12 months), we used the Pearson productmoment correlation coefficient [23]. We also tested the temporal correlation for four clinical cohorts of chronicallyill patients (diabetes, COPD, asthma and hypertension) prevalent in Medicaid populations of the United States [24]. Then we constructed predictive models to forecast expenditures based on previous expenditures, diagnoses, medical procedures and medications. Details are described as follows:
Correlation test
 Step 1::

Rank order all the patients in period 1 and period 2 based on PMPM expenditures.
 Step 2::

Compute the Pearson productmoment correlation coefficient between the rank percentiles in the two periods.
Predictive models
Four predictive models are applied to forecast the patients’ expenditures based on the previous time periods, including ordinary least squares linear regression (LR), regularized regression (LASSO), gradient boosting machine (GBM), and recurrent neural networks (RNN, a deep learning approach). Futoma et al. [25] compared these models in depth for predictive tasks in medicine. The following section describes the details for these models.
Ordinary least squares linear regression (LR) Regression is the most widely used method in predictive modeling. It serves as the base riskadjustment model [6, 7] for modeling riskbased payment systems in health care. Using the input variables as described above, we fit a LR model using least squares to predict future expenditures.
One advantageous property of GBM is that the information gain of the nodes in the decision trees can be aggregated as a measure of input variable importance, which is similar to the coefficients in LASSO. This enables interpretability of tree methods in applications. In practice, we use the implementation of GBM provided by [28]. We trained 1000 decision trees for each GBM. We perform a grid search and fivefold cross validation to optimize the choices of other hyperparameters such as learning rate and tree depth.
Recurrent neural networks (RNN) Recurrent neural networks are a set of deep learning models designed to process sequential data. These models have proven to be very effective in dealing with a variety of sequence tasks, such as speech recognition [29], machine translation [30], sunspot number prediction [31] and video understanding [32]. In health care, RNN models have been used for early detection of heart failure onset from electronic health records [33]. The health claims dataset used in our case to predict medical expenditures could be organized as sequential events (e.g. date of diagnosis, date of procedure, and date of medication use). Thus, we apply RNN to model these events as time series to take advantage of the chronological order, rather than including them in the models as unordered events.

Step 1: To reduce the dimensionality of input, \(\{\mathbf{x }_{i}^{1},\mathbf{x }_{i}^{2},...,\mathbf{x }_{i}^{T}\}\) and \(\mathbf x _{i}^{NT}\) are mapped to E dimensional embedding vectors of \(\{e^{1},e^{2},...,e^{T}\}\) and \(e^{NT}\) using embedding matrices \(W_{T}\in \mathbb {R}^{E \times K}\) and \(W_{NT}\in \mathbb {R}^{E \times L}\) respectively, i.e.$$\begin{aligned} e^{t}= & {} W_{T}{} \mathbf x _{i}^{t} \end{aligned}$$(3)$$\begin{aligned} e^{NT}= & {} W_{NT}{} \mathbf x _{i}^{Nt} \end{aligned}$$(4)

Step 2: A RNN with single gated recurrent unit (GRU) layer [35] is used to generate attention weights from the sequential embeddings \(\{e^{1},e^{2},...,e^{T}\}\). Attention is a mechanism in deep learning introduced in machine translation [36] and visual recognition [37] tasks that can dynamically decide which part of the sequence should be assigned additional weights. Our model contains two kinds of attention:In the GRU layer, recurrent hidden state \(g^{t}\) and \(h^{t}\) is used to generate \(\alpha ^{t}\) and \(\beta ^{t}\) respectively. The right panel of Fig. 1 describes the process used to generate \(\beta ^{t}\). The same process is applied to generate \(\alpha ^{t}\). The intermediate memory unit \(\hat{h}^{t}\) takes input from \(e^{t}\) and \(h^{t1}\) to update \(h^{t}\). The reset gate \(r^{t}\) determines which portion of \(h^{t1}\) is absorbed into \(\hat{h}^{t}\). The update gate \(z^{t}\) determines the weights of \(\hat{h}^{t}\) and \(h^{t1}\) when generating \(h^{t}\). Formally, the updating rules for \(r^{t}\), \(\hat{h}^{t}\), \(z^{t}\), \(h^{t}\) and \(\beta ^{t}\) are described as follows.

\(\alpha ^{t}\) is scalar that determines that weight of period t.

\(\beta ^{t}\) is an E dimensional vector that determines the importance of elements in each embedding \(e^{t}\).
$$\begin{aligned} r^{t}=\, & {} \sigma (W_{r}e^{t}+U_{r}h^{t1}+b_{r}) \end{aligned}$$(5)$$\begin{aligned} \hat{h}^{t}=\, & {} \tanh (W_{h}e^{t}+r^{t}\otimes U_{h}h^{t1}+b_{h}) \end{aligned}$$(6)$$\begin{aligned} z^{t}=\, & {} \sigma (W_{z}e^{t}+U_{z}h^{t1}+b_{z}) \end{aligned}$$(7)$$\begin{aligned} h^{t}=\, & {} (z^{t}\otimes h^{t1}) \oplus ((1z^{t})\otimes \hat{h}^{t}) \end{aligned}$$(8)where \(\sigma ()\) is the sigmoid function, \(\otimes\) and \(\oplus\) are elementwise multiplication and elementwise addition, respectively.$$\begin{aligned} \beta ^{t}= & {} \tanh (W_{\beta }h^{t}+b_{\beta }) \end{aligned}$$(9) 

Step 3: After obtaining the attention values \(\alpha ^{t}\) and \(\beta ^{t}\), context vector can be generated as follows:The term “context” came from the field of natural language processing, indicating that underlying representation contains information from the preceding and succeeding sequences. The context vectors are aggregated with the embedding vector of nontemporal variables \(e^{NT}\) and multiplied by the output coefficients to make predictions:$$\begin{aligned} c^{t}=\alpha ^{t}\beta ^{t}\otimes e^{t} \end{aligned}$$(10)$$\begin{aligned} c= & {} \sum _{t=1}^{T}c^{t}+e^{NT} \end{aligned}$$(11)$$\begin{aligned} \hat{y}_{i}= wc+b \end{aligned}$$(12)
Interpreting predictions
For predictive models in health care, in addition to high accuracy, interpretability is also crucial [34, 40, 41]. Many machine learning models are often regarded as blackbox focusing on pure prediction rather than understanding the degree of variances explained by each component of the model or medical causeeffect pathways. To improve interpretability, we applied additional strategies to quantify the contribution from each single input variable so that model users can interpret and diagnose the predictions. Below are the approaches that we took for each model.
Ordinary least squares linear regression (LR) and regularized regression (LASSO) For the two linear models, pulling the contributions of each input variable is straightforward. The product of linear coefficients and variable value are readily converted into the contribution of the corresponding variable.
Gradient boosting machine (GBM) As in [42, 43], for each single decision tree learned by the GBM, each test instance is assigned to a leaf following a decision path. The decision path consists of splitting nodes described by input variables. The weights of the leaves are assigned back to the splitting nodes on the decision path and weighed by the gain in each node. As a result, the predictors in the splitting nodes receive a portion of the weights. The contributions are the sums of these portions of weights by input variables across all trees.
From the above equation, the contribution of \(x_{tk}\) and \(x_{l}\) are \(\alpha ^{t}w(\beta ^{t}\otimes W_{T}[:,k])\) and \(wW_{NT}[:,l]\) respectively. It is worth noting that regular RNN is a noninterpretable blackbox model because of the recurrent hidden states. However, in our model, the recurrences here are used to generate attention weights rather than directly to make predictions. Thus, the model is partially interpretable in terms of input variables.
Model selection and validation
For models with hyperparameters (LASSO, GBM and RNN), we selected best fitting models using crossvalidation as described above. To validate these models on test dataset, we trained the models to predict period t and tested the models to predict period \(t+1\). During training, the information of period \(t+1\) was never accessed. We reported the Rsquared and root mean squared error (RMSE) of rank percentiles as the performance measure. Given the policy interests associated with HCHN patients [2, 3], throughout the study, we used the threshold of top 10% of PMPM expenditure to identify HCHN patients. In order to get robust results, we used at least three different t values to have multiple times training and testing. We reported the results using the averaged numbers from these multiple experiments. The only exception is for time periods of 12 months. In this case, we didn’t have sufficient data, giving us only one training set and one testing set.
Results
We first present the temporal correlation of expenditures for the adult population, HCHN, and patients with chronic conditions. We then present the predictive accuracy for each of the modeling methods. We also describe how these models can be used for potential causeeffect analysis for each patient.
Temporal correlation
We first present our results for the entire adult population, followed by correlation in expenditures of HCHN and cohorts with specific chronic diseases.
Average correlation for the entire adult population
Period length (months)  One period later  Two periods later  Three periods later 

1  0.564  0.526  0.521 
3  0.651  0.584  0.516 
6  0.653  0.566  0.515 
12  0.676  0.594  – 
Percentage of top 10% patients that stayed in top 10%
Period length (months)  One period later  Two periods later  Three periods later 

1  45.61  43.10  47.89 
3  53.76  50.16  47.57 
6  58.38  53.76  51.00 
12  61.13  55.25  – 
Average percentile ± standard deviation in the following periods of the top 10%
Period length (months)  One period later  Two periods later  Three periods later 

1  72.19 ± 33.36  68.98 ± 35.88  72.70 ± 33.76 
3  80.84 ± 24.42  78.12 ± 27.09  76.58 ± 28.09 
6  83.13 ± 21.11  80.60 ± 23.37  79.33 ± 24.12 
12  85.39 ± 18.01  82.91 ± 20.19  – 
Average correlation for the diabetes cohort
Period length (months)  One period later  Two periods later  Three periods later 

1  0.611  0.566  0.553 
3  0.649  0.592  0.559 
6  0.662  0.594  0.541 
12  0.675  0.581   
Percentage of patients from the diabetes cohorts that stayed in the top 10%
Period length (months)  One period later  Two periods later  Three periods later 

1  44.28  40.89  39.28 
3  45.00  41.31  39.56 
6  48.21  44.01  41.63 
12  52.66  46.46  – 
Average percentile ± standard deviation in the following periods of the top 10% in the diabetes cohort
Period length (months)  One period later  Two periods later  Three periods later 

1  76.33 ± 25.80  74.08 ± 27.43  73.30 ± 27.78 
3  78.39 ± 23.24  75.98 ± 24.92  74.93 ± 25.52 
6  80.43 ± 21.75  77.95 ± 23.60  76.50 ± 24.50 
12  83.07 ± 19.61  79.85 ± 22.27  – 
Just as we found in the general population, for each disease cohort, we observed a significant correlation in expenditures from one time period to the next. This correlation was stronger for the HCHN population than the general population, and was sustained for longer period lengths.
Temporal correlation comparison between cohorts
Period length and lags  General Medicaid population  Diabetes cohort  COPD cohort  Hypertension cohort  Asthma cohort 

1 month, One period later  0.564  0.611  0.591  0.588  0.587 
1 month, Two periods later  0.526  0.566  0.548  0.543  0.548 
1 month, Three periods later  0.521  0.553  0.526  0.530  0.533 
3 months, One period later  0.651  0.649  0.611  0.629  0.640 
3 months, Two periods later  0.584  0.592  0.554  0.572  0.600 
3 months, Three periods later  0.516  0.559  0.526  0.541  0.567 
6 months, One period later  0.653  0.662  0.626  0.644  0.680 
6 months, Two periods later  0.566  0.594  0.568  0.581  0.624 
6 months, Three periods later  0.515  0.541  0.522  0.530  0.585 
12 months, One period later  0.676  0.675  0.651  0.664  0.718 
12 months, Two periods later  0.594  0.581  0.563  0.570  0.631 
Prediction
In this section, results for predicting the expenditures at the patientlevel using prior expenditure information is presented. Only expenditure data for the immediate preceding periodofinterest followed by up to 4 subsequent periods and other claimsbased information available for the patient is used.
Baseline Baseline models below using one prior periodofinterest and only prior expenditures variables (prior PMPM, pctlPMPM or logPMPM depending on predicting objective) are presented. Table 8 shows the baseline results of quartertoquarter prediction. Baseline models fit reasonablywell as indicated by the Rsquared values. More than 40% of the variation is explained by the transformed models (pctlPMPM and logPMPM). RNN is the best model for all measures. The differences between RNN and other models in RMSEs are substantial, implying that expenditures are ranked much closer to the true rankings by RNN.
Baseline predictive model results
Predicting objective  Model  Train  Test  

Rsquared  RMSE  RMSE for Top 10%  Rsquared  RMSE  RMSE for Top 10%  
PMPM  LR  0.145  0.306  0.264  0.141  0.306  0.264 
LASSO  0.145  0.306  0.265  0.141  0.306  0.264  
GBM  0.199  0.317  0.270  0.172  0.314  0.272  
RNN  0.302  0.201  0.180  0.298  0.204  0.183  
logPMPM  LR  0.401  0.306  0.265  0.402  0.306  0.264 
LASSO  0.401  0.306  0.265  0.402  0.306  0.264  
GBM  0.399  0.316  0.267  0.394  0.314  0.269  
RNN  0.445  0.220  0.184  0.442  0.223  0.187  
pctlPMPM  LR  0.399  0.306  0.265  0.400  0.306  0.264 
LASSO  0.398  0.306  0.265  0.400  0.306  0.264  
GBM  0.384  0.314  0.275  0.382  0.312  0.277  
RNN  0.405  0.232  0.203  0.400  0.235  0.203 
The models appear to reach an improvement ceiling at approximately three prior periods. The RNN model benefits the most by the use of additional periods, which is consistent with literature showing that they are effective in modeling temporal relationships.
The Rsquared of the linear models for the testing data decreases when the predicting objective is PMPM. This is likely due to the fact that PMPM is not linearly distributed in the parameter space unlike its transformed versions (logPMPM and pctlPMPM). Using a large number of parameters and prior periods (effectively inducing a multiplicative effect on the number of parameters) in a linear model increases the likelihood of overfit. We found that the Rsquared for the training dataset in the same setting increased with the number of prior periods, which supports our claim of overfitting and preference for models that can adjust this risk, such as LASSO and GBM. The results above were similar with using 6month periods.
Interpreting the models
We repeated the process described above for 50 patients and the result for one patient are shown in Figs. 9, 10, and 11. For all these test cases, including the one shown here, the results demonstrates that all three models make robust predictions. LASSO and GBM have comparatively lower standard deviations. LASSO is the most stable model that consistently generates similar contributions. GBM has a larger standard deviation in contributions but can still derive influential ones from all variables. However, the contributions of each variable generated by RNN are very unstable. Clearly this method is not effective for deriving the importance of input variables. Considering that deciding the parameters of RNN requires a nonconvex optimization procedure using the stochastic gradient descent may end up with any local minima, it is not surprising that a stochastic algorithm would give different solutions (generally corresponding to a different local minimum) each time, leading to much larger variations in contribution estimates of input variables.
In conclusion, LASSO and GBM are more effective in generating interpretable contributions and find important input variables than the RNN model.
Choosing the best model
The choice of the best model depends on whether the goal is to best predict expenditure or better understand the contributions of underlying factors. From Figs. 7 and 8, we can conclude that RNN is the best model for prediction. GBM is slightly better in Rsquared and RMSE for top 10% than LASSO, but GBM performs similarly with LSAAO for RMSE. However, for clearer interpretation of a particular prediction, LASSO and GBM are more suitable.
In terms of comparing different prediction objectives, RNN seems to perform best using pctlPMPM. The reason for this could be that pctlPMPM is strictly contained in [0,1], which is less likely to cause significant gradient vanishing or exploding issues that are common in backpropagation when optimizing neural networks. For LR, GBM and LASSO, the choice of predicting objectives is a taskspecific decision. If minimal RMSE for top 10% is the goal, one should use PMPM as the predicting objective. If optimizing Rsquared is more important, one should consider using logPMPM or pctlPMPM. All three objectives are similar in overall RMSE.
Discussion
In 2014, the top 1 percent of expenditures in health care accounted for approximately onefifth of total health care expenditures [2, 44]. As a result, this group has been termed HCHN patients. Disproportionate spending concentration in this group is also prevalent in other countries [5]. Prior literature [14] has suggested that these expenditures may be episodic and not temporally consistent. If this is indeed the case, the benefit of modeling patterns of high expenditures may be severely limited by a high degree of randomness in their health care utilization.
However, our study clearly shows that health care expenditures are significantly autocorrelated within the Texas Medicaid program. With around 5 million enrollees. Texas has the third largest Medicaid population in the United States. This result may motivate preventive interventions. Autocorrelation suggests an underlying process structure that may be driven by modifiable factors. Thus, highly predictive machinelearning models can enable providers to direct these interventions to the right HCHN population.
This study has several limitations. First, we conducted the study within one states Medicaid program. The results may vary by state and/or payer type. Second, we applied only generalpurpose machinelearning models. Some tailored models may have better performance. Third, the predictive models provide little guidance on the preventive factors needed to inform interventions. Finally, health status determined from claims data only is limited. It may be necessary to include additional data sources, such as narrative components of electronic health records (EHR), disease severity measures, and/or social determinants of health.
Future work will address some of these limitations. We plan to expand the analysis to different types of health care programs. We will also collect additional data, mentioned above, to evaluate predictive performance. Moreover, we will collaborate with clinicians and policy experts to make the models clinicallyrelevant by integrating domain expertise to better direct preventive interventions.
Conclusions
In this work, we tested the temporal correlation of health care expenditures for multiple time periods. Our results show that health care expenditures are temporally consistent. Further, this correlation is significantly higher for the HCHN patients as compared to the general population. For patients with chronic conditions, the temporal consistency of expenditures was high, but not appreciably higher than the general population. This finding was somewhat surprising, as one would have expected chronic conditions to lead to more consistent expenditures.
Overall, machine learning models are very predictive to forecast health care expenditures. We iteratively developed several predictive models to forecast expenditures. First, we started from a baseline case using only expenditures and stepwisely added input variables to the models. We showed that additional information such as clinical information and demographics are useful to improve prediction performance. In addition, we showed that it is beneficial to have historical data from more prior time periods. The improvements due to additional prior periods saturates after three to four periods. The prediction accuracy of RNN outperforms LR, LASSO and GBM. In terms of prediction interpretability, LASSO and GBM consistently select similar variables and generate stable contributions independent of the resampling process.
Declarations
Declarations
Authors’ contributions Performed the literature review: CY; Designed experiments: CY, CD, and SR; Carried out experiments: CY; Results analysis: CY, CD, and SR. All authors contributed to write the final manuscript. All authors read and approved the final manuscript.
Acknowledgements
This work was supported in part by Texas HHSC and in part through PatientCentered Outcomes Research Institute (PCORI) (PCOCOORDCTR2013) for development of the National PatientCentered Clinical Research Network, known as PCORnet. The views, statements and opinions presented in this work are solely the responsibility of the author(s) and do not necessarily represent the views of the Texas HHSC and PatientCentered Outcomes Research Institute (PCORI), its Board of Governors or Methodology Committee or other participants in PCORnet. Part of the materials are adapted by permission from Springer Nature: Springer Customer Service Centre GmbH. Yang et al. [43] Copyright 2017.
Competing interests
The authors declare that they have no competing interests.
Availability of data and materials
Not applicable.
Consent for publication
Not applicable.
Ethics approval and consent to participate
The University of Florida Institutional Review Board approved this study and granted a full waiver of informed consent (IRB201401068).
Funding
Publication costs were funded in part by Texas HHSC and in part through PatientCentered Outcomes Research Institute (PCORI) (PCOCOORDCTR2013).
About this supplement
This article has been published as part of BioMedical Engineering OnLine Volume 17 Supplement 1, 2018: Selected articles from the 5th International WorkConference on Bioinformatics and Biomedical Engineering: biomedical engineering. The full contents of the supplement are available online at https://biomedicalengineeringonline.biomedcentral.com/articles/supplements/volume17supplement1.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Authors’ Affiliations
References
 Centers for Medicare & Medicaid Services, et al. National health expenditures 2014 highlights. 2014.Google Scholar
 Stanton MW, Rutherford M. The high concentration of us health care expenditures. Rockville: Agency for Healthcare Research and Quality; 2006.Google Scholar
 Berk ML, Monheit AC. The concentration of health expenditures: an update. Health Affairs. 1992;11(4):145–9.View ArticleGoogle Scholar
 Blumenthal D, Chernof B, Fulmer T, Lumpkin J, Selberg J. Caring for highneed, highcost patients—an urgent priority. N Engl J Med. 2016;375(10):909–11.View ArticleGoogle Scholar
 Gerdtham UG, Jönsson B. International comparisons of health expenditure: theory, data and econometric analysis. Handbook of health economics. 2000;1:11–53.View ArticleGoogle Scholar
 Pope GC, Kautter J, Ellis RP, Ash AS, Ayanian JZ, Ingber MJ, Levy JM, Robst J, et al. Risk adjustment of medicare capitation payments using the cmshcc model. 2004.Google Scholar
 Gilmer T, Kronick R, Fishman P, Ganiats TG. The medicaid rx model: pharmacybased risk adjustment for public programs. Med Care. 2001;39(11):1188–202.View ArticleGoogle Scholar
 Kronick R, Gilmer T, Dreyfus T, Ganiats, T. Cdpsmedicare: The chronic illness and disability payment system modified to predict expenditures for medicare beneficiaries. Final Report to CMS. 2002.Google Scholar
 Margolis R, Derr L, Dunn M, Huerta M, Larkin J, Sheehan J, Guyer M, Green ED. The national institutes of health’s big data to knowledge (bd2k) initiative: capitalizing on biomedical big data. J Am Med Inf Assoc. 2014;21(6):957–8.View ArticleGoogle Scholar
 Rockville M. Reducing and preventing adverse drug events to decrease hospital costs. Research in Action. Rockville: AHRQ Publication; 2001. p. 1.Google Scholar
 Minnesota Department of Health. An introductory analysis of potentially preventable health care events in Minnesota. Minnesota: Minnesota Department of Health; 2015. p. 1.Google Scholar
 Yang C, Delcher C, Shenkman E, Ranka S. Identifying high health care utilizers using postregression residual analysis of health expenditures from a state medicaid program. In: AMIA 2017, American Medical Informatics Association Annual Symposium, Washington, DC, November 48, 2017. 2017.Google Scholar
 Delcher C, Yang C, Ranka S, Tyndall JA, Vogel B, Shenkman E. Variation in outpatient emergency department utilization in texas medicaid: a statelevel framework for finding superutilizers. Int J Emerg Med. 2017;10(1):31.View ArticleGoogle Scholar
 Johnson TL, Rinehart DJ, Durfee J, Brewer D, Batal H, Blum J, Oronce CI, Melinkovich P, Gabow P. For many patients who use large amounts of health care services, the need is intense yet temporary. Health Affairs. 2015;34(8):1312–9.View ArticleGoogle Scholar
 Pacala JT, Boult C, Urdangarin C, McCaffrey D. Using selfreported data to predict expenditures for the health care of older people. J Am Geriatr Soc. 2003;51(5):609–14.View ArticleGoogle Scholar
 Van der Heyden J, Van Oyen H, Berger N, De Bacquer D, Van Herck K. Activity limitations predict health care expenditures in the general population in Belgium. BMC public health. 2015;15(1):1.View ArticleGoogle Scholar
 Huber CA, Schneeweiss S, Signorell A, Reich O. Improved prediction of medical expenditures and health care utilization using an updated chronic disease score and claims data. J Clin Epidemiol. 2013;66(10):1118–27.View ArticleGoogle Scholar
 Sen B, Blackburn J, Aswani MS, Morrisey MA, Becker DJ, Kilgore ML, Caldwell C, Sellers C, Menachemi N. Health expenditure concentration and characteristics of highcost enrollees in chip. INQUIRY: J Health Care Org Provis Financ. 2016;53:0046958016645000.Google Scholar
 Lee R, Miller T. An approach to forecasting health expenditures, with application to the us medicare system. Health Serv Res. 2002;37(5):1365–86.View ArticleGoogle Scholar
 The Kaiser Family Foundation. Total monthly medicaid and CHIP enrollment. 2016. http://kff.org/healthreform/stateindicator/totalmonthlymedicaidandchipenrollment/. Accessed 22 Aug 2016.
 AHRQ. Agency for healthcare research and quality, clinical classifications software (ccs). Rockville: AHRQ; 2015.Google Scholar
 Harman JS, Lemak CH, AlAmin M, Hall AG, Duncan RP. Changes in per member per month expenditures after implementation of florida’s medicaid reform demonstration. Health Serv Res. 2011;46(3):787–804.View ArticleGoogle Scholar
 Onwuegbuzie AJ, Daniel L, Leech NL. Pearson productmoment correlation coefficient. Encycl Meas Stat. 2007;2:751–6.Google Scholar
 Garis RI, Farmer KC. Examining costs of chronic conditions in a medicaid population. Manag Care. 2002;11(8):43–50.Google Scholar
 Futoma J, Morris J, Lucas J. A comparison of models for predicting early hospital readmissions. J Biomed Inform. 2015;56:229–38.View ArticleGoogle Scholar
 Tibshirani R. Regression shrinkage and selection via the lasso. J R Stat Soc. 1996;58:267–88.MathSciNetMATHGoogle Scholar
 Friedman JH. Greedy function approximation: a gradient boosting machine. Ann Stat. 2001;29:1189–232.MathSciNetView ArticleGoogle Scholar
 Chen T, Guestrin C. Xgboost: a scalable tree boosting system. arXiv preprint arXiv:1603.02754. 2016.
 Graves A, Mohamed AR, Hinton G. Speech recognition with deep recurrent neural networks. In: 2013 IEEE international conference on acoustics, speech and signal processing. 2013. p. 6645–9.Google Scholar
 Sutskever I, Vinyals O, Le QV. Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems NIPS. 2014. p. 3104–12.Google Scholar
 Li M, Mehrotra K, Mohan C, Ranka S. Sunspot numbers forecasting using neural networks. In: 5th IEEE international symposium on intelligent control. 1990. p. 524–9.Google Scholar
 Venugopalan S, Xu H, Donahue J, Rohrbach M, Mooney R, Saenko K. Translating videos to natural language using deep recurrent neural networks. arXiv preprint arXiv:1412.4729. 2014.
 Choi E, Schuetz A, Stewart WE, Sun J. Using recurrent neural network models for early detection of heart failure onset. J Am Med Inf Assoc. 2016;24:112.View ArticleGoogle Scholar
 Choi E, Bahadori MT, Schuetz A, Stewart WF, Sun J. Retain: interpretable predictive model in healthcare using reverse time attention mechanism. In: Advances in neural information processing systems NIPS. 2016.Google Scholar
 Cho K, Van Merriënboer B, Bahdanau D, Bengio Y. On the properties of neural machine translation: encoder–decoder approaches. arXiv preprint arXiv:1409.1259. 2014.
 Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473. 2014.
 Ba J, Mnih V, Kavukcuoglu K. Multiple object recognition with visual attention. arXiv preprint arXiv:1412.7755. 2014.
 Srivastava N, Hinton GE, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res. 2014;15(1):1929–58.MathSciNetMATHGoogle Scholar
 Zeiler MD. Adadelta: an adaptive learning rate method. arXiv preprint arXiv:1212.5701. 2012.
 Yang C, Rangarajan A, Ranka S. Global model interpretation via recursive partitioning. arXiv preprint arXiv:1802.04253. 2018.
 Yang C, Rangarajan A, Ranka S. Visual explanations from deep 3d convolutional neural networks for Alzheimer’s disease classification. arXiv preprint arXiv:1803.02544. 2018.
 Yang C, Delcher C, Shenkman E, Ranka S. Predicting 30day allcause readmissions from hospital inpatient discharge data. In: 2016 IEEE 18th international conference onehealth networking, applications and services (Healthcom). 2016. p. 1–6Google Scholar
 Yang C, Delcher C, Shenkman E, Ranka S. Machine learning approaches for predicting high utilizers in health care. In: International conference on bioinformatics and biomedical engineering. Berlin: Springer; 2017. p. 382–95.View ArticleGoogle Scholar
 Mitchell E. Concentration of health care expenditures in the us civilian noninstitutionalized population. Statistical Brief. 2014;497:1.Google Scholar