Measurement and application of patient similarity in personalized predictive modeling based on electronic medical records

Table 1 The basic characteristics of samples in the test set and training set

Characteristic	Test set (n = 6490)	Training set (n = 10,000)	P value^#
Male gender, n (%)	4387 (67.6%)	6838 (68.4%)	0.282
Age (years), mean ± SD	60.1 ± 14.7	60.1 ± 15.0	0.967
Myocardial infarction, n (%)	443 (6.8%)	656 (6.6%)	0.615
Congestive heart failure, n (%)	507 (7.8%)	795 (8.0%)	0.642
Chronic obstructive pulmonary disease, n (%)	288 (4.4%)	467 (4.7%)	0.368
Mild liver disease, n (%)	799 (12.3%)	1301 (13.0%)	0.188
Hypertension, n (%)	3501 (53.9%)	5389 (53.9%)	0.950
Coronary heart disease, n (%)	2206 (34.0%)	3331 (33.3%)	0.366
Serum glucose (mmol/L), mean ± SD	6.6 ± 2.9	6.7 ± 2.9	0.793
Abnormal urine glucose, n (%)	1222 (18.8%)	1884 (18.8%)	0.987

ISSN: 1475-925X