Machine learning and medicine: book review and commentary

This article is a review of the book “Master machine learning algorithms, discover how they work and implement them from scratch” (ISBN: not available, 37 USD, 163 pages) edited by Jason Brownlee published by the Author, edition, v1.10 http://MachineLearningMastery.com. An accompanying commentary discusses some of the issues that are involved with use of machine learning and data mining techniques to develop predictive models for diagnosis or prognosis of disease, and to call attention to additional requirements for developing diagnostic and prognostic algorithms that are generally useful in medicine. Appendix provides examples that illustrate potential problems with machine learning that are not addressed in the reviewed book. Electronic supplementary material The online version of this article (10.1186/s12938-018-0449-9) contains supplementary material, which is available to authorized users.

The obtained results of classification were realized for two independent groups of data. The second group of input data for classification was constituted by the data from Heberman's Survival Data Set from UCI Machine Learning Repository databases [2]. The dataset contains cases from a study that was conducted between 1958 and 1970 at the University of Chicago's Billings Hospital on the survival of patients who had undergone surgery for breast cancer [3], [4]. The data contained four features for 306 cases. The respective symbols of attributes and their description: As a result there were two classes as ground truth:  the patient survived 5 years or longer (first class) or  the patient died within 5 year (second class).
In all cases in question the order of cases in the learning and test vectors was random. The developed test software was implemented in a computer with an Intel ® Core i7 processor -3770 CPU 3.4 GHz, 10 GB of RAM with the operational environment Matlab Version 7.11.0.584 (R2010b) Java VM Version: Java 1.6.0_17-b04 with Sun Microsystems Inc. Java HotSpot(TM) 64-Bit Server VM mixed mode. Additionally, Statistics Toolbox Version 7.4 (R2010b) was used.

First example
The data were generated randomly. The length of the learning vector u was changed in the range from 4 to 50. The number of features k was changed in the range from 4 to 50. The   obtained for the ratio u/k equalling 5 or more. It indicates that the length of the vector must be at least 5 times higher than the number of features, which is according to the commentary presented in [1].

Second example -the choice of the 'appropriate' learning and test vector
In this examples as well as in the subsequent ones (unless otherwise indicated), the real data   As it can be seen form the graph (Fig. 7.) appropriate drawing may be found in order to obtain the best results. Therefore, fully correctly and in conformity with all rules, the obtained results may be influenced -according to Tab. 1.
As it is shown in Tab.1, the values of accuracy change by about 33% depending on the chosen classifier and type of drawing (See Fig. 7).

Third example -Leaking test data into the training data
Data leakage still will be simulated by multiplication of data between the learning and test vector. The multiplication of data (Fig. 8) will concern the percentage contribution of training data q from the value 0% to 100%. The length of both the learning and test vector does not change (learning vector -204 cases and test vector -102 cases). The results for changes in the q value by every 0.1% are shown in Fig.9.

Fourth example -Leaking the correct prediction or ground truth into the test data
The leakage of the ground truth data to prediction results allows for any kind of influence on the obtained results. In this case, maintaining the length of the test and learning vector in the proportions 1/3 to 2/3 simultaneously changed the percentage ground truth data leakage to prediction results -determined as a v coefficient (Fig. 10). The range of the coefficient v value was being changed in the range from 0 to 100% every 1%. The obtained results are shown in Fig. 11.  As it was expected, the bigger ground truth data leakage to the prediction results, the seemingly bigger efficacy of the classifier (accuracy value).

Fifth example -Inclusion of data not present in the model's operational environment
The test of classifier in the correct implementation should be conducted for the test data whose range of variability of particular features is the same or similar to the learning data. In this example, the data vector was divided into the learning data and test date in different proportions. These proportions were dependent on the mean value of particular features. The learning vector consisted of cases for which the values of the first feature (w(1)) were higher than its mean value. The test vector instead comprised the remaining value for which the values of the first feature were lower than its mean value. By analogy, the other two feature w(2) and w(3) were tested - Fig. 12 comprised the values from 20 to 300 cases and was changed every 2. The upper limit resulted from the maximum length of the data vector. The bottom limit instead resulted from the necessity to avoid the situation described in the first example. These vectors did not possess any common data. A total of 19 600 classifications were conducted for each type of the classifier. The obtained results for the three tested classifiers are shown in Fig. 13 -Fig. 18.  The above examples show limitations of machine learning. It is also quite possible to manipulate data in order to obtain better results. This quite important issue was not addressed in the reviewed book -even in a vague way (so as not to increase its volume excessively).