Differential sequential patterns supporting insulin therapy of newonset type 1 diabetes
 Rafał Deja^{1}Email author,
 Wojciech Froelich^{2} and
 GraŻyna Deja^{3}
https://doi.org/10.1186/s129380150004x
© Deja et al.; licensee BioMed Central. 2015
Received: 2 November 2014
Accepted: 26 January 2015
Published: 21 February 2015
Abstract
Background
In spite of numerous research efforts on supporting the therapy of diabetes mellitus, the subject still involves challenges and creates active interest among researchers. In this paper, a decision support tool is presented for setting insulin therapy in newonset type 1 diabetes.
Methods
The concept of differential sequential patterns (DSPs) is introduced with the aim of representing deviations in the patient’s blood glucose level (BGL) and the amount of insulin injections administered. The decision support tool is created using data mining algorithms for discovering sequential patterns.
Results
By using the DSPs, it is possible to support the physician’s decisionmaking concerning changing the treatment (i.e., whether to increase or decrease the insulin dosage). The other contributions of the paper are an algorithm for generating DSPs and a new method for evaluating nocturnal glycaemia. The proposed qualitative evaluation of nocturnal glycaemia improves the generalization capabilities of the DSPs.
Conclusions
The usefulness of the proposed approach was evident in the results of experiments in which juvenile diabetic patients actual data were used. It was confirmed that the proposed DSPs can be used to guide the therapy of numerous juvenile patients with type 1 diabetes.
Keywords
Data mining Medical patterns Diabetes mellitusIntroduction
In recent years, the number of cases of diabetes mellitus has increased significantly [1]. Therefore, controlling the concentration of glucose in the patient’s blood is still an important problem that is being investigated by many researchers. The problem is a challenge since the strict medical procedure for controlling BGL cannot be defined [2]. The main reason is the high variability between people and the numerous different initial clinical states of patients after they are admitted to the hospital. This especially concerns the onset of diabetes, when the patient’s state is unstable. For that reason, the existing medical approach to determining the appropriate therapy is based almost exclusively on the physician’s experience [3]. Researches has been focused mainly on the so called issue of ’closing the loop’ [4]. The idea is to create a system that can simulate the pancreas by automatic adjustment of insulin injections or, in some cases, by delivering glucagon to achieve the appropriate glucose level. The problem often requires building a decisionsupport system based on the prediction of the level of blood glucose.
In this paper, we propose a decision support tool for setting basal insulin therapy at the onset of type 1 diabetes. The proposed method is based on the concept of sequential patterns [5]. After determining the sequential patterns that underlie a given sequence of terms, it is possible to predict the plausible continuation of the sequence. From the medical perspective, sequential patterns can be treated as medical guidelines [3,6,7]. The advantage of using medical guidelines based on sequential patterns is that they are easy for physicians to interpret [8].
The concept of sequential patterns has been used extensively in different application domains [9], especially in medicine [1012]. Sequential patterns have been used to identify diagnoses that diverse hospitals have in common [13] or to determine the subsets of examinations that were done together and that were followed by patients [14]. Several algorithms to discover sequential patterns have been proposed [5,15]. The concept of mining diabetes data using sequential patterns first was proposed in [16]. The patterns that were discovered were used to construct a decision tree that expressed the possible flow of medical events. Recently, [17] proposed the use of sequential patterns for planning diabetic therapy. The templatebased patterns discovered from historical data were used to recommend the insulin dosage on the basis of the known value of the level of blood glucose [17].
One of the limitations of the approach proposed in [17] is the low generalization capability of the patterns that were discovered over the population of all patients. For that reason, the patterns were mined and used separately for each individual patient. In this paper, we describe our effort to extend the application of sequential patterns for supporting diabetic therapy by increasing their generalization capabilities. It is necessary to mention that such generalization is quite difficult for premeal insulin dosage, because each patient starts a meal with different BGL, and the measurement of the energy value of the meal is highly imprecise. For that reason, we left that challenge for future researches and focused our investigation in this research on mining patterns concerning only longlasting basal insulin.

A new approach for the qualitative evaluation of nocturnal glycaemia is introduced.

The concept of differential sequential patterns and an algorithm for mining them are proposed.

A decision support tool is developed based on differential sequential patterns. The decision tool’s output serves as a medical guideline for physicians when, the proper course of therapy must be determined quickly, e.g., for patients admitted to the hospital  newonset diabetes.
The remainder of the paper is organized in the following way. We start with the Medical background section where the therapy of diabetes is presented. The basics of sequential patterns are given in Methods section, where later on the contribution of this paper is presented. First, the procedure of qualitative evaluation of nocturnal glycaemia is formalized. Second, the concept of differential sequences is introduced, and an algorithm for the generation of sequential patterns based on differential sequences is proposed. The validation of the proposed approach and the discussion of the results that were obtained are described in Results of experiments section.
Medical background
The main goal of effective diabetes treatment is to maintain the state of near normoglyceamia. It is said that the normal glucose level should range between 70140 mg/dl throughout the day, although the recommended range before a meal is 70100 mg/dl [3]. Therefore repeated measurements of blood glucose during the day are performed [6]. Typically, the onset of type 1 diabetes among children is rapid, and such patients should be hospitalized. The initial daily insulin dosage is prescribed based on several basic data (e.g. the patient’s weight, gender and age), some test results (initial glicaemia level, acidbase balance, CRP, Cpeptyd and HbA1c) and the proposed diet (patient’s energy demand). Currently, there is no strict medical procedure that allows immediately setting the proper insulin dosage based on the data listed above. Thus, in the following days, the daily glucose profile is taken, and the insulin dosage is adaptively modified. The daily glucose profile consists of glicaemia measurements every 23 hours during the night, fasting, before each meal, and 2 hours after each meal. The patient is released from the hospital when treatment is established, so that during the day, the blood glucose level is close to normal.
The overall daily insulin dosage is divided with 30% of the dosage administered as the basal dosage and 70% administered as the premeal insulin dosage. (Every patient with type 1 diabetes mellitus should take two types of insulin.) For the basal insulin the longlasting (e.g., glargina insulin) is used. It is a biosynthetic recombinant analog insulin, that provides 24 hours of continuous activity, becomes fully acting as soon as 2 hours after administration, and is peakless. The basal insulin is usually injected in the evening (one injection per day), and it ensures the correct level of glucose during the night and before meals. In the following days, the results of current measurements of glucose are used to determine the dosage. The dose for the next day can be reduced when hypoglycemia occurred in the previous day, or it can be increased id hyperglycemia is observed in the glucose profile. In the case of increasing the glucose level (more than 200 mg/dl) during the night, the short acting insulin is administered immediately as the socalled ’basal correction’.
The second part of the therapy involves administering premeal insulin which, is short acting. This insulin is served just before each meal  usually 45 times a day.
Exemplary clinical data
Patient  Date  Time[h]  Value  Description 

425  20081126  22  10  Basal insulin 
425  20081127  0  109  Glycaemia 
425  20081127  3  62  Glycaemia 
425  20081127  5  60  Glycaemia 
425  20081127  7  71  Glycaemia 
425  20081127  22  9  Basal insulin 
425  20081128  0  114  Glycaemia 
425  20081128  3  112  Glycaemia 
425  20081128  5  105  Glycaemia 
425  20081128  7  99  Glycaemia 
425  20081129  22  8  Basal insulin 
Glycaemia discretisation
Blood glucose level [mg/dl]  Interpretation  Discrete value 

<70  Hypoglycemia  1 
[70, 140)  Normoglycemia  2 
[140, 200 ]  Mildhyperglycemia  3 
>200  Hyperglycemia  4 
When planning therapy, the physician considers the patient’s clinical status and some test results [3], but the most important factor when the insulin dosage is initially specified is the patient’s weight. We used this indication such that the given basal insulin dosage is recalculated into units per 10 kg of the patient’s weight. For example, when considering the first day of therapy for patient 425, the basal insulin dosage is 10 units, and the patient’s weight is 32 kg. Thus, in our research the considered/normalized basal insulin dosage was rounded to 3 units per 10 kg.
Methods
In this section, stepbystep we describe the way the decision support tool for therapy setting was created. We start with preliminaries about the notion of sequential patterns.
The basics of sequential patterns
Let E={e _{1},e _{2},…,e _{ n }} denote the set of items (also called elements or events). Any subset of items \(a_{i} \subseteq E\) is called an itemset. The sequence a=〈a _{1},a _{2},…,a _{ k }〉 is defined as an ordered list of itemsets a _{ i }. The term ksequence denotes any sequence of cardinality k=c a r d(a), where k is the number of itemsets it contains. The particular item e _{ j }∈E may recur in the ksequence by being included in several itemsets a _{ i }.
The sequence a=〈a _{1},a _{2},…,a _{ n }〉 is contained [5] in the other sequence b=〈b _{1},b _{2},…,b _{ m }〉, i.e. \(a \subseteq b\) if there exist integer numbers i _{1},i _{2},…,i _{ n } such that \(a_{1} \subseteq b_{i_{1}}, a_{2} \subseteq b_{i_{2}},\ldots,a_{n} \subseteq b_{i_{n}}\) [5]. The sequence a is called a subsequence of b, and the sequence b is called a supersequence of a.
The objective is to determine such subsequences \(a \subseteq s^{j}\) that are frequently contained in any sequences from S, i.e., with the support s u p(a)≥s u p _{ min }, where s u p _{ min } is a threshold given by experts. Such sequences are in fact a kind of patterns in data, therefore they are called frequent patterns.
Several algorithms for mining frequent patterns have been proposed, the first and best known among them being AprioriAll and AprioriSome [5]. The goal of the other algorithms was to decrease the computational cost of mining [1820]. An overview of sequential patterns mining algorithms is available in [21].
Qualitative estimation of nocturnal glycaemia
In the case of the hospital for diabetic children that was considered, the measurements of BGL are made at 0:00, 3:00, 5:00 and 7:00 A.M. After numerous experiments, it became clear that the application of the arithmetic mean of four numerical measurements [17] was not fully correct and led to decreased support for the patterns that were discovered from the sequences stored as historical data. For that reason, we proposed a new approach.
Exemplary clinical data after the qualitative evaluation of nocturnal glycaemia
Patient  Date  Time  Evaluation  Description 

425  20081126  22  3  Basal insulin 
425  20081127  Night  Hypoglycemia  Glycaemia 
425  20081127  22  3  Basal insulin 
425  20081128  Night  Normoglycemia  Glycaemia 
425  20081128  22  3  Basal insulin 
Differential sequential patterns
Let us define an ordered pair s _{ i }=〈z _{ i },c _{ i }〉, where z _{ i } denotes the value of basal insulin dosage, and c _{ i } is the value of nocturnal BGL. In this way every sequence s ^{ j }=〈s _{1},s _{2},…,s _{ n }〉 is related to the basalinsulin therapy of the j ^{ t h } patient. Every itemset within such sequence is, in fact, an ordered pair of events related to a single day of therapy. The goal of the therapy is to discover frequent patterns \(p \subset s^{j}, j \in [1,n]\) that with high probability will recur for a newlyadmitted patient, i.e., \(p \subset s^{n+1}\).
To represent the therapy of a patient, we redefine a differential sequence as S=〈s _{1},s _{2},…,s _{ n }〉, where s _{ i }=〈d _{ i },g _{ i }〉. For example, the sequence S=〈〈3,3〉,〈3,2〉,〈3,1〉,〈2,2〉〉 is converted to the differential sequence D=〈〈a,b〉,〈0,−〉,〈0,−〉,〈−,+〉〉 or shorter when omitting brackets D=〈a,b,d0,g−,d0,g−,d−,g+〉  note that d stands for insulin dosage and g for glycaemia.
The standard support measure given by formula (1) is used for the evaluation of DSP.
Discovering DSP from historical data
In this section an algorithm for mining DSP is proposed. The goal of the mining algorithm is to determine the set of differential patterns with the given support level. The input data for the algorithm are the set of differential sequences S prepared by preprocessing the clinical data gathered and interpreted according to description given in Qualitative estimation of nocturnal glycaemia.
First, the initial pattern candidate a _{ k } is initiated with the itemset <d _{0},g _{0}> from the first sequence from the set S. The support of the candidate pattern for each sequence in the set S is evaluated. Then, the pattern is enhanced with the following itemset <d _{ j },g _{ j }> of the considered sequence, creating a new candidate pattern. The support of the new candidate pattern is evaluated, and the algorithm repeats. The loop is ended if the support of the candidate pattern is less than the given level p s u p _{ min } or if there are no more itemsets in the sequence. The patterns with the support above p s u p _{ min } and having the required length (here ≥6) are stored. The steps described above are repeated for each unique sequence from S.
The algorithm presented above (Algorithm ??) is based on the AprioriAll [5] sequential patterns discovery algorithm. There are many different variations of this algorithm implementation for different purposes with the task to reduce the time and memory computation complexity [2224]. The algorithm above has been implemented by authors with worst time complexity not greater than O(n·k ^{2}) where n is the number of sequences and k is the average length of the sequences.
Decision support
 1.
The patient is admitted to the hospital cared for by the physician during a certain initial period (13 days)
 2.
The clinical data of the patient gathered during that initial period are converted into the form of differential sequence a ^{ j } (as described in this Section).
 3.
The computer program retrieves a subset of the patterns \(S \subseteq P\) from the set of all available DSPs such that a ^{ j } is supported by each sequence from S, i.e., \(S=\{s: a^{j} \subseteq s \wedge s \in P \}\). This set is called a set of treatment support patterns (TSP).
 4.
The selected patterns serve as the medical guidelines for the physician, supporting her or his decision regarding the following therapy.
After the patient’s hospitalization ends, all of the gathered data are added in the form of differential sequences to the set of all sequences and the algorithm for mining DSPs is run again.
Results of experiments
The objective of the following experiments was to validate the theoretical approach proposed in this paper. We used the data of 102 children with onset of type 1 diabetes collected by the Department of Pediatrics, Endocrinology and Diabetology at the Silesian Medical University in Katowice, Poland. First, the data were processed, i.e., the discretization of BGL and insulin values was performed as described in Medical background. Afterwards the qualitative evaluation of nocturnal glycaemia was made.
The set of DSPs mined from the entire set of data
DSP  Supp 

〈a,b,d0,g0,d0〉  0.39 
〈a,b,d0,g−,d−〉  0.24 
〈a,b,d0,g+,d0〉  0.23 
〈a,b,d0,g0,d0,g0〉  0.16 
〈a,b,d0,g0,d0,g−〉  0.15 
〈a,b,d0,g0,d0,g0,d0〉  0.13 
〈a,b,d0,g0,d0,g−,d0〉  0.13 
〈a,b,d0,g−,d0,g0〉  0.13 
〈a,b,d0,g+,d0,g0〉  0.13 
〈a,b,d0,g−,d−,g+〉  0.11 
〈a,b,d0,g−,d0,g+〉  0.11 
Table 4 shows that the support of individual DSPs reached quite high values. After adding the support of the first three patterns that refer to alternative versions of therapy for the initial three steps, the cumulative support (0.39 + 0.24 + 0.23) = 0.86 was evaluated as high. This means that applying only three of the most general DSPs, it is possible to propose the therapy for 86% of the patients.
Differential patterns, the results of 5folded cross validation
1 ^{ st } trial  Supp  2 ^{ nd } trial  Supp 

〈a,b,d0,g0,d0〉  0.4  〈a,b,d0,g0,d0〉  0.5 
〈a,b,d0,g−,d−〉  0.3  〈a,b,d0,g+,d0〉  0.3 
〈a,b,d0,g0,d0,g0,d0〉  0.2  〈a,b,d0,g+,d0,g0〉  0.25 
〈a,b,d0,g0,d0,g−〉  0.2  〈a,b,d0,g−,d−〉  0.2 
〈a,b,d0,g0,d0,g−,d0〉  0.2  〈a,b,d0,g0,d0,g−〉  0.15 
〈a,b,d0,g0,d0,g0〉  0.2  〈a,b,d0,g−,d−,g0〉  0.15 
〈a,b,d0,g+,d0〉  0.15  
〈a,b,d−,g−,d+〉  0.15 
Differential sequential patterns mined from every learning trial
3 ^{ rd } trial  Supp  4 ^{ th } trial  Supp  5 ^{ th } trial  Supp 

〈a,b,d0,g0,d0〉  0.35  〈a,b,d0,g0,d0〉  0.35  〈a,b,d0,g0,d0〉  0.4 
〈a,b,d0,g+,d0〉  0.25  〈a,b,d0,g+,d0〉  0.15  〈a,b,d0,g+,d0〉  0.3 
〈a,b,d0,g0,d0,g−〉  0.15  〈a,b,d0,g−,d−,g+,d0〉  0.15  〈a,b,d0,g+,d0,g0〉  0.2 
〈a,b,d0,g+,d0,g0〉  0.15  〈a,b,d0,g−,d−,g+〉  0.15  〈a,b,d0,g0,d0,g−,d0〉  0.15 
〈a,b,d0,g+,d0,g0,d0〉  0.15  〈a,b,d0,g0,d0,g0〉  0.15  〈a,b,d0,g0,d0,g−〉  0.15 
Similarly, as in previous experiment, the support that was obtained was evaluated as high. Taking into account the patterns with a minimal support level greater or equal to 0.2 and with the length of 2.5 days or more, the nonempty set of treatment support patterns was deduced in 89% of test examples in each trial. The mined patterns can be used effectively to support the therapy of newlyadmitted patients.

The correction of the initiallyadministered insulin dosage is often not needed, e.g., the DSP =〈d0,g0,d0,g0,d0〉 represents that fact.

The physician is not changing the dose hastily just after one day of observation, e.g., DSP =〈g+,d0〉, 〈g−,d0〉 is used, where g+,g corresponds to the glycaemia level.

The physician is trying to reduce the insulin dosage and maintain it at that level even if the body’s response is not clear, e.g, that is represented by DSP =〈d0,g−,d−〉.

Considering the whole set of DSPs, it was apparent that the last dosage is usually the same as the previous dosage, meaning that the patient is leaving the hospital with the treatment settled and verified with the previous days.

Another interesting observation is that the body’s response can change even when the insulin dosage is not changed, see, e.g. DSP =〈d0,g−,d0,g+〉.
Conclusions
In this paper, the treatment of onset type 1 diabetes patients was analyzed. It was shown that the proposed differential sequential patterns for setting the basal insulin dosage can serve as guidelines for physicians during the dayafterday therapy. Also, by having these patterns, the physician has the opportunity to foresee the possible consequences of the prescribed therapy. Differential sequences can help the physician to decide the proper therapy faster. The usability of the proposed differential sequential patterns and the proposed mining algorithm was verified experimentally using real medical data. The results that were obtained provide evidence of the usefulness of the proposed approach.
Declarations
Acknowledgements
Authors would like to thanks our colleagues Ms. Ossysek and Mr. Chumiecki for their help in collecting the data.
Authors’ Affiliations
References
 WHO. Fact Sheet No. 312.http://www.who.int/mediacentre/factsheets/fs312/en/.
 Toussi M, Lamy JB, Toumelin PL, Venot A. Using data mining techniques to explore physicians’ therapeutic decisions when clinical guidelines do not provide recommendations: methods and example for type 2 diabetes. BMC Med Inform Decis Mak. 2009; 9(1):28.View ArticleGoogle Scholar
 ADA. American diabetes association. Standards Med Care Diabetes2012 Diabetes Care. 2012; 35:11–63. doi:10.2337/dc12s011.Google Scholar
 Shalitin S, Phillip M. Closing the loop: combining insulin pumps and glucose sensors in children with type 1 diabetes mellitus. Pediatr Diabetes. 2006; 7:45–9.View ArticleGoogle Scholar
 Agrawal R, Srikant R. Mining sequential patterns In: Yu PS, Chen ALP, editors. ICDE, Proceedings of the Eleventh International Conference on Data Engineering, March 610. Taipei, Taiwan: IEEE Computer Society: 1995. p. 3–14.Google Scholar
 Bangstad HJ, Danne T, Deeb L, JaroszChobot P, Urakami T, Hanas R. Ispad clinical practice consensus guidelines. insulin treatment in children and adolescents with diabetes. Pediatr Diabetes. 2009; 12:92–9.Google Scholar
 Rewers M, Pihoker C, Donaghue K, Hanas R, Swift P, Klingensmith G. Ispad clinical practice consensus guidelines. assessment and monitoring of glycemic control in children and adolescents with diabetes. Pediatr Diabetes. 2009; 12:71–81.View ArticleGoogle Scholar
 Groot P. Experiences in Quality Checking Medical Guidelines using Formal Methods. In: Proceedings verification and validation of software systems (VVSS 2007). The Netherlands: Eindhoven: Technische Universiteit Eindhoven: 2007. p. 164–78.Google Scholar
 Koper A, Nguyen H. Sequential pattern mining from stream data In: Tang J, King I, Chen L, Wang J, editors. Advanced Data Mining and Applications. Lecture Notes in Computer Science, vol. 7121. Berlin Heidelberg: Springer: 2011. p. 278–91. doi:10.1007/9783642258565_21.Google Scholar
 Yuliana OY, Rostianingsih S, Budhi GS. Discovering sequential disease patterns in medical databases using freespan mining approach. In: International conference on advance computer science and information system. Universitas Indonesia: Faculty of Computer Science: 2009.Google Scholar
 Alonso F, LópezIllescas Á, Martínez L, Montes C, Valente JP. Knowledge discovery using medical data mining. In: Medical data analysis. Springer: 2002. p. 1–12.Google Scholar
 Kléma J, Nováková L, Karel F, Stepankova O, Zelezny F. Sequential data mining: A comparative case study in development of atherosclerosis risk factors. Syst Man Cybernet Part C: Appl Rev IEEE Trans. 2008; 38(1):3–15.View ArticleGoogle Scholar
 Concaro S, Sacchi L, Bellazzi R. Temporal data mining methods for the analysis of the ahrq archives. Proc Am Med Inform Assoc 2007 Annu Symp. 2007.Google Scholar
 Baralis E, Bruno G, Chiusano S, Domenici VC, Mahoto NA, Petrigni C. Analysis of medical pathways by means of frequent closed sequences. In: Setchi R, Jordanov I, Howlett RJ, Jain LC, editors. KES (3), Lecture notes in computer science. vol. 6278. Berlin Heidelberg: Springer: 2010. p. 418–25.Google Scholar
 Srikant R, Agrawal R. Mining sequential patterns: Generalizations and performance improvements. In: Apers PMG, Bouzeghoub M, Gardarin G, editors. EDBT, Lecture notes in computer science. vol. 1057. Berlin Heidelberg: Springer: 1996. p. 3–17.Google Scholar
 Rahaman SB, Shashi M. Sequential mining equips ehealth with knowledge for managing diabetes. Int J Inf Process Manag. 2011; 2(3):65–71.Google Scholar
 Froelich W, Deja R, Deja G. Mining therapeutic patterns from clinical data for juvenile diabetes. Fundamenta Informaticae. 2013; 127(1):513–28.Google Scholar
 Zaki MJ. Efficient enumeration of frequent sequences. In: 7th Intl. conf. on information and knowledge management. New York, NY, USA: ACM: 1998. p. 68–75.Google Scholar
 Han J, Pei J, MortazaviAsl B, Chen Q, Dayal U, Hsu MC. FreeSpan: frequent patternprojected sequential pattern mining. New York, NY, USA: ACM; 2000, pp. 355–59.View ArticleGoogle Scholar
 Cheng H, Yan X, Han J. IncSpan: incremental mining of sequential patterns in large database. In: KDD ’04: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, NY, USA: ACM: 2004. p. 527–32.Google Scholar
 Han J, Cheng H, Xin D, Yan X. Frequent pattern mining: current status and future directions. Data Min Knowl Discov. 2007; 15(1):55–86.View ArticleMathSciNetGoogle Scholar
 Cavique L. A network algorithm to discover sequential patterns. In: Progress in artificial intelligence. Berlin Heidelberg: Springer: 2007. p. 406–14.Google Scholar
 He D. Using suffix tree to discover complex repetitive patterns in dna sequences. In: Conf Proc IEEE Eng Med Biol Soc. vol. 1. IEEE: 2006. p. 3474–7.Google Scholar
 Shakya S, Singh A, Singh D. A time efficient algorithm for web log analysis. Int J Comput Appl. 2013; 75(9):23–30.Google Scholar
Copyright
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.