 Research
 Open Access
 Published:
Development of a brain MRIbased hidden Markov model for dementia recognition
BioMedical Engineering OnLine volumeÂ 12, ArticleÂ number:Â S2 (2013)
Abstract
Background
Dementia is an agerelated cognitive decline which is indicated by an early degeneration of cortical and subcortical structures. Characterizing those morphological changes can help to understand the disease development and contribute to disease early prediction and prevention. But modeling that can best capture brain structural variability and can be valid in both disease classification and interpretation is extremely challenging. The current study aimed to establish a computational approach for modeling the magnetic resonance imaging (MRI)based structural complexity of the brain using the framework of hidden Markov models (HMMs) for dementia recognition.
Methods
Regularity dimension and semivariogram were used to extract structural features of the brains, and vector quantization method was applied to convert extracted feature vectors to prototype vectors. The output VQ indices were then utilized to estimate parameters for HMMs. To validate its accuracy and robustness, experiments were carried out on individuals who were characterized as nondemented and mild Alzheimer's diseased. Four HMMs were constructed based on the cohort of nondemented young, middleaged, elder and demented elder subjects separately. Classification was carried out using a data set including both nondemented and demented individuals with a wide age range.
Results
The proposed HMMs have succeeded in recognition of individual who has mild Alzheimer's disease and achieved a better classification accuracy compared to other related works using different classifiers. Results have shown the ability of the proposed modeling for recognition of early dementia.
Conclusion
The findings from this research will allow individual classification to support the early diagnosis and prediction of dementia. By using the brain MRIbased HMMs developed in our proposed research, it will be more efficient, robust and can be easily used by clinicians as a computeraid tool for validating imaging biomarkers for early prediction of dementia.
Introduction
Dementia is an agerelated neurodegenerative disorder but the cause is still essentially unknown. Alzheimer's disease (AD), the most common form of dementia, is characterized by loss of neurons and synapses in the cerebral cortex and certain subcortical regions. The current clinical diagnosis of AD is still based on clinical observation, neurological and neuropsychological testing. Advanced medical imaging techniques such as magnetic resonance imaging (MRI), positron emission tomography (PET), and single photon emission computed tomography (SPECT) have shown promise as noninvasive diagnostic indicators for AD that may lead to proposal of new diagnostic criteria [1, 2]. Specially, volumetric MRI proves less expensive than other imaging methods and related studies have documented reductions in the size of specific brain regions in people with dementia as they progressed from mild cognitive impairment to severe AD [3, 4]. However, it is still challenging to apply fully automated MRI analytic methods to identify potential AD neuroimaging biomarkers.
Studies involving brain MRI scans have tried to extract the most informative features for diagnosing dementia. Regionsofinterest (ROIs) analysis mainly focuses on brain structural changes in specific anatomical regions such as hippocampal [5, 6], entorhinal [7, 8], frontal [9â€“11], temporal [6, 12], and parietal cortex [6, 13] during disease progression. They provide valuable information of histopathological changes but somehow suffer from limitations like expertdependency, regionlimited, and time consumption. Multiple regions or the whole brain give more accuracy for dementia prediction since they extract typical features from the whole brain and are able to capture early signs of cognitive impairment before the onset of dementia. These studies include the measure of voxelbased differences in cortical volume [10, 14], density [12], and thickness [13, 15], etc. Cortex architecture such as sulcal folds and irregularity can be another important aspects worth studying since sulcal folds are the principal surface landmarks of the human cerebral cortex, and exhibit structurally complex patterns [16] that are postulated to reflect underlying connectivity [17]. However, very few studies have reported the changes of cortical architecture in dementia. In our previous study, we have quantified the cortex structure complexity using entropy method [18], and found a significant higher global cortical structure complexity in AD subjects compared to cognitive normal. This increase was also found to be accompanied with aging. These features have the potential to serve as sensitive surrogate markers and are capable of quantifying the extent of brain degeneration in dementia. However, modeling that can best capture brain structural variability and can be valid in both disease classification and interpretation is extremely challenging.
For dementia diagnosis, multivariate classification techniques have been proposed in literatures such as support vector machines (SVM) [19, 20], artificial neural network (ANN) [21], and decision trees (Trees) [22]. Hidden Markov Models (HMMs) is a popular double stochastic model which is able to extract the underlying statistics using a compact set of features [23]. HMM is especially suitable for sequence data modeling compared to other conventional classifiers. It has been applied in white matter hyperintensities quantification [24], spatiotemporal analysis of brain MR images [25] and age prediction [26]. In the current study, we aimed to establish a computational approach for modeling the MRIbased structural complexity of the brain using the framework of HMMs for dementia recognition. Regularity dimension and semivariogram was used to extract structural features of the brains. Regularity dimension is based on sample entropy which quantifies the system complexity. Semivariogram can estimate the spatial distribution of the system. Vector quantization (VQ) method was applied to convert extracted large feature data to small prototype data. The output VQ indices were then utilized to estimate parameters for HMMs. Classification was carried out using a data set including both nondemented and demented individuals with a wide age range.
Materials and Methods
System architecture
Two fundamental steps are required to design the classifier. The first one is to extract features that can best characterize and discriminate between different group of subjects, and the second is to select an appropriate classifier paradigm. Figure 1 shows the architecture of the proposed classification system. In the current study, two features were extracted using regularity dimension and semivariogram separately from time series generated from each MRI slice. Two feature sequences were then obtained from time series of subsequent MRI slices over the whole brain. VQ allows each feature sequence to be represented as index sequence which can be defined as state or observable symbol in HMM construction. Evaluation of the classifier includes training and testing stages. We used nfold and leaveoneout crossvalidation in model testing.
During the training stage, feature vectors were obtained from all training data from two different groups. The model parameters were estimated and optimized by using state and observation sequences generated from VQ. Therefore, each group will have one representative HMM (Figure 1(a)).
Two approaches were applied in HMMs testing. The first is to match the observed symbol sequence of test data against two different group HMMs previously obtained in the training stage and compute the probability of the sequence generated from the model separately. The test data which indicates a subject will be classified to the group which attains higher probability (Figure 1(b)). The second approach is to construct a HMM for each test data and compare it with group HMMs separately using KullbackLeibler divergence (KLD). The test data will be classified to the group which has smaller KLD indicating greater similarity between the two models (Figure 1(c)).
Subjects
All subjects were drawn from the Open Access Series of Imaging Studies (OASIS) database (http://www.oasisbrains.org/) [27]. From the database, we selected a data set consisted of a crosssectional collection of young, middleaged, nondemented and AD elder adults (Table 1). The subjects were all righthanded and included both men and women. Among the subjects, 75 of them had been clinically diagnosed with very mild to mild AD (with Clinical Dementia Rating (CDR) score of 0.5 or 1); and others with CDR score of 0. All studies were approved by the Institutional Review Board (IRB) of Washington University. Informed consents were obtained from all subjects at the time of study participation.
MRI acquisition and data preprocessing
All images were corrected for inhomogeneity prior to further segmentation. For each subject, a grey/white/Cerebrospinal Fluid (CSF) segmented image [28] in which each voxel has been labeled as GM, white matter (WM), or CSF is provided. All images are in 16bit Analyze 7.5 format, and are normalized into 176x208x176 voxelwise images. Additional details of the image characteristics can be found at (http://www.oasisbrains.org/) [27].
Time series extraction
In order to generate 1D signals that allow the application of regularity dimension, the surface structure of the GM should be represented by timeseries. For each MRI slice, the time series represent the distances measured from subsequent outer boundary points to the GM center of mass. In semivariogram analysis, the spacial locations of each point on the outer boundary were also included.
For each MRI slice, the center of the GM was detected by using the Matlab function called "regionprops" which calculates the centroids of the image regions labeled in the matrix of a twodimensional array of nonnegative integers that represent contiguous regions. Each centroid is 1byn dims vector that specifies the center of mass of the region. The locating of the boundaries of GM in each 2D scan was carried out by using the Matlab function called "bwtraceboundary" which traces the outline of an object in a binary image. All of these functions are available in the Matlab Image Processing Toolbox. The distances from consecutive points on the outer boundary to the center of the GM are then calculated. The distances are measured within each single MR slice (in a 2D plane); one MR slice yields one time series. The whole cortical surface structure is then represented by 130140 time series extracting from their corresponding MR image slices. For each individual, a sequence consist of 130140 time series generated from the whole brain MRI slices can be obtained. The orders of the sequences are the same, from top to bottom.
By depicting the distances between the boundary points and the GM center of each cortical sheet, we were able to build up time series that can best reflect the microstructures of the cortical surface architecture including its folding patterns and any shape changes, and as thus enable complexity analysis. Figure 2 presents typical examples of brain MRI, detected boundary contour of the GM and its corresponding generated time series for each group.
Features extraction
Regularity dimension
The entropy measures provide a way to study the system complexity and has shown high potentiality in quantifying the extent of cortical degeneration in dementia in one of our previous studies [18]. The idea of regularity dimension is based on the concept of power laws and sample entropy (SampEn) measures [29]. SampEn quantifies the conditional probability that two sequences similar for m points (within a given tolerance r) remain similar when one consecutive point is included. For a given time series X = x _{1}, ..., x _{ N }, where B^{m}(r) is the probability that two sequences will match for m points, whereas A^{m}(r) is the probability that two sequences will match for m + 1 points. However, since we normally have no a priori knowledge concerning the dimension of a system, it is imperative that we evaluate the method for different m and r. The regularity dimension has been recently introduced for modeling a mathematical relationship between the frequency with which information about signal regularity changes in various scales. It provides an approach to unify multiple solutions due to the choice among the varieties of the values of entropy parameters m and r. It can be generally expressed as where I _{ r } is SampEn denoting the information subject to r. It can be noted from (2) that the regularity dimension D _{ r } measures the rate of change of signal regularity/predictability with respect to log(1/r). It is the rate at which the entropy of a dynamical system is gained with decreasing length r.
In the present study, regularity dimension D _{ r } was estimated with a r reduced from 1 to 0.05 with an interval of 0.05. Timeseries were extracted from each slice of brain MRI, thus, for each slice, we can get a feature vector consisting of D _{ r } with increasing r (Figure 3(a)).
Semivariogram
The variogram is originally a geostatistical method that describes how the spatial data are related (correlated) with distance. It constructs a variogram that best estimates the autocorrelation structure of the underlying stochastic process. In the current study, since regularity dimension analysis only includes the information of distance values from boundary points to the gray matter center, as supplement, semivariogram was applied to studies the spatial structure of the distance series. It includes the information of both distances and spatial locations of the boundary points. We applied the experimental semivariogram [30], denoted as Î³(h), to analyse the spatial (autocorrelation) structure of the data using a plot of the semivariance against lag distance h. It is defined as the average squared difference of values separated by h. The semivariogram can be calculated as where x _{ i } and x _{ j } are data values from spatial locations i and j to the center of gray matter (GM) mass, respectively. h represents a spatial distance that separate x _{ i } and x _{ j }. N(h) is the total number of distinct data pairs of (x _{ i } âˆ’ x _{ j }). In the current study, x _{ i } and x _{ j } are the distance values from the cortical boundary at location i and j on to the GM center, respectively. Thus, for each time series, we can get a feature vector consisted of semivariogram values Î³(h) with increasing h (Figure 3(b)).
Hidden Markov models (HMMs)
To study the similarities or dissimilarities between a test and reference sequences, we can apply HMMs as an efficient recognition tool. An HMM is defined as a double stochastic process, composed of an underlying stochastic process (hidden states) that can only be visualized through another set of stochastic process (observable symbols). Each HMM is characterized by Î» = (A, B, Ï€), where A is the transition probability matrix of the hidden states, B denotes the emission probability matrix of the observable symbol distributed within the hidden states, and Ï€ is the probability of the initial distribution of the hidden states. To be specific, the following parameters need to be defined to construct an HMM:
Î»: HMM model, Î» = (A, B, Ï€)
N: the number of states
M: the number of different observable symbols per state
Q: the state sequence
Q = (q _{1}, q _{2}, ..., q _{ T }), T is the number of the state sequence
O: the observation sequence
O = (o _{1}, o _{2}, ..., o _{ T }), T is the number of observations
A: A = {a _{ ij }}, a _{ ij } is the probability of state i transferring to state j
a _{ ij } = P(q _{ t } _{+1} = jq _{ t } = i), 1 â‰¤ i, j â‰¤ N
B: B = {b _{ j }(k)}, b _{ j }(k) is the probability of the k^{th} symbol being in the state j
b _{ j }(k) = P(o _{ t } = kq _{ t } = j), i â‰¤ j â‰¤ N, 1 â‰¤ k â‰¤ M
Ï€: Ï€ = {Ï€ _{ i }}, Ï€ _{ i } is the initial distribution of state i
Ï€ _{ i } = P(q _{1} = i), 1 â‰¤ i â‰¤ N
In a hidden Markov model, the state is not directly visible, but output, dependent on the state, is visible (observable symbols). We have proved the capacity of regularity dimension as a sensitive indicator to reflect the extent of cortical degeneration in [18], so we define regularity dimension as hidden state. Semivariogram studies the spatial structure of the time series. Since the spatial location of each point on the cortical outer boundary is observable, the semivariogram is assumed as observable symbol. The probability of transition distribution of the states then can be estimated by states sequence, and the emission distribution probability can be estimated by observing semivariograms where are emitted at the states of regularity dimension. The probability distribution of initial state is assumed to be as equal (0.5).
Features coding
Since the size of feature vectors is too large for hidden Markov modelling (around 20,000 in two groups, 130140/individual), a VQ method [31] is required to map the set of vectors into a finite, smaller, set of prototype vectors for HMM. The output VQ indexes were then applied as states or observable symbols in HMM. VQ is an efficient technique for data compression which can greatly reduce the storage and increase computational efficiency without loosing too much information. The VQ process includes two steps [32]: to design a representative codebook that minimizes the expected distortion, and assign a label (index) to each feature vector of the input data from the codebook. A most commonly used method for generating codebook is the LindeBuzoGray (LBG) algorithm [33]. In general, for a given training set T ( regularity dimension or semivariogram in this study) and the size J of codebook, LBG repeatedly splits the training data into two cells until the desired size of codebook is reached. During each splitting process, search for and update the centroid of each cell until the average distortion D is minimized. D is defined by
For a training set T = {y _{1}, y _{2},..., y _{ T } }, where y _{ T } = (y _{ t } _{1}, y _{ t } _{2}, ..., y _{ tK } ) is a Kdimensional feature vector, t = 1, 2, ..., T , we aimed to find a codebook vector C = {c _{1}, c _{2}, ..., c _{ J }} and the partitions of space, V = {R _{1}, R _{2}, ..., R _{ J }}, where R _{ j } is the encoding region associated with code vector c _{ j }, which minimize D. Then, each source vector y _{ t } is assigned to a nearest neighbor encoding region R _{ j } denoted by Q(x _{ t }) = c _{ j } and labeled by index of the code vector. Only the indices are sent instead of vectors. Figure 4 shows the flowchart of LBG algorithm.
In the current study, the training vectors are assembled from two groups to be compared in order to keep accordance from subject to subject. For state construction using regularity dimension, we set the size of codebook as 2. For symbol construction using semivariogram, in order to achieve an "optimal" codebook size, we varied the size from 4 to 256. Figure 5 presents the classification rate of AD elders vs. normal elders using a fixed VQ codebook size of states as 2 and varied VQ codebook sizes for observable symbols. From the Figure 5, it can be observed that both accuracy and sensitivity improve as the size increases from 4 until 32. The accuracy as well as the sensitivity and specificity begin to decrease as the size rises to 128. The optimal size for symbols is then decided as 32 because it could provide the best tradeoff between effective analysis and efficient computation.
Similarity comparisons
For a test observation sequence O, representing a single individual, and a reference HMM Î», we can compute P(OÎ»), which implies the probability of an observation sequence O given the model Î»:
The time needed to evaluate P(OÎ») directly would be exponential to the observation number T. A forward algorithm [34] is a more efficient procedure which reduces the complexity of the calculation from 2TN^{T} to N^{2} T.
P(OÎ») can be maximized by using BaumWelch algorithm [23] which is also refereed to as the forwardbackward algorithm. The algorithm iteratively uses \stackrel{\xc2\xaf}{\mathrm{\xce\xbb}}=\left(\stackrel{\xc2\xaf}{A},\phantom{\rule{0.3em}{0ex}}\stackrel{\xc2\xaf}{B},\phantom{\rule{0.3em}{0ex}}\stackrel{\xc2\xaf}{\mathrm{\xcf\u20ac}}\right) instead of Î» = (A, B, Ï€) to repeat the reestimation process. The P(OÎ») improves until some limiting point is reached. The final estimate is called maximum likelihood of the HMM. The reestimated model \stackrel{\xc2\xaf}{\mathrm{\xce\xbb}}=\left(\stackrel{\xc2\xaf}{A},\phantom{\rule{0.3em}{0ex}}\stackrel{\xc2\xaf}{B},\phantom{\rule{0.3em}{0ex}}\stackrel{\xc2\xaf}{\mathrm{\xcf\u20ac}}\right) is better than or equal to the previous model, so that P\left(O\stackrel{\xcc\u201e}{\mathrm{\xce\xbb}}\right)\xe2\u2030\yen P\left(O\mathrm{\xce\xbb}\right), as desired.
Given a test HMM Î» _{1} = (A _{1}, B _{1}, Ï€ _{1}) and a reference HMM Î» _{2} = (A _{2}, B _{2}, Ï€ _{2}), we can compare the similarity/dissimilarity between the two models by using a wellknown proximity measure, KullbackLeibler divergence (KLD) [35]. The KLD estimate two probability distributions between two HMMs Î» 1 and Î» 2 which can be defined by
where D _{ s } is the symmetrized version of the approximate KLD of Î» _{1} and Î» _{2}, namely
in which D(Î» _{1}, Î» _{2}) is the empirical KLD between Î» _{1} and Î» _{2} which was originally introduced by Juang and Rabiner [32] using the Monte Carlo simulations. The models are assumed to be ergodic, having arbitrary observation probability distributions and the dissimilarity is defined as the mean divergence of the observation sample. This approximate KLD is given by where {O}_{{\mathrm{\xce\xbb}}_{2}}=\left({o}_{1},{o}_{2},\xe2\u20ac\xa6,{o}_{{T}_{2}}\right) is a sequence of observations generated by model Î» _{2}, and T _{2} is the length of the sequence {O}_{{\mathrm{\xce\xbb}}_{2}}. Eq. (8) can interpret how well model Î» _{1} matches observations generated by model Î» _{2}, relative to how well model Î» _{2} matches observations generated by itself.
To be symmetric, we define D (Î» _{2}, Î» _{1}) as where {O}_{{\mathrm{\xce\xbb}}_{1}}=\left({o}_{1},{o}_{2},\xe2\u20ac\xa6,{o}_{{T}_{1}}\right) is a sequence of observations generated by model Î» _{1}, and T _{1} is the length of the sequence {O}_{{\mathrm{\xce\xbb}}_{2}}.
HMM implementation
The implementation of the brain HMM is outlined as follows:

1)
Obtain MRI scans of a participant;

2)
Extract grey matter from presegmented image using SPM software package (http://www.fil.ion.ucl.ac.uk/spm);

3)
Extract time series by the distances measured from subsequent outer boundary points to the GM center of mass;

4)
Extract state and observable symbol sequences based on regularity dimension and semivariogram using the VQ codebook;

5)
Initial estimate Î» = (A, B, Ï€);

6)
Reestimate \stackrel{\xc2\xaf}{\mathrm{\xce\xbb}}=\left(\stackrel{\xc2\xaf}{A},\phantom{\rule{0.3em}{0ex}}\stackrel{\xc2\xaf}{B},\phantom{\rule{0.3em}{0ex}}\stackrel{\xc2\xaf}{\mathrm{\xcf\u20ac}}\right) using BaumWelch algorithm.
HMM testing
The classifier was tested between every two groups (elder AD vs. elder nondemented, middleaged and young, respectively) using leaveoneout (LOO) and nfold crossvalidation. In LOO method, one subject is taken at a time for evaluation, and the remaining subjects are taken for training. This process is repeated until each subject in both groups has been taken for evaluation, yielding an unbiased estimate of the classification error rate. In nfold crossvalidation method, all the datasets are divided into training and testing dataset. Firstly, 50% of the dataset are randomly selected as the training set, while the rest 50% are taken for testing. The mean classification rates were obtained by repeating the testing process over 100 times. Next, we increase the traning data to 70% and 90% and test the models on the rest 30% and 10% data, respectively. For each combinations, the testing process was repeated over 100 times to get mean classification rates.
To quantify the classification results, we calculated the accuracy defined as the ratio of the number of test subjects correctly classified to the group. Meanwhile, sensitivity and specificity of each tested pair of groups were computed as: Sensitivity = TP/(TP + FN) and Specificity = TN/(TN + FP), where true positives (TP) are number of demented patients correctly classified; true negatives (TN) are the number of control subjects correctly classified; false positives (FP) are the number of controls classified as demented patients and false negatives (FN) are the number of demented patients classified as normal controls.
Results and Discussion
The testing results of HMM classifier are shown in Table 2. From Table 2, it is noted that the classification rate increased as the age difference becomes larger. This is accordance with our previous finding of an agerelated progressing in cortical structural irregularity [18]. Normal aging can also result in a reduction in brain size. Although AD may exhibit a characteristic pattern of atrophy different from that due to aging, even accelerated aging, when AD works as an added effect over aging, it is difficult to ascertain whether the cognitive decline is simply resulted from normal aging or AD, especially at early stage of the disease. This is why the classification accuracy is relatively lower in AD elders versus normal elders. When without this added effect, the accuracy rises (in LOO, from 80.7% to 98.7%) as well as the sensitivity (in LOO, from 81.3% to 98.7%) and specificity (in LOO, from 80.0% to 98.7%). Results from the 50%, 70% and 90% training sets showed that the larger the training set is, the higher the mean classification rates. However, two testing approaches using P(OÎ») and KLD don't differ in classification results (Table 3). Since KLD estimation is based on P(OÎ»), a single use of P(OÎ») is superior to KLD which provides with more efficient computing. Meanwhile, reestimation of HMM parameters did not improve the classification results as shown in Table 4. One of the possible reasons could be a small size of training data that allows in parameter reestimation. Another reason may be while parameters of one HMM are updated in such a way to maximize the quantity P(OÎ»), the probability of O being observed from the other model is also improved. For those P(OÎ») of one group HMM only a tiniest bit over that relative to the other group HMM, there is a risk that the reestimated parameters may lead to a completely converse classification result.
Using the aforementioned experimental methods, demented images can be classified into three categories: very mild, mild and moderate as shown in Table 5. The resulting detection rate of dementia increased as the cognitive impairment exacerbated as expected (Table 6). The declination of global GM volume accelerates as disease developed [6] in the sense the morphological abnormalities can be even more remarkable which make them more likely to be predicted and differentiated from normal cognitive decline. However, due to the limitation of the data provided by OASIS database which include very few moderate AD samples, even the current results show a very encouraging detection rate in latter two groups, we are not able to verify the performance of the classifier in late AD recognition. This issue is certainly worth investigating in our future research when more data become available.
We have demonstrated that our proposed HMMs for MRI data analysis has strong potential in discriminating among early AD, and healthy controls. We also compared our classification results with recent related works using the same OASIS database. GarciaSebastian et al. [36] applied voxelbased morphometry (VBM) to extract classification features and SVM algorithm to perform classification of patients with mild AD (49) vs. controls (49). They obtained a better results with 87.5%. Savio et al. [37] applied four different models of ANN to the same dataset using SVM, and reported a best classification accuracy of 83%. Comparing to these studies that achieved higher accuracies but using small sampler sizes, our classification result of 80.7% accuracy is still encouraging considering the number of subjects in the database and presented more statistically reliable results. Daliri [38] proposed an automated method using scaleinvariant feature transform (SIFT) and SVM for diagnosing AD. Study achieved a classification accuracy of 86% for mild AD (20) vs. normal (66) and 75% for (very mild + mild AD) (69) vs. normal (66). The author also noted that it is more difficult to classify the data from the subjects with very mild AD or the data from the subjects that are elderly. Another study reported by Zhou et al. [39] using a large sample size proposed a framework to analyze the hippocampal shape difference between AD (85) and HC subjects (79) and achieved a classification accuracy of 52.6 âˆ¼ 61.5%. Yang et al. [40] proposed a method based on independent component analysis (ICA) coupled with the use of SVM for classifying MRI scans into categories of AD (100), young healthy controls (yHC) (116), middle age healthy controls (mHC) (100) and old healthy controls (oHC) (100). The best classification accuracy between AD and oHC, AD and mHC, AD and yHC appeared to be 73.7%, 91.6% and 97.8%, respectively. Comparing to above studies using large samples, our results show better accuracy as well as sensitivity and specificity.
HMMs have been applied to audio recognition and gesture/motion detection for monitoring daily activities of patients with dementia [41â€“43]. It has been also utilized to brain MRI for age prediction [25, 26]. To our knowledge, there has been no report on the application of HMMs in dementia classification. We explored the usage of HMM and proved its potentiality in early AD diagnosis. The results from the 50%, 70% and 90% training sets show that larger training set did achieve better accuracy but did not significantly improve the classification performance. It suggests that only a small training set (50%) is large enough for HMMs. Comparing to conventional modeling method using SVM or relevance vector regression (RVR) that usually needs for large training data and effective feature selection which is sometimes difficult to give clinical interpretations, training and validation on the extracted features using HMM can be more time efficient. Meanwhile, the superiority of HMMs in sequence statistics enables the sequence information of brain MRI slices in representing the global cortical structure which makes the results favorable to clinical analysis.
Our main goal in the current study is to propose a method based on HMM modeling for dementia recognition. The current results have shown the capability of our proposed method for early disease diagnosis. Since it is a pilot study, the current classification results are encouraging and reasonable but still can be improved. We have noticed that some white matter (WM) hyperintensities were classified as GM, especially the boarder voxels between GM and WM, and possibly the GM center of mass may be influenced by the presence of the WM hyperintensities. Due to the limitation of the segmentation and boundary tracing method, some potential sensitivity of the approach is probably lost. Future study will focus on methodology refinements to achieve a higher classification accuracy especially between AD elders and normal elders. Moreover, we aim to extent the use of this modeling to longitudinal dataset provided by OASIS database, and also to larger datasets such as that provided by Alzheimer's Disease Neuroimaging Initiative (ADNI). We are also interested to know if our proposed model can classify between AD and other neurological diseases.
Conclusion
We have presented an approach for modeling MRI based whole brain structural complexity and applied it for diagnosing Alzheimer disease. Comparison with other related work using the same OASIS database, we found the classification performance of our proposed model presents promising results in classification. These results demonstrate that our proposed approach is potential to serve as an in vivo surrogate tool for disease severity prediction, and as a diagnostic method for mild cognitive impairment and AD.
References
Waldemar G, Dubois B, Emre M, Georges J, McKeith IG, Rossor M, Scheltens P, Tariska P, Winblad B: Recommendations for the diagnosis and management of Alzheimer's disease and other disorders as sociated with dementia: EFNS guideline. Eur J Neurol 2007, 14: e1â€“26.
Dubois B, Feldman HH, Jacova C, Dekosky ST, BarbergerGateau P, Cummings J, Delacourte A, Galasko D, Gauthier S, Jicha G, Meguro K, O'Brien J, Pasquier F, Robert P, Rossor M, Salloway S, Stern Y, Visser PJ, Scheltens P: Research criteria for the diagnosis of Alzheimer's disease: revising the NINCDSADRDA criteria. Lancet Neurol 2007,6(8):734â€“746. 10.1016/S14744422(07)701783
Schroeter ML, Stein T, Maslowski N, Neumann J: Neural correlates of Alzheimer's disease and mild cognitive impairment: a systematic and quantitative metaanalysis involving 1351 patients. Neuroimage 2009,47(4):1196â€“1206. 10.1016/j.neuroimage.2009.05.037
Desikan RS, Cabral HJ, Hess CP, Dillon WP, Glastonbury CM, Weiner MW, Schmansky NJ, Greve DN, Salat DH, Buckner RL, et al.: Automated MRI measures identify individuals with mild cognitive impairment and Alzheimer's disease. Brain 2009,132(8):2048â€“2057. 10.1093/brain/awp123
Shi F, Liu B, Zhou Y, Yu C, Jiang T: Hippocampal volume and asymmetry in mild cognitive impairment and Alzheimer's disease: Metaanalyses of MRI studies. Hippocampus 2009, 19: 1055â€“1064. 10.1002/hipo.20573
Whitwell JL, Przybelski SA, Weigand SD, Knopman DS, Boeve BF, Petersen RC, Jack J C R: 3D maps from multiple MRI illustrate changing atrophy patterns as subjects progress from mild cognitive impairment to Alzheimer's disease. Brain 2007, 130: 1777â€“1786. 10.1093/brain/awm112
Devanand D, Bansal R, Liu J, Hao X, Pradhaban G, Peterson BS: MRI hippocampal and entorhinal cortex mapping in predicting conversion to Alzheimer's disease. Neuroimage 2012,60(3):1622â€“1629. 10.1016/j.neuroimage.2012.01.075
Tapiola T, Pennanen C, Tapiola M, Tervo S, Kivipelto M, HÃ¤nninen T, PihlajamÃ¤ki M, Laakso MP, Hallikainen M, HÃ¤mÃ¤lÃ¤inen A, et al.: MRI of hippocampus and entorhinal cortex in mild cognitive impairment: a followup study. Neurobiology of aging 2008, 29: 31â€“38. 10.1016/j.neurobiolaging.2006.09.007
The Lund and Manchester Groups: Clinical and neuropathological criteria for frontotemporal dementia. J Neurol Neurosurg Psychiatry 1994, 57: 416â€“418.
Burton EJ, Karas G, Paling SM, Barber R, Williams ED, Ballard CG, McKeith IG, Scheltens P, Barkhof F, O'Brien JT: Patterns of cerebral atrophy in dementia with Lewy bodies using voxelbased morphometry. Neuroimage 2002, 17: 618â€“630. 10.1006/nimg.2002.1197
Thompson PM, Hayashi KM, de Zubicaray G, Janke AL, Rose SE, Semple J, Herman D, Hong MS, Dittmer SS, Doddrell DM, Toga AW: Dynamics of gray matter loss in Alzheimer's disease. J Neurosci 2003, 23: 994â€“1005.
Hamalainen A, Tervo S, GrauOlivares M, Niskanen E, Pennanen C, Huuskonen J, Kivipelto M, Hanninen T, Tapiola M, Vanhanen M, Hallikainen M, Helkala EL, Nissinen A, Vanninen R, Soininen H: Voxelbased morphometry to detect brain atrophy in progressive mild cognitive impairment. Neuroimage 2007, 37: 1122â€“1131. 10.1016/j.neuroimage.2007.06.016
Im K, Lee JM, Seo SW, Yoon U, Kim ST, Kim YH, Kim SI, Na DL: Variations in cortical thickness with dementia severity in Alzheimer's disease. Neurosci Lett 2008, 436: 227â€“231. 10.1016/j.neulet.2008.03.032
Giulietti G, Bozzali M, Figura V, Spano B, Perri R, Marra C, Lacidogna G, Giubilei F, Caltagirone C, Cercignani M: Quantitative magnetization transfer provides information complementary to grey matter atrophy in Alzheimer's disease brains. Neuroimage 2012, 59: 1114â€“1122. 10.1016/j.neuroimage.2011.09.043
Dickerson BC, Bakkour A, Salat DH, Feczko E, Pacheco J, Greve DN, Grodstein F, Wright CI, Blacker D, Rosas HD, Sperling RA, Atri A, Growdon JH, Hyman BT, Morris JC, Fischl B, Buckner RL: The cortical signature of Alzheimer's disease: regionally specific cortical thinning relates to symptom severity in very mild to mild AD dementia and is detectable in asymptomatic amyloidpositive individuals. Cereb Cortex 2009, 19: 497â€“510. 10.1093/cercor/bhn113
Welker W: Why does cerebral cortex fissure and fold? A review of determinants of gyri and sulci. Cerebral cortex 1990, 8: 3â€“136.
Van Essen DC: A tensionbased theory of morphogenesis and compact wiring in the central nervous system. Nature 1997,385(6614):313â€“318. 10.1038/385313a0
Chen Y, Pham TD: Sample entropy and regularity dimension in complexity analysis of cortical surface structure in early Alzheimer's disease and aging. J Neurosci Methods 2013,215(2):210â€“217. 10.1016/j.jneumeth.2013.03.018
Magnin B, Mesrob L, KinkingnÃ©hun S, PÃ©lÃ©griniIssac M, Colliot O, Sarazin M, Dubois B, LehÃ©ricy S, Benali H: Support vector machinebased classification of Alzheimer's disease from wholebrain anatomical MRI. Neuroradiology 2009,51(2):73â€“83. 10.1007/s002340080463x
Cuingnet R, Gerardin E, Tessieras J, Auzias G, LehÃ©ricy S, Habert MO, Chupin M, Benali H, Colliot O: Automatic classification of patients with Alzheimer's disease from structural MRI: A comparison of ten methods using the ADNI database. Neuroimage 2011,56(2):766â€“781. 10.1016/j.neuroimage.2010.06.013
Aguilar C, Westman E, Muehlboeck J, Mecocci P, Vellas B, Tsolaki M, Kloszewska I, Soininen H, Lovestone S, Spenger C, et al.: Different multivariate techniques for automated classification of MRI data in Alzheimer's disease and mild cognitive impairment. Psychiatry Research: Neuroimaging 2013.
Querbes O, Aubry F, Pariente J, Lotterie JA, DÃ©monet JF, Duret V, Puel M, Berry I, Fort JC, Celsis P, et al.: Early diagnosis of Alzheimer's disease using cortical thickness: impact of cognitive reserve. Brain 2009,132(8):2036â€“2047. 10.1093/brain/awp105
Rabiner LR: A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 1989,77(2):257â€“286. 10.1109/5.18626
Pham TD, Salvetti F, Wang B, Diani M, Heindel W, Knecht S, Wersching H, Baune BT, Berger K: The hiddenMarkov brain: comparison and inference of white matter hyperintensities on magnetic resonance imaging (MRI). J Neural Eng 2011, 8: 016004. 10.1088/17412560/8/1/016004
Wang Y, Resnick SM, Davatzikos C: Spatiotemporal analysis of brain MRI images using hidden Markov models. In Medical Image Computing and ComputerAssisted InterventionMICCAI 2010. Springer; 2010:160â€“168.
Wang B, Pham TD: MRIbased age prediction using hidden Markov models. J Neurosci Methods 2011, 199: 140â€“145. 10.1016/j.jneumeth.2011.04.022
Marcus DS, Wang TH, Parker J, Csernansky JG, Morris JC, Buckner RL: Open Access Series of Imaging Studies (OASIS): crosssectional MRI data in young, middle aged, nondemented, and demented older adults. J Cogn Neurosci 2007, 19: 1498â€“1507. 10.1162/jocn.2007.19.9.1498
Zhang Y, Brady M, Smith S: Segmentation of brain MR images through a hidden Markov random field model and the expectationmaximization algorithm. IEEE Trans Med Imaging 2001, 20: 45â€“57. 10.1109/42.906424
Pham TD: Regularity dimension of sequences and its application to phylogenetic tree reconstruction. Chaos, Solitons & Fractals 2012, 45: 879â€“887. [http://www.sciencedirect.com/science/article/pii/S0960077912000732] 10.1016/j.chaos.2012.03.001
Clark I, Harper WV: Practical geostatistics 2000. Ecosse North America Columbus; 2000.
Gray R: Vector quantization. ASSP Magazine, IEEE 1984,1(2):4â€“29.
Gersho A, Gray RM: Vector quantization and signal compression. Springer; 1992.
Linde Y, Buzo A, Gray R: An algorithm for vector quantizer design. Communications, IEEE Transactions on 1980, 28: 84â€“95. 10.1109/TCOM.1980.1094577
Rabiner L, Juang BH: Fundamentals of speech recognition. 1993.
Cover TM, Thomas JA: Elements of information theory. Wileyinterscience; 2012.
GarcÃaSebastiÃ¡n M, Savio A, GraÃ±a M, VillanuÃ¡ J: On the use of morphometry based features for Alzheimer's disease detection on MRI. In BioInspired Systems: Computational and Ambient Intelligence. Springer; 2009:957â€“964.
Savio A, GarciaSebastian M, Hernandez C, Grana M, Villanua J: Classification Results of Artificial Neural Networks for Alzheimer's Disease Detection. Intelligent Data Engineering and Automated Learning  IDEAL 2009.
Daliri MR: Automated diagnosis of Alzheimer disease using the scaleinvariant feature transforms in magnetic resonance images. J Med Syst 2012,36(2):995â€“1000. 10.1007/s1091601197386
Zhou L, Lieby P, Barnes N, RegladeMeslin C, Walker J, Cherbuin N, Hartley R: Hippocampal shape analysis for Alzheimer's disease using an efficient hypothesis test and regularized discriminative deformation. Hippocampus 2009,19(6):533â€“540. 10.1002/hipo.20639
Yang W, Lui RL, Gao JH, Chan TF, Yau ST, Sperling RA, Huang X: Independent component analysisbased classification of Alzheimer's disease MRI data. J Alzheimers Dis 2011,24(4):775â€“783.
Karaman S, BenoisPineau J, Megret R, Dovgalecs V, Dartigues JF, Gaestel Y: Human daily activities indexing in videos from wearable cameras for monitoring of patients with dementia diseases. Pattern Recognition(ICPR), 2010 20th International Conference on, IEEE 2010, 4113â€“4116.
Peters C, Wachsmuth S, Hoey J: Learning to recognise behaviours of persons with dementia using multiple cues in an HMMbased approach. Proceedings of the 2nd International Conference on Pervasive Technologies Related to Assistive Environments 2009, 65. ACM
MÃ©gret R, Dovgalecs V, Wannous H, Karaman S, BenoisPineau J, El Khoury E, Pinquier J, Joly P, AndrÃ©Obrecht R, GaÃ«stel Y, et al.: The IMMED project: wearable video monitoring of people with age dementia. Proceedings of the international conference on Multimedia 2010, 1299â€“1302. ACM
Acknowledgements
This work was supported by Japan Society for the Promotion of Science (JSPS) GrantsinAid for Scientific Research (Research Activity Startup) awarded to the second author (T. D. Pham).
This article has been published as part of BioMedical Engineering OnLine Volume 12 Supplement 1, 2013: Selected articles from the 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society: Workshop on Current Challenging Image Analysis and Information Processing in Life Sciences. The full contents of the supplement are available online at http://www.biomedicalengineeringonline.com/supplement/12/S1
Author information
Authors and Affiliations
Corresponding author
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors' contributions
YC conducted the study, evaluated the data, performed data analyses, and drafted the manuscript. TP conceived and supervised the study, revised, and approved the final manuscript.
Rights and permissions
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Cite this article
Chen, Y., Pham, T.D. Development of a brain MRIbased hidden Markov model for dementia recognition. BioMed Eng OnLine 12 (Suppl 1), S2 (2013). https://doi.org/10.1186/1475925X12S1S2
Published:
DOI: https://doi.org/10.1186/1475925X12S1S2
Keywords
 Dementia
 Hidden Markov Model (HMM)
 Classification
 Magnetic Resonance Imaging (MRI)
 Alzheimer's Disease (AD)
 Regularity Dimension
 Semivariogram
 Vector Quantization (VQ)