Familial or Sporadic Idiopathic Scoliosis – classification based on artificial neural network and GAPDH and ACTB transcription profile
BioMedical Engineering OnLine volume 12, Article number: 1 (2013)
Importance of hereditary factors in the etiology of Idiopathic Scoliosis is widely accepted. In clinical practice some of the IS patients present with positive familial history of the deformity and some do not. Traditionally about 90% of patients have been considered as sporadic cases without familial recurrence. However the exact proportion of Familial and Sporadic Idiopathic Scoliosis is still unknown. Housekeeping genes encode proteins that are usually essential for the maintenance of basic cellular functions. ACTB and GAPDH are two housekeeping genes encoding respectively a cytoskeletal protein β-actin, and glyceraldehyde-3-phosphate dehydrogenase, an enzyme of glycolysis. Although their expression levels can fluctuate between different tissues and persons, human housekeeping genes seem to exhibit a preserved tissue-wide expression ranking order. It was hypothesized that expression ranking order of two representative housekeeping genes ACTB and GAPDH might be disturbed in the tissues of patients with Familial Idiopathic Scoliosis (with positive family history of idiopathic scoliosis) opposed to the patients with no family members affected (Sporadic Idiopathic Scoliosis). An artificial neural network (ANN) was developed that could serve to differentiate between familial and sporadic cases of idiopathic scoliosis based on the expression levels of ACTB and GAPDH in different tissues of scoliotic patients. The aim of the study was to investigate whether the expression levels of ACTB and GAPDH in different tissues of idiopathic scoliosis patients could be used as a source of data for specially developed artificial neural network in order to predict the positive family history of index patient.
The comparison of developed models showed, that the most satisfactory classification accuracy was achieved for ANN model with 18 nodes in the first hidden layer and 16 nodes in the second hidden layer. The classification accuracy for positive Idiopathic Scoliosis anamnesis only with the expression measurements of ACTB and GAPDH with the use of ANN based on 6-18-16-1 architecture was 8 of 9 (88%). Only in one case the prediction was ambiguous.
Specially designed artificial neural network model proved possible association between expression level of ACTB, GAPDH and positive familial history of Idiopathic Scoliosis.
Idiopathic Scoliosis (IS) is the most common structural deformity of the human spine. Although the exact cause or causes of idiopathic scoliosis are still unknown there is convincing evidence supporting a genetic aetiology of this disorder [1–5]. Importance of hereditary factors in the etiology of IS is underlined by increased risk of scoliosis among the first-degree relatives of index patients. Harrington found scoliosis incidence of 27% among the first degree relatives.  Other studies indicate 11% of first degree and 2,4% and 1,4% of second and third degree relatives are affected [7, 8]. Genetic basis of IS is also supported by the results of the twin studies. Inoue and colleagues found the concordance rate of scoliosis of 92,3% in monozygous, decreasing to 62,5% in dizygous twins . Lower concordance rate was found in the study of Kesling and al, 73% among monozygous and 36% among dizygous twins . Recent study based on the Swedish Twin Registry estimates that heritability of this condition is 38% indicating the importance of other still unknown factors in the development of the deformity . Mode of inheritance and genetic basis of the scoliotic phenotype are still not definitively determined. Autosomal dominant mode, X-linked as well as multifactorial inheritance patterns were suggested [3–7]. According to Miller et al. idiopathic scoliosis is a complex genetic disorder involving one or more genetic loci and complex genetic interactions for disease expression . In clinical practice some of the IS patients present with positive familial history of the deformity and some do not. Traditionally about 90% of patients have been considered as sporadic cases without familial recurrence . However the exact proportion of Familial and Sporadic Idiopathic Scoliosis is still unknown . Ogilvie et al. in the population study based on a large data base from Utah conclude that 97% of their patients with idiopathic scoliosis have familial origins and suggest a presence of one or more major gene defects or single gene defects influenced by other factors . According to Cheng et al. predisposition for IS doesn’t have a specific assigned risk of heritability, but inheritance is based on multiple factors potentially both genetic and environmental, which have yet to be defined .
Housekeeping genes encode proteins that are usually essential for the maintenance of basic cellular functions. They are expressed constitutively in every human cell but it appears that their transcriptional level may be influenced by numerous factors [12, 13]. ACTB and GAPDH are two housekeeping genes encoding respectively a cytoskeletal protein β-actin, and glyceraldehyde-3-phosphate dehydrogenase, an enzyme of glycolysis . Based on the assumption of their constant expression in various tissues these genes serve as traditional internal control in a variety of assays in molecular biology . Although their expression levels can fluctuate between different tissues and persons, human housekeeping genes seem to exhibit a preserved tissue-wide expression ranking order .
It was hypothesized that expression ranking order of two representative housekeeping genes ACTB and GAPDH might be disturbed in the tissues of patients with Familial Idiopathic Scoliosis (with positive family history of idiopathic scoliosis) opposed to the patients with no family members affected (Sporadic Idiopathic Scoliosis). In order to recognize potentially sophisticated patterns in the data and because of the tensor structure of the ACTB and GAPDH expression an artificial neural network (ANN) was developed that could serve to differentiate between familial and sporadic cases of idiopathic scoliosis based on the expression levels of ACTB and GAPDH in different tissues of scoliotic patients.
The aim of the study was to investigate whether the expression levels of ACTB and GAPDH in different tissues of idiopathic scoliosis patients could be used as source of data for specially developed artificial neural network in order to predict the positive family history of index patients.
Study design was approved by Bioethical Committee Board of Silesian Medical University. Informed, written consent was obtained from each patient participating in the study and if required from their parents. Twenty nine patients (23 females and 6 males) with a definite diagnosis of late onset Idiopathic Scoliosis were included in the study. Thirteen of them (44%) had positive familial history of IS. All of the patients had undergone posterior corrective surgery with segmental spinal instrumentation according to C-D method. The mean age at surgery was 16 years 8 months (range 13,7 – 24 years). Based on Lenke classification 6 curves were of type 1,6 curves of type 2,7 curves of type 3,7 curves of type 4,4 curves of type 5 and 3 of type 6 . Preoperatively the average frontal and sagittal Cobb angles measured on standard p-a and lateral radiograms were 68,8° (range 36°-114°) and 35,4° (range 12°-70°) respectively. The axial plane deformity was measured on CT scans performed at the curve apex by spinal rotation angle relative to sagittal plane RAsag and rib hump index RHi as described by Aaro and Dahlborn . The mean RAsag was 19,3° (range 2,5°-46°) and RHi 0,4 (range 0,03-0,91). During surgery bilateral facet removal was performed in the routine manner and bone specimens from inferior articular spinal processes at the curve apex concavity and convexity were harvested. In the same time bilateral samples of paravertebral muscle tissue at the apical level and 10 ml of patients peripheral blood were collected. Every sample of bone and muscular tissue as well as blood specimens were placed in separate sterile tubes, adequately identified and immediately snap frozen in liquid nitrogen and stored at -80°C until molecular analysis.
Tissues samples were homogenized with the use of Polytron® (Kinematyka AG). Total RNA was isolated from tissue samples with the use of TRIZOL® reagent (Invitrogen Life Technologies, California, USA), according to the manufacturer’s instructions. Extracts of total RNA were treated with DNAase I (Qiagen Gmbh, Hilden, Germany) and purified with the use of RNeasy Mini Spin Kolumn (Qiagen Gmbh, Hilden, Germany), in accordance with manufacturer’s protocol. The quality of RNA was estimated by electrophoresis on a 1% agarose gel stained with ethidium bromide. The RNA concentration was determined by absorbance at 260 nm using a Gene Quant II spectrophotometer (Pharmacia LKB Biochrom Ltd., Cambridge, UK). Total RNA served as a matrix for QRT PCR.
ACTB and GAPDH mRNA quantification in osseous, muscular and blood tissue samples by Quantitative Real Time Reverse Transcription Polymerase Chain Reaction.
The quantitative analysis was carried out with the use of Sequence Detector ABI PRISM™ 7000 (Applied Biosystems, California, USA). The standard curve was appointed for standards of ACTB (TaqMan® DNA Template Reagents Kit, Applied Biosystems, Foster, CA, USA). The ACTB and GAPDH mRNA abundance in all studied tissue specimens were expressed as mRNA copy number per 1 μg of total RNA.
The QRT-PCR reaction mixture of a total volume of 25 μl contained QuantiTect SYBR- Green RT-PCR bufor containing Tris–HCl (NH4)2SO4, 5 mM MgCl2, pH 8,7, dNTP mix fluorescent dye SYBR-Green I, and passive reference dye ROX mixed with 0,5 μl QuantiTect RT mix (Omniscript reverse transcriptase, Sensiscript reverse transcriptase) (QuantiTect SYBR-Green RT-PCR kit; Qiagen) forward and reverse primers each at a final concentration of 0,5 μM mRNA and total RNA 0,25 μg per reaction. Sequence for primers: mRNA for mRNA for ACTB 5’TCACCCACACTGTGCCC ATCTACGA3’(forward primer) 5’CAGCGGAACCGCTCATTGCCAATGG3’ (reverse primer), mRNA for GAPDH 5’GAAGGTGAAGGTCGGAGTC3’(forward primer) 5’GAAGATGG TGATGGGATT 3’(reverse primer). Reverse transcription was carried out at 50°C for 30 min. After activation of the HotStar Taq DNA polymerase and deactivation of reverse transcriptases at 95°C for 15 min, subsequent PCR amplification consisted of denaturation at 94°C for 15 sec, annealing at 60°C for 30 sec and extension at 72°C for 30 sec (40 cycles). Final extension was carried out at 72°C for 10 min. QRT-PCR specificity was assessed by electrophoresis in 6% polyacrylamid gel and melting curves for aplimeres.
The results of laboratory procedures and the family anamnesis of 29 patients were used to create dataset consisting of 29 rows. The expression values were transformed to logarithmic scale. One row represented ACTB and GAPDH transcription profile in three kinds of tissue (bone, muscle, and blood) for exactly one patient.
Unfortunately, there were some missing data in our dataset. To face this problem we could either remove incomplete records from the analyzed dataset or use appropriate methodology and tool to preserve and utilize them in the analysis. In data mining and knowledge discovery from data disciplines the problem of missing data is widely discussed [17–22]. With the removal of all incomplete records we could risk losing some important information contained in the whole dataset. In effect we decided to preserve all the records and replace missing values by random data from normal distributions similar to the original distributions of the variables. The random values were marked in bold [Tables 1 and 2]. Our decision was supported by the experience of one of the co-authors of this study conducting extensive research in the field of advanced data processing therein in processing incomplete data [23–27]. Basing on the mentioned above ANN was chosen as an appropriate method for classification in this case. The dataset was randomly divided into training set (20 rows) [Table 1] and test set (9 rows) [Table 2].
Artificial neural network
Artificial neural network is a mathematical model that is inspired by the structure and functional aspects of biological neural networks [28, 29]. ANN can be used to detect sophisticated patterns in data. Several studies have applied neural networks in research and analysis of various diseases (i.e. classification of cardiovascular disease, forecast for bacteria – antibiotic interactions, prediction of colorectal cancer patient survival) .
The architecture of the ANN used in this study is the multilayered feed-forward network architecture with four layers (two hidden layers). Multilayer feed-forward neural networks can be used to approximate a nonlinear functions which are applied to describe the complicated relationships in biological data . The schematic representation of the best architecture of artificial neural network for our problem is shown in Figure 1. The number of neurons in the input layer was 6 and it was equal to the number of ACTB and GAPDH expression measurements. The ideal outputs were set at 1 for the positive history of IS in the family and at 0 for absence of IS in the anamnesis. The number of hidden nodes was obtained by trial and error method. We trained 421 neural networks models with different number of hidden nodes using the backpropagation algorithm (activation function: binary sigmoidal function, learning rate: 0,1; momentum rate: 0,01; epochs: 50, 500 and 5000) and the training set. The backpropagation teaching method was chosen because it is the most common method of training multilayered feed-forward neural networks . Initially, 50 training epochs were considered but it did not yield a satisfactory result (Table 3). The mean square error (MSE) was high. This MSE was minimized by increasing the epochs from 50 to 500 and finally from 500 to 5000 [Table 3]. Thereafter, we selected 3 neural networks with the least mean square error (MSE) for training set. To test the classification ability of the ANN approach, we used the selected neural models and test set of data. The ANN model with the best classification accuracy for Idiopathic Scoliosis in the anamnesis with expression measurement of ACTB and GAPDH was chosen as the best.
The data have been analyzed using NeuronDotNet computer library . Training an ANN is the process of setting the best weights on the inputs of each of the nodes. The goal is to use the training set to produce weights where the output of the network is as close to the desired output as possible for as many of the examples in the training set as possible . Table 3 shows the MSE for all 421 trained artificial neural models. A satisfactory MSE was yielded for ANNs with:
18 nodes in the first hidden layer and 16 nodes in the second hidden layer
19 nodes in the first hidden layer and 19 nodes in the second hidden layer
18 nodes in the first hidden layer and 10 nodes in the second hidden layer
Figure 2 presents the MSE for ANN model based on 6-18-16-1 architecture and the training set.
Table 2 lists classification results on the test set of ANN modelling for presence and absence of Idiopathic Scoliosis in the anamnesis. It proves how well the artificial neural network will perform on new data. The comparison of developed models showed, that the most satisfactory classification accuracy was achieved for ANN model with 18 nodes in the first hidden layer and 16 nodes in the second hidden layer. The classification accuracy for Idiopathic Scoliosis in the anamnesis with expression measurement of ACTB and GAPDH with use of ANN based on 6-18-16-1 architecture was 8 of 9 (88%). Only in one case (ID 27 in test set), the prediction was ambiguous.
The results of this study confirm the potential benefits of the artificial neural network application for clinical research and point at human housekeeping genes as a potential target for future molecular investigations on idiopathic scoliosis etiopathogenesis. The analysis indicates the relationship between level of expression of ACTB, GAPDH and familial Idiopathic Scoliosis.
Artificial neural network
Messenger ribonucleic acid
- QRT PCR:
Quantitative Real Time Reverse Transciptase Chain Reaction
Cheng JC, Tang NLS, Yeung H, Miller N: Genetic association of complex traits. Clin Ortop and Rel Res 2007, 462: 36–44.
Grauers A, Rahman I, Gerdhem P: Heritability of scoliosis. Eur Spine J 2011. ahead of print
Cowell HR, Hall JN, MacEwen GD: Genetic aspects of idiopathic scoliosis. Clin Orthop 1972, 86: 121–131.
Miller NH: Genetics of familial idiopathic scoliosis. Clin Orthop Rel Res 2002, 401: 60–64.
Miller NH: Genetics of familial idiopathic scoliosis. Clin Orthop and Rel Res 2007, 462: 6–10.
Harrington PR: The etiology of idiopathic scoliosis. Clin Orthop 1977, 126: 43–46.
Riseborough EJ, Wynne-Davies R: A genetic survey of of idiopathic scoliosis in Boston, Massachusetts. J Bone Joint Surg 1973, 55A: 974–982.
Wynne-Davies R: Familial (idiopathic) scoliosis: a family survey. J Bone Joint Surg 1968, 50B: 24–30.
Inoue M, Minami S, Kitahara H, Otsuka Y, Nakata Y, Takso M, Moria H: Idiopathic scoliosis In twins studied by DNA fingerprinting. J Bone and Joint Surg 1998, 80-B(2):212–217.
Kesling KL, Reinker KA: Scoliosis in twins. A metaanalysis of the literature and report of six cases. Spine 1997, 22: 2009–2015. 10.1097/00007632-199709010-00014
Ogilvie JW, Braun J, Argyle VA, Nelson L, Meade M, Ward K: The search for idiopathic scoliosis genes. Spine 2006, 31(6):679–681. 10.1097/01.brs.0000202527.25356.90
Glare EM, Divjak M, Bailey MJ, Walters EH: β-actin and GAPDH housekeeping gene expression in asthmatic airways is variable and not suitable for normalizing mRNA levels. Thorax 2002, 57: 765–770. 10.1136/thorax.57.9.765
Cheng W, Chang C, Chen C, Tsai M, Shu W, Li C: Identification of reference genes across physiological states for qRT-PCR through microarray meta-analysis. PLoS One 2011, 6(2):e17347. http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0017347 10.1371/journal.pone.0017347
Shaw GT, Shih ES, Chen C, Hwang M: Preservation of ranking order in the expression of human housekeeping genes. PLoS One 2011, 6(12):e29314. http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0029314 10.1371/journal.pone.0029314
Lenke LG, Betz RR, Harms J, Bridwell KH, Clements DH, Lowe TG, Blanke K: Adolescent idiopathic scoliosis. A new classification to determine extent of spinal arthrodesis. J Bone Joint Surg 2001, 83-A(8):1169–1181.
Aaro S, Dahlborn M: Estimation of vertebral rotation and the spinal and rib cage deformity in scoliosis by computer tomography. Spine 1981, 6: 460–467. 10.1097/00007632-198109000-00007
Enders CK: Applied Missing Data Analysis. 72 Spring Street, New York: The Guilford Press; 2010. chapter 1.3: Missing Data Patterns
Wang J: Data Mining: Opportunities and Challenges. Idea Group Publishing. 2003. chap 7: The impact of Missing Data on Data Mining
Kantardzic M: Data Mining: Concepts, Models, Methods, and Algorithms. Second edition. Hoboken, NJ, USA: IEEE Press, John Wiley & Sons, Inc; 2011. chapter 2.4 Missing Data
Microsoft MSDN Analysis Services - Data Mining Documentation. http://msdn.microsoft.com/en-us/library/cc280406.aspx
Han J, Kamber M: Data Mining: Concepts and Techniques: Concepts and Techniques. 3d edition. The Morgan Kaufmann Series In Data Management Systems: Morgan Kaufmann Publishers; 2012. chapter 3.2.1 Missing Values
Wang H, Wang S: Discovering patterns of missing data in survey databases: an application of rough sets. Expert Syst Appl 2009, 36(3):6256–6260. 10.1016/j.eswa.2008.07.010
Tkacz M: Processing an Incomplete Data Using Artificial Neural Networks. Ostrava: International Workshop Control and Information Technology; 1999.
Tkacz M: Processing an Incomplete and Random Data Using Artificial Neural Networks. Ostrava: International Workshop Control and Information Technology; 2001.
Tkacz M: Geoenvironmental Modelling with Artificial Intelligence Methods in Case of Hybrid Geothermal System. Sosnowiec, Poland: PhD Thesis, in Polish: Modelowanie warunków geośrodowiskowych przy użyciu metod sztucznej inteligencji na przykładzie hybrydowego systemu płytkiej geotermiki), University of Silesia; 2004.
Tkacz M: Artificial Neural Networks in Incomplete Data Sets Processing. In Intelligent Information Processing and Web Mining. Advances in Soft Computing. Edited by: Kłopotek MA, Wierzchon S, Trojanowski K. Berlin Heidelberg: Springer-Verlag; 2005.
Kłopotek MA, Wierzchon S, Trojanowski K, Tkacz M: Artifcial Neural Networks Resistance To Incomplete Data. In Intelligent Information Processing and Web Mining. Advances in Soft Computing. Berlin Heidelberg: Springer-Verlag; 2006.
Kan T, Shimada Y, Sato F, Ito T, Kondo K, Watanabe G, Maeda M, Yamasaki S, Meltzer SJ, Imamura M: Prediction of Lymph Node Metastasis with Use of Artificial Neural Networks Based on Gene Expression Profiles in Esophageal Squamous Cell Carcinoma. Ann Surg Oncol 2004, 11(12):1070–1078. 10.1245/ASO.2004.03.007
Peterson LE, Mustafa O, Halime E, Andrew A, Lori G, Collen CN, Michael I Proceedings of the 2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology. Artificial Neural Network Analysis of DNA Microarray-based Prostate Cancer Recurrence
Shanthi D, Sahoo G, Saravanan N: Designing an Artificial Neural Network Model for the Prediction of Thrombo-embolic Stroke. International Journals of Biometric and Bioinformatics (IJBB) 2009, 3(1):10–18.
van der Smagt P, Krose B: An Introduction to Neural Networks. Amsterdam: The University of Amsterdam; 1996.
NeuronDotNet. 2012. http://sourceforge.net/projects/neurondotnet/
The study was supported by grant 2P05C07430 from State Committee for Scientific Research of Polish Ministry of Science and Higher Education. We thank all the medical staff of Orthopedics Clinic of WSS nr 5 in Sosnowiec who participated in the scoliosis surgery of the patients included in the study and help us to collect the biological samples.
Tomasz Waller and Damian Zapart received a scholarship under the project "DoktoRIS - Scholarship Program for Innovative Silesia" co-financed by the European Union under the European Social Fund.
The authors declare that they have no financial or non-financial competing interests.
RN participated in the design of the study, performed spinal surgeries, prepared tissue samples, performed radiological measurements and statistical analysis and drafted the manuscript. TW carried out analysis based on artificial neural network and participated in the design of the study. MT has supported us with her experience and knowledge concerning advanced data analysis: knowledge discovery from data, data mining, artificial intelligence and machine learning, and together with DZ and UM has been involved in the design of the study and interpretation of the data and helped to draft the manuscript. All authors read and approved the final manuscript.
Tomasz Waller, Roman Nowak, Magdalena Tkacz, Damian Zapart and Urszula Mazurek contributed equally to this work.
About this article
Cite this article
Waller, T., Nowak, R., Tkacz, M. et al. Familial or Sporadic Idiopathic Scoliosis – classification based on artificial neural network and GAPDH and ACTB transcription profile. BioMed Eng OnLine 12, 1 (2013). https://doi.org/10.1186/1475-925X-12-1
- Artificial Neural Network
- Hide Layer
- Mean Square Error
- Artificial Neural Network Model
- Idiopathic Scoliosis