Familial or Sporadic Idiopathic Scoliosis – classification based on artificial neural network and GAPDH and ACTB transcription profile

Background Importance of hereditary factors in the etiology of Idiopathic Scoliosis is widely accepted. In clinical practice some of the IS patients present with positive familial history of the deformity and some do not. Traditionally about 90% of patients have been considered as sporadic cases without familial recurrence. However the exact proportion of Familial and Sporadic Idiopathic Scoliosis is still unknown. Housekeeping genes encode proteins that are usually essential for the maintenance of basic cellular functions. ACTB and GAPDH are two housekeeping genes encoding respectively a cytoskeletal protein β-actin, and glyceraldehyde-3-phosphate dehydrogenase, an enzyme of glycolysis. Although their expression levels can fluctuate between different tissues and persons, human housekeeping genes seem to exhibit a preserved tissue-wide expression ranking order. It was hypothesized that expression ranking order of two representative housekeeping genes ACTB and GAPDH might be disturbed in the tissues of patients with Familial Idiopathic Scoliosis (with positive family history of idiopathic scoliosis) opposed to the patients with no family members affected (Sporadic Idiopathic Scoliosis). An artificial neural network (ANN) was developed that could serve to differentiate between familial and sporadic cases of idiopathic scoliosis based on the expression levels of ACTB and GAPDH in different tissues of scoliotic patients. The aim of the study was to investigate whether the expression levels of ACTB and GAPDH in different tissues of idiopathic scoliosis patients could be used as a source of data for specially developed artificial neural network in order to predict the positive family history of index patient. Results The comparison of developed models showed, that the most satisfactory classification accuracy was achieved for ANN model with 18 nodes in the first hidden layer and 16 nodes in the second hidden layer. The classification accuracy for positive Idiopathic Scoliosis anamnesis only with the expression measurements of ACTB and GAPDH with the use of ANN based on 6-18-16-1 architecture was 8 of 9 (88%). Only in one case the prediction was ambiguous. Conclusions Specially designed artificial neural network model proved possible association between expression level of ACTB, GAPDH and positive familial history of Idiopathic Scoliosis.


Background
Idiopathic Scoliosis (IS) is the most common structural deformity of the human spine. Although the exact cause or causes of idiopathic scoliosis are still unknown there is convincing evidence supporting a genetic aetiology of this disorder [1][2][3][4][5]. Importance of hereditary factors in the etiology of IS is underlined by increased risk of scoliosis among the first-degree relatives of index patients. Harrington found scoliosis incidence of 27% among the first degree relatives. [6] Other studies indicate 11% of first degree and 2,4% and 1,4% of second and third degree relatives are affected [7,8]. Genetic basis of IS is also supported by the results of the twin studies. Inoue and colleagues found the concordance rate of scoliosis of 92,3% in monozygous, decreasing to 62,5% in dizygous twins [9]. Lower concordance rate was found in the study of Kesling and al, 73% among monozygous and 36% among dizygous twins [10]. Recent study based on the Swedish Twin Registry estimates that heritability of this condition is 38% indicating the importance of other still unknown factors in the development of the deformity [11]. Mode of inheritance and genetic basis of the scoliotic phenotype are still not definitively determined. Autosomal dominant mode, X-linked as well as multifactorial inheritance patterns were suggested [3][4][5][6][7]. According to Miller et al. idiopathic scoliosis is a complex genetic disorder involving one or more genetic loci and complex genetic interactions for disease expression [5]. In clinical practice some of the IS patients present with positive familial history of the deformity and some do not. Traditionally about 90% of patients have been considered as sporadic cases without familial recurrence [1]. However the exact proportion of Familial and Sporadic Idiopathic Scoliosis is still unknown [5]. Ogilvie et al. in the population study based on a large data base from Utah conclude that 97% of their patients with idiopathic scoliosis have familial origins and suggest a presence of one or more major gene defects or single gene defects influenced by other factors [11]. According to Cheng et al. predisposition for IS doesn't have a specific assigned risk of heritability, but inheritance is based on multiple factors potentially both genetic and environmental, which have yet to be defined [1].
Housekeeping genes encode proteins that are usually essential for the maintenance of basic cellular functions. They are expressed constitutively in every human cell but it appears that their transcriptional level may be influenced by numerous factors [12,13]. ACTB and GAPDH are two housekeeping genes encoding respectively a cytoskeletal protein β-actin, and glyceraldehyde-3-phosphate dehydrogenase, an enzyme of glycolysis [12]. Based on the assumption of their constant expression in various tissues these genes serve as traditional internal control in a variety of assays in molecular biology [13]. Although their expression levels can fluctuate between different tissues and persons, human housekeeping genes seem to exhibit a preserved tissue-wide expression ranking order [14].
It was hypothesized that expression ranking order of two representative housekeeping genes ACTB and GAPDH might be disturbed in the tissues of patients with Familial Idiopathic Scoliosis (with positive family history of idiopathic scoliosis) opposed to the patients with no family members affected (Sporadic Idiopathic Scoliosis). In order to recognize potentially sophisticated patterns in the data and because of the tensor structure of the ACTB and GAPDH expression an artificial neural network (ANN) was developed that could serve to differentiate between familial and sporadic cases of idiopathic scoliosis based on the expression levels of ACTB and GAPDH in different tissues of scoliotic patients.
The aim of the study was to investigate whether the expression levels of ACTB and GAPDH in different tissues of idiopathic scoliosis patients could be used as source of data for specially developed artificial neural network in order to predict the positive family history of index patients.

Patients
Study design was approved by Bioethical Committee Board of Silesian Medical University. Informed, written consent was obtained from each patient participating in the study and if required from their parents. Twenty nine patients (23 females and 6 males) with a definite diagnosis of late onset Idiopathic Scoliosis were included in the study. Thirteen of them (44%) had positive familial history of IS. All of the patients had undergone posterior corrective surgery with segmental spinal instrumentation according to C-D method. The mean age at surgery was 16 years 8 months (range 13,7 -24 years). Based on Lenke classification 6 curves were of type 1,6 curves of type 2,7 curves of type 3,7 curves of type 4,4 curves of type 5 and 3 of type 6 [15]. Preoperatively the average frontal and sagittal Cobb angles measured on standard p-a and lateral radiograms were 68,8°(range 36°-114°) and 35,4°(range 12°-70°) respectively. The axial plane deformity was measured on CT scans performed at the curve apex by spinal rotation angle relative to sagittal plane RAsag and rib hump index RHi as described by Aaro and Dahlborn [16]. The mean RAsag was 19,3°(range 2,5°-46°) and RHi 0,4 (range 0,03-0,91). During surgery bilateral facet removal was performed in the routine manner and bone specimens from inferior articular spinal processes at the curve apex concavity and convexity were harvested. In the same time bilateral samples of paravertebral muscle tissue at the apical level and 10 ml of patients peripheral blood were collected. Every sample of bone and muscular tissue as well as blood specimens were placed in separate sterile tubes, adequately identified and immediately snap frozen in liquid nitrogen and stored at -80°C until molecular analysis.

Laboratory procedures
Tissues samples were homogenized with the use of Polytron W (Kinematyka AG). Total RNA was isolated from tissue samples with the use of TRIZOL W reagent (Invitrogen Life Technologies, California, USA), according to the manufacturer's instructions. Extracts of total RNA were treated with DNAase I (Qiagen Gmbh, Hilden, Germany) and purified with the use of RNeasy Mini Spin Kolumn (Qiagen Gmbh, Hilden, Germany), in accordance with manufacturer's protocol. The quality of RNA was estimated by electrophoresis on a 1% agarose gel stained with ethidium bromide. The RNA concentration was determined by absorbance at 260 nm using a Gene Quant II spectrophotometer (Pharmacia LKB Biochrom Ltd., Cambridge, UK). Total RNA served as a matrix for QRT PCR.
ACTB and GAPDH mRNA quantification in osseous, muscular and blood tissue samples by Quantitative Real Time Reverse Transcription Polymerase Chain Reaction.
The quantitative analysis was carried out with the use of Sequence Detector ABI PRISM™ 7000 (Applied Biosystems, California, USA). The standard curve was appointed for standards of ACTB (TaqMan W DNA Template Reagents Kit, Applied Biosystems, Foster, CA, USA). The ACTB and GAPDH mRNA abundance in all studied tissue specimens were expressed as mRNA copy number per 1 μg of total RNA. The QRT-PCR reaction mixture of a total volume of 25 μl contained QuantiTect SYBR-Green RT-PCR bufor containing Tris-HCl (NH 4 ) 2 SO 4 , 5 mM MgCl 2 , pH 8,7, dNTP mix fluorescent dye SYBR-Green I, and passive reference dye ROX mixed with 0,5 μl QuantiTect RT mix (Omniscript reverse transcriptase, Sensiscript reverse transcriptase) (QuantiTect SYBR-Green RT-PCR kit; Qiagen) forward and reverse primers each at a final concentration of 0,5 μM mRNA and total RNA 0,25 μg per reaction. Sequence for primers: mRNA for mRNA for ACTB 5'TCACCCACACTGTGCCC ATC TACGA3'(forward primer) 5'CAGCGGAACCGCTCATTGCCAATGG3' (reverse primer), mRNA for GAPDH 5'GAAGGTGAAGGTCGGAGTC3'(forward primer) 5'GAA GATGG TGATGGGATT 3'(reverse primer). Reverse transcription was carried out at 50°C for 30 min. After activation of the HotStar Taq DNA polymerase and deactivation of reverse transcriptases at 95°C for 15 min, subsequent PCR amplification consisted of denaturation at 94°C for 15 sec, annealing at 60°C for 30 sec and extension at 72°C for 30 sec (40 cycles). Final extension was carried out at 72°C for 10 min. QRT-PCR specificity was assessed by electrophoresis in 6% polyacrylamid gel and melting curves for aplimeres.

Patient data
The results of laboratory procedures and the family anamnesis of 29 patients were used to create dataset consisting of 29 rows. The expression values were transformed to The random values are marked in bold. logarithmic scale. One row represented ACTB and GAPDH transcription profile in three kinds of tissue (bone, muscle, and blood) for exactly one patient. Unfortunately, there were some missing data in our dataset. To face this problem we could either remove incomplete records from the analyzed dataset or use appropriate methodology and tool to preserve and utilize them in the analysis. In data mining and knowledge discovery from data disciplines the problem of missing data is widely discussed [17][18][19][20][21][22]. With the removal of all incomplete records we could risk losing some important information contained in the whole dataset. In effect we decided to preserve all the records and replace missing values by random data from normal distributions similar to the original distributions of the variables. The random values were marked in bold [Tables 1 and 2]. Our decision was supported by the experience of one of the coauthors of this study conducting extensive research in the field of advanced data processing therein in processing incomplete data [23][24][25][26][27]. Basing on the mentioned above ANN was chosen as an appropriate method for classification in this case. The dataset was randomly divided into training set (20 rows) [ Table 1] and test set (9 rows) [ Table 2].

Artificial neural network
Artificial neural network is a mathematical model that is inspired by the structure and functional aspects of biological neural networks [28,29]. ANN can be used to detect sophisticated patterns in data. Several studies have applied neural networks in research and analysis of various diseases (i.e. classification of cardiovascular disease, forecast for bacteriaantibiotic interactions, prediction of colorectal cancer patient survival) [30].
The architecture of the ANN used in this study is the multilayered feed-forward network architecture with four layers (two hidden layers). Multilayer feed-forward neural networks can be used to approximate a nonlinear functions which are applied to describe the complicated relationships in biological data [31]. The schematic Figure 1 A schematic representation of one of tested artificial neural networks. Our ANN has input layer, two hidden layers and an output layer. The input layer has 6 neurons, the first hidden layer has 18 neurons, the second hidden layer has 16 neurons and output layer has 1 neuron.         Table 3 Evaluation and selection of multiple network architectures (Continued)  Table 3 Evaluation and selection of multiple network architectures (Continued) representation of the best architecture of artificial neural network for our problem is shown in Figure 1. The number of neurons in the input layer was 6 and it was equal to the number of ACTB and GAPDH expression measurements. The ideal outputs were set at 1 for the positive history of IS in the family and at 0 for absence of IS in the anamnesis. The number of hidden nodes was obtained by trial and error method. We trained 421 neural networks models with different number of hidden nodes using the backpropagation algorithm (activation function: binary sigmoidal function, learning rate: 0,1; momentum rate: 0,01; epochs: 50, 500 and 5000) and the training set. The backpropagation teaching method was chosen because it is the most common method of training multilayered feed-forward neural networks [30]. Initially, 50 training epochs were considered but it did not yield a satisfactory result ( Table 3). The mean square error (MSE) was high. This MSE was minimized by increasing the epochs from 50 to 500 and finally from 500 to 5000 [ Table 3]. Thereafter, we selected 3 neural networks with the least mean square error (MSE) for training set. To test the classification ability of the ANN approach, we used the selected neural models and test set of data. The ANN model with the best classification accuracy for Idiopathic Scoliosis in the anamnesis with expression measurement of ACTB and GAPDH was chosen as the best.

Results
The data have been analyzed using NeuronDotNet computer library [32]. Training an ANN is the process of setting the best weights on the inputs of each of the nodes. The goal is to use the training set to produce weights where the output of the network is as close to the desired output as possible for as many of the examples in the training set as possible [30]. Table 3 shows the MSE for all 421 trained artificial neural models. A satisfactory MSE was yielded for ANNs with: -18 nodes in the first hidden layer and 16 nodes in the second hidden layer -19 nodes in the first hidden layer and 19 nodes in the second hidden layer -18 nodes in the first hidden layer and 10 nodes in the second hidden layer   Table 2 lists classification results on the test set of ANN modelling for presence and absence of Idiopathic Scoliosis in the anamnesis. It proves how well the artificial neural network will perform on new data. The comparison of developed models showed, that the most satisfactory classification accuracy was achieved for ANN model with 18 nodes in the first hidden layer and 16 nodes in the second hidden layer. The classification accuracy for Idiopathic Scoliosis in the anamnesis with expression measurement of ACTB and GAPDH with use of ANN based on 6-18-16-1 architecture was 8 of 9 (88%). Only in one case (ID 27 in test set), the prediction was ambiguous.

Conclusions
The results of this study confirm the potential benefits of the artificial neural network application for clinical research and point at human housekeeping genes as a potential target for future molecular investigations on idiopathic scoliosis etiopathogenesis. The analysis indicates the relationship between level of expression of ACTB, GAPDH and familial Idiopathic Scoliosis.