Intelligent Bayes Classifier (IBC) for ENT infection classification in hospital environment
 Ritaban Dutta^{1}Email author and
 Ritabrata Dutta^{2}
DOI: 10.1186/1475925X565
© Dutta and Dutta; licensee BioMed Central Ltd. 2006
Received: 20 October 2006
Accepted: 18 December 2006
Published: 18 December 2006
Abstract
Electronic Nose based ENT bacteria identification in hospital environment is a classical and challenging problem of classification. In this paper an electronic nose (enose), comprising a hybrid array of 12 tin oxide sensors (SnO_{2}) and 6 conducting polymer sensors has been used to identify three species of bacteria, Escherichia coli (E. coli), Staphylococcus aureus (S. aureus), and Pseudomonas aeruginosa (P. aeruginosa) responsible for ear nose and throat (ENT) infections when collected as swab sample from infected patients and kept in ISO agar solution in the hospital environment. In the next stage a subclassification technique has been developed for the classification of two different species of S. aureus, namely MethicillinResistant S. aureus (MRSA) and Methicillin Susceptible S. aureus (MSSA). An innovative Intelligent Bayes Classifier (IBC) based on "Baye's theorem" and "maximum probability rule" was developed and investigated for these three main groups of ENT bacteria. Along with the IBC three other supervised classifiers (namely, Multilayer Perceptron (MLP), Probabilistic neural network (PNN), and Radial Basis Function Network (RBFN)) were used to classify the three main bacteria classes. A comparative evaluation of the classifiers was conducted for this application. IBC outperformed MLP, PNN and RBFN. The best results suggest that we are able to identify and classify three bacteria main classes with up to 100% accuracy rate using IBC. We have also achieved 100% classification accuracy for the classification of MRSA and MSSA samples with IBC. We can conclude that this study proves that IBC based enose can provide very strong and rapid solution for the identification of ENT infections in hospital environment.
1. Background
An electronic nose (enose) is an instrument that has been developed as a simplified "electronic" model of the human olfactory system. To humans, the sensation of flavour is due to three main chemoreceptor systems. These are gustation (sense of taste by tongue), olfaction (sense of smell by nose) and trigeminal (sense of irritation of trigeminal receptors). The sense of taste is used to detect certain nonvolatile chemicals, which enter the mouth while the sense of smell is used to detect certain volatile compounds. Receptors for the trigeminal sense are located in the mucous membranes and in the skin, they respond to certain volatile chemicals and it is thought to be especially important in the detection of irritants and chemically reactive species. In the perception of flavour all three chemoreceptor systems are involved but olfaction plays by far the greatest role. An electronic nose (enose) is an instrument that is designed to detect and discriminate different complex odours using a sensor array. The sensor array consists of broadly tuned (nonspecific) sensors that are treated with a variety of odoursensitive chemical materials [1, 2].
An odour stimulus generates a characteristic fingerprint (or smellprint) from the sensor array. Patterns, or fingerprints, from known odours are then used to construct a database and train a pattern recognition system so that unknown odours can subsequently be classified, i.e. identified. Thus, enoses comprise of mechanical components to collect and transport odours to the sensor array as well as electronic circuitry to digitize and store the sensor responses for signal processing.
Gardner and Bartlett defined an electronic nose as "An instrument, which comprises an array of electronic chemical sensors with partial specificity and an appropriate pattern recognition system, capable of recognizing simple or complex odours". The EN system is designed for automated detection and classification of odours, vapours, and gases. It can also perform simple odour discrimination and provide measurement of odour intensity. The two main components of an enose are the sensing system and the automated pattern recognition system [3, 4].
SAW (surface acoustic wave) and QMB (quartz micro balances piezo crystals) both utilize common GC stationary phases to absorb odorant molecules. Both suffer from a lack of sensitivity. CP (conducting polymers) can be very specific but are very sensitive to moisture, which makes them difficult to use in food and beverage analysis. MOS (metal oxide sensors) are the most sensitive and stable of the sensor technologies applied. They have emerged as the bench mark of stable reproducible instruments.
In this paper we describe the use of a hybrid sensors based enose [5] for illness diagnosis. The system consists of 12 tin oxide sensors (SnO2) and 6 conducting polymer sensors configured into a hybrid array has been used to identify three species of bacteria, Escherichia coli (E. coli), Staphylococcus aureus (S. aureus), and Pseudomonas aeruginosa (P. aeruginosa) responsible for ear nose and throat (ENT) infections when collected as swab sample from infected patients and kept in ISO agar solution in the hospital environment. In the next stage a subclassification technique has been developed for the classification of two different species of S. aureus, namely MethicillinResistant S. aureus (MRSA) and Methicillin Susceptible S. aureus (MSSA) [1].
2. ENT bacteria for this study
2.1. Escherichia coli
Escherichia coli, usually abbreviated to E. coli, discovered by Theodor Escherich, a pediatrician and bacteriologist, is one of the main species of bacteria that live in the lower intestines of warmblooded animals, including birds and mammals. They are necessary for the proper digestion of food and are part of the intestinal flora. Its presence in groundwater is a common indicator of fecal contamination. The name comes from its discoverer, Theodor Escherich. It belongs among the Enterobacteriaceae, and is commonly used as a model organism for bacteria in general. One of the root words of their family's scientific name, "enteric", refers to the intestine, hence "gastroenteritis" (from 'gastro', stomach, 'entero' intestine, 'itis', disease). "Fecal" is the adjective for organisms that live in feces, so it is often used synonymously with "enteric" [6].
The number of individual E. coli bacteria in the feces that one human passes in one day averages between 100 billion and 10 trillion. All the different kinds of fecal coli bacteria and all the very similar bacteria that live in the ground (in soil or decaying plants, of which the most common is Enterobacter aerogenes are grouped together under the name coliform bacteria. Technically, the "coliform group" is defined to be all the aerobic and facultative anaerobic, nonsporeforming, Gramnegative, rodshaped bacteria that ferment lactose with the production of gas within 48 hours at 35°C (95°F). In the body, this gas is released as flatulence).
2.2. Pseudomonas aeruginosa
Pseudomonas aeruginosa is an opportunistic pathogen that usually causes problems in humans who have weakened immune systems. This bacterium usually infects the urinary tract, burns, wounds, and also causes other blood infections. One in ten hospital acquired infections is from Pseudomonas. Cystic fibrosis patients are also predisposed to P. aeruginosa infection of the lungs. P. aeruginosa is also the typical cause of "hottub rash" (dermatitis), caused by lack of proper, periodic attention to water cleanliness maintenance procedures. This species is also known to be an opportunistic pathogen of plants [6].
2.3. Staphylococcus aureus
Staphylococcus aureus (which is occasionally given the nickname golden staph) is a bacterium, frequently living on the skin or in the nose of a healthy person, that can cause illnesses ranging from minor skin infections (such as pimples, boils, and cellulitis) and abscesses, to lifethreatening diseases such as pneumonia, meningitis, endocarditis and septicemia. For example each year some 500,000 patients in American hospitals contract a staphylococcal infection. It is a spherical bacterium [6, 7].
Staphylococcus lives as a commensal on the skin and in the nose of humans and other animals, as well as in the environment. It can infect other tissues when normal barriers have broken down (e.g. skin or mucosal lining). This leads to furuncles (boils) and carbuncles (a collection of furuncles). In infants Staphylococcus aureus infection can cause a severe disease SSSS (staphylococcal scalded skin syndrome).
Staphylococcal infections can be spread through contact with pus from an infected wound, skin to skin contact with an infected person, and contact with objects such as towels, sheets, clothing, or athletic equipment used by an infected person.
Deep Staphylococcus infections can be very severe. Prosthetic joints are particularly at risk, and staphylococcal endocarditis (infection of the heart valves) and pneumonia may be rapidly fatal [8].
2.4. MRSA and MSSA
Methicillinresistant Staphylococcus aureus (MRSA) is a subgroup within a group of organisms known as S. aureus. MRSA are characterized by their resistance to treatment with commonly used antibiotics, in contrast to the remainder of the Staphyloccocus aureus group which are referred to as methicillinsusceptible S. aureus (MSSA). Both MRSA and MSSA can cause infection but individuals may also carry the organism without being infected by it. An individual, who carries the organism, but is not infected, is said to be a 'carrier' or 'colonized'. At any one time up to 33% of healthy individuals carry Staphylococcus aureus, including MRSA, predominantly in their noses and also at other sites. Methicillin Resistant Staphylococcus aureus, MRSA, is regarded as the hospital 'superbug' by virtue of its ability to spread and cause outbreaks with a high mortality rate of up to 60%.
Staphylococcus aureus can give rise to infections varying from mild, e.g. boils and infected cuts, to severe, e.g. infections of bones, lungs, heart and blood stream. The difficulty in treating MRSA with antibiotics has led to concern about this particular group of staphylococcal infections. All strains of Staphyloccocus aureus, including MSSA and MRSA, are capable of causing hospitalacquired infection [6, 9, 10].
Organisms can be passed to patients from contact with hands or directly from the environment. The latter includes air, dust, clothing, soft furnishings, surfaces and equipment.
Some patients are more susceptible to colonization and infection than others. These include the elderly, patients with wounds, ulcers or bedsores, catheterized patients, those who have received antibiotics and those who have been, or who are, hospitalized or institutionalized.
3. Experiment & data gathering
3.1 Swab samples
Swab samples were collected from the ENT patients who were suffering from these bacteria infections. The infected swab sample sniffing experiments were conducted using a tin oxide sensors based enose at the Heartland hospital, Birmingham, UK. The Heartland hospital, Birmingham, UK ethic committee approved this study [1, 10, 11]. With the assistance of two ENT specialists in Heartlands hospital swab samples from three types of patients' swab samples were collected. These three types of samples were as follows:

Sample Type 1: Swab samples representing Escherichia coli. Total no. of real patient swab collected was 120.

Sample Type 2: Swab samples representing Pseudomonas aeruginosa. Total no. of real patient swab collected was 150.

Sample Type 3: Swab samples representing Methicillin Resistant S. aureus (MRSA). Total no. of real patient swab collected was 100.

Sample Type 4: Swab samples representing Methicillin Susceptible S. aureus (MSSA). Total no. of real patient swab collected was 60.
Sample type 3 and 4 were colleted as part of S. aureus data base. We got 430 swab samples from 430 different patients which covers all 4 types of bacterial class and infections. Samples were sniffed following same procedure.
3.2. Test procedure
A strategy to collect the ENT odour samples and sampling procedure was agreed with the ENT specialists and was used throughout the data gathering process. Both protocol and sampling technology were kept simple to be cost effective and nonapplication specific. Swab samples were collected from the infected areas of the ENT patients' ear, nose and throat regions from the clinical patients. After collection, all swab samples were kept in ISO S type agar solution in typical 15 cm^{3} vials. This vial size was chosen so as to generate sufficient headspace from the bacterial solution for sniffing purposes. All swab samples were kept in the agar solution in a vial to help the growth of the ENT bacteria.
Later we used the electronic nose system to sniff all the samples. We put approximately 3 mg of agar solution of bacteria in a 10 ml vial. We kept the vial containing bacterial solution for 5 min; it was to generate some headspace from bacterial solution. Enose's inlet was inserted into the vial and headspace was sniffed [1, 5, 10].
4. Data representation & feature extraction
4.1 Data representation
Consider an array of n discrete sensors, where each sensor i produces a timedependent output signal x _{ ij }(t) in response to an odour j. The electrical sensor signal often depends on several physical parameters (e.g. flow rate, ambient pressure, temperature and humidity), but the sensor outputs will reach constant asymptotic values when presented with a constant input stimulus. It is common practice to use only the static or steadystate values of the sensor signals, the response is then defined simply as, for example, an absolute change in sensor signal. However, the choice of the response parameter is fundamental to the subsequent performance of the pattern recognition technique; the preprocessing technique of the response vectors should be chosen to help analyse data from a specific problem. In order to extract relevant key features from the data in terms of the static change in sensor resistance or conductivity, generally a good choice is using a fractional difference model: ${x}_{ij}=({X}_{ij}^{odour}{X}_{i}^{o})/{X}_{i}^{o}$ where ${X}_{i}^{odour}$ is the response of the sensor i to the odour sample j, and ${X}_{i}^{o}$ is the corresponding baseline or reference signal, e.g. in ambient room air prior to the odour measurement. The response generated by the nsensor array to an odour j can then be represented by a timedependent vector: X(t) = (x _{1j }(t), x _{2j }(t), K, x _{ ij }(t), K, x _{ nj }(t)). When the same array is presented to a set of m odours, the response can be regarded as a set of m vectors, which are best represented by a response matrix $\tilde{R}$ (t):
$\tilde{R}(t)=\left(\begin{array}{cccc}{x}_{11}(t)& {x}_{12}(t)& \text{K}& {x}_{1m}(t)\\ {x}_{21}(t)& {x}_{22}(t)& \text{K}& {x}_{2m}(t)\\ \text{M}& \text{M}& \text{O}{x}_{ij}(t)& \text{M}\\ {x}_{n1}(t)& {x}_{n2}(t)& \text{K}& \text{O}{x}_{nm}(t)\end{array}\right)\left(1\right)$
Each column represents a response vector associated with a particular odour, whereas the rows are the responses of an individual sensor to the different odours. As odour sensors do not in general behave independently, an individual sensor will typically respond to odours but with varying crosssensitivity and intensity. As a result, the offdiagonal terms of R are usually nonzero, and thus, under these conditions, nontrivial data processing techniques (i.e. pattern analysis) are required to process the data and extract knowledge [1–4].
4.2 Feature extraction
There is certainly some distinguishing feature among the classes, which is unknown to us. So we want to take some of the known features and explore their capability of classification.
For each sample we know responses of 18 sensors over 120 seconds. The data from a sensor for each sample is a timeseries data having 120 consecutive values. So we know x _{ ij }(t) for t = 1,..,120. We want to get just a scalar value in place of x _{ ij }(t) as a representative value. Basically the feature extraction task extracts an 18 × 1 array from 18 × 120 values. So we want to convert the time dependent matrix $\tilde{R}$ (t) to $\tilde{R}$ which is independent of time. This will be our knowledge base. By plotting the 120 values of x _{ ij }(t) against time we get a graph. This graph approximates the function which determines the responses of the sensor over the time.
There are different feature extraction techniques available for this type of response curve. The generally used "scalar feature extraction (x _{ ij }(t) transforms into a scalar quantity)" techniques are following ones. Let x _{ ij }(t) be the measured signal of the ith sensor of jth odour. If it is a steadystate signal, the custom is to choose the feature as the difference between the steadystate response and the baseline response. For transient signals, number of features is richer. The most popular feature is, again, the difference between the signal's peak and its baseline. Other options are to take the area beneath the curve; the area beneath the curve left of the peak, and the time it takes for the signal to reach its peak.
One of the commonest "vector feature extraction (x _{ ij }(t) transforms into a time independent vector quantity)" technique is the wavelet analysis of the data available. A lot of work has been done by Discante et. al [12, 13]. They tried different dictionary of functions to optimize the number of scalar entities in the extracted feature vector. But they successfully optimized the number to 5, compromising 10% classification error. In other approaches the response functions have been modeled with some known models and their estimated parameters were used as features. But in that case also "vector feature extraction" worked better than "scalar feature extraction" [14].
These two features are well known in cases of checking symmetry of the curves. But here they are giving some measure of the asymmetry of the asymmetric curves. The patterns of asymmetry judged by them, speaks for the various things like, peak of the stimulus, rate of the stimulus going up and falling down or in some cases their time of action in precise.
In many approaches, more than one feature per signal is used, working with a subset of the aforementioned features. So these two new scalar features can be an addition to that super set. These features can lead us to a new exhaustive set of features.
5. Multifeatured knowledge representation
To represent knowledge from the experimental data, generation of patterns and study of pattern recognition algorithms is necessary. Pattern recognition is defined as the process of identifying structure in data by comparison to known structure. Patterns are typically described in terms of multidimensional data vectors, where each component is called a feature. The aim of a pattern recognition system is to associate each pattern with one of the possible pattern classes (or simply classes). Obviously, different patterns should be associated with the same class or with different classes depending on whether they are characterized by similar or dissimilar features, respectively. In the case of the enose, the patterns and the classes are, respectively, the responses of the sensor array to odorants, and the odorants being considered [15–19].
After the data representation stage the next step is knowledge representation from the preprocessed data.
After individual feature extraction, each sample is being represented by an 18 × 1 valued array i.e., the data gathered from a sample. If in the problem we have k classes, then in the knowledge we will have N _{ g }samples from gth class when(g = 1,....., k), and sample data will be gathered from them by the sensors. So after feature extraction gth class contains N _{ g }arrays of dimension 18 × 1. This collection of data is known as knowledge.
 1.
Difference between signal's peak and its baseline
 2.
Area beneath the signal curve
 3.
Area beneath the signal curve (left of the peak)
 4.
Required time for the signal to reach the peak
 5.
Skewness
 6.
Kurtosis
This multifeatured knowledge base can be used to train an artificial neural network (ANN) for supervised bacteria classification or this featured based knowledgebase can be used to develop a conventional fuzzy rule based system for unsupervised unknown sample classification.
When a new sample will be examined, we will extract feature from that in the same manner as the knowledge has been generated. Then this knowledge will be used to classify this new unknown data by using following classification techniques.
6. The problem being solved
 1.
Expensive
 2.
Lengthy Time consuming Process
 3.
More use of infrastructure which increase operating cost
 4.
Less Care as results are always delayed
Typical time required to get a single bacteria screening result pathological lab is 24 – 72 hrs. This problem is so serious that every year NHS spends £1 B just to tackle only MRSA super bug. But still the overall success rate of proper diagnosis of ENT infection in hospital environment in right time is only 65%. So it's an ongoing serious problem and virtually impossible to solve with the existing method [1, 7, 8, 25, 26]. Researchers have started looking at different alternative methods to find a novel rapid screening and diagnosis of ENT infections for last 5 years. In our previous research papers we have looked in "Intelligent Adaptive Sensory Signal Processing (IASSP)" techniques for solving this persistent problem in hospital environment. It has been proved from several research papers that Enose based on IASSP methods could be a very effective, novel and rapid solution for screening and diagnosis of ENT infections.
IASSP is the heart of the gas sensor based 'Electronic Nose' concept. All intelligence of the Enose system is in the IASSP blocks combined with it.
 1.
Very Rapid diagnosis
 2.
Less Expensive
 3.
Less use of infrastructure and operating cost
 4.
Better Care
6.1 PCA for data estimation
PCA is a linear supervised method that has been widely used by various researchers to discriminate the response of an enose to simple and complex odours [20–24]. It is a multivariate statistical method, based on the KarhunenLowve expansion, used in classification models to produce classification results for enose pattern recognition techniques. The method consists of expressing the response vectors r _{ ij }in terms of linear combinations of orthogonal vectors, and is sometimes referred to as vector decomposition. Each orthogonal vector, principal component (PC), accounts for a certain amount of variance in the data with a decreasing degree of importance. The scalar product of the orthogonal vectors with the response vector gives the value of the p th principal components:
X _{ p }= α _{1p } r _{1j }+ α _{2p } r _{2j }+ ... + α _{ ip } r _{ ij }+ ... + α _{ np } r _{ nj } (2)
The variance of each PC score, X _{ p }, is maximized under the constraint that the sum of the coefficients of the orthogonal vectors or eigenvectors α _{ p }= (α _{ 1p }... α _{ jp }... α _{ np }) is set to unity, and the vectors are uncorrelated. Since there is often a high degree of sensor colinearity in enose data, the majority of the information held in response space can often be displayed using a small number of PCs. PCA is in essence a data reduction technique for correlated data, such that a ndimensional problem can be described by a two or three dimensional plot. It can be applied to high dimensional datasets to identify their variation in structure for discrimination in gas sensor applications.
6.2 PCA for subclassification of S. aureus strains
Our early work has proved that it is possible to classify MRSA and MSSA strains on the basis of headspace analysis by using an intelligent conducting polymer sensor based enose system. Now it is an important challenge to prove the S. aureus classification is possible from E. coli and P. aeruginosa, and in the next stage subclassification of the original S. aureus data into two subclusters for MRSA and MSSA is also possible.
We have done PC analysis using "kurtosis" and "skewness" as feature and plotted first three significant PCA components to estimate inner characteristics of the S. aureus data cluster. Microbiological strain similarity between E. coli and S. aureus has a significant impact on the subclassification of S. aureus strains into MRSA and MSSA, as some E. coli PC score point are very similar with some score points of MRSA and MSSA. But apart from those few point, there was now significant overlapping between them.
7. Classification methodology
7.1 Artificial Neural Networks approach
The six different bacteria dataset were analyzed using three supervised ANN classifiers, namely the Multi Layer Perceptron (MLP), Probabilistic Neural network (PNN) and Radial basis function network (RBF) paradigms [2–4, 27]. Training of the neural networks was performed with 80% of the whole data set. The remaining 20% of the whole data were used for testing the neural networks. These percentages were selected arbitrarily and were applied for all data sets. The aim of this comparative study was to identify the most appropriate ANN paradigm, which can be trained with best accuracy, to predict the "type of ENT infections" or in other words "type of ENT bacteria".
7.1.1 MLP classifier
7.1.2 RBF classifier
These networks have a static Gaussian function as the nonlinearity for the hidden layer processing elements. The Gaussian function responds only to a small region of the input space around the Gaussian centered. The key to a successful implementation of these networks is to find suitable centres for the Gaussian functions. This can be done with supervised learning, but an unsupervised approach usually produces better results. For this reason, NeuroSolutions implements RBF networks as a hybrid supervisedunsupervised topology.
The simulation starts with the training of an unsupervised layer. Its function is to derive the Gaussian centres and the widths from the input data. These centres are encoded within the weights of the unsupervised layer using competitive learning. During the unsupervised learning, the widths of the Gaussians are computed based on the centres of their neighbors. The output of this layer is derived from the input data weighted by a Gaussian mixture.
Once the unsupervised layer has completed its training, the supervised segment then sets the centres of Gaussian functions (based on the weights of the unsupervised layer) and determines the width (standard deviation) of each Gaussian. Any supervised topology (such as a MLP) may be used for the classification of the weighted input.
7.1.3 PNN classifier
The PNN networks are variants of the radial basis function (RBF) network. Unlike the standard RBF, the weights of theses networks can be calculated analytically. In this case, the number of cluster centers is by definition equal to the number of exemplars, and they are all set to the same variance.
7.2 "Maximum Probability Rule" based classification
The selection of a patient is done randomly from a large collection of patients having a specific disease, so that the collected samples can be treated as randomly drawn samples. Here we denote each sample as a unit "u", reflecting one specific case of disease or in other word one specific class of bacteria. Now the feature extracted from the sensor response is treated as the observation vector X_{ u }of unit u. In our knowledge for each class we have some observation vectors (of dimension 18 × 1) of randomly selected sample units from that class. We denoted the number of units in a class in the knowledge by N_{ g }when g represents that specific class [1, 28, 29].
7.2.1 Basic idea: modeling of each class by a probability model
At the preliminary stage, we assign probability models to each of the classes. We assume that if we draw random samples from one class it will be selected from a fixed probability distribution, which is specific for that class. In general this means that the bacteria class has been modeled with that probability distribution model. So we assign a distribution to each of the classes. Assuming continuous probability models we specify a probability density function f(X/g) for class g.
7.2.3 Decision rule: maximum (Bayesian) probability rule
Baye's Rule: The posterior and prior probabilities of class membership are related using typicality probability of class by Baye's Rule in the following manner,
$P(g/{\text{X}}_{\text{u}})=\frac{{\pi}_{g}\cdot P({\text{X}}_{\text{u}}/g)}{{\displaystyle \sum _{{g}^{\prime}=1}^{k}{\pi}_{{g}^{\prime}}}\cdot P({\text{X}}_{\text{u}}/{g}^{\prime})}\left(3\right)$
So for classification purpose of a new unit of observation u to any one of the classes, our decision rule becomes:
Assign unit u to class g if P(g/X _{ u } ) > P(h/X _{ u } ) for g ≠ h.
This is called the "Maximum Probability Rule".
Again k values of P(X_{ u }/g) need to be determined for each unit. Because the denominator in 1 is constant for all class, the rule could more simply be based on the k values of π _{ g }·f(X_{ u }/g) and 1 can be stated equivalently as,
$P(g/{\text{X}}_{\text{u}})=\frac{{\pi}_{g}\cdot f({\text{X}}_{\text{u}}/g)}{{\displaystyle \sum _{{g}^{\prime}=1}^{k}{\pi}_{{g}^{\prime}}}\cdot f({\text{X}}_{\text{u}}/{g}^{\prime})}\left(4\right)$
7.2.4 Assignment of probability model
The "Maximum Probability Rule" involving posterior probability of class membership can be applied only if the probability density functions of the distributions of the classes are known [1, 29].
Each class may be modeled by assigning a density function f(X/g). Our only knowledge about class g is the N_{ g }observation units of randomly drawn samples from class g. There are many commonly used techniques of estimating f(X/g) from this knowledge. Commonly these methods are divided in two classes, parametric and nonparametric ones.
Parametric approach: Specify a theoretical probability distributional model ${\text{f}}_{\underset{\u02dc}{\theta}}$ (X/g), assume that the data on hand fit the model and estimate the model parameters $\underset{\u02dc}{\theta}$ using the data, and construct a rule using these estimates.
NonParametric approach: Estimate the density values directly from data with no prior model specification, and construct a rule using these estimates.
We applied two significant methods from these two classes for classification of our ENT data. In parametric model multinominal model has been used, where in nonparametric approaches our choice was for "kernel density estimator". There are some other very successful nonparametric methods, like kNN classification rule. It is true that the knn method is faster than kernel method. But where the minimization of classification error is also important then these two methods are equivalent when number of sensors is too large.
7.3 Nonparametric approach
In this approach we do not assume any prespecified theoretical parametric form of pdf of a distribution. So it is also known as distributionfree method. There are four major types of nonparametric pdf estimators: the histogram, the kernel method, the knearestneighbour method, and the series method. But in this case we will only use the kernel estimates of the pdf and a further development of this method, generally portrayed as adaptive kernel estimators.
7.3.1 General kernel estimator based on Bayesian rule
In general the form of the kernel estimator is,
$\widehat{f}({X}_{u}/g)=\frac{1}{{N}_{g}}{\displaystyle \sum _{i=1}^{{N}_{g}}K({X}_{u}{X}_{i})}\left(5\right)$
Imposing the conditions K(z) ≥ 0 and ∫K(z)dz = 1 on K, then it is easy to see that $\widehat{f}$ also satisfies $\widehat{f}$ (z) ≥ 0 and ∫$\widehat{f}$ (z)dz = 1 so that $\widehat{f}$ is a legitimate pdf. Using product form of the kernel estimates we get,
$\widehat{f}({X}_{u}/g)=\frac{1}{{N}_{g}}{\displaystyle \sum _{i=1}^{{N}_{g}}{\displaystyle \prod _{j=1}^{p}\frac{1}{{h}_{jg}}K(\frac{{X}_{uj}{X}_{ij}}{{h}_{jg}})}}\left(6\right)$
Here the K is known as kernel function and ${\left\{{h}_{jg}\right\}}_{j=1,p}$ is called the smoothing parameters for the gth class.
In our practice we use the normal kernel functions and take equal values for all ${\left\{{h}_{jg}\right\}}_{j=1,p}$ of gth class, then our estimator becomes
$\widehat{f}({X}_{u}/g)=\frac{1}{{N}_{g}}{\displaystyle \sum _{i=1}^{{N}_{g}}{\displaystyle \prod _{j=1}^{p}\frac{1}{{h}_{g}}}}\mathrm{exp}[\frac{1}{2}(\frac{{X}_{uj}{X}_{ij}}{{h}_{g}})]\left(7\right)$
Here we estimated h values for each class g and denoted them as h _{ g }.
7.3.2 Adaptive kernel estimator as IBC
A practical drawback of the kernel method of density estimation is its inability to deal satisfactorily with the tails of distributions without over smoothing the main part of the density. The data have reasonable amount of outliers and their densities are multimodal densities. So the general method smoothed the inner part of the density, where it should not be due to the close overlapping of the data. This pattern of the data speaks for a density estimator which can sense small masses of probability and also a robust one at the same time. From the kernel density point of view, if the window function can be adopted locally depending upon the local data, we can have the optimum one. So here we used one of the popularly known adaptive approaches, adaptive kernel method. In this method another added local smoothing parameter has been used with the global smoothing parameter. This local one has been estimated from a pilot estimate of the density. Previous works show that this method tackles the tail probabilities excellently [1, 28–30].
The adaptive kernel estimators will be,
$\widehat{f}({X}_{u}/g)=\frac{1}{{N}_{g}}{\displaystyle \sum _{i=1}^{{N}_{g}}\frac{1}{{h}_{ig}^{p}}}K({X}_{u}{X}_{i})\left(8\right)$
where the N _{ g }smoothing parameters h _{ ig }(i = 1,......., N _{ g }) are based on some pilot estimate of the density $\tilde{f}$ (X/g). The smoothing parameters h _{ ig }are specified as h _{ g } a _{ ig }, where h _{ g }is a global smoothing parameter of a class and a _{ ig }are local smoothing parameters given by
$\begin{array}{cc}{a}_{ig}={\left\{\tilde{f}({X}_{ig}/g)/{C}_{g}\right\}}^{{\alpha}_{g}}& (i=1,\mathrm{.........},{N}_{g})\end{array}\left(9\right)$
where $\mathrm{log}{C}_{g}=\frac{1}{{N}_{g}}{\displaystyle \sum _{i=1}^{{N}_{g}}\mathrm{log}(\tilde{f}({X}_{ig}/g))}$
and $\tilde{f}$ (X _{ ig }/g) > 0 (i = 1,........, N _{ g }), and where α _{ g }is the sensitivity parameter satisfying 0 ≤ α _{ g }≤ 1. In practical application we assumed α _{ g }to be 0.5.
Here in our case we have taken the general kernel estimator as the pilot estimate of the density $\tilde{f}$ (X/g).
Now plugging in these above estimates $\widehat{f}$ (X _{ u }/g) in the equation 2 we get,
$\widehat{P}(g/{\text{X}}_{\text{u}})=\frac{{q}_{g}\cdot \widehat{f}({X}_{u}/g)}{{\displaystyle \sum _{{g}^{\prime}=1}^{k}{q}_{{g}^{\prime}}\cdot \widehat{f}({X}_{u}/{g}^{\prime})}}.\left(10\right)$
Then the decision rule becomes,
Assign unit u to class g if q _{ g }·$\widehat{f}$ (X _{ u }/g) > q _{ h }·$\widehat{f}$ (X _{ u }/h) for g ≠ h.
when $\widehat{f}$ (X _{ u }/g) will be different for these two methods.
In both of these kernel methods we have to estimate some of the parameters, the window function h for each class. To simplify the problem we assumed same window function for each class, as because in kernel method this windows are smoothed again, so collective choice is not a bad idea. We selected the smoothing parameters simultaneously by minimization of the crossvalidated estimate of the error of classification of the Baye's rule by plugging in these kernel estimates of the groupconditional densities from the knowledge.
It is pleasant to note that maximum probability rule is just a basic concept about modeling, and decision making rule. But it heavily depends upon the above described estimation step which is the heart of the modeling process. With the development of this estimation methods modeling becomes much more flawless and the classification technique becomes stronger. This phenomenon is noted at the time of testing. Adaptive kernel method with its superiority in estimation of tail probability gets an edge over the other methods.
7. Classification performance of ANN
The three different bacteria data sets were analyzed using three supervised ANN classifiers: the multilayer perceptron (MLP), probabilistic neural network (PNN), and radial basis function (RBF) network paradigms. Training of the neural networks was performed with 40% of the data set. The remaining 60% of the data were used for testing the neural networks. These percentages were selected arbitrarily and were applied for all data sets. The aim of this comparative study was to identify the most appropriate ANN paradigm, which can be trained with best accuracy, to predict the type of ENT infections or, in other words, the type of ENT bacteria [3, 27].
7.1 Performance of MLP
A MLP network (with learning rate equal to 0.2 and a momentum term equal to 0.3) with three 3–18 inputs, 2 hidden layers and 3 outputs neurons reached a success rate of 75% in overall classification for three ENT bacteria.
7.2 Performance of RBF and PNN
Neurons were added to the network until the sumsquared error fell beneath an error goal (0.000015) or a maximum number of internal neurons was reached upto 50. It is important that the spread parameter should be large enough so that the radial basis neurons respond to overlapping regions of the input space, but not so large that all the neurons respond in essentially the same manner. For both the networks, the spread parameter was set to 1. PNN was able to correctly classify 85% of the response vectors whereas the level of correct classification in the RBF network was up to 92%.
8. IBC implementation and results
The best classification system we achieved is based on 'newly extracted feature based knowledge' and 'Adaptive Kernel Estimator'. Implemented Bayes classification system was used to classify unknown bacterial samples [1].
General Algorithm steps:

Knowledge base creation based on multiple features.

Modeling of the classes, by estimating the density functions for these classes.

New sample collected by experiment.

Feature extraction and creation of observation vector from sample.

Calculation of the probability of the class membership of this unit in those classes.

Determine the class for which the above mentioned probability is maximum.

Final classification and decision.
The main novelty of this paper is that an innovative knowledgebased Bayes classifier depending upon "Baye's theorem" and "maximum probability rule" has been investigated for these three groups of ENT bacteria. Two different innovative feature extraction techniques, namely 'Kurtosis of the sensory signal', and 'Skewness of the sensory signal', were developed and tested for the three main bacteria data sets and two substrains. It is important to note that there are several different feature extraction techniques are available for enose sensory signal processing. The reason behind the selection of these two new feature extraction approaches is that there was not enough consistency in overall classification by considering the most popular techniques, like, 'the difference between the signal's peak and its baseline', 'the area beneath the curve', 'the area beneath the curve left of the peak', and 'the time it takes for the signal to reach its peak'. And also these popular feature extraction techniques based approached produce significant overlapping between data clusters (please see previous section).
For ENT bacteria identification problem, with these two new extracted features and with Adaptive Kernel Estimator we are able to achieve a great level of overall classification consistency.
8.1 Classification performance of IBC
To check the ability of classification of IBC over the six different features (the difference between the signal's peak and its baseline, the area beneath the curve; the area beneath the curve left of the peak, the time it takes for the signal to reach its peak, Skewness of the data and Kurtosis of the data) we used the following two testing procedures.
8.1.1 Existing data manipulation method
Randomly we have taken out 20% of the feature points from the existing knowledge base and tried to classify them by considering the remaining 80% feature points in knowledge base as the "present knowledge base". In this way we have tried to classify all the data in the knowledge base and noted the percentage of correct classification for these different classes. Also we aimed to keep the consistency of this testing method along with ANN based testing methods. Advantage of this method is the tested data is virtually behaving like an unknown data. Following this approach we achieved 100% for the overall classification of three species of bacteria, Escherichia coli (E. coli), Staphylococcus aureus (S. aureus), and Pseudomonas aeruginosa (P. aeruginosa) responsible for ear nose and throat (ENT) infections.
In the next stage a subclassification has been performed using IBC for the classification of two different species of S. aureus, namely MRSA and MSSA. In the later subclassification problem we also achieved 100% rate of correct classification. All percentages of classification study are based on this our collected data sets at hospital.
8.1.2 Real data based method
To complete this study we decided to take a more realistic approach for testing. So far our classification performance testing was based on existing data sets which originally had been used to develop the knowledgebase. To make the validation of our IBC model in the hospital scenario, we gathered a completely new set of total 60 new patients' data who had infection from three species of bacteria, E. coli, S. aureus (including MRSA and MSSA), and P. aeruginosa.
Kurtosis and Skewness of this unknown data set were extracted as feature and the IBC was used to classify these species of bacteria along with the existing knowledgebase.
Results from these new set of data (completely unknown to our previously developed knowledgebase) were very encouraging. The best results suggest that we are able to identify and classify three bacteria main classes with up to 100% accuracy rate using IBC. We have also achieved 100% classification accuracy for the classification of MRSA and MSSA samples with IBC.
8.2 Conclusion
From our results we can conclude following important observations:

The newly suggested Skewness of the data and Kurtosis of the data features are giving all through better result than the previously used feature in identification of some classes and consistent in other cases.

A comparative evaluation of the classifiers was conducted for this application. IBC outperformed MLP (75%), PNN (85%) and RBFN (92%). The best percentage of IBC based correct classification is 100%.

The nonparametric approaches are giving good results but adaptive kernel method is all the way superior to advanced ANN. It speaks for its more accuracy in estimation of tail probabilities of the data distribution. There is a steady increase in the percentage of correct classification when we use IBC in the place of ANN.

Good results from this study indicate the effectiveness of Bay's maximum probability rule. So it has minimized the total number of misclassification error.
Problem of suitable feature extraction from enose sensory data is extensively addressed in this paper where two innovative new scalar feature extraction approaches 'Kurtosis of the sensory data', and 'Skewness of the sensory data' with the old ones as 'area under the curve of sensory data' have been used for extracting representative data. It is also proved that these new feature extraction techniques are giving excellent result for class discrimination in these cases as these methods have been tested on 430 patients' data base. So it can be treated as a major achievement and further research can be carried out in this line.
The kernel methods are the foundation stones of most statistically significant neural network method "Probabilistic neural network". The adaptive kernel method, which is a developed version of this kernel method, can also promote futher works in this direction. IBC could be a significant breakthrough in the field of sensory signal processing of electronic nose technology.
We can conclude that this study proves that "maximum probability rule" and adaptive kernel estimator based IBC can provide very strong solution for identification and very rapid detection of ENT infections in hospital environment.
8.3 Discussion
Identification and very rapid detection of ENT infections in hospitals is the most challenging problem worldwide to secure a safe and better hospital environment. Contamination from ENT bacteria in hospital wards is so serious that sometime it's affecting other patients, service nurses and even doctors as well. For example around 40% of cases of S. aureus in the UK are resistant to methicillin and other antibiotics. These types of S. aureus tend to be more common in hospital, because people are more susceptible to infections when they are already unwell. From April to September 2004, 3,519 NHS patients were infected with MRSA. It is estimated that the NHS spends around £1 billion per annum on hospitalacquired infections including MRSA.
At present hospitals use full pathological infustructure for any patients with any type of infection. But long term statistics shows that only 30% of the total infected person are having serious bacterial problem. So there is a need for prioritization. Enose technology based on IASSP method could be used as a quick screening system to priorities which patients or healthcare workers need more rigorous testing.
With this new electronic nose based technology at least we will be able to exclude 70% of the population of the patients that would be a bonus in terms of the number of people you would want to screen further. Early exclusion of 70% of total patient (who are not suffering from serious ENT infections) would be extremely cost effective, it would be time saving and effectively we will be using less pathological infustructure. This will help us to make a rapid decision and care we would be able to provide much better care that really need it.
9. Appendix
9.1. Skewness of the data
Skewness is a measure of the asymmetry of the data around the sample mean. If skewness is negative, the data are spread out more to the left of the mean than to the right. If skewness is positive, the data are spread out more to the right. The skewness of the normal distribution (or any perfectly symmetric distribution) is zero. The skewness of a distribution is defined as, $y=\frac{E{(X\mu )}^{3}}{{\sigma}^{3}}$ , where μ is the mean of x, and σ is the standard deviation of x, and E(t) represents the expected value of the quantity t.
9.2. Kurtosis of the data
Kurtosis is a measure of how outlierprone a distribution is. The kurtosis of the normal distribution is 3. Distributions that are more outlierprone than the normal distribution have kurtosis greater than 3; distributions that are less outlierprone have kurtosis less than 3. The kurtosis of a distribution is defined as, $y=\frac{E{(X\mu )}^{4}}{{\sigma}^{4}}$ where μ is the mean of x, σ is the standard deviation of x, and E(t) represents the expected value of the quantity t.
9.3. Typicality probability
Now P(X/g) denotes the probability that a randomly selected unit has a profile close to X given that the unit is a member of class g, and to be noted that P(X/g) is, in limit, proportional to f(X/g).
9.4. Prior probability of class membership
π _{ g }is used to denote the "prior probability" of membership in class g, "prior" in the sense that this is a probability of class membership before X_{ u }is known.
9.5. Posterior probability of class membership
The probability denoted by P(g/X_{ u }) is the probability of unit u belonging to group g, given that the unit has a particular observation vector X_{ u }, is called the "posterior probability".
Declarations
Acknowledgements
The authors would like to thank Dr. David Morgan and Mr. Matt Devoy for their help and collaborative support during ENT data gathering in hospital environment.
Authors’ Affiliations
References
 Dutta R, Dutta R: "Maximum Probability Rule" based classification of MRSA infections in hospital environment: using Electronic Nose. Sensors and Actuators B 29 March 2006.Google Scholar
 Gardner JW, Bartlett PN: Electronic noses: Principles and applications. Oxford University Press, UK; 1999.Google Scholar
 Dutta R, Morgan D, Baker N, Gardner JW, Hines E: Identification of Staphylococcus aureus infections in hospital environment: electronic nose based approach. Sensors and Actuators B 2005, 109: 355–362. 10.1016/j.snb.2005.01.013View ArticleGoogle Scholar
 Pearce TC, Schiffman SS, Nagle HT, Gardner JW, eds: Handbook of Machine Olfaction: Electronic Nose Technology. 1st edition. WileyVCH; 2003.Google Scholar
 alphamos 2006. [http://www.alphamos.com/en/products/profox.php]
 wikipedia 2006. [http://en.wikipedia.org/wiki/Main_Page]
 nhsdirect 2006. [http://www.nhsdirect.nhs.uk/articles/article.aspx?articleID=252]
 biospace 2006. [http://www.biospace.com/news_story.aspx?StoryID=19379920&full=1]
 news.bbc 2006. [http://news.bbc.co.uk/1/hi/health/4267686.stm]
 Dutta R, Hines EL, Gardner JW, Boilot P: Biomedical Engineering Online. 2002, 1: 4. 10.1186/1475925X14View ArticleGoogle Scholar
 Gardner JW, Craven M, Dow CS, Hines EL: Measurement Science and Technology. 1998, 9: 120. 10.1088/09570233/9/1/016View ArticleGoogle Scholar
 Leone A, Distante C, Ancona N, Persaud KC, Stellab E, Siciliano P: A powerful method for feature extraction and compression of electronic nose responses. Sensors and Actuators B 2005, 105: 378–392. 10.1016/j.snb.2004.06.026View ArticleGoogle Scholar
 Support vector machines for olfactory signals recognition, Cosimo Distante, Nicola Ancona, Pietro Siciliano Sensors and Actuators B 2003, 88: 30–39. 10.1016/S09254005(02)003064Google Scholar
 Carmel L, Levy S, Lancet D, Harel D: A feature extraction method for chemical sensors in electronic noses. Sensors and Actuators B 2003, 93: 67–76. 10.1016/S09254005(03)002478View ArticleGoogle Scholar
 Shin HW, Llobet E, Gardner JW, Hines EL, Dow CS: IEE Proc. Science Measurement and Technology 2000, 147: 158. 10.1049/ipsmt:20000422View ArticleGoogle Scholar
 Artursson T, Eklöv T, Lundström I, Mårteusson P, Sjöström M, Holmberg M: Drift correction for gas sensors using multivariate methods. J Chemomet 2000, 14: 1–13. Publisher Full Text 10.1002/1099128X(200009/12)14:5/6%3C;711::AIDCEM607%3E;3.0.CO;24View ArticleGoogle Scholar
 Davide F, Di Natale C, Holmberg M, Winquist F: Frequency analysis of drift in chemical sensors. Proceedings of the 1st Italian Conference on Sensors and Microsystems, Rome, Italy 1996, 150–154.Google Scholar
 Holmberg M, Winquist F, Lundström I, Davide F, Di Natale C, D'Amico A: Drift counteraction for an electronic nose. Sens Actuators B 1996, 35/36: 528–535. 10.1016/S09254005(97)801244View ArticleGoogle Scholar
 Holmberg M, Winquist F, Lundström I, Davide F, Di Natale C, D'Amico A: Drift counteraction in odour recognition application: lifelong calibration method. Sens Actuators B 1997, 42: 185–194. 10.1016/S09254005(97)803358View ArticleGoogle Scholar
 Haugen JE, Tomic O, Kvaal K: A calibration method for handling the temporal drift of solid state gassensors. Anal Chim Acta 2000, 407: 23–39. 10.1016/S00032670(99)007849View ArticleGoogle Scholar
 Duda Richard O, Hart Peter E, Stork David G: Pattern Classification. second edition. Wiley Interscience; 2000.Google Scholar
 Michael F, Ulmer H, Ruiz J, Visani P, Weimar U: Complementary analytical measurements based upon gas chromatographymass spectrometry, sensor system and human sensory panel: a case study dealing with packaging materials. Anal Chim Acta 2001, 431: 11–29. 10.1016/S00032670(00)013167View ArticleGoogle Scholar
 Mitrovics J, Ulmer H, Weimar U, Gopel W: Modular sensor systems for gas sensing and odor monitoring: the MOSES concept. Acc Chem Res 1998, 31: 307–315. 10.1021/ar970064nView ArticleGoogle Scholar
 Nagle HT, Schiffman SS, GutierrezOsuna R: The how and why of electronic noses. IEEE Spectrum 1998, 22–34. 10.1109/6.715180Google Scholar
 Lin YJ, Guo HR, Chang YH, Kao MT, Wang HH, Hong RI: Application of the electronic nose for uremia diagnosis. Sens Actuators B 2001, 76: 177–180. 10.1016/S09254005(01)006256View ArticleGoogle Scholar
 Hahn S, Frank M, Weimar U: Rancidity investigation on olive oil: a comparison of multiple headspace analysis using an electronic nose and GC/MS. Proceedings of the Seventh International Symposium on Olfaction and Electronic Nose 2000, 49–50. ISOEN 2000Google Scholar
 Jang JSR, Sun CT, Mizutani E: Neurofuzzy and soft computing: a Computational approach to learning and machine intelligence. Upper Saddle River NJ: Prenctice Hall; 1997.Google Scholar
 Hand DJ: Discrimination and classification. New York: Wiley; 1981.Google Scholar
 Huberty CJ: Applied Discriminant Analysis. New York: Wiley; 1994.Google Scholar
 McLachlan GJ: Discriminant analysis and statistical pattern recognition. New York: Wiley; 1992.View ArticleGoogle Scholar
 Mathwork 2006. [http://www.mathworks.com]
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.