It is known that there is no standard approach to establish the optimum decision criteria for a generic problem. Nonparametric classification approaches [19–22] overcome the difficulty of accurate estimation of underlying distributions from small population size. These approaches are used without the assumptions about the form of the density function. Their decision procedures bypass probability estimation and go directly to decision functions. Due to the small population, we here employ nonparametric techniques instead of parametric ones such as K-Means, vector quantization (VQ) and gaussian mixture model (GMM) which, otherwise, are very useful with the large population. Fuzzy modeling methods could be used to also enhance the decision ability [23].

Among various nonparametric techniques, we choose a local k-NN and a global polynomial classifier, MLP. The concept of locality and globality is related to the location of information that is extracted for class decision. The k-NN rule employs local information, in contrast, an MLP extracts global information.

It is known that the nearest neighbor in the feature space gives the half of the classification information obtained by the Bayes classifier [19]. On the other hand, polynomials have excellent properties as classifiers. Thus, polynomial classifiers are universal approximators relative to the optimal Bayes classifier [20]. Typical polynomial classifiers have either been based on either statistical methods or minimizing a mean-squared error criterion. The focus has been on linear and second-degree polynomial classifiers, but both of these methods traditionally have had limitations. Classical statistical methods estimate the mean and covariance for a parametric model [19–22]. These methods with a large number of features or multivariables lead to large intractable problems [17, 18].

With the k-NN rule, the class decision of an unknown sample is based on the majority of the nearest k samples. In other words, a local voting process decides the class of the unknown by the highest vote. This is, in fact, an approximation of Bayes Decision Theory in a local environment [19]. This process can further be extended to weighted voting and gaussian weighted voting (Parzen window). To recognize pattern v, k minimum distance samples are computed among all the samples. The distance can be computed by various norms: minkowski-norm-based distances, e.g. euclidian as second order, covariance-based distance or mahalonobis, entropy-based distance or kullback-leibler distances. The simplest and the most suitable process for a small sample size is the euclidian metric, which is defined as

dj = |v - v_{j}|^{2} = (v - v_{j})^{T}(v - v_{j}) (3)

where dj defines the distance between patterns v and v_{j}. The k-NN rule classifies v by assigning it a label that is the most frequently represented among the k nearest samples:

ki = max {k_{1},....,k_{L}} x € w_{
i
} (4)

k_{1} + .... + k_{L} = k

Here, k_{i} is the number of neighbors belong to class w_{i} (i = 1,....L) class among the k nearest neighbors.

The global MLP [22] is a parallel, feedforward structure that consists of input, output, and hidden layers. Each layer has sigmoidal units interconnected through weight connections. The MLP is trained with a supervised back propagation (BP) algorithm to efficiently compute partial derivatives of an approximating function F(w; x). The network has an adjustable weight vector, w, that is computed with respect to all training data for a given value of input vector x and output vector y. The weights are adjusted to fit a set of surfaces to the input space. The surfaces are constructed with sigmoids by using the best linear regression concerning the cluster membership. The mean square (MSE) (Eq. 5) error function, which is the difference between the network's output and the supervisor output, is minimized to find the cluster membership:

In k-NN rule, an arbitrary discriminant function is constructed for class decision. The nearest neighbor contributes the half of the classification information. An MLP can generate any nonlinear discriminant function of input by incorporating multiple constraints. Each sigmoidal unit contributes to the global discriminant function by a linear constraint.