Resumo: |
The global classification paradigm uses the entire training set for producing a single discriminating model for distinct classes. Alternatively, the cluster-based local classification approach builds multiple discriminating models using smaller subsets of the training data. By considering these two paradigms as the extremes of a spectrum of possibilities, in this thesis, it is introduced a novel two-stage framework for building pattern classification models based on the clustering of the self-organizing map (SOM) method (VESANTO et al., 2000). According to this technique, data samples are submitted to the SOM as a preprocessing stage. Then, clustering algorithms (e.g. the K-means) are applied to the prototype vectors of the SOM aiming at organizing them in well-defined regions. By applying this two-stage strategy to labeled data, it is shown how to build accurate classifying models, henceforth referred to as regional classifiers, using the subset of samples mapped to a specific cluster of SOM units. A comprehensive comparative study is carried out to evaluate the effectiveness of the proposed approach on several benchmarking data sets, using linear models, i.e. least squares classifier with linear basis functions (LSC-LBF), and nonlinear ones, i.e. least squares support vector machine (LSSVM) with nonlinear kernel functions. As an additional step on the training of cluster-based local and regional models, during the model validation phase, a set of twelve cluster validation metrics was used to assess their ability to predict the best number of prototypes given a well-defined objective function. The capability of the local and regional approaches to building nonlinear decision functions with a set of linear classifiers is assessed and the regional paradigm presented itself as a sparser alternative than the local approach, having similar performance while using fewer prototypes/models. |
---|