Algoritmo inspirado nos morcegos para seleção de variáveis em problemas de classificação

Detalhes bibliográficos
Ano de defesa: 2023
Autor(a) principal: Souza, Juliana da Cruz
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Tese
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Universidade Federal da Paraíba
Brasil
Química
Programa de Pós-Graduação em Química
UFPB
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: https://repositorio.ufpb.br/jspui/handle/123456789/32100
Resumo: The use of Linear Discriminant Analysis (LDA) in multivariate classification modeling allows the construction of models in the domain of the original data, in which a direct chemical inference of the results may be accomplished. However, this technique requires a low dimensionality of the data and produces models with generalization problems when there is a high multicollinearity among the variables. To overcome these drawbacks, the use of variable selection algorithms has proved to be very efficient especially when UV-Vis, NIR, etc, data are used. In this context, bio-inspired algorithms (such as the genetic algorithm-GA) have allowed the successful selection of variables. In the present work, a bat-inspired algorithm (BA) for selection variables in modeling via LDA is proposed. This algorithm, here named BA-LDA, uses a cost function associated with the average risk of misclassification (Gcost), which was implemented in its code written in Matlab. The performance of BA-LDA was evaluated in four case studies, involving the use of mass spectrometric (MS), NIR, and UV-Vis data, as well as a dataset with simulated information. For each analyzed dataset, the BA-LDA parameters were optimized using a 24-1 fractional factorial design. MS data were resulting of analyzes of 216 serum samples from patients with and without ovarian cancer. The NIR data were acquired in analysis of 60 coffee samples belonging to two classes (gourmet and traditional). UV-Vis data were obtained from recorded spectra of vegetable oil samples belonging to four classes, namely: soybean, canola, corn and sunflower. For the study with a class of simulated samples, diesel NIR data were employed. The performance of BA-LDA was compared to those obtained with the GA-LDA and SPA-LDA algorithms used for variable selection; it was also compared to the partial least squares discriminant analysis (PLS-DA) and independent and flexible modeling by class analogy (SIMCA). The proposed algorithm selected 11, 3, 7 and 9 variables and obtained correct classification rates (TCC %) of 93, 100, 100 and 100% in the classification based on data from MS, NIR, UV-Vis and of the simulated class (NIR). In the case of MS data, BA-LDA outperformed SPA-LDA (79.1% TCC) and GA-LDA (88.4% TCC), but was lower than the PLS-DA algorithm that showed a TCC of 98%. For the other datasets, the BA-LDA performance was comparable to the classical algorithms. In all case studies, BA-LDA outperformed SIMCA. Furthermore, the BA-LDA proved to be less susceptible to noise added to the spectra of the test samples from the simulated dataset. Since the BA-LDA is stochastic, its main differential is the convergence and robustness that it demonstrated in all data sets, in which the selected variables allowed a safe chemical interpretation.