SELEÇÃO DE VARIÁVEIS NA MINERAÇÃO DE DADOS AGRÍCOLAS:Uma abordagem baseada em análise de componentes principais

Detalhes bibliográficos
Ano de defesa: 2012
Autor(a) principal: Jr., Juscelino Izidoro de Oliveira
Orientador(a): Rocha, Jose Carlos Ferreira da lattes
Banca de defesa: Mathias, Ivo Mario lattes, Kikuti, Daniel lattes
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: UNIVERSIDADE ESTADUAL DE PONTA GROSSA
Programa de Pós-Graduação: Programa de Pós Graduação Computação Aplicada
Departamento: Computação para Tecnologias em Agricultura
País: BR
Palavras-chave em Português:
Palavras-chave em Inglês:
Área do conhecimento CNPq:
Link de acesso: http://tede2.uepg.br/jspui/handle/prefix/152
Resumo: Multivariate data analysis allows the researcher to verify the interaction among a lot of attributes that can influence the behavior of a response variable. That analysis uses models that can be induced from experimental data set. An important issue in the induction of multivariate regressors and classifers is the sample size, because this determines the reliability of the model for tasks of regression or classification of the response variable. This work approachs the sample size issue through the Theory of Probably Approximately Correct Learning, that comes from problems about machine learning for induction of models. Given the importance of agricultural modelling, this work shows two procedures to select variables. Variable Selection by Principal Component Analysis is an unsupervised procedure and allows the researcher to select the most relevant variables from the agricultural data by considering the variation in the data. Variable Selection by Supervised Principal Component Analysis is a supervised procedure and allows the researcher to perform the same process as in the previous procedure, but concentrating the focus of the selection over the variables with more influence in the behavior of the response variable. Both procedures allow the sample complexity informations to be explored in variable selection process. Those procedures were tested in five experiments, showing that the supervised procedure has allowed to induce models that produced better scores, by mean, than that models induced over variables selected by unsupervised procedure. Those experiments also allowed to verify that the variables selected by the unsupervised and supervised procedure showed reduced indices of multicolinearity.