Uma metodologia prática para testar o critério do custo de complexidade ao podar árvores de regressão

Detalhes bibliográficos
Ano de defesa: 2022
Autor(a) principal: Heitor Blesa Farias
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Universidade Federal de Minas Gerais
Brasil
FAF - DEPARTAMENTO DE PSICOLOGIA
Programa de Pós-graduação em Psicologia: Cognição e Comportamento
UFMG
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: http://hdl.handle.net/1843/50516
https://orcid.org/0000-0002-8090-4012
Resumo: The tree method is a well-established approach in machine learning. Most of its algorithms build an initial tree for the prediction of a given outcome and then "prune" this tree as a way to minimize overfitting. The cost of complexity criterion is probably the most used for pruning, because it is an objective criterion to define the point in the tree where the prediction is the best possible, taking into account the predictive ability of the model in other samples of the population. Researchers have used this criterion to perform regression tree pruning based on literature recommendations that this is a good criterion. However, there is no methodology in which the researcher is able to assess the effectiveness of this criterion to generate an empirical tree with adequate pruning, that is, a tree that does not have overfitting and that finds the best possible prediction for other samples of the population. Considering the relevance of tree regression techniques for prediction and the need for pruning these trees to deal with the overfitting problem, it is necessary to develop a methodology that allows the researcher to assess whether the criterion cost of complexity is adequate, taking as reference your own pruned empirical tree. This dissertation aimed to develop a practical methodology to evaluate the adequacy of the cost of complexity criterion for pruning regression trees. The dissertation consists of two articles. Study one is a simulation that presents initial evidence that the complexity cost criterion is sensitive to sample size and generates inadequately pruned trees depending on the size of these samples. Due to this inadequacy, it is necessary to test whether pruning via the complexity cost criterion is adequate for a given empirical data. In study two, the problem of the cost of complexity criterion is presented in a didactic way, as well as the methodology developed to verify the adequacy of this criterion. In this study, an example of how to implement the methodology and its evaluation via simulation is also presented.