Detalhes bibliográficos
Ano de defesa: |
2016 |
Autor(a) principal: |
Motta, Porthos Ribeiro de Albuquerque
 |
Orientador(a): |
Ambrósio, Ana Paula Laboissière
 |
Banca de defesa: |
Ambrósio, Ana Paula Laboissière
,
Soares, Anderson da Silva,
Almeida, Leandro da Silva |
Tipo de documento: |
Dissertação
|
Tipo de acesso: |
Acesso aberto |
Idioma: |
por |
Instituição de defesa: |
Universidade Federal de Goiás
|
Programa de Pós-Graduação: |
Programa de Pós-graduação em Ciência da Computação (INF)
|
Departamento: |
Instituto de Informática - INF (RG)
|
País: |
Brasil
|
Palavras-chave em Português: |
|
Palavras-chave em Inglês: |
|
Área do conhecimento CNPq: |
|
Link de acesso: |
http://repositorio.bc.ufg.br/tede/handle/tede/6563
|
Resumo: |
Educational Data Mining, by the triad of quality improvement, cost reduction and educational effectiveness, acts and seeks to better understand the teaching and learning process. In this context, the aim of this work is an exploratory study of classification methods to predict student performance and dropout from data in university academic databases. In this study we used demographic, socio-economic and academic results, obtained from the Vestibular and the university database to analyze several classification techniques, as well as balancing and attribute selection techniques, identified through a systematic review of the literature. Following a trend found in the selected articles, we chose to use decision trees as the primary classification algorithm, although comparative studies showed better results with logistic regression techniques and Bayesian networks. This is because decision trees allow an analysis of the attributes used in the generated models while maintaining acceptable levels of accuracy, while other techniques work as a black box. Through the tests we found that you get better results using balanced sets. In this sense, the Resample technique that selects a balanced subset of the data showed better results than SMOTE technique that generates synthetic data for balancing the dataset. Regarding the use of attribute selection techniques, these did not bring significant advantages. Among the attributes used, grades and economic factors often appear as nodes in the generated models. An attempt to predict performance for each subject based on data from previous courses was less successful, maybe because of the use of ternary predictive classes. Nevertheless, the analysis carried out showed that the use of classifiers is a promising way to predict performance and dropout, but further studies are still needed. |