Análise da predição da violência infantil por meio de árvores de decisão e regras de associação

Detalhes bibliográficos
Ano de defesa: 2020
Autor(a) principal: Silva Osses, Aníbal Tomás
Orientador(a): Fernandes, Ricardo Augusto Souza lattes
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Universidade Federal de São Carlos
Câmpus São Carlos
Programa de Pós-Graduação: Programa de Pós-Graduação em Ciência da Computação - PPGCC
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Palavras-chave em Inglês:
Palavras-chave em Espanhol:
Área do conhecimento CNPq:
Link de acesso: https://repositorio.ufscar.br/handle/20.500.14289/12915
Resumo: According to the United Nations International Children's Emergency Fund (UNICEF), currently around 300 million children around the world suffer from various types of abuse, including: psychological, physical, sexual or negligence. Considering the severity of the problem, the analysis gets more difficult given elements such as: the existence of different definitions of violence used for different contexts, without patterns that ease the study; determine the type of abuse that's happening in each case, being more than one; victims reports and official abuse statistics not always have the expected quality; among others. It should even be noted that, while collecting data, the fact that most of the time the only people that are aware of the abuse situation are the children and their agressors, rendering the situation invisible and making the abuse's prevention a much more difficult task. The aim of this work is to generate and evaluate models based on machine learning techniques that can estimate in which cases a situation of child abuse is currently happening or it could happen, via models represented by rules easily understandable by humans. The scientific method utilized in this project is ex-post-facto based on two structured datasets, one supervised and the other unsupervised, both built by Chilean organizations and that possess numeric and categorical attributes. Feature selection techniques were applied in order to work with the most relevant elements, and then use the C4.5 and Apriori algorithms on each dataset respectively. The first one was evaluated with the areas under the receiver operating characteristic and precision-recall curves, and the second one with the lift, conviction and leverage metrics. About the results, for the classification technique were built models with performances close to 0.9 for each metric; and for the association rules, in all executions the sentences found have higher values than the thresholds that define the implication between their antecedents and consequents.