Análise da predição da violência infantil por meio de árvores de decisão e regras de associação
Ano de defesa: | 2020 |
---|---|
Autor(a) principal: | |
Orientador(a): | |
Banca de defesa: | |
Tipo de documento: | Dissertação |
Tipo de acesso: | Acesso aberto |
Idioma: | por |
Instituição de defesa: |
Universidade Federal de São Carlos
Câmpus São Carlos |
Programa de Pós-Graduação: |
Programa de Pós-Graduação em Ciência da Computação - PPGCC
|
Departamento: |
Não Informado pela instituição
|
País: |
Não Informado pela instituição
|
Palavras-chave em Português: | |
Palavras-chave em Inglês: | |
Palavras-chave em Espanhol: | |
Área do conhecimento CNPq: | |
Link de acesso: | https://repositorio.ufscar.br/handle/20.500.14289/12915 |
Resumo: | According to the United Nations International Children's Emergency Fund (UNICEF), currently around 300 million children around the world suffer from various types of abuse, including: psychological, physical, sexual or negligence. Considering the severity of the problem, the analysis gets more difficult given elements such as: the existence of different definitions of violence used for different contexts, without patterns that ease the study; determine the type of abuse that's happening in each case, being more than one; victims reports and official abuse statistics not always have the expected quality; among others. It should even be noted that, while collecting data, the fact that most of the time the only people that are aware of the abuse situation are the children and their agressors, rendering the situation invisible and making the abuse's prevention a much more difficult task. The aim of this work is to generate and evaluate models based on machine learning techniques that can estimate in which cases a situation of child abuse is currently happening or it could happen, via models represented by rules easily understandable by humans. The scientific method utilized in this project is ex-post-facto based on two structured datasets, one supervised and the other unsupervised, both built by Chilean organizations and that possess numeric and categorical attributes. Feature selection techniques were applied in order to work with the most relevant elements, and then use the C4.5 and Apriori algorithms on each dataset respectively. The first one was evaluated with the areas under the receiver operating characteristic and precision-recall curves, and the second one with the lift, conviction and leverage metrics. About the results, for the classification technique were built models with performances close to 0.9 for each metric; and for the association rules, in all executions the sentences found have higher values than the thresholds that define the implication between their antecedents and consequents. |