Processamento de linguagem natural aplicado à identificação de padrões semânticos em relatos de mulheres vítimas de violência doméstica e familiar

Detalhes bibliográficos
Ano de defesa: 2023
Autor(a) principal: Foroni, Deborah Quenia Gouveia lattes
Orientador(a): Belan, Peterson Adriano lattes
Banca de defesa: Belan, Peterson Adriano lattes, Martins, Fellipe Silva lattes, Sassi, Renato José lattes
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Universidade Nove de Julho
Programa de Pós-Graduação: Programa de Pós-Graduação em Informática e Gestão do Conhecimento
Departamento: Informática
País: Brasil
Palavras-chave em Português:
Palavras-chave em Inglês:
Área do conhecimento CNPq:
Link de acesso: http://bibliotecatede.uninove.br/handle/tede/3275
Resumo: Domestic and family violence (DFV) against women is still considered a taboo phenomenon for many individuals, which directly impacts the data collection cycle as the professionals involved often do not feel prepared and responsible for this data collection. In light of this, social media has been used by millions of people to share their everyday experiences, including accounts of violence experienced by victims. Given this perspective of an increasing amount of data available on social media, the vast quantity of information generated exceeds the processing capabilities of human beings. In this context, Machine Learning (ML), Natural Language Processing (NLP), and topic modeling techniques can contribute to identifying patterns and semantic characteristics in reports published by female victims of DFV. This research aims to identify representative themes in reports from women victims of DFV and use them as tools to address this issue, although it recognizes the challenges in addressing all the nuances and complexities of this phenomenon. For this purpose, a methodology was developed to identify topics that emerge from spontaneously collected and unstructured data of reports from women victims of DFV on the YouTube platform. Three experiments were conducted employing the PCA, UMAP, K-means, and HDBSCAN algorithms. The experiment that combined the UMAP and HDBSCAN algorithms demonstrated the feasibility of using NLP to identify semantic patterns in the topics that emerged from the collected dataset. As a result, it was possible to identify 27 topics that demonstrate a better semantic representation for interpreting patterns in reports from female victims of DFV.