Detalhes bibliográficos
Ano de defesa: |
2023 |
Autor(a) principal: |
Trajano, Douglas de Oliveira
 |
Orientador(a): |
Bordini, Rafael Heitor
 |
Banca de defesa: |
Não Informado pela instituição |
Tipo de documento: |
Dissertação
|
Tipo de acesso: |
Acesso aberto |
Idioma: |
por |
Instituição de defesa: |
Pontifícia Universidade Católica do Rio Grande do Sul
|
Programa de Pós-Graduação: |
Programa de Pós-Graduação em Ciência da Computação
|
Departamento: |
Escola Politécnica
|
País: |
Brasil
|
Palavras-chave em Português: |
|
Palavras-chave em Inglês: |
|
Área do conhecimento CNPq: |
|
Link de acesso: |
https://tede2.pucrs.br/tede2/handle/tede/10782
|
Resumo: |
The advent of social media has transformed the way in which individuals and communities interact and communicate. However, the messages on social media may contain expressions of opinion, and support messages, but they can also hate speech. The proliferation of hate speech in the digital sphere has become an increasingly pressing issue, with polarized opinions and a sense of anonymity and impunity among users often serving as contributing factors. The haters, users who spread hate speech, can be found in a variety of topics, including political discussions, entertainment, gaming, and corporate environments. The Natural Language Processing (NLP) area can contribute with tools to ensure healthy communication and protect users’ rights online. NLP applications are efficient, standardized, and automated, eliminating the need for manual moderation of such content. In this study, we used advanced machine learning and deep learning techniques to develop a toxic language detection system in Portuguese messages. The dataset used for training the models consists of 6,354 (with the possibility of extending to 13,538) comments manually annotated by experts. This dataset, made available as part of the work, has annotations for 5 NLP tasks, using a hierarchical annotation scheme with different levels of granularity. The results of the experiments demonstrate the usefulness of this dataset for the development of NLP systems aimed at detecting toxic language in texts in Portuguese. |