Detecção de discurso de ódio em redes sociais utilizando deep learning

Detalhes bibliográficos
Ano de defesa: 2021
Autor(a) principal: Venturott, Lígia Iunes
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Universidade Federal do Espírito Santo
BR
Mestrado em Engenharia Elétrica
Centro Tecnológico
UFES
Programa de Pós-Graduação em Engenharia Elétrica
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: http://repositorio.ufes.br/handle/10/15540
Resumo: In the last decade, online social networks went through a quick expansion. The main goal of these platforms is to allow the communication between people from different backgrounds, religions, cultures and countries. However, this new form of contact, allied to the feeling of anonymity and impunity of the digital enviroment, turned social networks into a favorable enviroment for disseminating hate speech, such as xenophobia, racism, sexism, homophobia, and others. Most platforms, such as Twitter and Facebook, explicitly forbid this kind of behaviour. However, the large volume of daily posts make manually detecting hate speech an almost impossible task. In this context, there is a need for automatic detection tools for hate speech in social networks, but most works focus on detecting of hateful content in English. This work develops a method for detecting hate speech in social networks focused on Portuguese, using deep neural networks as the main resource. To that end, first we identified the main issues regarding hate speech detection in Portuguese, and it was observed that there is a lack of labeled datasets for hate speech and offensive language in Portuguese. The few existing datasets consist of few documents, which makes the application of deep learning techniques difficult. In order to mitigate this problem, we propose using data augmentation techniques. Three techniques were selected from the literature and were applied in different scenarios, where we tried to identify in which cases these techniques would be the most beneficial. It was concluded that the data augmentation techniques selected can be helpful when applied to very reduced datasets, varying from 1,000 to 2,000 documents.