Análise de sentimento multiclasse: uma abordagem com o uso de aprendizado de máquina

Detalhes bibliográficos
Ano de defesa: 2019
Autor(a) principal: Santos, Allisfrank dos
Orientador(a): Camargo, Heloisa de Arruda lattes
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Universidade Federal de São Carlos
Câmpus São Carlos
Programa de Pós-Graduação: Programa de Pós-Graduação em Ciência da Computação - PPGCC
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Palavras-chave em Inglês:
Área do conhecimento CNPq:
Link de acesso: https://repositorio.ufscar.br/handle/20.500.14289/20474
Resumo: In the globalized world, the analysis of data generated from the most varied sources, especially the textual ones, has become of great importance for the acquisition of knowledge and information. In this respect, the Internet and social networks make up the main textual database. The Sentiment Analysis is a form of data mining in text format, and the purpose of this type of analysis is to identify and / or analyze users' opinions about an entity or about sentiment related to various topics. Several researchers have used the Sentiment Analysis to understand user behavior through polarity, which can be separated into two or three classes. However, the challenge ahead is to find ways that go beyond the traditional classification and achieve a more real analysis of the expressed feelings, exploring t he idea of multiclass analysis (through emotional classes). Based on these facts, this paper aims to study aspects of the Sentiment Analysis related to the number of classes of emotions to be analyzed, as well as the representation form of the texts to be submitted for classification. For this, classic Machine Learning algorithms (SVM, kNN and Naive Bayes) as well as vectorization techniques such as TF - IDF and Word2Vec were used. The results show that a reduced number of classes allied to the use of Word2Vec as a textual representation method improves the classification result, especially with the use of the SVM classifier, obtaining an accuracy of 58.8% for the emotional base and 68.6% for the basis of polarity.