Comparando modelos de classificação dos comentários de clientes: mineração de opiniões em empresa brasileira de classificados online de empregos

Detalhes bibliográficos
Ano de defesa: 2014
Autor(a) principal: Miranda, Marcelo Drudi lattes
Orientador(a): Sassi, Renato José lattes
Banca de defesa: Chaves, Marcírio Silveira lattes, Santana, José Carlos Curvelo lattes
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Universidade Nove de Julho
Programa de Pós-Graduação: Programa de Pós-Graduação de Mestrado e Doutorado em Engenharia de Produção
Departamento: Engenharia
País: BR
Palavras-chave em Português:
Palavras-chave em Inglês:
Área do conhecimento CNPq:
Link de acesso: http://bibliotecatede.uninove.br/tede/handle/tede/225
Resumo: The Internet nowadays is a reality in people's lives, enabling the growth of many online services companies. However, to maintain their activities and stay in the market, it s important for these companies to worry about the quality of the provided services. In this context, it becomes important to be able to assess the client satisfaction regarding those services. One way to assess the clients' sentiment regarding the provided services is Opinion Mining, which refers to the set of techniques used to extract and assess the sentiment expressed in texts. The goal of this work was to compare Classification Models for the clients' comments for Opinion Mining in a Brazilian online job search company. Three models were implemented: one based on commercial software named Repustate; one Dictionary based and one based on Naive Bayes. The Models were applied to a database containing non-structured comments of clients in Portuguese, captured in a service cancellation form. A non-structured comment contains typos, concordance errors and can be almost unintelligible. Classifying non-structured comments in Portuguese is a complex task for a classifier for two reasons: the errors in comments and the scarcity of Classification Models for comments in Portuguese to be taken as examples. Those reasons make the Models developed in this work relevant for the research field of Sentiment Analysis and Opinion Mining. The performance evaluation of the Classification Models was performed using Kappa coefficient and the Confusion Matrix. The classification performance of the three models was analyzed by comparing it with the classification made by human judges, that in turn, had their agreement among them assessed using Kappa and Confusion Matrix. The non-structured characteristic of the comments caused divergence in the judges classification and also among the Classification Models. The agreement among the classifiers and the judges was moderate at best, with the best performance achieved by the Naïve Bayes based classifier. The models were applied to the database and, despite the characteristics of the comments the Opinion Mining was performed. The conclusion is that the performance of the classifiers in the Opinion Mining in a Brazilian online search company was positive and the goal of this work has been reached. It s worth to note that the Opinion Mining in non-structured comments in Portuguese is a complex task, that demands research and this scenario is open for new studies.