SOM4SImD : um método semântico baseado em ontologia para detectar similaridade entre documentos

Detalhes bibliográficos
Ano de defesa: 2017
Autor(a) principal: Arruda, Claudineia Gonçalves de
Orientador(a): Santos, Marilde Terezinha Prado lattes
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Universidade Federal de São Carlos
Câmpus São Carlos
Programa de Pós-Graduação: Programa de Pós-Graduação em Ciência da Computação - PPGCC
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Palavras-chave em Inglês:
Área do conhecimento CNPq:
Link de acesso: https://repositorio.ufscar.br/handle/20.500.14289/8961
Resumo: In several research areas, interviews are a means of obtaining data widely used by researchers. These interviews are arranged, in most cases, in several documents and have an informal language, because they are conversations between several people at the same time. Analyzing such documents is an arduous and time-consuming task, bringing fatigue and difficulties to a correct analysis. One solution for analyzing this type of interview is to group documents according to the similarity between them, so that experts can analyze documents of similar subjects more quickly. In this way, this work presents the method SOM4SImD, created to detect the semantic similarity between the documents composed by interviews with an informal language written in Brazilian Portuguese. In order to create this method, an ontology of the same document domain was used, which allowed the use of the formal terms of the ontology, along with its synonyms and variants, to perform the semantic annotation in the documents and to calculate the similarity between the interview pairs. Through the created method, a SimIGroup approach was developed that assists the researchers in the qualitative analysis of the documents, using Coding technique. The results show that the SOM4SImD method and the SimIGroup approach reduce the difficulties and fatigue in the analysis of the documents made by the annotators, helping to increase the number of documents analyzed. In addition, the SOM4SImD method was more advantageous in obtaining similarity between documents than the others found in the literature, reaching significant values for the performance measures, with 0.96 accuracy, 0.93 of recall and 0.94 of F-Mensure.