Detecção de plágio de paráfrase utilizando as características do texto

Detalhes bibliográficos
Ano de defesa: 2019
Autor(a) principal: Silva, Egberto Caetano Araujo da
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Universidade Federal do Rio de Janeiro
Brasil
Instituto Alberto Luiz Coimbra de Pós-Graduação e Pesquisa de Engenharia
Programa de Pós-Graduação em Engenharia de Sistemas e Computação
UFRJ
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
RST
RAE
Link de acesso: http://hdl.handle.net/11422/14051
Resumo: [EN] Plagiarism is the improper adoption of abstract or concrete artifacts such as: texts, artwork, ideas or intentions without proper reference to their original author. The ways to commit plagiarism, there is paraphrase plagiarism, which occurs through manipulations in the document text trying to obscure its real source. For the identification of plagiarism, we use the External Plagiarism Detection System (EPDS) framework, which contains the detailed analysis task, where, given a suspicious document, it should identify whether or not plagiarism when compared to the set of document source. The objective of the research is to perform the detailed analysis task in order to, with the lexical, syntactic, semantic and structural characteristics of the text, assist in the identification of paraphrase plagiarism between documents. For this, it is believed that when the document is fully represented, taking into consideration its organization, tree structures contribute to the identification of paraphrase plagiarism from the simplest to the most complex type. For this task, it was proposed to use Rhetorical Structure Theory and Part-of-Speech Tagging to represent document characteristics along with Recursive Autoencoder and Dynamic Pooling to detect cases of paraphrase plagiarism in documents. During the experiments, the proposed approaches obtained between 83% and 89% accuracy in the paraphrase plagiarism data set.