Detecção de plágio de paráfrase utilizando as características do texto
Ano de defesa: | 2019 |
---|---|
Autor(a) principal: | |
Orientador(a): | |
Banca de defesa: | |
Tipo de documento: | Dissertação |
Tipo de acesso: | Acesso aberto |
Idioma: | por |
Instituição de defesa: |
Universidade Federal do Rio de Janeiro
Brasil Instituto Alberto Luiz Coimbra de Pós-Graduação e Pesquisa de Engenharia Programa de Pós-Graduação em Engenharia de Sistemas e Computação UFRJ |
Programa de Pós-Graduação: |
Não Informado pela instituição
|
Departamento: |
Não Informado pela instituição
|
País: |
Não Informado pela instituição
|
Palavras-chave em Português: | |
Link de acesso: | http://hdl.handle.net/11422/14051 |
Resumo: | [EN] Plagiarism is the improper adoption of abstract or concrete artifacts such as: texts, artwork, ideas or intentions without proper reference to their original author. The ways to commit plagiarism, there is paraphrase plagiarism, which occurs through manipulations in the document text trying to obscure its real source. For the identification of plagiarism, we use the External Plagiarism Detection System (EPDS) framework, which contains the detailed analysis task, where, given a suspicious document, it should identify whether or not plagiarism when compared to the set of document source. The objective of the research is to perform the detailed analysis task in order to, with the lexical, syntactic, semantic and structural characteristics of the text, assist in the identification of paraphrase plagiarism between documents. For this, it is believed that when the document is fully represented, taking into consideration its organization, tree structures contribute to the identification of paraphrase plagiarism from the simplest to the most complex type. For this task, it was proposed to use Rhetorical Structure Theory and Part-of-Speech Tagging to represent document characteristics along with Recursive Autoencoder and Dynamic Pooling to detect cases of paraphrase plagiarism in documents. During the experiments, the proposed approaches obtained between 83% and 89% accuracy in the paraphrase plagiarism data set. |