Descoberta automática de expressões multipalavras a partir de textos paralelos

Detalhes bibliográficos
Ano de defesa: 2018
Autor(a) principal: Vargas, Natalie Lourenço
Orientador(a): Caseli, Helena de Medeiros lattes
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Universidade Federal de São Carlos
Câmpus São Carlos
Programa de Pós-Graduação: Programa de Pós-Graduação em Ciência da Computação - PPGCC
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
EM
Palavras-chave em Inglês:
MWE
Área do conhecimento CNPq:
Link de acesso: https://repositorio.ufscar.br/handle/20.500.14289/10836
Resumo: Multiword Expressions (MWEs) are a current challenge for Natural Language Processing field and there are different proposed automatic methods to treat and discovery them. We propose in this work two new bilingual discover methods in parallel texts, which were implemented as the Bilingual Discovery MWE Toolkit (BiDiMWEToolkit). The proposed methods were based on similar ideas in related works and they use bilingual word embeddings in order to find the best MWEs translations automatically discovered. In the first method, source and target MWEs are extracted separately from morphossyntatic patterns already defined and they are paired based on billingual word embeddings. In the second method, we just extracted source MWEs and the best translations are defined using bilingual word embeddings. As a result of our presented experiments, we concluded that both methods are capable of performing billingual discovery but the second method has prove to be more complete than the first method: (1) it capable of generating translations without target MWEs, so it wasn’t necessary to have prior knowledge about the target language, (2) and capable of generating translations composed by one word, covering the cases when MWE translations are not an expression.