Exploring linguistic information and semantic contextual models for a relation extraction task using deep learning

Detalhes bibliográficos
Ano de defesa: 2020
Autor(a) principal: Schmitt, Bruna Koch
Orientador(a): Rigo, Sandro José
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: eng
Instituição de defesa: Universidade do Vale do Rio dos Sinos
Programa de Pós-Graduação: Programa de Pós-Graduação em Computação Aplicada
Departamento: Escola Politécnica
País: Brasil
Palavras-chave em Português:
Palavras-chave em Inglês:
Área do conhecimento CNPq:
Link de acesso: http://www.repositorio.jesuita.org.br/handle/UNISINOS/9214
Resumo: Deep Learning (DL) methods have been extensively used in many Natural Language Processing (NLP) tasks, including in semantic relation extraction. However, the performance of these methods is dependent on the type and quality of information being used as features. In NLP, linguistic information is being increasingly used to improve the performance of DL algorithms, such as pre-trained word embeddings, part-of-speech (POS) tags, synonyms, etc, and the use of linguistic information is now present in several state-of-the-art algorithms in relation extraction. However, no effort has been made to understand exactly the impact that linguistic information from different levels of abstraction (morphological, syntactic, semantic) has in these algorithms in a semantic relation extraction task, which we believe may bring insights in the way deep learning algorithms generalize language constructs when compared to the way humans process language. To do this, we have performed several experiments using a recurrent neural network (RNN) and analyzed how the linguistic information (part-of-speech tags, dependency tags, hypernyms, frames, verb classes) and different word embeddings (tokenizer, word2vec, GloVe, and BERT) impact on the model performance. From our results, we were able to see that different word embeddings techniques did not present significant difference on the performance. Considering the linguistic information, the hypernyms did improve the model performance, however the improvement was small, therefore it may not be cost effective to use a semantic resource to achieve this degree of improvement. Overall, our model performed significantly well compared to the existing models from the literature, given the simplicity of the deep learning architecture used, and for some experiments our model outperformed several models presented in the literature. We conclude that with this analysis we were able to reach a better understanding of whether deep learning algorithms require linguistic information across distinct levels of abstraction to achieve human-like performance in a semantic task.