Detalhes bibliográficos
Ano de defesa: |
2018 |
Autor(a) principal: |
Melo, Fabrício Silva |
Orientador(a): |
Macedo, Hendrik Teixeira |
Banca de defesa: |
Não Informado pela instituição |
Tipo de documento: |
Dissertação
|
Tipo de acesso: |
Acesso aberto |
Idioma: |
por |
Instituição de defesa: |
Não Informado pela instituição
|
Programa de Pós-Graduação: |
Pós-Graduação em Ciência da Computação
|
Departamento: |
Não Informado pela instituição
|
País: |
Não Informado pela instituição
|
Palavras-chave em Português: |
|
Palavras-chave em Inglês: |
|
Área do conhecimento CNPq: |
|
Link de acesso: |
http://ri.ufs.br/jspui/handle/riufs/10677
|
Resumo: |
Relation extraction is the task of extracting relation between named entities from natural language texts. This work presents an information extraction technique for extracting relation with convoluted neural networks trained for the recognition of sentence patterns represented on low-dimension word2vec and position embeddings. Significant studies related to the relation extraction with trained classifiers under distant supervision used a data set constructed by Riedel, Yao e McCallum (2010) as a means to train and test relation classifiers. However, important limitations to this data were raised: the use of a statistically inappropriate sampling methodology in the selection of samples constituting the data set; the lack of evaluation of the accuracy of classifiers by type (class) of relation; and the neglect of the problem of imbalance in the distribution of classes in this data set, as well as of measures to train the classifiers amid unbalanced data. In view of the above problem, this dissertation aims to propose and evaluate a "deep convolutional neural networks" based model to improve the classification precision of relation between named entities extracted under distant supervision. An evaluation of the distribution of samples in each type of relationship was made in a dataset constructed by distant supervision, from the FreeBase knowledge base, widely used for training by the most recent relation extraction work. It was found that these studies made reference to the quality of classification of relations generalizing very optimistic conclusions based on a strongly unbalanced data set, using, also, sampling methodologies statistically inappropriate in the construction of the test set. This data set was treated using random stratified sampling for use in the training and testing of the proposed convolutional model using stratified k-fold cross-validation. Experiments show that the proposed model can achieve 87.0% precision and 88.0% recall. This result prove that our model outperform the art of state on the relation classification. |