Extração de relação entre entidades nomeadas no contexto econômico-financeiro

Detalhes bibliográficos
Ano de defesa: 2021
Autor(a) principal: Reyes, Daniel Alessandro Guimarães de los
Orientador(a): Manssour, Isabel Harb lattes
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Pontifícia Universidade Católica do Rio Grande do Sul
Programa de Pós-Graduação: Programa de Pós-Graduação em Ciência da Computação
Departamento: Escola Politécnica
País: Brasil
Palavras-chave em Português:
Palavras-chave em Inglês:
Área do conhecimento CNPq:
Link de acesso: http://tede2.pucrs.br/tede2/handle/tede/9970
Resumo: Competitive Intelligence (CI) is a relevant area of a corporation and can support the strategic business area, helping those responsible for decision making and how to position your organization in the market. In the financial domain, identifying the organizations contained in a news story can become insufficient, and it is also necessary to extract relations (ER) between entities. Therefore, the main goal of this work is to propose an approach for the extraction of any semantic relation between Named Entities (NEs) in the Financial Market domain for the Portuguese language. To achieve this goal, a state-of-the-art review was initially carried out, which led to the analysis of 76 articles to identify techniques and datasets used to assess them. This study shows that there are readings for the RE task in Portuguese language. Therefore, following the methodology of Knowledge Discovery in Databases (KDD) created by Fayyad, we proposed a five-step approach, which goes from collecting data to evaluating the results. This approach uses two models based on Bidirectional Transformer Encoding Representations (BERT) to process a sentence and its named entities. We first classify whether or not a given pair of entities has a semantic relation and then extract the sentence parts representing or describing the semantic relation between these named entities. The approach was developed for the Portuguese language, considering the financial domain and exploring deep linguistic representations without using other lexical-semantic resources. The results of the experiments show an accuracy of 76.3% using the Jaccard metric, which measures the similarity between the relations extracted by the extractor model, in addition to achieving scores of 87%, 84.5% and 85.8%, respectively for the Recall, Precision and F-Measure metrics when assessing the complete approach. Another important contribution is the manually built corpus with more than 9,114 tuples (phrase, entity, entity) annotated from tweets and news provided by CI analysts to support the decision.