Detalhes bibliográficos
Ano de defesa: |
2021 |
Autor(a) principal: |
Reyes, Daniel Alessandro Guimarães de los |
Orientador(a): |
Manssour, Isabel Harb
 |
Banca de defesa: |
Não Informado pela instituição |
Tipo de documento: |
Dissertação
|
Tipo de acesso: |
Acesso aberto |
Idioma: |
por |
Instituição de defesa: |
Pontifícia Universidade Católica do Rio Grande do Sul
|
Programa de Pós-Graduação: |
Programa de Pós-Graduação em Ciência da Computação
|
Departamento: |
Escola Politécnica
|
País: |
Brasil
|
Palavras-chave em Português: |
|
Palavras-chave em Inglês: |
|
Área do conhecimento CNPq: |
|
Link de acesso: |
http://tede2.pucrs.br/tede2/handle/tede/9970
|
Resumo: |
Competitive Intelligence (CI) is a relevant area of a corporation and can support the strategic business area, helping those responsible for decision making and how to position your organization in the market. In the financial domain, identifying the organizations contained in a news story can become insufficient, and it is also necessary to extract relations (ER) between entities. Therefore, the main goal of this work is to propose an approach for the extraction of any semantic relation between Named Entities (NEs) in the Financial Market domain for the Portuguese language. To achieve this goal, a state-of-the-art review was initially carried out, which led to the analysis of 76 articles to identify techniques and datasets used to assess them. This study shows that there are readings for the RE task in Portuguese language. Therefore, following the methodology of Knowledge Discovery in Databases (KDD) created by Fayyad, we proposed a five-step approach, which goes from collecting data to evaluating the results. This approach uses two models based on Bidirectional Transformer Encoding Representations (BERT) to process a sentence and its named entities. We first classify whether or not a given pair of entities has a semantic relation and then extract the sentence parts representing or describing the semantic relation between these named entities. The approach was developed for the Portuguese language, considering the financial domain and exploring deep linguistic representations without using other lexical-semantic resources. The results of the experiments show an accuracy of 76.3% using the Jaccard metric, which measures the similarity between the relations extracted by the extractor model, in addition to achieving scores of 87%, 84.5% and 85.8%, respectively for the Recall, Precision and F-Measure metrics when assessing the complete approach. Another important contribution is the manually built corpus with more than 9,114 tuples (phrase, entity, entity) annotated from tweets and news provided by CI analysts to support the decision. |