Descrição sintático-semântica de nomes predicadores em tweets do mercado financeiro em português

Detalhes bibliográficos
Ano de defesa: 2024
Autor(a) principal: Barbosa, Bryan Khelven da Silva
Orientador(a): Di Felippo, Ariani lattes
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Universidade Federal de São Carlos
Câmpus São Carlos
Programa de Pós-Graduação: Programa de Pós-Graduação em Linguística - PPGL
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
PLN
Área do conhecimento CNPq:
Link de acesso: https://repositorio.ufscar.br/handle/20.500.14289/20503
Resumo: In this study, the argument structure (A-structure) of predicative nouns (Npred) occurring in the financial market tweet corpus DANTEStocks was described, given the preference for using this type of predicator in digital genres of this domain. The specific objective was to verify: (i) the presence/absence of arguments (Arg) in the tweets, (ii) the syntactic realization of Args, and (iii) the influence of linguistic phenomena in tweets on the realization of the A-structure of the nouns. Specifically, 145 Npreds and 1,756 instances (tweets with at least one Npred) from the corpus were described at the syntactic-semantic level. Syntactically, semi-automatic annotation of the entire DANTEStocks was carried out according to the Universal Dependencies (UD) model. At the semantic level, syntactic dependency trees guided the manual annotation of instances according to NomBank. The syntactic-semantic mapping revealed that: (i) the A-structure of valency one (V1) Npreds is always filled in syntax, (ii) the A-structure of Npreds with V>1 shows some missing Args, (iii) most analyzed Npreds are of V3, with only 2 Args in most instances, (iv) the deprels most frequently connecting Npreds to their Args are nmod and amod, and (v) the syntactic realization of the A-structure in 24 instances was reduced by tweet-specific phenomena (truncation and juxtaposition of elements). These results enrich the descriptive framework of lexical aspects of the language in financial market tweets. Moreover, the syntactic-semantic valency description of Npreds was systematized in NounBank.DS, an online lexical repository that can support further linguistic-computational research. A contribution to Natural Language Processing (NLP) is the UD-syntactic annotation of DANTEStocks, which led to the creation of the first Portuguese tweebank. This resource enabled the development of the first UD-parser of UGC for this language. The NomBank-like semantic annotation of a portion of the corpus also generated a significant resource. Thus, this study produced reference linguistic resources and a tool (parser) for the automatic processing of Portuguese tweets, which are essential for developing NLP applications targeting this type of UGC.