Augmenting product knowledge graphs with subjective information

Detalhes bibliográficos
Ano de defesa: 2023
Autor(a) principal: SILVA, Johny Moreira da
Orientador(a): BARBOSA, Luciano de Andrade
Banca de defesa: Não Informado pela instituição
Tipo de documento: Tese
Tipo de acesso: Acesso aberto
Idioma: eng
Instituição de defesa: Universidade Federal de Pernambuco
Programa de Pós-Graduação: Programa de Pos Graduacao em Ciencia da Computacao
Departamento: Não Informado pela instituição
País: Brasil
Palavras-chave em Português:
Link de acesso: https://repositorio.ufpe.br/handle/123456789/49464
Resumo: Product Graphs (PGs), are knowledge graphs on consumer product data. They have become popular lately due to their potential to enable AI-related tasks in e-commerce. PGs contain facts on products (e.g., mobile phones) and their characteristics (e.g., brand, dimensions, and processor) automatically gathered from several sources. Enriching these structures with dynamic and subjective information, such as users’ opinions, is essential for improving recommendations, searching, comparison, and pricing. However, this is a novel task, and works trying to handle this are based on supervised approaches. In this thesis, we address this task by exploring two complementary stages: (1) We build a weak-supervised pipeline called Product Graph enriched with Opinions (PGOpi) which augments PGs with users’ opinions extracted from product reviews. For that, we explore a traditional method for opinion mining, Distant Supervision based on word embeddings to alleviate manual labor dependency for training, and Deep Learning approaches to map extracted opinions to targets in the PG; (2) We devised SYNthetiC OPinionAted TriplEs (SYNCOPATE), a generator that autonomously builds opinionated triples and can replace traditional methods for extracting aspect-opinion pairs from opinionated reviews. We build it by exploring In-Context Learning on an adapted pretrained Language Model. Finally, we apply post-processing to clean up and label the autonomously generated text. We perform the experimental evaluation of both frameworks. We evaluated PGOpi on five product categories of two representative real-world datasets. The proposed weak-supervised approach achieves a superior micro F1 score over more complex weak-supervised models. It also presents comparable results to a fully-supervised state-ofthe-art (SOTA) model. We evaluated SYNCOPATE by augmenting existing benchmark datasets with the generated data and comparing the performance of four SOTA models on aspect-opinion pair extraction. The results show that the models trained on the generated synthetic data outperform those trained on a small percentage of human-labeled data. Furthermore, three human raters’ manual inspection of these triples attested to their quality.