Um framework independente de domínio para knowledge graph question answering baseado em large language models

Detalhes bibliográficos
Ano de defesa: 2024
Autor(a) principal: Ávila, Caio Viktor da Silva
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Tese
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Não Informado pela instituição
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: http://repositorio.ufc.br/handle/riufc/78251
Resumo: Knowledge graph question answering (KGQA) systems are computational systems capable of answering questions in natural language using a knowledge graph (KG) as a source of knowledge to be consulted. These systems stand out for their curated and deep answers. Throughout history, several architectures and approaches have been proposed for KGQA systems, with systems based on pre-trained end-to-end deep learning models becoming popular in recent years. Currently, large language models (LLMs) are the state of the art for pre-trained language models. Thus, the opportunity arises to develop KGQA systems based on LLMs. With this in mind, as its main contribution, this thesis presents Auto-KGQA, a domain-independent autonomous framework based on LLMs for KGQA. The framework automatically selects fragments of the KG that are relevant to the question, which the LLM uses as context to translate the natural language question into a SPARQL query over the KG. The framework is accessible through its HTTP API or through a Chat Messenger Web interface. In addition, the framework is integrated with the RDF browser, LiRB, allowing iterative navigation of resources returned in queries. Preliminary experiments with Auto-KGQA with ChatGPT indicate that the framework substantially reduced the number of tokens passed to LLM without sacrificing performance. Finally, evaluation of Auto-KGQA on a benchmark with enterprise queries in the insurance companies domain showed that the framework is competitive, achieving a 13.2% improvement in accuracy over the state-of-the-art and a 51.12% reduction in the number of tokens passed to LLM. Experiments have revealed that the use of few-shot learning strategies together with the subgraph selected by Auto-KGQA generates robust and generalizable KGQA systems, outperforming their competitors in 0-shot learning scenarios and matching them in few-shot scenarios.