Storage and Navigation Operations on Graphs in Relational DBMS

Detalhes bibliográficos
Ano de defesa: 2021
Autor(a) principal: Scabora, Lucas de Carvalho
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Tese
Tipo de acesso: Acesso aberto
Idioma: eng
Instituição de defesa: Biblioteca Digitais de Teses e Dissertações da USP
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: https://www.teses.usp.br/teses/disponiveis/55/55134/tde-26052021-125443/
Resumo: A complex network models relationships among data elements in several applications, such as social networks, urban street organization, disease monitoring and treatment co-occurrences, and communication and transport infrastructures. To store and manage large volumes of data, the Relational Database Management Systems (RDBMSs) provide efficient disk-based data management, being present in the most diverse application domains, including enterprise information, finance, medical or clinical support, and in airline reservations and scheduling. However, RDBMSs still face challenges to efficiently process queries that navigate over complex networks represented as graphs. Those queries, named in this work as graph queries, are often expressed by recursive queries, which execute multiple join operations between an internal, temporary table that stores intermediary results and the table that stores the graphs edges. This Ph.D. dissertation focuses on developing methods and specialized data structures based on adjacency-lists aiming at improving graph query performance in an RDBMS through recursive queries. This projects main goal targets the question: How can we improve the performance of queries over graph data executed in a RDBMS?. To answer it, we proposed specializations over each table involved in a recursive query: the table of edges E, and the temporary result table T. The modifications in table T consists of adding extra measurement columns to enable employing it in multiple graph queries executed in a single database connection. In its turn, table E is specialized to allow storing multiple edges into a single row optimized for the execution of each join operation of the recursive query. Experiments using data from RDBMSs modeled as a graph allow concluding that: (i) enriching table T provides increased efficiency to execute multiple graph queries, such as the query types Single-Source Shortest Path (SSSP), Connected Components (CC), and PageRank (PR); (ii) grouping multiple edges from table E into a single row significantly improves the performance of individual recursive queries, reducing not only their elapsed time but also the required size to store the graph data; and (iii) employing Clustered Tables to group the edges into a single page also enables improved performance for recursive queries, with the advantage of requiring just a few changes in the SQL statements. Overall, specializing both tables T and E helps diminishing the gap of efficiently executing graph queries in a RDBMS