Detalhes bibliográficos
Ano de defesa: |
2024 |
Autor(a) principal: |
Eleutério, Igor Alberte Rodrigues |
Orientador(a): |
Não Informado pela instituição |
Banca de defesa: |
Não Informado pela instituição |
Tipo de documento: |
Dissertação
|
Tipo de acesso: |
Acesso aberto |
Idioma: |
eng |
Instituição de defesa: |
Biblioteca Digitais de Teses e Dissertações da USP
|
Programa de Pós-Graduação: |
Não Informado pela instituição
|
Departamento: |
Não Informado pela instituição
|
País: |
Não Informado pela instituição
|
Palavras-chave em Português: |
|
Link de acesso: |
https://www.teses.usp.br/teses/disponiveis/55/55134/tde-23072024-143549/
|
Resumo: |
RDBMSs are omnipresent systems that store and retrieve data in diverse scenarios. They are good at dealing with scalar data, such as numbers, small strings, and dates, for which the Identity (=,≠) and Order relations (≤, ≥,<,>) are helpful. However, they struggle with complex data like images, videos, and audio tracks. For this kind of data, Identity and Order relations are not meaningful. In this context, the Similarity Queries are noteworthy because they are an approach to comparing and evaluating complex objects. Two noteworthy similarity queries are Range and k-NN. Many works in the literature implement systems to perform similarity queries. However, they have limitations, such as not using RDBMS structures to allow traditional queries, not implementing indexes, or requiring changes in SQL commands to operate similarity queries. In this masters research, we implemented two systems: MIGUE-Sim and CoSIM-Gres, each one with its own contributions to literature. MIGUE-Sim is focused on implementing similarity queries using only native resources of Postgres. With this system, we evaluated different ways to represent a k-NN query in plain SQL, and our proposed query is up to 10% faster than our main competitor. Also, we used the native Gist R-tree index to perform k-NN query, and it achieved a performance speed-up of up to 96% than our competitor. The CoSIM-Gres is focused on implementing three different access methods to perform similarity queries in RDBMS: Sequential Access, MAM Slim-tree, and Gist R-tree. To the best of our knowledge, this is the first in- depth discussion of the performance of similarity queries involving different access methods in RDBMS. We evaluated different cardinalities, dimensionalities, and distance functions, and our results point that i) distance functions of the Minkowski family do not impact the access methods performance significantly; ii) When the expected number of elements retrieved is low compared with the total number of elements in the table (around 5%), the MAM is much better than Sequential Access; iii) When the expected number of elements retrieved by the query is up to 50% of the dataset, the MAM is better than Sequential Access; otherwise, it is better to perform a Sequential Access; iv) When the Gist R-tree is available, it is better than MAM Slim-tree and Sequential Access to retrieve up to 20% of the dataset. Our results are relevant to future work on optimizing similarity queries in RDBMS. |