Large-Scale Distributed Similarity Search with Locality-Sensitive Hashing

Detalhes bibliográficos
Autor(a) principal: Queimado, João Pedro Ferro
Data de Publicação: 2023
Tipo de documento: Dissertação
Idioma: eng
Título da fonte: Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
Texto Completo: http://hdl.handle.net/10451/63676
Resumo: Tese de mestrado, Engenharia Informática , 2023, Universidade de Lisboa, Faculdade de Ciências
id RCAP_763ae89b273eaaa6ec9641bf5fc7d8e8
oai_identifier_str oai:repositorio.ulisboa.pt:10451/63676
network_acronym_str RCAP
network_name_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository_id_str https://opendoar.ac.uk/repository/7160
spelling Large-Scale Distributed Similarity Search with Locality-Sensitive HashingLocality Sensitive Hashing (LSH)Cliente-Servidor, Big-dataApproximate Nearest Neighbour (ANN)Erasure codesTeses de mestrado - 2024Departamento de InformáticaTese de mestrado, Engenharia Informática , 2023, Universidade de Lisboa, Faculdade de CiênciasLocality sensitive hashing (LSH) is an efficient hash-based solution for the Approximate Nearest Neighbor problem in high-dimensional environments. The recent popularity growth of the Internet brought its platforms an increase in online users. The user-base growth of Internet platforms increases the amount of user generated information produced by these platforms. This type of information accumulates into datasets that store large quantities of multidimensional objects, i.e., objects defined by several features. To keep search latencies as low as possible, faster and more efficient search methods need to be developed to counteract the big multidimensional datasets performance issues. LSH archives fast search times by indexing similar data objects close to each other but incurs extra computational steps and additional data to be stored. The high storage demand of traditional LSH algorithms in big-data environments vouch for the development of distributed and resource-efficient LSH solutions. In this thesis, we propose a distributed LSH approach that focuses on balancing storage resources with low storage overhead and without compromising LSH’s search efficiency. The proposed solution uses erasure codes to store object information directly into the LSH’s index nodes removing the need for additional storage structures and providing fault-tolerance. To further increase the proposed solution’s performance, we also include a process optimization that reduces query latency by directly comparing LSH hash values instead of object values. Additionally, we also develop an open source generalized framework capable of implementing multiple distributed LSH approaches using several LSH hashing algorithms and storage backends. We evaluated the proposed model and its optimized variant and proved not only that the proposed storage solution balances storage resources through the system nodes, but that also scales with the size of objects. Additionally, this evaluation also shows that the proposed optimization significantly increases the solutions performance, at some service quality cost.Cogo, Vinicius VielmoMedeiros, Ibéria Vitória de Sousa, 1971-Repositório da Universidade de LisboaQueimado, João Pedro Ferro2024-03-22T09:11:15Z202420232024-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10451/63676TID:203882008enginfo:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2025-03-17T15:13:17Zoai:repositorio.ulisboa.pt:10451/63676Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-29T03:37:18.909155Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse
dc.title.none.fl_str_mv Large-Scale Distributed Similarity Search with Locality-Sensitive Hashing
title Large-Scale Distributed Similarity Search with Locality-Sensitive Hashing
spellingShingle Large-Scale Distributed Similarity Search with Locality-Sensitive Hashing
Queimado, João Pedro Ferro
Locality Sensitive Hashing (LSH)
Cliente-Servidor, Big-data
Approximate Nearest Neighbour (ANN)
Erasure codes
Teses de mestrado - 2024
Departamento de Informática
title_short Large-Scale Distributed Similarity Search with Locality-Sensitive Hashing
title_full Large-Scale Distributed Similarity Search with Locality-Sensitive Hashing
title_fullStr Large-Scale Distributed Similarity Search with Locality-Sensitive Hashing
title_full_unstemmed Large-Scale Distributed Similarity Search with Locality-Sensitive Hashing
title_sort Large-Scale Distributed Similarity Search with Locality-Sensitive Hashing
author Queimado, João Pedro Ferro
author_facet Queimado, João Pedro Ferro
author_role author
dc.contributor.none.fl_str_mv Cogo, Vinicius Vielmo
Medeiros, Ibéria Vitória de Sousa, 1971-
Repositório da Universidade de Lisboa
dc.contributor.author.fl_str_mv Queimado, João Pedro Ferro
dc.subject.por.fl_str_mv Locality Sensitive Hashing (LSH)
Cliente-Servidor, Big-data
Approximate Nearest Neighbour (ANN)
Erasure codes
Teses de mestrado - 2024
Departamento de Informática
topic Locality Sensitive Hashing (LSH)
Cliente-Servidor, Big-data
Approximate Nearest Neighbour (ANN)
Erasure codes
Teses de mestrado - 2024
Departamento de Informática
description Tese de mestrado, Engenharia Informática , 2023, Universidade de Lisboa, Faculdade de Ciências
publishDate 2023
dc.date.none.fl_str_mv 2023
2024-03-22T09:11:15Z
2024
2024-01-01T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10451/63676
TID:203882008
url http://hdl.handle.net/10451/63676
identifier_str_mv TID:203882008
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron:RCAAP
instname_str FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron_str RCAAP
institution RCAAP
reponame_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
collection Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository.name.fl_str_mv Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
repository.mail.fl_str_mv info@rcaap.pt
_version_ 1833601766345670656