Large-Scale Distributed Similarity Search with Locality-Sensitive Hashing
| Autor(a) principal: | |
|---|---|
| Data de Publicação: | 2023 |
| Tipo de documento: | Dissertação |
| Idioma: | eng |
| Título da fonte: | Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| Texto Completo: | http://hdl.handle.net/10451/63676 |
Resumo: | Tese de mestrado, Engenharia Informática , 2023, Universidade de Lisboa, Faculdade de Ciências |
| id |
RCAP_763ae89b273eaaa6ec9641bf5fc7d8e8 |
|---|---|
| oai_identifier_str |
oai:repositorio.ulisboa.pt:10451/63676 |
| network_acronym_str |
RCAP |
| network_name_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| repository_id_str |
https://opendoar.ac.uk/repository/7160 |
| spelling |
Large-Scale Distributed Similarity Search with Locality-Sensitive HashingLocality Sensitive Hashing (LSH)Cliente-Servidor, Big-dataApproximate Nearest Neighbour (ANN)Erasure codesTeses de mestrado - 2024Departamento de InformáticaTese de mestrado, Engenharia Informática , 2023, Universidade de Lisboa, Faculdade de CiênciasLocality sensitive hashing (LSH) is an efficient hash-based solution for the Approximate Nearest Neighbor problem in high-dimensional environments. The recent popularity growth of the Internet brought its platforms an increase in online users. The user-base growth of Internet platforms increases the amount of user generated information produced by these platforms. This type of information accumulates into datasets that store large quantities of multidimensional objects, i.e., objects defined by several features. To keep search latencies as low as possible, faster and more efficient search methods need to be developed to counteract the big multidimensional datasets performance issues. LSH archives fast search times by indexing similar data objects close to each other but incurs extra computational steps and additional data to be stored. The high storage demand of traditional LSH algorithms in big-data environments vouch for the development of distributed and resource-efficient LSH solutions. In this thesis, we propose a distributed LSH approach that focuses on balancing storage resources with low storage overhead and without compromising LSH’s search efficiency. The proposed solution uses erasure codes to store object information directly into the LSH’s index nodes removing the need for additional storage structures and providing fault-tolerance. To further increase the proposed solution’s performance, we also include a process optimization that reduces query latency by directly comparing LSH hash values instead of object values. Additionally, we also develop an open source generalized framework capable of implementing multiple distributed LSH approaches using several LSH hashing algorithms and storage backends. We evaluated the proposed model and its optimized variant and proved not only that the proposed storage solution balances storage resources through the system nodes, but that also scales with the size of objects. Additionally, this evaluation also shows that the proposed optimization significantly increases the solutions performance, at some service quality cost.Cogo, Vinicius VielmoMedeiros, Ibéria Vitória de Sousa, 1971-Repositório da Universidade de LisboaQueimado, João Pedro Ferro2024-03-22T09:11:15Z202420232024-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10451/63676TID:203882008enginfo:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2025-03-17T15:13:17Zoai:repositorio.ulisboa.pt:10451/63676Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-29T03:37:18.909155Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse |
| dc.title.none.fl_str_mv |
Large-Scale Distributed Similarity Search with Locality-Sensitive Hashing |
| title |
Large-Scale Distributed Similarity Search with Locality-Sensitive Hashing |
| spellingShingle |
Large-Scale Distributed Similarity Search with Locality-Sensitive Hashing Queimado, João Pedro Ferro Locality Sensitive Hashing (LSH) Cliente-Servidor, Big-data Approximate Nearest Neighbour (ANN) Erasure codes Teses de mestrado - 2024 Departamento de Informática |
| title_short |
Large-Scale Distributed Similarity Search with Locality-Sensitive Hashing |
| title_full |
Large-Scale Distributed Similarity Search with Locality-Sensitive Hashing |
| title_fullStr |
Large-Scale Distributed Similarity Search with Locality-Sensitive Hashing |
| title_full_unstemmed |
Large-Scale Distributed Similarity Search with Locality-Sensitive Hashing |
| title_sort |
Large-Scale Distributed Similarity Search with Locality-Sensitive Hashing |
| author |
Queimado, João Pedro Ferro |
| author_facet |
Queimado, João Pedro Ferro |
| author_role |
author |
| dc.contributor.none.fl_str_mv |
Cogo, Vinicius Vielmo Medeiros, Ibéria Vitória de Sousa, 1971- Repositório da Universidade de Lisboa |
| dc.contributor.author.fl_str_mv |
Queimado, João Pedro Ferro |
| dc.subject.por.fl_str_mv |
Locality Sensitive Hashing (LSH) Cliente-Servidor, Big-data Approximate Nearest Neighbour (ANN) Erasure codes Teses de mestrado - 2024 Departamento de Informática |
| topic |
Locality Sensitive Hashing (LSH) Cliente-Servidor, Big-data Approximate Nearest Neighbour (ANN) Erasure codes Teses de mestrado - 2024 Departamento de Informática |
| description |
Tese de mestrado, Engenharia Informática , 2023, Universidade de Lisboa, Faculdade de Ciências |
| publishDate |
2023 |
| dc.date.none.fl_str_mv |
2023 2024-03-22T09:11:15Z 2024 2024-01-01T00:00:00Z |
| dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
| dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
| format |
masterThesis |
| status_str |
publishedVersion |
| dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10451/63676 TID:203882008 |
| url |
http://hdl.handle.net/10451/63676 |
| identifier_str_mv |
TID:203882008 |
| dc.language.iso.fl_str_mv |
eng |
| language |
eng |
| dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
| eu_rights_str_mv |
openAccess |
| dc.format.none.fl_str_mv |
application/pdf |
| dc.source.none.fl_str_mv |
reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia instacron:RCAAP |
| instname_str |
FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
| instacron_str |
RCAAP |
| institution |
RCAAP |
| reponame_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| collection |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| repository.name.fl_str_mv |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
| repository.mail.fl_str_mv |
info@rcaap.pt |
| _version_ |
1833601766345670656 |