Algoritmos de espaço quase ótimo para hashing perfeito

Fabiano Cupertino Botelho

Algoritmos de espaço quase ótimo para hashing perfeito

Detalhes bibliográficos
Ano de defesa:	2008
Autor(a) principal:	Fabiano Cupertino Botelho
Orientador(a):	Não Informado pela instituição
Banca de defesa:	Não Informado pela instituição
Tipo de documento:	Tese
Tipo de acesso:	Acesso aberto
Idioma:	por
Instituição de defesa:	Universidade Federal de Minas Gerais UFMG
Programa de Pós-Graduação:	Não Informado pela instituição
Departamento:	Não Informado pela instituição
País:	Não Informado pela instituição
Palavras-chave em Português:	algoritmos Estrutura de dados (Computação) Organização de arquivos (Computação) Algoritmos de computador Computação Hashing (Computação)
Link de acesso:	http://hdl.handle.net/1843/RVMR-7LAMRS
Resumo:	A perfect hash function (PHF) h : S [0, m 1] for a key set S U of size n, where m n and U is a key universe, is an injective function that maps the keys of S to unique values. A minimal perfect hash function (MPHF) is a PHF with m = n, the smallest possible range. Minimal perfect hash functions are widely used for memory efficient storage and fast retrieval of items from static sets, such as words in natural languages, reserved words in programming languages or interactive systems, universal resource locations (URLs) in web search engines, or item sets in data mining techniques. In this thesis we present a simple, highly scalable and near-space optimal perfect hashing algorithm. Evaluation of a PHF on a given element of S requires constant time, and thedominating phase in the construction algorithm consists of sorting n fingerprints of O(log n) bits in O(n) time. The space usage depends on the relation between m and n. For m = n the space usage is in the range 2.62n to 3.3n bits, depending on the constants involved in the construction and in the evaluation phases. For m = 1.23n the space usage is in the range 1.95n to 2.7n bits. In all cases, this is within a small constant factor from the information theoretical minimum of approximately 1.44n bits for MPHFs and 0.89n bits for PHFs, something that has not been achieved by previous algorithms, except asymptotically for very large n. This small space usage opens up the use of MPHFs to applications forwhich they were not useful in the past. We demonstrate the scalability of our algorithm by constructing an MPHF for a set of 1.024 billion URLs from the World Wide Web of average length 64 characters in approximately 50 minutes, using a commodity PC. We also present a distributed and parallel implementation of the algorithm, which generates an MPHF for the same URL set, using a 14 computer cluster, in approximately 4 minutes, achieving an almost linear speedup.Also, for 14.336 billion 16-byte random integers distributed among the 14 participating machines, the algorithm outputs an MPHF in approximately 50 minutes, with a performance degradation of 20%.

Algoritmos de espaço quase ótimo para hashing perfeito

Registros relacionados