Junções por similaridade com expressões complexas em ambientes distribuídos

Oliveira, Diego Junior do Carmo

Junções por similaridade com expressões complexas em ambientes distribuídos

Detalhes bibliográficos
Ano de defesa:	2018
Autor(a) principal:	Oliveira, Diego Junior do Carmo
Orientador(a):	Ribeiro, Leonardo Andrade
Banca de defesa:	Ribeiro, Leonardo Andrade, Martins, Wellington Santos, Esmin, Ahmed Ali Abdalla
Tipo de documento:	Dissertação
Tipo de acesso:	Acesso aberto
Idioma:	por
Instituição de defesa:	Universidade Federal de Goiás
Programa de Pós-Graduação:	Programa de Pós-graduação em Ciência da Computação (INF)
Departamento:	Instituto de Informática - INF (RG)
País:	Brasil
Palavras-chave em Português:	Junção por similaridade Sistemas distribuídos Apache spark Big data
Palavras-chave em Inglês:	Similarity joins Distributed platforms
Área do conhecimento CNPq:	CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
Link de acesso:	http://repositorio.bc.ufg.br/tede/handle/tede/8928
Resumo:	A recurrent problem that degrades the quality of the information in databases is the presence of duplicates, i.e., multiple representations of the same real-world entity. Despite being computationally expensive, the use of similarity operations is fundamental to identify duplicates. Furthermore, real-world data is typically composed of different attributes and each attribute represents a distinct type of information. The application of complex similarity expressions is important in this context because they allow considering the importance of each attribute in the similarity evaluation. However, due to a large amount of data present in Big Data applications, it has become crucial to perform these operations in parallel and distributed processing environments. In order to solve such problems of great relevance to organizations, this work proposes a novel strategy to identify duplicates in textual data by using similarity joins with complex expressions in a distributed environment.

Junções por similaridade com expressões complexas em ambientes distribuídos

Registros relacionados