Projeto e avaliação de algoritmos paralelos para sistemas Multicore e Manycore aplicados no processamento de documentos

Detalhes bibliográficos
Ano de defesa: 2017
Autor(a) principal: Freitas, Mateus Ferreira e lattes
Orientador(a): Martins, Wellington Santos lattes
Banca de defesa: Martins , Wellington Santos, Ribeiro, Leonardo Andrade, Melo , Alba Cristina Magalhães Alves de
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Universidade Federal de Goiás
Programa de Pós-Graduação: Programa de Pós-graduação em Ciência da Computação (INF)
Departamento: Instituto de Informática - INF (RG)
País: Brasil
Palavras-chave em Português:
Palavras-chave em Inglês:
GPU
Área do conhecimento CNPq:
Link de acesso: http://repositorio.bc.ufg.br/tede/handle/tede/7829
Resumo: Several applications process documents in different ways, aiming to filter, organize or learn with them. Nowadays, a great computational power is necessary in order to do that efficiently, due to the large and increasing number of documents. Usually, documents are independent of each other, which facilitates the use of parallelism to speed up this processing. This work explores three problems: active learning, learning to rank (L2R) and top-k search. Using the parallelism on multicore CPUs and manycore GPUs (Graphics Processing Unit), parallel algorithms were proposed and evaluated for each problem, and implemented with the OpenMP and CUDA APIs. For the active learning problem a multicore algorithm was proposed, which obtained 10.8x of speedup in the best case with 12 threads. The proposed manycore version obtained 128x of speedup over the serial version, and a solution with 4 GPUs achieved 3.5x of speedup over 1 GPU. For the L2R problem a manycore algorithm was proposed, which follows a thread-block approach using the concept of Combinadic, and uses a cache with fingerprint to speed up the processing. The best case speedups were 508x over the serial, 9x over a GPU baseline, and 4x over our solution when using 4 GPUs. When comparing with a version without combinadic, the speedup over it was 4.4x with both versions using 1 GPU and 3.9x with 4. These solutions used bitmap structures to speed up the association rules creation. In the top-k search a serial and multicore solutions were implemented from a state of the art manycore algorithm for exact searches. These implementations served as baselines for our extension of this algorithm, which includes the use of multi-GPU, group searches and an intra-block load balancing. The speedups were 2.7x over the original algorithm, 17x over the serial, 4x over the multicore, and 4x over our version when using 4 GPUs.