Accelerating the alignment phase of Minimap2 genome assembly algorithm Using GACT-X in a commercial Cloud FPGA machine.

Detalhes bibliográficos
Ano de defesa: 2022
Autor(a) principal: Teng, Carolina
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: eng
Instituição de defesa: Biblioteca Digitais de Teses e Dissertações da USP
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: https://www.teses.usp.br/teses/disponiveis/3/3140/tde-05092022-084236/
Resumo: Genetic sequencing can provide crucial information in medicine and in biology studies. The technologies developed in the field are advancing rapidly and the current third-generation of genome sequencers have significant improvements over the secondgeneration. In parallel to that, sequencing throughput has been increasing at an exponential rate, which, coupled with price reduction, has resulted in a leap of generation of genomic data to be processed. Transistor technology is reaching its fundamental limits, and Moores Law is becoming obsolete, so other alternatives are required to efficiently process such an amount of data. Long-reads from the third generation of sequencers are shown to be an emerging type of genetic data, with average lengths of thousands of nucleotides each. State-of-the-Art algorithm Minimap2 is able to assemble these reads into the genome that was sampled, but it is a computationally-intensive process: for the human genome size with sufficient coverage, running times can reach up to dozens of CPU hours. Hardware acceleration has been proposed as an effort to make Minimap2 more efficient, but up to the present moment, only one of its main bottlenecks, the chaining step, has been successfully accelerated on FPGA. No efficient solution has been proposed for the aligning step, implemented as the ksw function. GACT-X is a Cloud FPGA design that performs a banded SWG alignment with fixed memory, suitable for any size of input. GACT-X with tiles of size 4,000 can be 2x faster than ksw when aligning long sequences. Replacing the alignment function ksw in Minimap2 with GACT-X on a Cloud hybrid system can provide up to 1.41x acceleration on the entire execution to the software counterpart, with comparable accuracy for data that have high similarity to the reference genome. This dissertation presents all the relevant background information, the development stages and methods, the results achieved on three different datasets, and the proposed future work on this acceleration project.