Comparação do desempenho de diferentes estratégias de montagem do genoma de Bertholletia excelsa Bonpl. (Lecythidaceae)

Detalhes bibliográficos
Ano de defesa: 2020
Autor(a) principal: Barros, Laís Rossetto Ferraz de
Orientador(a): Martins, Karina lattes
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Universidade Federal de São Carlos
Câmpus Sorocaba
Programa de Pós-Graduação: Programa de Pós-Graduação em Biotecnologia e Monitoramento Ambiental - PPGBMA-So
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Área do conhecimento CNPq:
Link de acesso: https://repositorio.ufscar.br/handle/20.500.14289/12390
Resumo: The Brazil Nut (Bertholletia excelsa) is an arboreal species, native to the Amazon forest, by which there’s a concern referring to its ecological sustainability. Therefore, sequencing and assembling the genome might contribute to the conservation of genetic diversity and maintenance of natural populations, besides assisting the comprehension of the success of future populations. Therefore, the objectives are to assemble the genome of Bertholletia excelsa, compare different assembly methodologies, and define the methodology that results in the best quality. The methodology consisted in sequencing DNA samples from an adult individual of Bertholletia excelsa with Pacific Biosciences is Single Molecule Real Time. This step resulted in a genomic coverage of 187× and a N50 of raw reads of 14,02 kb. Following up, the haploid genome size was estimated from k-mer distribution (k=22), resulting in ~596Mpb. Subsequently, there were tested five assemblers (wtdbg2, MECAT2, SMARTdenovo, Flye and Canu) with six genomic coverages (47×, 63×, 97×, 126×, 187× and 60× with corrected reads by Canu). QUAST was used to compare the efficiency of these methods through the following parameters: contig N50, number of contigs, L50 and assembly size in relation to genome size estimative. BUSCO was used to determine the genomic completeness. The LAI score validated its quality, considering the contiguity and integrity of repetitive sequences. In conclusion, the best assembly was SMARTdenovo with genomic coverage of 187×, resulting in a contig N50 of ~2,6 Mb, genic completeness of 95,1% and LAI score of 10,53; totaling an assembly of 649.349.366 pb.