Montagem híbrida e análises de aneuploidias em genomas complexos: Trypanosoma cruzi CL Brener como modelo

Detalhes bibliográficos
Ano de defesa: 2023
Autor(a) principal: Anderson Coqueiro dos Santos
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Tese
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Universidade Federal de Minas Gerais
Brasil
ICB - INSTITUTO DE CIÊNCIAS BIOLOGICAS
Programa de Pós-Graduação em Bioinformatica
UFMG
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: http://hdl.handle.net/1843/73392
Resumo: Trypanosoma cruzi, the etiological agent of Chagas disease, is a flagellated unicellular protozoan parasite, whose the first version of its genome was sequenced and published in 2005. The strain selected was CL Brener, a hybrid lineage between DTUs TcII and TcIII. This hybrid characteristic, in addition to the large number of members of multigenic families associated with other repetitive elements of the T. cruzi genome, compromised the quality of assembly. Because of this, in this work, a new assembly was performed using sequencing of long reads (PacBio) combined with short reads (Illumina), and Sanger reads of BACs and Fosmids generated by the genome project of 2005 were also used to aid assembly. For this, different assemblers were tested, such as Canu and HGAP, for the construction of contigs. Scaffolding was performed interactively, reducing the number of reads used to join contigs at each iteration and therefore allowing the assembly of regions with better support first. A total of 446 sequences were obtained at the end of assembly, followed by their correction using short reads. A de novo annotation of this new assembly was performed using the Augustus program based on the CL Brener annotation available in the TritrypDB, as well as that of other strains already annotated. In addition, the telomeric and subtelomeric regions were evaluated, obtaining 24 sequences with telomeres. Compared to the public CL Brener assembly and other recent assemblies of different strains that also used long reads, this new genome assembly of the CL Brener showed good results. We have also evaluated the occurrence of recombination in the CL Brener genome using short Illumina reads from strains representative of the parental lineages (Y TcII and 231 TcIII). We detected possible recombination sites exclusive of CL Brener, as well as common recombination sites between CL Brener and TCC, another hybrid strain of DTU TcVI. Finally, we have developed CADIn, a tool intended to infer genomic ploidy and chromosomal somy variations based on NGS data with a single command. To this end, CADIn uses both allele frequencies of heterozygous SNPs and depth coverage analysis of reads. CADIn removes chromosomal regions with atypical coverage that may complicate read depth analysis and statistically validates ploidy variations. Through this tool, aneuploidies were detected in the CL Brener genome as well as in other genomes with distinct levels of complexity such as Leismania sp., and Saccharomyces cerevisiae. In addition, simulated data demonstrated CADIn's ability to use reads with different lengths and obtained by different sequencing methods.