Population genomics of viruses and its analytical tools

Detalhes bibliográficos
Ano de defesa: 2020
Autor(a) principal: Santos, Matheus de Morais
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: eng
Instituição de defesa: Universidade Federal de Uberlândia
Brasil
Programa de Pós-graduação em Agronomia
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: https://repositorio.ufu.br/handle/123456789/30358
http://doi.org/10.14393/ufu.di.2020.228
Resumo: Comparative genomics makes possible to study virus populations with high degree of accuracy, by using the largest possible number of molecular markers. The knowledge of evolutionary processes that affect these populations is of ecological and epidemiological importance, since they show high degrees of genetic diversity. Begomoviruses (genus Begomovirus, Family Geminiviridae) have genomes composed of one single-stranded DNA molecule (monopartite begomoviruses) or two (bipartite begomoviruses, DNA-A and DNA-B components), infect dicotyledonous plants and evolve as quickly as viruses with genomes composed of RNA. The genetic structure of the global metapopulation of begomoviruses was determined in a previous study based on DNA-A sequences and revealed to consist of large, genetically different and cohesive subpopulations due to the existence of geographical barriers, host range and genetic barriers to recombination. However, the structure of the global metapopulation based on DNA-B sequences has not been determined. There is a enough sequences in public databases to carry out such a study. Nevertheless, comparing three or more biological sequences requires the construction of multiple sequence alignments (MSAs). All computer programs used in MSAs construction use heuristic strategies and, therefore, show differences in their degrees of accuracy. Accuracy is a critical parameter for choosing a MSA program. Also, notably, there is a scarcity of studies that assess the level of accuracy of these programs when data sets are composed of viral genomes. Thus, this work (i) determined the genetic structure of the global metapopulation of begomoviruses based on DNA-B sequences, comparing the recombination patterns from their subpopulations; and (ii) evaluated the degree of accuracy of the main MSA programs on viral genomes and their practical implications in genomics studies. To achieve the first objective, full-length DNA-B sequences were obtained from GenBank, subdivided into eight subpopulations by discriminant analysis of principal components and analyzed by seven methods of recombination detection. The inferred subpopulations were genetically different and presented two distinct recombination patterns. The second objective was achieved by estimating the degree of accuracy of 13 MSA programs/settings on six data sets composed of the full-length genomes of species belonging to five genera of viruses and one genus of a subviral agent. The programs showed distinct degrees of accuracy depending on the data set. Notably, MSAs generated by the most accurate programs (determined in this study) and those generated by programs widely used in genomic studies in the field of Virology yielded incongruent phylogenetic trees, suggesting that the evolutionary histories frequently presented in the literature may not be the most likely ones.