SNP LANE, uma base de dados de vizinhança de SNPs e análises do efeito dos nucleotídeos vizinhos na probabilidade de substituição de nucleotídeos em mamíferos

Detalhes bibliográficos
Ano de defesa: 2019
Autor(a) principal: Fernanda Stussi Duarte Lage
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Universidade Federal de Minas Gerais
Brasil
ICB - INSTITUTO DE CIÊNCIAS BIOLOGICAS
Programa de Pós-Graduação em Bioinformatica
UFMG
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: http://hdl.handle.net/1843/30434
Resumo: The dynamics of DNA composition of the genomes of several taxa is one of the main issues in genomic analysis. These changes might be consequences of errors of both replication and repair machinery. Nucleotide substitution, for example, is one of the fuels in the production of genetic variation and directs evolution. These substitutions are supposed to occur randomly throughout the genome through the time. However, recent analysis on genomic data, especially SNP (single nucleotide polymorphism) data, demonstrated several characteristics supporting the non-randomness nature of this event, such as the high variation on the mutation rate along the genome. Recent studies have shown that the components of the genome affect mutation patterns to some extent, that is, intense nucleotide neighborhood biases were observed at positions immediately adjacent to mutations and less pronounced bias extend to regions distant from the replacement site. This phenomenon may be mainly attributed to the enzyme that modifies or mutates the genetic material, since most enzymes tend to have specific sequence contexts that dictate their activity. Thus, identification of context effects may lead to the discovery of additional editing sites or unknown enzymatic factors. Aiming to investigate comprehensively this event, we built an online database to show the pattern of bases in SNP neighborhood, available at: http://bioinfo.icb.ufmg.br/snplane/ using the latest SNP datasets from 5 different species of mammals (Mus musculus, Homo Sapiens, Bos taurus, Rattus Norvegicus and Sus scrofa) were downloaded and then parsed according to the genomic region where the SNP belonged to (intron, exon, 5’ upstream, 3’ downstream and coding sequences) and classified by substitution type: K, M, R, Y, W or S. For each SNP class, nucleotide frequencies were calculated for the first five positions upstream and downstream surrounding the SNP. Expected baseline nucleotide frequencies for positions neighbouring the SNP were estimated by randomly choosing positions in the genome and retrieving nucleotides flanking it. Two graphics are presented for each of 1200 distinct situations. In the majority of cases, baseline frequency was not significantly different from the observed data, indicating that the observed neighboring effect was not an influence on the mutation, but rather if T or A are more frequent downstream of C, it would seem C might be influencing the transition T/A but baseline frequency indicates that this is just an effect of non-randomness of the genome. When we deaminated all remaining C in CpG, was a small increase in bias. Simulating different percentages of amination of "CpA" and "TpG" back to CpG dinucleotides was noteworthy that the bias is completely erased with 25% to 35% of amination. We do not see the neighboring nucleotide effect on these conditions. R and Y substitutions did not respond to amination, probably because amination already causes R and Y. It is suggested that dinucleotide composition produces the previously reported neighborhood bias on SNP probability. Most of this effect might be explained by deamination of C in CpG and we suggest that originally human genome would have 25% to 35% of the present CpA and TpG in the form of CpG.