Desenvolvimento de ferramentas bioinformáticas para estudos de associação em escala genômica
Ano de defesa: | 2018 |
---|---|
Autor(a) principal: | |
Orientador(a): | |
Banca de defesa: | |
Tipo de documento: | Tese |
Tipo de acesso: | Acesso aberto |
Idioma: | por |
Instituição de defesa: |
Universidade Federal de Minas Gerais
Brasil ICB - INSTITUTO DE CIÊNCIAS BIOLOGICAS Programa de Pós-Graduação em Bioinformatica UFMG |
Programa de Pós-Graduação: |
Não Informado pela instituição
|
Departamento: |
Não Informado pela instituição
|
País: |
Não Informado pela instituição
|
Palavras-chave em Português: | |
Link de acesso: | http://hdl.handle.net/1843/35077 |
Resumo: | The EPIGEN-Brazil Project is one of the biggest Latin American initiatives in genomic epidemiology and population genomics and its main objective is to understand the association between complex traits and genetic variants in Brazilian populations, which has a high level of admixture. This thesis describes two projects developed within the scope of the EPIGEN-Brazil Project. The first project describes the development of an imputation panel for Latin American admixed populations. Genotype imputation is one of the main steps of genome-wide association studies (GWAS), however, the imputation efficiency depends on the match between the genotyped data and the imputation panel used. As the imputation panels available do not have data of admixed populations, the imputation experiments can insert variants erroneously due to mismatches between the genotyped data and the available reference panels. Our developed imputation panel consists in fusion data from 4.3 million SNPs to 265 individuals with the imputation panel of the 1000 Genomes Project (1KGP). After comparing the efficiency of our panel with that of the 1KGP we found that our panel inserts 140,452 SNPs (Single Nucleotide Polimorphism) more in total and produces 788,873 SNPs imputed with high quality value when compared to results panel of 1KGP, increasing the efficiency of the imputation. The second project presented here consists in the development of NAToRA (Network Algorithm To Relatedness Analysis), a tool designed to minimize the relationship in related samples. This tool uses graph and complex networks theory and relationship measures to perform successive exclusions of individuals based on centrality metrics to reduce the relationship of population samples and, at the same time, avoiding large sample loss. From tests performed on simulated and real data, we observed that the node degree centrality produced better results. Furthermore, we found that the reduction of kinship by NAToRA produced low impact on the genetic diversity of the generated subsamples when compared to the original samples. We also implemented a method that allows the generation of sets of unrelated individuals that can be analyzed without the need to exclude any individual from the original sample. |