Network-based methods for analyzing the genetics of human complexdiseases

Detalhes bibliográficos
Ano de defesa: 2017
Autor(a) principal: Gilderlanio Santana de Araújo
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Tese
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Universidade Federal de Minas Gerais
UFMG
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: http://hdl.handle.net/1843/BUOS-APSR9L
Resumo: Interpreting the high volume of genomic data is a challenge for epidemiologists, anthropologists and geneticists that aim to understand the genetic basis of populations and phenotype variations (diseases/traits). In this respect, several computational methodsand tools have been developed to extract knowledge and data patterns from publicomic data for a diverse set of populations in genetic studies, and also to deal with transparency and reproducibility that has been two subjects of high importance in science. The genetic architecture of parental populations, such as African, European and Asian, has been subject of studies to understand the process of population structureand the origins of genetic diseases. It is well known that allele frequency and allele differentiation based on genotype data reveal hallmarks of dierential demographic history in worldwide populations, and some alleles are risk variants that confer disease risk and their frequency may lead to varying susceptibility to complex diseases. In this context, this thesis has two main contributions, one is a network-based approach to integrate and visualize data from NGHRI/EBI GWAS Catalog and 1000 Genomes Project, and the second is scientific workflow approach to disclose scientific knowledge. First, we present DANCE (Disease-Ancestry Network), a new web tool to improve the understanding of the genetic architecture of diseases in a cross-ethnic view. DANCE is a tool to integrate, summarize and visualize molecular profiles of genetic disease associations in a network-based approach. It was implemented as a web-based tool to explore genetic associations and risk allele dierentiation across global populations to support a broad set of genetic population analyses, such as GWAS replication and admixture mapping. Our networks are bipartite, where nodes are either phenotypes (diseases/traits) or SNPs and diseases are connected to SNPs if there is a known association in current GWAS studies. In a graphical projection, the population variability of risk-alleles frequencies is represented as a color gradient based on the pairwise FST values of dierent populations, where higher values point out highly dierentiated SNPs between the two populations. In addition, this study presents the EPIGEN Scientific Workflow (EPIGEN-SW), which aims to improve transparency and reproducibility in genetic and epidemiology studies.The EPIGEN-SW is implemented as a web tool and facilitates the access to computational resources through an integrative and interactive approach based on flowcharts, masterscripts and auxiliary scripts. Both approaches are implemented as web tools and made freely available for the scientific community. DANCE are available online at www.ldgh.com.br/dance and the EPIGEN-Brazil Scientific Workflow is available at www.ldgh.com.br/scientificworkflow.