Entropy, mutual information, and population structure in genome-wide selection

Detalhes bibliográficos
Ano de defesa: 2020
Autor(a) principal: Simiqueli, Guilherme Ferreira
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Tese
Tipo de acesso: Acesso aberto
Idioma: eng
Instituição de defesa: Universidade Federal de Viçosa
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: https://locus.ufv.br//handle/123456789/28716
Resumo: Different populations can compose the training set aiming for a better predictive ability of genomic prediction models. However, this practice has not always resulted in higher predictive ability and some studies have proposed to account population structure effect for a better prediction. Different strategies like principal components covariates, uni and multi-population models, alternative genomic relationships matrices, admixed proportions covariates, or a mix of them have been applied to genomic prediction. Thus, the first chapter aims to evaluate some combinations of these strategies to help the decision making about considering or not considering population structure on genomic prediction. Simulated polygenic traits with 0.1 and 0.5 heritability and real data were used to evaluated the strategies. Bias was lower, when multi-population model was used for low-heritability simulated trait. The accuracy of high-heritability trait was lower for strategies that used alternative genomic matrices that accounted for differences in allele frequency, only in admixed populations. Further, for real data, two commonly used genomic relationship matrices showed lower values of predictive ability for all traits, which are likely controlled by few quantitative trait loci. Therefore, accounting for population structure depends on trait heritability, trait architecture, and admixture level of population for obtaining lower bias without reduction of accuracy, and, consequently, success of genomic prediction. The second chapter address the fact that random k-fold cross-validation in genome wide selection can provide high estimates of predictive ability, due to the high degree of kinship between the training and validation sets. However, many breeding tree populations are less genetically related to the training sets and have different levels of phenotypic diversity. Therefore, this chapter proposed novel methods of splitting cross-validation sets, accounting genetic similarity and phenotypic diversity estimated via mutual information and entropy, respectively. These methods also verified how distribution of phenotypic and genotypic information affects genome wide selection of trees. The methods trustworthily fitted models, according to the entropy of tree breeding populations and their genetic relatedness to the training sets. Validations sets with more phenotypic diversity showed higher predictive ability and lower bias. Therefore, the phenotypic diversity should be added in tree breeding populations for higher genetic gain and better estimation of genomic breeding values and a consistent long-term tree breeding success. Keywords: Population structure. Accuracy. Bias. Mutual information. Entropy. K-fold cross-validation.