Dimensionamento amostral em análises de componentes principais, variáveis canônicas e agrupamento em cultivares de soja

Souza, Rafael Rodrigues de

Dimensionamento amostral em análises de componentes principais, variáveis canônicas e agrupamento em cultivares de soja

Detalhes bibliográficos
Ano de defesa:	2024
Autor(a) principal:	Souza, Rafael Rodrigues de
Orientador(a):	Não Informado pela instituição
Banca de defesa:	Não Informado pela instituição
Tipo de documento:	Tese
Tipo de acesso:	Acesso aberto
Idioma:	por
Instituição de defesa:	Universidade Federal de Santa Maria Brasil Agronomia UFSM Programa de Pós-Graduação em Agronomia Centro de Ciências Rurais
Programa de Pós-Graduação:	Não Informado pela instituição
Departamento:	Não Informado pela instituição
País:	Não Informado pela instituição
Palavras-chave em Português:	Bootstrap Extreme Gradient Boosting Modeling Experimental planning Modelagem Planejamento experimental CNPQ::CIENCIAS AGRARIAS::AGRONOMIA
Link de acesso:	http://repositorio.ufsm.br/handle/1/32041
Resumo:	Research on soybean genetic divergence is intensively based on indirect methodologies that use phenotypic characters. Principal components, canonical variables, and hierarchical analyses are among the main applied methodologies. Although these tools possess wide applicability, it is important to highlight that their use does not always include a representative sample basement. In other words, there is a lack of previous sampling definition, so that, a lot of times, empirical decisions are taken. In this sense, the present study aims to analyze the response of genetic divergence techniques to the variation in the number of sampled plants; to define a reference sample size for principal component techniques, canonical variables, and grouping techniques in soybean; and, to propose new robust approaches to define sample size. Therefore, field trials were conducted during the 2017/2018 growing season, in two locations of Rio Grande do Sul and on three sowing dates, totaling six experiments. The experimental units were composed of five rows, with three meters in length, spaced by 0.45 meters. A completely randomized block design was used to evaluate 20 soybean cultivars, with three repetitions in each experiment. During grain maturation, ten morphological characters were assessed in 20 plants per experimental unit, totaling 7,200 individually measured plants. Next, simulations with reposition were performed (bootstrap resampling) in sampling scenarios varying from 1 to 100 plants per experimental unit to evaluate the eigenvalues of the principal components, the canonical components of the canonical variables, and the coefficient of cophenetic correlation deriving from the combination of nine dissimilarity measures and seven grouping methods. These bootstrap simulations were carried out individually for the six experiments, followed by a joint analysis of the experiments. Regarding the sample dimensioning for the principal component technique, the method of error as a percentage of the average was used. For the second study, related to canonical variables, an approach which combined nonlinear models and a maximum curvature point was used to estimate sample size. In the third study, a methodology was developed for sample size definition, which was based on unsupervised machine learning, along with bayesian optimization, plus a modification of the maximum curvature point through perpendicular distances. An overall gradual improvement was observed in the estimate of the eigenvalues of the canonical variables and the cophenetic coefficient with an increase in the number of sampled plants. It was observed that 18 plants per experimental unit were enough to estimate the first two principal components, whereas 36 plants were necessary to estimate the canonical variables. In the hierarchical analyses, a variation in the representative sample size was verified, which was dependent on the dissimilarity measure and the grouping method used. However, it is suggested that 27 plants per experimental unit were enough for a representative sampling in hierarchical analyses. Thus, it is possible to optimize the use of the methodologies of principal components, canonical variables, and hierarchical analyses, ensuring the reliability of its results and avoiding empirical decision-making on the sampling number in soybean.

Dimensionamento amostral em análises de componentes principais, variáveis canônicas e agrupamento em cultivares de soja

Registros relacionados