Dimensionamento amostral em análises de componentes principais, variáveis canônicas e agrupamento em cultivares de soja
Ano de defesa: | 2024 |
---|---|
Autor(a) principal: | |
Orientador(a): | |
Banca de defesa: | |
Tipo de documento: | Tese |
Tipo de acesso: | Acesso aberto |
Idioma: | por |
Instituição de defesa: |
Universidade Federal de Santa Maria
Brasil Agronomia UFSM Programa de Pós-Graduação em Agronomia Centro de Ciências Rurais |
Programa de Pós-Graduação: |
Não Informado pela instituição
|
Departamento: |
Não Informado pela instituição
|
País: |
Não Informado pela instituição
|
Palavras-chave em Português: | |
Link de acesso: | http://repositorio.ufsm.br/handle/1/32041 |
Resumo: | Research on soybean genetic divergence is intensively based on indirect methodologies that use phenotypic characters. Principal components, canonical variables, and hierarchical analyses are among the main applied methodologies. Although these tools possess wide applicability, it is important to highlight that their use does not always include a representative sample basement. In other words, there is a lack of previous sampling definition, so that, a lot of times, empirical decisions are taken. In this sense, the present study aims to analyze the response of genetic divergence techniques to the variation in the number of sampled plants; to define a reference sample size for principal component techniques, canonical variables, and grouping techniques in soybean; and, to propose new robust approaches to define sample size. Therefore, field trials were conducted during the 2017/2018 growing season, in two locations of Rio Grande do Sul and on three sowing dates, totaling six experiments. The experimental units were composed of five rows, with three meters in length, spaced by 0.45 meters. A completely randomized block design was used to evaluate 20 soybean cultivars, with three repetitions in each experiment. During grain maturation, ten morphological characters were assessed in 20 plants per experimental unit, totaling 7,200 individually measured plants. Next, simulations with reposition were performed (bootstrap resampling) in sampling scenarios varying from 1 to 100 plants per experimental unit to evaluate the eigenvalues of the principal components, the canonical components of the canonical variables, and the coefficient of cophenetic correlation deriving from the combination of nine dissimilarity measures and seven grouping methods. These bootstrap simulations were carried out individually for the six experiments, followed by a joint analysis of the experiments. Regarding the sample dimensioning for the principal component technique, the method of error as a percentage of the average was used. For the second study, related to canonical variables, an approach which combined nonlinear models and a maximum curvature point was used to estimate sample size. In the third study, a methodology was developed for sample size definition, which was based on unsupervised machine learning, along with bayesian optimization, plus a modification of the maximum curvature point through perpendicular distances. An overall gradual improvement was observed in the estimate of the eigenvalues of the canonical variables and the cophenetic coefficient with an increase in the number of sampled plants. It was observed that 18 plants per experimental unit were enough to estimate the first two principal components, whereas 36 plants were necessary to estimate the canonical variables. In the hierarchical analyses, a variation in the representative sample size was verified, which was dependent on the dissimilarity measure and the grouping method used. However, it is suggested that 27 plants per experimental unit were enough for a representative sampling in hierarchical analyses. Thus, it is possible to optimize the use of the methodologies of principal components, canonical variables, and hierarchical analyses, ensuring the reliability of its results and avoiding empirical decision-making on the sampling number in soybean. |