Proposição de dois novos métodos para análise de componentes principais

Detalhes bibliográficos
Ano de defesa: 2020
Autor(a) principal: Reis, Carlos José dos
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Tese
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Universidade Federal de Lavras
Programa de Pós-Graduação em Estatística e Experimentação Agropecuária
UFLA
brasil
Departamento de Ciências Exatas
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: http://repositorio.ufla.br/jspui/handle/1/46110
Resumo: Principal component analysis (PCA) is a multivariate method widely used, mainly because of its ability to synthesize in a few latent variables, known as principal components, a large proportion of the total variance of all original variables. However, PCA suffers from the fact that each principal component is the linear combination of a very large number of original variables, which often causes difficulties in interpreting the results. One of the ways adopted to overcome this difficulty is to observe the loadings that accompany each variable and ignore those whose values are small. The component thus obtained becomes the linear combination involving the remaining variables. Although this practice is widely used, this procedure is potentially misleading as it is based on subjectivity. Sparse principal component analysis (SPCA) has emerged as a method that can be applied to improve this disadvantage of PCA. Being a subject of intense research for over a decade, the SPCA method proposed by Zou, Hastie and Tibshirani in 2006 modifies the original formulation of the PCA by treating it as a regression problem by introducing the LASSO penalty, acronym for Least Absolute Shrinkage and Selection Operator, which is useful for inducing sparse (null loadings) in the principal components. Because of the above, two new methods are proposed in order to facilitate the interpretation of results in the PCA, mainly for scenarios in which the problem under investigation has a very large number of variables. The proposed methods were called Sparse Group for Principal Component Analysis (SGPCA) and Pairwise Absolute Clustering and Sparsity for Principal Component Analysis (PACSPCA). The SGPCA and PACSPCA methods are based on the Octogonal Shrinkage and Clustering Algorithm for Regression (OSCAR) and Pairwise Absolute Clustering and Sparsity (PACS) regression methods, respectively. The two new methods proposed, in addition to also inducing the sparsity in the components such as the SPCA method, also can group variables using the correlation between them by the equality of their loadings. As an illustration, the proposed SGPCA and PACSPCA methods were applied to real and simulated data, aiming to elucidate some of their characteristics.