Multiple factor analysis model with scale mixture of normal distributions in the latent factors

Detalhes bibliográficos
Ano de defesa: 2018
Autor(a) principal: MARQUES, Alexandre Henrique Carvalho
Orientador(a): GARAY, Aldo William Medina
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: eng
Instituição de defesa: Universidade Federal de Pernambuco
Programa de Pós-Graduação: Programa de Pos Graduacao em Estatistica
Departamento: Não Informado pela instituição
País: Brasil
Palavras-chave em Português:
Link de acesso: https://repositorio.ufpe.br/handle/123456789/32306
Resumo: Statistical tools for modeling covariance structures have been shown useful in Medicine for studies in genetics. In that context, factor analysis models stand out for its ability in identifying latent factors capable of reducing data dimensionality and explaining observed variability. Usually, latent factors are interpreted as unobserved physiological mechanisms underlying the studied phenomenon. Confirmatory factor analysis models are characterized by allowing the researcher to pre-specify model’s elements, as for example, the number of latent factors, the loading matrix structure and linear restrictions on the parameters. Those models allow the validation of hypothesis in gene co-expression studies. Confirmatory factor analysis models under normality assumption for the data are well consolidated in the literature. Our aim is to develop a more general class capable of integrate several independent populations extending the data’s normality assumption to a more flexible class of distributions, the class of scale mixture of normal (SMN). The class of scale mixture of normal includes, as special cases, the normal distribution and distributions with heavy tails as the t-Student, contaminated normal ans slash. This model allows to specify parameter restrictions, which leads to important particular cases of covariance structures, making it more flexible in its specification and distributional assumptions. Model identifiability is studied, with necessary and/or sufficient conditions for parameter identification being presented. To estimate the model’s parameters we propose an ECM algorithm and the estimators’ performance in finite samples is evaluated through Monte Carlo simulation studies. We conclude the study with an illustration considering a confirmatory model for the pathological dynamic of pancreas cancer based on actual gene expression data.