Essays on misspecification detection in double bounded random variables modeling

SILVA, José Jairo de Santana e

Essays on misspecification detection in double bounded random variables modeling

Detalhes bibliográficos
Ano de defesa:	2023
Autor(a) principal:	SILVA, José Jairo de Santana e
Orientador(a):	Não Informado pela instituição
Banca de defesa:	Não Informado pela instituição
Tipo de documento:	Tese
Tipo de acesso:	Acesso embargado
Idioma:	eng
Instituição de defesa:	Universidade Federal de Pernambuco UFPE Brasil Programa de Pos Graduacao em Estatistica
Programa de Pós-Graduação:	Não Informado pela instituição
Departamento:	Não Informado pela instituição
País:	Não Informado pela instituição
Palavras-chave em Português:	Estatística matemática Bootstrap Distribuição beta
Link de acesso:	https://repositorio.ufpe.br/handle/123456789/52017
Resumo:	The beta distribution is routinely used to model variables that assume values in the standard unit interval. Several alternative laws have, nonetheless, been proposed in the literature, such as the Kumaraswamy and simplex distributions. A natural and empirically motivated question is: does the beta law provide an adequate representation for a given dataset? We test the null hypothesis that the beta model is correctly specified against the alternative hypothesis that it does not provide an adequate data fit. Our tests are based on the information matrix equality, which only holds when the model is correctly specified. They are thus sensitive to model misspecification. Simulation evidence shows that the tests perform well, especially when coupled with bootstrap resampling. We model state and county Covid-19 mortality rates in the United States. The misspecification tests indicate that the beta law successfully represents Covid-19 death rates when they are computed using either data from prior to the start of the vaccination campaign or data collected when such a campaign was under way. In the latter case, the beta law is only accepted when the negative impact of vaccination reach on death rates is moderate. The beta model is rejected under data heterogeneity, i.e., when mortality rates are computed using information gathered during both time periods. The beta regression model is tailored for responses that assume values in the standard unit interval. In its more general formulation, it comprises two submodels, one for the mean response and another for the precision parameter. We develop tests of correct specification for such a model. The tests are based on the information matrix equality, which fails to hold when the model is incorrectly specified. We establish the validity of the tests in the class of varying precision beta regressions, provide closed-form expressions for the quantities used in the test statistics, and present simulation evidence on the tests’ null and non-null behavior. We show it is possible to achieve very good control of the type I error probability when data resampling is employed and that the tests are able to reliably detect incorrect model specification, especially when the sample size is not small. Two empirical applications are presented and discussed. Diagnostic analyses in regression modeling are usually based on residuals or local influence measures. They are used for detecting atypical observations. We develop a new approach for detecting such observations when the parameters of the model are estimated by maximum likelihood. It is based on the information matrix equality, which holds when the model is correctly specified. We consider different measures of the distance between two symmetric matrices and use them with the sample counterparts of the matrices in the information matrix equality in such a way that zero distance corresponds to correct model specification. The distance measures we use thus quantify the degree of model adequacy. We use such measures to identify observations that are atypical because they disproportionately alter the degree of model adequacy. We also introduce a modified generalized Cook distance and a new criterion that uses the two generalized Cook’s distances (modified and unmodified). Empirical applications involving Gaussian and beta models are presented and discussed.

Essays on misspecification detection in double bounded random variables modeling

Registros relacionados