Imputação múltipla via algorítmo MICE e método IMLD
Ano de defesa: | 2016 |
---|---|
Autor(a) principal: | |
Orientador(a): | |
Banca de defesa: | |
Tipo de documento: | Tese |
Tipo de acesso: | Acesso aberto |
Idioma: | por |
Instituição de defesa: |
Universidade Estadual de Maringá
Brasil Departamento de Estatística Programa de Pós-Graduação em Bioestatística UEM Maringá, PR Centro de Ciências Exatas |
Programa de Pós-Graduação: |
Não Informado pela instituição
|
Departamento: |
Não Informado pela instituição
|
País: |
Não Informado pela instituição
|
Palavras-chave em Português: | |
Link de acesso: | http://repositorio.uem.br:8080/jspui/handle/1/4362 |
Resumo: | A common problem in statistical analyzes is the occurrence of incomplete databases. Generally, in these situations, it restricts the analysis to subjects with complete data on the variables. This reduces the size of the sample, and can result in unbiased estimates. The "filling" of the missing data can be done by multiple imputation (IM), wherein each missing value is replaced by a set of plausible values, incorporating the uncertainty about the amount to be imputed. Currently, multiple imputation is available in the main statistical software, but most of the implemented methods are parametric, and in these cases there are strong assumptions about the distribution of data, which in practice is dificult to verify. In order to promote interdisciplinarity in Biostatistics, we treat here two procedures to perform multiple imputation which offer greater flexibility in the distribution of the data: the MICE algorithm Multivariate imputation by Chained Equations - and IMLD method - Multiple Imputation Distribution Free. The MICE algorithm is applied to data from a cross-sectional study of newborns live residents in the Parana state, in the year 2012. A random sample with complete records of 3380 cases was obtained, a logistic regression model was fitted to the outcome of low birth weight. By simulation, it was generated three sets of incomplete data, with missing data for weight outcome. The models were adjusted in three diferent situations for comparison with the standard model. It can be seen through the estimates, a better adjustment of the models with imputation when compared to the case where we analyze the data with missing records. The estimates of imputed model standard errors of approaches very well the results obtained with the gold standard model. An application to the IMLD method is made with a array of data regarding the average plant height (m) of 20 early and genetically modified corn cultivars, evaluated in seven locations in the Parana state (SHIOGA et al., 2015). Random removals ( 5%, 15%, 30%) were made in the original array and then used the method IMLD to fill these missing values. The implementation of the method was taken in the R software, which is provided in annex. Through variability and accuracy measurements, the method proved to be effective. With this, we have evidence that multiple imputation should be an option to be used when there is missing data |