Monitoramento estatístico aplicado à ciência dos dados: uma abordagem para validação contínua de modelos preditivos classificatórios

Bueno, Carlos Renato

Monitoramento estatístico aplicado à ciência dos dados: uma abordagem para validação contínua de modelos preditivos classificatórios

Detalhes bibliográficos
Ano de defesa:	2022
Autor(a) principal:	Bueno, Carlos Renato
Orientador(a):	Oprime, Pedro Carlos
Banca de defesa:	Não Informado pela instituição
Tipo de documento:	Tese
Tipo de acesso:	Acesso aberto
Idioma:	por
Instituição de defesa:	Universidade Federal de São Carlos Câmpus São Carlos
Programa de Pós-Graduação:	Programa de Pós-Graduação em Engenharia de Produção - PPGEP
Departamento:	Não Informado pela instituição
País:	Não Informado pela instituição
Palavras-chave em Português:	Indústria 4.0 Regressão logística Mineração de dados Validação temporal Modelos preditivos Validação contínua
Palavras-chave em Inglês:	Industry 4.0 Big Data Data mining Logistic regression Temporal validation Predictive models Continuous validation
Área do conhecimento CNPq:	CIENCIAS SOCIAIS APLICADAS::ADMINISTRACAO::ADMINISTRACAO DE EMPRESAS::ADMINISTRACAO DA PRODUCAO ENGENHARIAS::ENGENHARIA DE PRODUCAO::GERENCIA DE PRODUCAO ENGENHARIAS::ENGENHARIA DE PRODUCAO CIENCIAS EXATAS E DA TERRA::PROBABILIDADE E ESTATISTICA::ESTATISTICA::ANALISE DE DADOS
Link de acesso:	https://repositorio.ufscar.br/handle/20.500.14289/16304
Resumo:	Predictive models are those that apply to data of observable variables, called independent, inferring on the behavior of another variable, observable or not, called dependent. A particular case, and widely used, are the binary classificatory models, in which the dependent variable can receive two values: Yes or no (positive/negative, success/failure). The present thesis demonstrates that operational environments increasingly digitized allow more complex applications of these classification models. Add to this the need to increase business competitiveness, through the search for information that reduces costs or increases the profitability of companies: it is the “Perfect Storm”, which increases the importance, scope, financial impacts, and horizon time of use of these models. This phenomenon occurs both within the industry with Big Data Analytics (BDA), and in other sectors, with the development of Data Science (DS). However, the boundary conditions, or existing operational conditions, when creating the model, can undergo significant variations, due to technical problems in the generation, capture, flow of information, or even in the relationships between the variables studied, which can reduce the quality of the forecast. of the created model. The literature review showed that several researchers claim that it is important to periodically check the performance of hits and misses of these models, however, there is a lack of more specific criteria and methods that define the checking frequency and sample sizes suitable for this monitoring. To fill this gap, the concepts of Project Science Research were used to integrate the concepts of Statistical Process Monitoring (SPM) with the methods of elaboration of models applied in the field of DS. In the construction of this integration, Phases I and II of the SPM were related to a structured process of data analysis and model generation, creating an approach for its continuous validation. This was validated using analytical and simulation techniques applied to the Cohen's Kappa index, resulting in prescriptive criteria for its use, supported by comparisons based on the Matthews correlation coefficient (MCC) and the index from Youden. Control charts, based on Kappa, were found to perform well for sample amounts of m=5 and sample sizes of n=500, provided the Pe value is less than 0.8. The simulations also showed that for monitoring through Kappa fewer samples are needed than for the other studied indices.

Monitoramento estatístico aplicado à ciência dos dados: uma abordagem para validação contínua de modelos preditivos classificatórios

Registros relacionados