Data mining techniques applied to historical data of industrial processes as a tool to find time intervals suitable for system identification.

Detalhes bibliográficos
Ano de defesa: 2020
Autor(a) principal: Santo, Giulio Cesare Mastrocinque
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: eng
Instituição de defesa: Biblioteca Digitais de Teses e Dissertações da USP
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: https://www.teses.usp.br/teses/disponiveis/3/3139/tde-05032021-111034/
Resumo: System Identification is a set of model estimation techniques traditionally used by in- dustries to improve and optimize their processes. Estimating dynamic process models requires the existence of informative and representative data of the system, which are usually generated through physical experiments on the plants. However, such procedures often need to be performed multiple times to produce adequate datasets, which may result in products that are out of specification. On the other hand, the emergence of powerful data storage and management software, as well as the constant development in the areas of mining and data science represent a potential paradigm break in industry, in which robust data-driven solutions can be adopted. The direct use of historical data to extract useful information from industrial processes is a central part of this work, in which it is proposed a comparison of data mining techniques with the objective of finding time intervals with sucient information to perform system identification. For this purpose, a detailed review on the literature regarding the problem is initially provided. Then, dierent mining algorithms are applied to both Single-Input Single-Output and Multiple-Input Multiple-Output systems operating in open-loop and in closed-loop. Simulated data is used to didactically exemplify how each method works and to validate the expected outcomes in an ideal scenario. Regressive models are then estimated with the obtained intervals, which are used to perform cross-validation. Finally, the proposed methods are applied to real multivariable data coming from an industrial petrochemical furnace. Results obtained through simulated data show that the proposed data mining strategies allowed the estimation of good models in cross-validation scenarios with 1, 10, 100 and infinite prediction steps. Real data applications, in turn, revealed to be challenging due to the noisy nature of the data and due to the scarcity of historical intervals in which all the inputs of the multivariable system are suciently active to estimate a model. However, this problem is overcome through the use of multiple intervals in the estimation process, elucidating that the adopted algorithms can also produce reasonable models in real scenarios.