Comparação de modelos de aprendizado de máquina interpretáveis na predição de calor de combustão e de formação
Ano de defesa: | 2023 |
---|---|
Autor(a) principal: | |
Orientador(a): | |
Banca de defesa: | |
Tipo de documento: | Dissertação |
Tipo de acesso: | Acesso aberto |
Idioma: | por |
Instituição de defesa: |
Universidade Federal de Santa Maria
Brasil Engenharia Química UFSM Programa de Pós-Graduação em Engenharia Química Centro de Tecnologia |
Programa de Pós-Graduação: |
Não Informado pela instituição
|
Departamento: |
Não Informado pela instituição
|
País: |
Não Informado pela instituição
|
Palavras-chave em Português: | |
Link de acesso: | http://repositorio.ufsm.br/handle/1/29514 |
Resumo: | The determination of physical-chemical properties for substances is of paramount importance in the field of chemical engineering, as these are related to equipment sizing, operational conditions, and process efficiencies. Since experimental data for certain substances are not always available, it is necessary to develop and use equations to determine these properties. In recent decades, there has been a popularization of machine learning algorithms. Through an interactive training process with a database, these algorithms have become capable of making predictions. In order to evaluate the integration between different methods for property prediction, a total of 551 data points for pure substances, consisting of carbon, hydrogen, oxygen, nitrogen, and sulfur, were used. These pure substances were represented computationally by the number and type of atoms or by the number and type of chemical bonds between these atoms. These variables served as inputs for all trained models. To establish the relationship between these substances and their respective thermodynamic properties, namely the heat of combustion and formation, multivariable linear regression models, symbolic regression, artificial neural networks, gradient boosting based on decision trees, and regression vector support machines were employed. All of these methods were trained using a data split of 70% for training, 15% for validation, and 15% for testing. Finally, the multivariable linear regression model, specifically for the description based on chemical bonds, outperformed the other methods. It resulted in a Pearson correlation coefficient of 99.93% and 96.43% for the test data of heat of combustion and heat of formation, respectively. This demonstrates that the linear model approach is suitable for organic substances composed of C, H, O, N, S. In addition to evaluating the goodness of fit, a local contribution analysis was employed for each input variable using a calculation methodology derived from game theory, known as Shapley values. This analysis allowed for the identification of the influence of each variable in comparison with the average value predicted by the model. |