Predição de carbono orgânico do solo por espectroscopia Vis-Nir

Detalhes bibliográficos
Ano de defesa: 2021
Autor(a) principal: Heinen, Taciara Zborowski Horst
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Tese
Tipo de acesso: Acesso aberto
Idioma: eng
Instituição de defesa: Universidade Federal de Santa Maria
Brasil
Agronomia
UFSM
Programa de Pós-Graduação em Ciência do Solo
Centro de Ciências Rurais
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: http://repositorio.ufsm.br/handle/1/24215
Resumo: The development of large databases usually implies combining data collected for different purposes under different standards and methodologies, which often leads databases to suffer from disparate and inconsistent soil data. Despite the potential of visible and nearinfrared (Vis-NIR) spectroscopy to predict soil organic carbon (SOC) from those databases, the effectiveness and consistency among analytical methods used to produce the target data are seldom discussed. The main purpose of this research was to investigate the interplay among preprocessing techniques, model architectures, and especially the analytical methods used to produce the SOC target data. To accomplish it, we set up two specific objectives: i) evaluate the interplay among analytical methods, preprocessing techniques, and model architectures on SOC predictions, ii) assess whether this interplay can be translated into some form of hierarchy across validation metrics. In this PhD thesis, two chapters cover the topic where the above-mentioned objectives were met. Chapter I presents how changes in the analytical method (dry (SOCDC) and wet combustion with quantification by titrimetry (SOCWCt) and colorimetry (SOCWCc)) and the preprocessing techniques (smoothing (SMO), continuum removal (CRR), and Savitzky-Golay first derivative (SGD)) affect the empirical relationship captured by different machine learning algorithms (random forest, cubist, and partial least square regression (PLSR)). Cross-validation metrics were used to compare the parallel performance of 27 predictive models. The relationship between covariate matrix and target data is explored based on the variable importance. Chapter II shows how the interplay among those three factors can be translated into a hierarchy. A resampling technique was used to split the dataset into training and validation sets 100 times to achieve realistic performances and explore how the predictive performance changed as the training set changed. Conditional inference tree analysis was performed to evaluate how those three factors influenced global validation metrics. The predictive performance in both studies varied depending on the SOC analytical method, preprocessing technique, and model architecture employed. Among the three analytical methods tested, DC and WCt provided a higher correlation between SOC and spectra than WCc, and thus, resulted in higher models performance. The model architecture had a larger influence on the validation metrics over preprocessing techniques and analytical methods. PLSR models were more influenced by the analytical method, whereas the preprocessing technique influenced random forest and cubist more. Cubist models combined with CRR minimized the accuracy differences resulting from the employed SOC analytical methods. However, this combination resulted in overfitted model and high uncertainty on predictions. PLSR presented a more consistent performance than random forest and cubist. Overall, SOC data produced using different analytical methods in a training dataset significantly affected the prediction reliability, capability, and assessment. These results will be useful either to guide the analytical method selection for new projects or to manage already available databases. Besides that, they highlight the need for transparent and precise documentation over spectroscopy modeling to enable a fair comparison between publications.