From seed to canopy: high-throughput phenotyping and machine learning in soybean breeding

Detalhes bibliográficos
Ano de defesa: 2024
Autor(a) principal: Miranda, Melissa Cristina de Carvalho
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Tese
Tipo de acesso: Acesso aberto
Idioma: eng
Instituição de defesa: Biblioteca Digitais de Teses e Dissertações da USP
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: https://www.teses.usp.br/teses/disponiveis/11/11137/tde-02072024-112314/
Resumo: Soybean breeding faces the challenge of evaluating large and complex populations in different environments to obtain accurate genetic values that can be used as selection criteria. This study aims to overcome this challenge by enhancing the understanding of the potential of highthroughput phenotyping (HTP) and the application of machine learning (ML) models in predicting classic phenotypic traits in soybean breeding programs, through the analysis of seed images and aerial canopy images of the plants. The methodology consisted of the phenotypic characterization of 275 soybean genotypes in different environments and management practices, including management with and without fungicide application for the control of Asian rust. In general, predictions based on regression algorithms (support vector machine (SVM), random forest (RF), multilayer perceptron neural network (MLP), and AdaBoosting) were initially evaluated, followed by the use of transfer learning techniques with convolutional neural networks (CNNs) to extract features from images (VGG16, VGG19, ResNet50, InceptionV3, and Inception-ResNetV2) integrated with the same models for prediction. In the first chapter, RGB (red-green-blue) images of seeds from each plot were collected, considering sparsely and densely distributed seeds. A custom image processing pipeline was developed for seed segmentation, which allowed for a detailed morphological evaluation. ML algorithms and different CNNs architectures were compared in predicting the weight of a hundred seeds. The image segmentation technique correctly identified over 98% of the seeds, and the morphological measurements achieved a predictive ability of 0.71, with a mean squared error (MSE) of 3.15. The same results were observed for the CNN features, highlighting the efficiency of the morphological measurements as extractors of image features. The ResNet-50 model stood out as the most accurate CNN for feature extraction. In the second chapter, we investigated the heritability and correlation between vegetation indices obtained from aerial images and traditional phenotypic traits. High heritability of the RGBVI and GLI vegetation indices (mean H2 of 0.56) was found compared to other RGB-based indices, making them promising for genetic evaluations. The use of advanced ML techniques, especially transfer learning with ResNet 50, improved the prediction of traits such as days to R7 stage (DR7) and plant height measurement (PHM) from canopy images. The combination of ResNet 50 with RF for DR7 prediction and with MLP for PHM prediction showed promising results, highlighting the potential of these approaches to optimize decision-making in soybean breeding. In summary, the research concludes that the integration of image data with machine learning models offers a robust decision support system, enabling the prediction of classic phenotypic characteristics of soybeans through images, aiming to optimize the identification of high-performance genotypes.