Maize common rust resistance classification with machine learning analyzes
Ano de defesa: | 2023 |
---|---|
Autor(a) principal: | |
Orientador(a): | |
Banca de defesa: | |
Tipo de documento: | Tese |
Tipo de acesso: | Acesso aberto |
Idioma: | eng |
Instituição de defesa: |
Universidade Federal de Viçosa
Genética e Melhoramento |
Programa de Pós-Graduação: |
Não Informado pela instituição
|
Departamento: |
Não Informado pela instituição
|
País: |
Não Informado pela instituição
|
Palavras-chave em Português: | |
Link de acesso: | https://locus.ufv.br//handle/123456789/31122 https://doi.org/10.47328/ufvbbt.2023.202 |
Resumo: | Maize (Zea mays ssp. Mays) is a widely cultivated crop, having one of the highest productivities among cereals, and it is of great importance in human consumption, both in natura and processed. In addition, it has applications in industry as a source of energy through corn ethanol and animal feed. Many diseases can affect maize yield such as the Maize Common Rust (MCR) (Puccinia sorghi Schwein), a leaf disease which causes the appearance of pustules. The aim of this study was to classify maize lines between resistant and susceptible, selecting 50% of them to be carried on the breeding pipeline. A dataset containing three time-point evaluations in two years using a visual score scale and two Unmanned Aerial Vehicle (UAV) - couple sensors (multispectral and thermal) data were analyzed with six machine learning algorithms in order to identify the training time set to deliver the best classification performance. The three time-point evaluations phenotypic data along with the genetic markers data were used to explore the performance of the Support Vector Machine (SVM) and the Artificial Neural Network (ANN) algorithms in a k-fold cross-validation analysis with nine datasets. Their learning curves and feature importance rank were analyzed using the SVM algorithm. Our results showed that the last evaluation training set delivered the highest accuracies, of approximately 80 per cent, with Logistic Regression and SVM outperforming the other algorithms. The results obtained with the analysis by year suggest that a homogenous distribution of scores is of great importance for an effective MCR resistance classification. Our results also demonstrated the advantageous use of the SVM algorithm, in which models had the capacity to generalize using a smaller number of features. Similar performance metrics were achieved with SVM when the third evaluation and the three time-point evaluations combined together were employed. The SVM learning curves indicate that the addition of more training samples would be beneficial for all datasets analyzed. The five most important features for each dataset were listed, resulting in a predominance of the Red wavelength in the first position of the rank. In addition, the protein- coding genes aligned with the markers’ allele sequence ranked as important should be further explored in genomic-functional studies. Keywords: Maize common rust. Machine learning. SVM. ANN. Data mining. |