Seleção de SNPs em culturas de arroz utilizando aprendizado de máquina
Ano de defesa: | 2024 |
---|---|
Autor(a) principal: | |
Orientador(a): | |
Banca de defesa: | |
Tipo de documento: | Dissertação |
Tipo de acesso: | Acesso aberto |
Idioma: | por |
Instituição de defesa: |
Universidade Federal de São Carlos
Câmpus São Carlos |
Programa de Pós-Graduação: |
Programa de Pós-Graduação em Ciência da Computação - PPGCC
|
Departamento: |
Não Informado pela instituição
|
País: |
Não Informado pela instituição
|
Palavras-chave em Português: | |
Palavras-chave em Inglês: | |
Área do conhecimento CNPq: | |
Link de acesso: | https://repositorio.ufscar.br/handle/20.500.14289/19636 |
Resumo: | Rice (Oryza sativa) is one of the largest collections of genetic resources among plant species of economic interest. To increase the productivity of this cultivar, several genetic variability studies have been developed. In this context, single nucleotide polymorphisms (SNPs), which are single base variations in DNA sequences, have been widely studied, as they act as molecular markers linked to productivity and resistance in rice cultivation. However, due to the ineffectiveness of conventional methods in the task of selecting SNPs, methods based on Machine Learning (ML) have been used. For this purpose, the selection of SNPs is modeled as a Feature Selection (FS) problem. Although the use of FS is widespread in the literature, there are still gaps regarding its use in the context of rice genetic improvement. In conjunction with this, there is a need to investigate the SNPs selected by these methods in genetic improvement studies, to offer possible biological explanations linked to the results generated. To advance interesting points regarding this discussion, this work proposes some ensemble methods for selecting SNPs, to combine several FS algorithms to generate a robust result. These methods were implemented such as to create a pipeline for SNPs selection. The pipeline was applied to a dataset with multiple phenotypes linked to rice productivity. The proposed methods were compared to other methods present in the literature, demonstrating the best results in some cases. Furthermore, the use of functional enrichment as a strategy to explain the results was explored. The dataset used belongs to the Coleção Nuclear de Arroz of Embrapa Arroz e Feijão and was provided with the intention that the results generated in the present work would be subsequently investigated and used in the genetic improvement of rice |