Exportação concluída — 

Classificação de genes associados ao câncer de mama utilizando dados de expressão

Detalhes bibliográficos
Ano de defesa: 2025
Autor(a) principal: Valentin, Ana Beatriz Miranda
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Universidade Tecnológica Federal do Paraná
Cornelio Procopio
Brasil
Programa de Pós-Graduação em Bioinformática
UTFPR
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: http://repositorio.utfpr.edu.br/jspui/handle/1/36417
Resumo: Understanding the characteristics of tumors and subtypes of breast cancer based on gene expression data is crucial for assisting in the identification of cancer types, obtaining a more accurate diagnosis, and quickly directing appropriate treatment. In this context, the aim of this study is to apply machine learning and deep learning methods for the multiclass classification of genes associated with breast cancer, using gene expression datasets, and to evaluate the predictive performance of these methods. The datasets used are obtained from repositories such as TCGA and GEO, and undergo preprocessing for data treatment and the application of dimensionality reduction techniques due to the high number of variables. Initially, principal component analysis is used to reduce the dimensionality of the data. Then, different traditional machine learning methods are applied, such as Logistic Regression, Support Vector Machine, and Random Forest, as well as deep learning models such as Multilayer Perceptron and Convolutional Neural Network. To enhance the performance of these models, the Optuna library is used for hyperparameter optimization, evaluating the performance of the algorithms both with and without this optimization. The performance comparison between the algorithms showed that Logistic Regression and Support Vector Machine achieved high accuracy on the GEO and TCGA databases, respectively. However, the MLP and CNN models, especially when optimized with Optuna, also delivered competitive results. The optimization adjusted parameters such as learning rate and number of layers, leading to significant improvements in performance. While Random Forest was less impacted by optimization, MLP and CNN showed substantial gains. Additionally, the SHAP library was applied to analyze the importance of variables and the influence of each dimension for each classifier. The analysis highlighted that hyperparameter optimization can be crucial in improving classifier accuracy