Machine learning prediction of protein abundance by codon usage metrics

Detalhes bibliográficos
Ano de defesa: 2020
Autor(a) principal: Ferreira, Maurício Alexander de Moura
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: eng
Instituição de defesa: Universidade Federal de Viçosa
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: https://locus.ufv.br//handle/123456789/27942
Resumo: Proteins are responsible for most physiological processes in the cell, and their abundance provides crucial information for systems biology research. Protein abundance is determined by a number of factors, such as mRNA abundance, translation efficiency, protein turnover rate, and codon usage bias. New frameworks of genome-scale metabolic models have been recently developed to simulate phenotypes taking into account protein abundance data along with enzyme kinetics. However, these models still have the limitation of dataset availability, which impairs their reconstruction. This is due to limitations in absolute protein quantification methods, such as mass spectrometry. Moreover, absolute protein quantification has been mostly limited to model species, such as Escherichia coli and Saccharomyces cerevisiae, which hinders system biology endeavours in non-model species. Codon usage bias directly affects translation dynamics, which in turn affects protein levels, and many metrics for codon usage have been developed in order to clarify this phenomenon. In this study, it was evaluated the effect of codon usage bias of genes in protein abundance. Notably, many differences regarding codon usage patterns between genes coding for highly abundant proteins and genes coding for less abundant proteins were observed. Based on these differences, various codon metrics coupled with machine learning algorithms were applied to predict the absolute abundance of proteins used by S. cerevisiae. The machine learning models predicted protein abundances from codon usage metrics with remarkable accuracy. Upon integration of the predicted protein abundance in enzyme-constrained genome-scale metabolic models, the simulated phenotypes closely matched experimental data, which demonstrates that the built predictive models are valuable tools for systems metabolic engineering approaches Keywords: Codon usage bias. Metabolic modelling. Metabolic engineering.