Abordagem de aprendizagem de máquina para predição de famílias de peptídeos antimicrobianos vegetais

Detalhes bibliográficos
Ano de defesa: 2022
Autor(a) principal: Andrade, Annie Elisabeth Beltrão de
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso embargado
Idioma: por
Instituição de defesa: Universidade Federal da Paraíba
Brasil
Informática
Programa de Pós-Graduação em Informática
UFPB
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: https://repositorio.ufpb.br/jspui/handle/123456789/26066
Resumo: Plant antimicrobial peptides (AMPs) consist of molecules ranging from 29 to 100 amino acids, generally positively charged and are found in a wide variety of life forms. Most AMPs have direct antipathogenic action, while others have immunomodulatory activity. In a context of a rapid development of increasing multi-resistance to conventional antibiotics as well as the need to reduce the use of pesticides, efforts to develop biotechnological products based on AMPs have been accelerated. AMPs families are known to have a specific sequence composition, a characteristic that can be used to prospect and design AMPs, helping to characterize functions, biochemical patterns and characteristics of industrial interest. However, experimental approaches are costly and laborious, besides, AMPs in silico prediction and characterization are hard because of the small length and sequence, but crucial. In this sense, the objective of this work was to develop models using supervised learning methods capable of classifying six families of plant antimicrobial peptides: Thionins, LTPs, Heveins, Snakins, Defensins and Cyclotides, since this characterization can accelerate research with AMP. The methods used for classification were LightGBM, logistic regression, nearest neighbors, decision tree, support vector machine and naive bayes. The training was performed using the PhytAMP and PlantpepDB databases, whose labels were assigned by experimental methodologies. The results were compared with the CAMPSing systems and the one described in Quintans (2019), as well as tests were carried out with the model developed to classify the CAMPR3 databases, which includes the Thionine and Defensin families; and OneKP, and a alignment based system developed by our study group with OneKP base, which includes the Thionin, Lipid Transference Protein, Hevein and Snakin families. Both databases have labels assigned using in silico methodologies. Regarding the results, considering the training bases, the LightGBM algorithm presented the best performance values in relation to the others. After defining the best hyperparameters, LightGBM presented an average accuracy of 91.5%. In relation to the OneKP database, the method presented an average accuracy of 91.2%, with the prediction performance being variable between classes. Finally, for the CAMPR3 base, the model presented an average accuracy of 93%.