Abordagem de aprendizagem de máquina para predição de famílias de peptídeos antimicrobianos vegetais
Ano de defesa: | 2022 |
---|---|
Autor(a) principal: | |
Orientador(a): | |
Banca de defesa: | |
Tipo de documento: | Dissertação |
Tipo de acesso: | Acesso embargado |
Idioma: | por |
Instituição de defesa: |
Universidade Federal da Paraíba
Brasil Informática Programa de Pós-Graduação em Informática UFPB |
Programa de Pós-Graduação: |
Não Informado pela instituição
|
Departamento: |
Não Informado pela instituição
|
País: |
Não Informado pela instituição
|
Palavras-chave em Português: | |
Link de acesso: | https://repositorio.ufpb.br/jspui/handle/123456789/26066 |
Resumo: | Plant antimicrobial peptides (AMPs) consist of molecules ranging from 29 to 100 amino acids, generally positively charged and are found in a wide variety of life forms. Most AMPs have direct antipathogenic action, while others have immunomodulatory activity. In a context of a rapid development of increasing multi-resistance to conventional antibiotics as well as the need to reduce the use of pesticides, efforts to develop biotechnological products based on AMPs have been accelerated. AMPs families are known to have a specific sequence composition, a characteristic that can be used to prospect and design AMPs, helping to characterize functions, biochemical patterns and characteristics of industrial interest. However, experimental approaches are costly and laborious, besides, AMPs in silico prediction and characterization are hard because of the small length and sequence, but crucial. In this sense, the objective of this work was to develop models using supervised learning methods capable of classifying six families of plant antimicrobial peptides: Thionins, LTPs, Heveins, Snakins, Defensins and Cyclotides, since this characterization can accelerate research with AMP. The methods used for classification were LightGBM, logistic regression, nearest neighbors, decision tree, support vector machine and naive bayes. The training was performed using the PhytAMP and PlantpepDB databases, whose labels were assigned by experimental methodologies. The results were compared with the CAMPSing systems and the one described in Quintans (2019), as well as tests were carried out with the model developed to classify the CAMPR3 databases, which includes the Thionine and Defensin families; and OneKP, and a alignment based system developed by our study group with OneKP base, which includes the Thionin, Lipid Transference Protein, Hevein and Snakin families. Both databases have labels assigned using in silico methodologies. Regarding the results, considering the training bases, the LightGBM algorithm presented the best performance values in relation to the others. After defining the best hyperparameters, LightGBM presented an average accuracy of 91.5%. In relation to the OneKP database, the method presented an average accuracy of 91.2%, with the prediction performance being variable between classes. Finally, for the CAMPR3 base, the model presented an average accuracy of 93%. |