ProPythia, an automated platform for the classification of peptides/proteins using machine learning

Bibliographic Details
Main Author: Sequeira, Ana Marta Fernandes Tavares
Publication Date: 2020
Other Authors: Pereira, S., Lousa, Diana, Rocha, Miguel
Language: eng
Source: Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
Download full: https://hdl.handle.net/1822/64073
Summary: One of the most challenging problems in bioinformatics is to computationally characterize sequences, structures and functions of proteins. Sequence-derived structural and physicochemical properties of proteins have been used in the development of machine learning models in protein related problems. However, tools and platforms to calculate features and perform Machine learning (ML) with proteins are scarce and have their limitations in terms of effectiveness, user-friendliness and applicability. Here, a generic modular automated ML-based platform for the classification of proteins based on their physicochemical properties is proposed. ProPythia, developed as a Python package, facilitates the major tasks of ML and includes modules to read and alter sequences, calculate protein features, pre-process datasets, execute feature reduction and selection, perform clustering, train and optimize ML models and make predictions. This platform was validated by testing its ability to classify anticancer and antimicrobial peptides and further used to explore viral fusion peptides. Membrane-interacting peptides play a crucial role in several biological processes. Fusion peptides are a subclass found in enveloped viruses, that are particularly relevant for membrane fusion. Determining what are the properties that characterize fusion peptides and distinguishing them from other proteins is a very relevant scientific question with important technological implications. Using three different datasets composed by well annotated sequences, different feature extraction techniques and feature selection methods, ML models were trained, tested and used to predict the location of a known fusion peptide in a protein sequence from the Dengue virus. Feature importance was also analysed. The models obtained will be useful in future research, also providing a biological insight into the distinctive physicochemical characteristics of fusion peptides. This work presents a freely available tool to perform ML-based protein classification and the first global analysis and prediction of viral fusion peptides using ML, reinforcing the usability and importance of ML in protein classification problems.
id RCAP_91a33227f09db423471af6c3001bb9db
oai_identifier_str oai:repositorium.sdum.uminho.pt:1822/64073
network_acronym_str RCAP
network_name_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository_id_str https://opendoar.ac.uk/repository/7160
spelling ProPythia, an automated platform for the classification of peptides/proteins using machine learningMachine learningPeptide classificationViral fusion peptidesOne of the most challenging problems in bioinformatics is to computationally characterize sequences, structures and functions of proteins. Sequence-derived structural and physicochemical properties of proteins have been used in the development of machine learning models in protein related problems. However, tools and platforms to calculate features and perform Machine learning (ML) with proteins are scarce and have their limitations in terms of effectiveness, user-friendliness and applicability. Here, a generic modular automated ML-based platform for the classification of proteins based on their physicochemical properties is proposed. ProPythia, developed as a Python package, facilitates the major tasks of ML and includes modules to read and alter sequences, calculate protein features, pre-process datasets, execute feature reduction and selection, perform clustering, train and optimize ML models and make predictions. This platform was validated by testing its ability to classify anticancer and antimicrobial peptides and further used to explore viral fusion peptides. Membrane-interacting peptides play a crucial role in several biological processes. Fusion peptides are a subclass found in enveloped viruses, that are particularly relevant for membrane fusion. Determining what are the properties that characterize fusion peptides and distinguishing them from other proteins is a very relevant scientific question with important technological implications. Using three different datasets composed by well annotated sequences, different feature extraction techniques and feature selection methods, ML models were trained, tested and used to predict the location of a known fusion peptide in a protein sequence from the Dengue virus. Feature importance was also analysed. The models obtained will be useful in future research, also providing a biological insight into the distinctive physicochemical characteristics of fusion peptides. This work presents a freely available tool to perform ML-based protein classification and the first global analysis and prediction of viral fusion peptides using ML, reinforcing the usability and importance of ML in protein classification problems.info:eu-repo/semantics/publishedVersionUniversidade do MinhoSequeira, Ana Marta Fernandes TavaresPereira, S.Lousa, DianaRocha, Miguel2020-02-192020-02-19T00:00:00Zconference objectinfo:eu-repo/semantics/publishedVersionapplication/pdfhttps://hdl.handle.net/1822/64073engSequeira, Ana; Pereira, Sara; Lousa, Diana; Rocha, Miguel, ProPythia, an automated platform for the classification of peptides/proteins using machine learning. BOD 2020 - IX Bioinformatics Open Days (Conference Book). Braga, Feb 19-21, 2020.http://www.bioinformaticsopendays.com/info:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2024-05-11T05:19:43Zoai:repositorium.sdum.uminho.pt:1822/64073Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-28T15:14:44.933561Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse
dc.title.none.fl_str_mv ProPythia, an automated platform for the classification of peptides/proteins using machine learning
title ProPythia, an automated platform for the classification of peptides/proteins using machine learning
spellingShingle ProPythia, an automated platform for the classification of peptides/proteins using machine learning
Sequeira, Ana Marta Fernandes Tavares
Machine learning
Peptide classification
Viral fusion peptides
title_short ProPythia, an automated platform for the classification of peptides/proteins using machine learning
title_full ProPythia, an automated platform for the classification of peptides/proteins using machine learning
title_fullStr ProPythia, an automated platform for the classification of peptides/proteins using machine learning
title_full_unstemmed ProPythia, an automated platform for the classification of peptides/proteins using machine learning
title_sort ProPythia, an automated platform for the classification of peptides/proteins using machine learning
author Sequeira, Ana Marta Fernandes Tavares
author_facet Sequeira, Ana Marta Fernandes Tavares
Pereira, S.
Lousa, Diana
Rocha, Miguel
author_role author
author2 Pereira, S.
Lousa, Diana
Rocha, Miguel
author2_role author
author
author
dc.contributor.none.fl_str_mv Universidade do Minho
dc.contributor.author.fl_str_mv Sequeira, Ana Marta Fernandes Tavares
Pereira, S.
Lousa, Diana
Rocha, Miguel
dc.subject.por.fl_str_mv Machine learning
Peptide classification
Viral fusion peptides
topic Machine learning
Peptide classification
Viral fusion peptides
description One of the most challenging problems in bioinformatics is to computationally characterize sequences, structures and functions of proteins. Sequence-derived structural and physicochemical properties of proteins have been used in the development of machine learning models in protein related problems. However, tools and platforms to calculate features and perform Machine learning (ML) with proteins are scarce and have their limitations in terms of effectiveness, user-friendliness and applicability. Here, a generic modular automated ML-based platform for the classification of proteins based on their physicochemical properties is proposed. ProPythia, developed as a Python package, facilitates the major tasks of ML and includes modules to read and alter sequences, calculate protein features, pre-process datasets, execute feature reduction and selection, perform clustering, train and optimize ML models and make predictions. This platform was validated by testing its ability to classify anticancer and antimicrobial peptides and further used to explore viral fusion peptides. Membrane-interacting peptides play a crucial role in several biological processes. Fusion peptides are a subclass found in enveloped viruses, that are particularly relevant for membrane fusion. Determining what are the properties that characterize fusion peptides and distinguishing them from other proteins is a very relevant scientific question with important technological implications. Using three different datasets composed by well annotated sequences, different feature extraction techniques and feature selection methods, ML models were trained, tested and used to predict the location of a known fusion peptide in a protein sequence from the Dengue virus. Feature importance was also analysed. The models obtained will be useful in future research, also providing a biological insight into the distinctive physicochemical characteristics of fusion peptides. This work presents a freely available tool to perform ML-based protein classification and the first global analysis and prediction of viral fusion peptides using ML, reinforcing the usability and importance of ML in protein classification problems.
publishDate 2020
dc.date.none.fl_str_mv 2020-02-19
2020-02-19T00:00:00Z
dc.type.driver.fl_str_mv conference object
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://hdl.handle.net/1822/64073
url https://hdl.handle.net/1822/64073
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv Sequeira, Ana; Pereira, Sara; Lousa, Diana; Rocha, Miguel, ProPythia, an automated platform for the classification of peptides/proteins using machine learning. BOD 2020 - IX Bioinformatics Open Days (Conference Book). Braga, Feb 19-21, 2020.
http://www.bioinformaticsopendays.com/
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron:RCAAP
instname_str FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron_str RCAAP
institution RCAAP
reponame_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
collection Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository.name.fl_str_mv Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
repository.mail.fl_str_mv info@rcaap.pt
_version_ 1833595193265225728