A Machine Learning Approach for Hot-Spot Detection at Protein-Protein Interfaces
| Main Author: | |
|---|---|
| Publication Date: | 2016 |
| Other Authors: | , , , , , , , |
| Format: | Article |
| Language: | eng |
| Source: | Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| Download full: | https://hdl.handle.net/10316/108631 https://doi.org/10.3390/ijms17081215 |
Summary: | Understanding protein-protein interactions is a key challenge in biochemistry. In this work, we describe a more accurate methodology to predict Hot-Spots (HS) in protein-protein interfaces from their native complex structure compared to previous published Machine Learning (ML) techniques. Our model is trained on a large number of complexes and on a significantly larger number of different structural- and evolutionary sequence-based features. In particular, we added interface size, type of interaction between residues at the interface of the complex, number of different types of residues at the interface and the Position-Specific Scoring Matrix (PSSM), for a total of 79 features. We used twenty-seven algorithms from a simple linear-based function to support-vector machine models with different cost functions. The best model was achieved by the use of the conditional inference random forest (c-forest) algorithm with a dataset pre-processed by the normalization of features and with up-sampling of the minor class. The method has an overall accuracy of 0.80, an F1-score of 0.73, a sensitivity of 0.76 and a specificity of 0.82 for the independent test set. |
| id |
RCAP_12475a3b51d8d98bb95d75e419afeb71 |
|---|---|
| oai_identifier_str |
oai:estudogeral.uc.pt:10316/108631 |
| network_acronym_str |
RCAP |
| network_name_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| repository_id_str |
https://opendoar.ac.uk/repository/7160 |
| spelling |
A Machine Learning Approach for Hot-Spot Detection at Protein-Protein Interfacesprotein-protein interfaceshot-spotsmachine learningSolvent Accessible Surface Area (SASA)evolutionary sequence conservationAlgorithmsComputational BiologyDatabases, ProteinHumansProtein ConformationProtein Interaction Domains and MotifsProtein Interaction MappingProteinsMachine LearningUnderstanding protein-protein interactions is a key challenge in biochemistry. In this work, we describe a more accurate methodology to predict Hot-Spots (HS) in protein-protein interfaces from their native complex structure compared to previous published Machine Learning (ML) techniques. Our model is trained on a large number of complexes and on a significantly larger number of different structural- and evolutionary sequence-based features. In particular, we added interface size, type of interaction between residues at the interface of the complex, number of different types of residues at the interface and the Position-Specific Scoring Matrix (PSSM), for a total of 79 features. We used twenty-seven algorithms from a simple linear-based function to support-vector machine models with different cost functions. The best model was achieved by the use of the conditional inference random forest (c-forest) algorithm with a dataset pre-processed by the normalization of features and with up-sampling of the minor class. The method has an overall accuracy of 0.80, an F1-score of 0.73, a sensitivity of 0.76 and a specificity of 0.82 for the independent test set.Marie Skłodowska-Curie Individual Fellowship MSCA-IF-2015 (MEMBRANEPROT 659826); FCT Investigator program—IF/00578/2014 and Center for Basic and Translational Research on Disorders of the Digestive System, Rockefeller University, through the generosity of the Leona M. and Harry B. Helmsley Charitable Trust and start-up funds of the Icahn School of Medicine at Mount Sinai.MDPI2016-07-27info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articlehttps://hdl.handle.net/10316/108631https://hdl.handle.net/10316/108631https://doi.org/10.3390/ijms17081215eng1422-0067Melo, RitaFieldhouse, RobertMelo, AndréCorreia, João D. G.Cordeiro, Maria Natália D. S.Gümüş, Zeynep H.Costa, JoaquimBonvin, Alexandre M. J. J.Moreira, Irina S.info:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2024-09-06T13:40:31Zoai:estudogeral.uc.pt:10316/108631Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-29T06:00:00.515520Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse |
| dc.title.none.fl_str_mv |
A Machine Learning Approach for Hot-Spot Detection at Protein-Protein Interfaces |
| title |
A Machine Learning Approach for Hot-Spot Detection at Protein-Protein Interfaces |
| spellingShingle |
A Machine Learning Approach for Hot-Spot Detection at Protein-Protein Interfaces Melo, Rita protein-protein interfaces hot-spots machine learning Solvent Accessible Surface Area (SASA) evolutionary sequence conservation Algorithms Computational Biology Databases, Protein Humans Protein Conformation Protein Interaction Domains and Motifs Protein Interaction Mapping Proteins Machine Learning |
| title_short |
A Machine Learning Approach for Hot-Spot Detection at Protein-Protein Interfaces |
| title_full |
A Machine Learning Approach for Hot-Spot Detection at Protein-Protein Interfaces |
| title_fullStr |
A Machine Learning Approach for Hot-Spot Detection at Protein-Protein Interfaces |
| title_full_unstemmed |
A Machine Learning Approach for Hot-Spot Detection at Protein-Protein Interfaces |
| title_sort |
A Machine Learning Approach for Hot-Spot Detection at Protein-Protein Interfaces |
| author |
Melo, Rita |
| author_facet |
Melo, Rita Fieldhouse, Robert Melo, André Correia, João D. G. Cordeiro, Maria Natália D. S. Gümüş, Zeynep H. Costa, Joaquim Bonvin, Alexandre M. J. J. Moreira, Irina S. |
| author_role |
author |
| author2 |
Fieldhouse, Robert Melo, André Correia, João D. G. Cordeiro, Maria Natália D. S. Gümüş, Zeynep H. Costa, Joaquim Bonvin, Alexandre M. J. J. Moreira, Irina S. |
| author2_role |
author author author author author author author author |
| dc.contributor.author.fl_str_mv |
Melo, Rita Fieldhouse, Robert Melo, André Correia, João D. G. Cordeiro, Maria Natália D. S. Gümüş, Zeynep H. Costa, Joaquim Bonvin, Alexandre M. J. J. Moreira, Irina S. |
| dc.subject.por.fl_str_mv |
protein-protein interfaces hot-spots machine learning Solvent Accessible Surface Area (SASA) evolutionary sequence conservation Algorithms Computational Biology Databases, Protein Humans Protein Conformation Protein Interaction Domains and Motifs Protein Interaction Mapping Proteins Machine Learning |
| topic |
protein-protein interfaces hot-spots machine learning Solvent Accessible Surface Area (SASA) evolutionary sequence conservation Algorithms Computational Biology Databases, Protein Humans Protein Conformation Protein Interaction Domains and Motifs Protein Interaction Mapping Proteins Machine Learning |
| description |
Understanding protein-protein interactions is a key challenge in biochemistry. In this work, we describe a more accurate methodology to predict Hot-Spots (HS) in protein-protein interfaces from their native complex structure compared to previous published Machine Learning (ML) techniques. Our model is trained on a large number of complexes and on a significantly larger number of different structural- and evolutionary sequence-based features. In particular, we added interface size, type of interaction between residues at the interface of the complex, number of different types of residues at the interface and the Position-Specific Scoring Matrix (PSSM), for a total of 79 features. We used twenty-seven algorithms from a simple linear-based function to support-vector machine models with different cost functions. The best model was achieved by the use of the conditional inference random forest (c-forest) algorithm with a dataset pre-processed by the normalization of features and with up-sampling of the minor class. The method has an overall accuracy of 0.80, an F1-score of 0.73, a sensitivity of 0.76 and a specificity of 0.82 for the independent test set. |
| publishDate |
2016 |
| dc.date.none.fl_str_mv |
2016-07-27 |
| dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
| dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
| format |
article |
| status_str |
publishedVersion |
| dc.identifier.uri.fl_str_mv |
https://hdl.handle.net/10316/108631 https://hdl.handle.net/10316/108631 https://doi.org/10.3390/ijms17081215 |
| url |
https://hdl.handle.net/10316/108631 https://doi.org/10.3390/ijms17081215 |
| dc.language.iso.fl_str_mv |
eng |
| language |
eng |
| dc.relation.none.fl_str_mv |
1422-0067 |
| dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
| eu_rights_str_mv |
openAccess |
| dc.publisher.none.fl_str_mv |
MDPI |
| publisher.none.fl_str_mv |
MDPI |
| dc.source.none.fl_str_mv |
reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia instacron:RCAAP |
| instname_str |
FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
| instacron_str |
RCAAP |
| institution |
RCAAP |
| reponame_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| collection |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| repository.name.fl_str_mv |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
| repository.mail.fl_str_mv |
info@rcaap.pt |
| _version_ |
1833602543380332544 |