A Machine Learning Approach for Hot-Spot Detection at Protein-Protein Interfaces

Bibliographic Details
Main Author: Melo, Rita
Publication Date: 2016
Other Authors: Fieldhouse, Robert, Melo, André, Correia, João D. G., Cordeiro, Maria Natália D. S., Gümüş, Zeynep H., Costa, Joaquim, Bonvin, Alexandre M. J. J., Moreira, Irina S.
Format: Article
Language: eng
Source: Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
Download full: https://hdl.handle.net/10316/108631
https://doi.org/10.3390/ijms17081215
Summary: Understanding protein-protein interactions is a key challenge in biochemistry. In this work, we describe a more accurate methodology to predict Hot-Spots (HS) in protein-protein interfaces from their native complex structure compared to previous published Machine Learning (ML) techniques. Our model is trained on a large number of complexes and on a significantly larger number of different structural- and evolutionary sequence-based features. In particular, we added interface size, type of interaction between residues at the interface of the complex, number of different types of residues at the interface and the Position-Specific Scoring Matrix (PSSM), for a total of 79 features. We used twenty-seven algorithms from a simple linear-based function to support-vector machine models with different cost functions. The best model was achieved by the use of the conditional inference random forest (c-forest) algorithm with a dataset pre-processed by the normalization of features and with up-sampling of the minor class. The method has an overall accuracy of 0.80, an F1-score of 0.73, a sensitivity of 0.76 and a specificity of 0.82 for the independent test set.
id RCAP_12475a3b51d8d98bb95d75e419afeb71
oai_identifier_str oai:estudogeral.uc.pt:10316/108631
network_acronym_str RCAP
network_name_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository_id_str https://opendoar.ac.uk/repository/7160
spelling A Machine Learning Approach for Hot-Spot Detection at Protein-Protein Interfacesprotein-protein interfaceshot-spotsmachine learningSolvent Accessible Surface Area (SASA)evolutionary sequence conservationAlgorithmsComputational BiologyDatabases, ProteinHumansProtein ConformationProtein Interaction Domains and MotifsProtein Interaction MappingProteinsMachine LearningUnderstanding protein-protein interactions is a key challenge in biochemistry. In this work, we describe a more accurate methodology to predict Hot-Spots (HS) in protein-protein interfaces from their native complex structure compared to previous published Machine Learning (ML) techniques. Our model is trained on a large number of complexes and on a significantly larger number of different structural- and evolutionary sequence-based features. In particular, we added interface size, type of interaction between residues at the interface of the complex, number of different types of residues at the interface and the Position-Specific Scoring Matrix (PSSM), for a total of 79 features. We used twenty-seven algorithms from a simple linear-based function to support-vector machine models with different cost functions. The best model was achieved by the use of the conditional inference random forest (c-forest) algorithm with a dataset pre-processed by the normalization of features and with up-sampling of the minor class. The method has an overall accuracy of 0.80, an F1-score of 0.73, a sensitivity of 0.76 and a specificity of 0.82 for the independent test set.Marie Skłodowska-Curie Individual Fellowship MSCA-IF-2015 (MEMBRANEPROT 659826); FCT Investigator program—IF/00578/2014 and Center for Basic and Translational Research on Disorders of the Digestive System, Rockefeller University, through the generosity of the Leona M. and Harry B. Helmsley Charitable Trust and start-up funds of the Icahn School of Medicine at Mount Sinai.MDPI2016-07-27info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articlehttps://hdl.handle.net/10316/108631https://hdl.handle.net/10316/108631https://doi.org/10.3390/ijms17081215eng1422-0067Melo, RitaFieldhouse, RobertMelo, AndréCorreia, João D. G.Cordeiro, Maria Natália D. S.Gümüş, Zeynep H.Costa, JoaquimBonvin, Alexandre M. J. J.Moreira, Irina S.info:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2024-09-06T13:40:31Zoai:estudogeral.uc.pt:10316/108631Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-29T06:00:00.515520Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse
dc.title.none.fl_str_mv A Machine Learning Approach for Hot-Spot Detection at Protein-Protein Interfaces
title A Machine Learning Approach for Hot-Spot Detection at Protein-Protein Interfaces
spellingShingle A Machine Learning Approach for Hot-Spot Detection at Protein-Protein Interfaces
Melo, Rita
protein-protein interfaces
hot-spots
machine learning
Solvent Accessible Surface Area (SASA)
evolutionary sequence conservation
Algorithms
Computational Biology
Databases, Protein
Humans
Protein Conformation
Protein Interaction Domains and Motifs
Protein Interaction Mapping
Proteins
Machine Learning
title_short A Machine Learning Approach for Hot-Spot Detection at Protein-Protein Interfaces
title_full A Machine Learning Approach for Hot-Spot Detection at Protein-Protein Interfaces
title_fullStr A Machine Learning Approach for Hot-Spot Detection at Protein-Protein Interfaces
title_full_unstemmed A Machine Learning Approach for Hot-Spot Detection at Protein-Protein Interfaces
title_sort A Machine Learning Approach for Hot-Spot Detection at Protein-Protein Interfaces
author Melo, Rita
author_facet Melo, Rita
Fieldhouse, Robert
Melo, André
Correia, João D. G.
Cordeiro, Maria Natália D. S.
Gümüş, Zeynep H.
Costa, Joaquim
Bonvin, Alexandre M. J. J.
Moreira, Irina S.
author_role author
author2 Fieldhouse, Robert
Melo, André
Correia, João D. G.
Cordeiro, Maria Natália D. S.
Gümüş, Zeynep H.
Costa, Joaquim
Bonvin, Alexandre M. J. J.
Moreira, Irina S.
author2_role author
author
author
author
author
author
author
author
dc.contributor.author.fl_str_mv Melo, Rita
Fieldhouse, Robert
Melo, André
Correia, João D. G.
Cordeiro, Maria Natália D. S.
Gümüş, Zeynep H.
Costa, Joaquim
Bonvin, Alexandre M. J. J.
Moreira, Irina S.
dc.subject.por.fl_str_mv protein-protein interfaces
hot-spots
machine learning
Solvent Accessible Surface Area (SASA)
evolutionary sequence conservation
Algorithms
Computational Biology
Databases, Protein
Humans
Protein Conformation
Protein Interaction Domains and Motifs
Protein Interaction Mapping
Proteins
Machine Learning
topic protein-protein interfaces
hot-spots
machine learning
Solvent Accessible Surface Area (SASA)
evolutionary sequence conservation
Algorithms
Computational Biology
Databases, Protein
Humans
Protein Conformation
Protein Interaction Domains and Motifs
Protein Interaction Mapping
Proteins
Machine Learning
description Understanding protein-protein interactions is a key challenge in biochemistry. In this work, we describe a more accurate methodology to predict Hot-Spots (HS) in protein-protein interfaces from their native complex structure compared to previous published Machine Learning (ML) techniques. Our model is trained on a large number of complexes and on a significantly larger number of different structural- and evolutionary sequence-based features. In particular, we added interface size, type of interaction between residues at the interface of the complex, number of different types of residues at the interface and the Position-Specific Scoring Matrix (PSSM), for a total of 79 features. We used twenty-seven algorithms from a simple linear-based function to support-vector machine models with different cost functions. The best model was achieved by the use of the conditional inference random forest (c-forest) algorithm with a dataset pre-processed by the normalization of features and with up-sampling of the minor class. The method has an overall accuracy of 0.80, an F1-score of 0.73, a sensitivity of 0.76 and a specificity of 0.82 for the independent test set.
publishDate 2016
dc.date.none.fl_str_mv 2016-07-27
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://hdl.handle.net/10316/108631
https://hdl.handle.net/10316/108631
https://doi.org/10.3390/ijms17081215
url https://hdl.handle.net/10316/108631
https://doi.org/10.3390/ijms17081215
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 1422-0067
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.publisher.none.fl_str_mv MDPI
publisher.none.fl_str_mv MDPI
dc.source.none.fl_str_mv reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron:RCAAP
instname_str FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron_str RCAAP
institution RCAAP
reponame_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
collection Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository.name.fl_str_mv Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
repository.mail.fl_str_mv info@rcaap.pt
_version_ 1833602543380332544