Automated knowledge extraction from protein sequence

Faria, Daniel

Automated knowledge extraction from protein sequence

Detalhes bibliográficos
Autor(a) principal:	Faria, Daniel
Data de Publicação:	2012
Idioma:	eng
Título da fonte:	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
Texto Completo:	http://hdl.handle.net/10451/7159
Resumo:	Efficient and reliable prediction of protein functions based on their sequences is one of the standing problems in genetics and bioinformatics, as experimental methods to determine protein function are unable to keep up with the rate at which new sequences are published. The function of a protein is conditioned by its three-dimensional structure, which is deeply tied to the sequence, but we cannot yet model this information with sufficient reliability to make de novo protein function predictions. Thus, protein function predictions are necessarily comparative. The most common approaches to protein function prediction rely on sequence alignments and on the assumption that proteins of similar sequence have evolved from a common ancestor and thus should perform similar functions. However, cases of divergent evolution are relatively common, and can lead to prediction errors from these approaches. Machine learning approaches not involving sequence alignments methods have also been applied to protein function prediction. However, their application has been mostly restricted to predicting generic functional aspects of proteins. My thesis is that it is possible to extract suficient information from protein sequences to make reliable detailed function predictions without the use of sequence alignments, and therefore develop machine learning approaches that can compete in general with alignment-based approaches. To prove this thesis, I developed and evaluated multiple machine learning approaches in the context of detailed function prediction. Several of these approaches were able to compete with alignmentbased classiffiers in precision, and two outperformed them notably in small classiffication problems. The main contribution of my work was the discovery of the informativeness of tripeptide subsequences. The tripeptide composition of protein sequences not only led to the most precise classification of all approaches tested, but also was suficiently informative to measure similarity between proteins directly, and compete with sequence alignments.

Metadados do item

id	RCAP_c58d2cc30f9548b1439a3f1832966e1f
oai_identifier_str	oai:repositorio.ulisboa.pt:10451/7159
network_acronym_str	RCAP
network_name_str	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository_id_str	https://opendoar.ac.uk/repository/7160
spelling	Automated knowledge extraction from protein sequenceBioinformáticaAutomatizaçãoTeses de doutoramento - 2012Efficient and reliable prediction of protein functions based on their sequences is one of the standing problems in genetics and bioinformatics, as experimental methods to determine protein function are unable to keep up with the rate at which new sequences are published. The function of a protein is conditioned by its three-dimensional structure, which is deeply tied to the sequence, but we cannot yet model this information with sufficient reliability to make de novo protein function predictions. Thus, protein function predictions are necessarily comparative. The most common approaches to protein function prediction rely on sequence alignments and on the assumption that proteins of similar sequence have evolved from a common ancestor and thus should perform similar functions. However, cases of divergent evolution are relatively common, and can lead to prediction errors from these approaches. Machine learning approaches not involving sequence alignments methods have also been applied to protein function prediction. However, their application has been mostly restricted to predicting generic functional aspects of proteins. My thesis is that it is possible to extract suficient information from protein sequences to make reliable detailed function predictions without the use of sequence alignments, and therefore develop machine learning approaches that can compete in general with alignment-based approaches. To prove this thesis, I developed and evaluated multiple machine learning approaches in the context of detailed function prediction. Several of these approaches were able to compete with alignmentbased classiffiers in precision, and two outperformed them notably in small classiffication problems. The main contribution of my work was the discovery of the informativeness of tripeptide subsequences. The tripeptide composition of protein sequences not only led to the most precise classification of all approaches tested, but also was suficiently informative to measure similarity between proteins directly, and compete with sequence alignments.Fundação para a Ciência e TecnologiaFalcão,André Osório e Cruz de Azerêdo,1969-Ferreira,António Eduardo do Nascimento,1964-Repositório da Universidade de LisboaFaria, Daniel2012-11-02T15:38:02Z20122012-01-01T00:00:00Zdoctoral thesisinfo:eu-repo/semantics/publishedVersionapplication/pdfhttp://hdl.handle.net/10451/7159enginfo:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2025-03-17T12:55:20Zoai:repositorio.ulisboa.pt:10451/7159Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-29T02:31:07.234248Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse
dc.title.none.fl_str_mv	Automated knowledge extraction from protein sequence
title	Automated knowledge extraction from protein sequence
spellingShingle	Automated knowledge extraction from protein sequence Faria, Daniel Bioinformática Automatização Teses de doutoramento - 2012
title_short	Automated knowledge extraction from protein sequence
title_full	Automated knowledge extraction from protein sequence
title_fullStr	Automated knowledge extraction from protein sequence
title_full_unstemmed	Automated knowledge extraction from protein sequence
title_sort	Automated knowledge extraction from protein sequence
author	Faria, Daniel
author_facet	Faria, Daniel
author_role	author
dc.contributor.none.fl_str_mv	Falcão,André Osório e Cruz de Azerêdo,1969- Ferreira,António Eduardo do Nascimento,1964- Repositório da Universidade de Lisboa
dc.contributor.author.fl_str_mv	Faria, Daniel
dc.subject.por.fl_str_mv	Bioinformática Automatização Teses de doutoramento - 2012
topic	Bioinformática Automatização Teses de doutoramento - 2012
description	Efficient and reliable prediction of protein functions based on their sequences is one of the standing problems in genetics and bioinformatics, as experimental methods to determine protein function are unable to keep up with the rate at which new sequences are published. The function of a protein is conditioned by its three-dimensional structure, which is deeply tied to the sequence, but we cannot yet model this information with sufficient reliability to make de novo protein function predictions. Thus, protein function predictions are necessarily comparative. The most common approaches to protein function prediction rely on sequence alignments and on the assumption that proteins of similar sequence have evolved from a common ancestor and thus should perform similar functions. However, cases of divergent evolution are relatively common, and can lead to prediction errors from these approaches. Machine learning approaches not involving sequence alignments methods have also been applied to protein function prediction. However, their application has been mostly restricted to predicting generic functional aspects of proteins. My thesis is that it is possible to extract suficient information from protein sequences to make reliable detailed function predictions without the use of sequence alignments, and therefore develop machine learning approaches that can compete in general with alignment-based approaches. To prove this thesis, I developed and evaluated multiple machine learning approaches in the context of detailed function prediction. Several of these approaches were able to compete with alignmentbased classiffiers in precision, and two outperformed them notably in small classiffication problems. The main contribution of my work was the discovery of the informativeness of tripeptide subsequences. The tripeptide composition of protein sequences not only led to the most precise classification of all approaches tested, but also was suficiently informative to measure similarity between proteins directly, and compete with sequence alignments.
publishDate	2012
dc.date.none.fl_str_mv	2012-11-02T15:38:02Z 2012 2012-01-01T00:00:00Z
dc.type.driver.fl_str_mv	doctoral thesis
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://hdl.handle.net/10451/7159
url	http://hdl.handle.net/10451/7159
dc.language.iso.fl_str_mv	eng
language	eng
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.source.none.fl_str_mv	reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia instacron:RCAAP
instname_str	FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron_str	RCAAP
institution	RCAAP
reponame_str	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
collection	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository.name.fl_str_mv	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
repository.mail.fl_str_mv	info@rcaap.pt
_version_	1833601391753428992

Automated knowledge extraction from protein sequence

Registros relacionados