Multiple sequence alignment correction using constraints

Detalhes bibliográficos
Autor(a) principal: Guasco, Luciano M.
Data de Publicação: 2010
Tipo de documento: Dissertação
Idioma: eng
Título da fonte: Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
Texto Completo: http://hdl.handle.net/10362/5143
Resumo: Trabalho apresentado no âmbito do European Master in Computational Logics, como requisito parcial para obtenção do grau de Mestre em Computational Logics
id RCAP_68e20ccba41524ba0c38f2d4ec5496af
oai_identifier_str oai:run.unl.pt:10362/5143
network_acronym_str RCAP
network_name_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository_id_str https://opendoar.ac.uk/repository/7160
spelling Multiple sequence alignment correction using constraintsMultiple sequence alignmentMolecular coevolutionMutual informationConstraintsAmino acids correlationTrabalho apresentado no âmbito do European Master in Computational Logics, como requisito parcial para obtenção do grau de Mestre em Computational LogicsOne of the most important fields in bioinformatics has been the study of protein sequence alignments. The study of homologous proteins, related by evolution, shows the conservation of many amino acids because of their functional and structural importance. One particular relationship between the amino acid sites in the same sequence or between different sequences, is protein-coevolution, interest in which has increased as a consequence of mathematical and computational methods used to understand the spatial, functional and evolutionary dependencies between amino acid sites. The principle of coevolution means that some amino acids are related through evolution because mutations in one site can create evolutionary pressures to select compensatory mutations in other sites that are functionally or structurally related. With the actual methods to detect coevolution, specifically mutual information techniques from the information theory field, we show in this work that much of the information between coevolved sites is lost because of mistakes in the multiple sequence alignment of variable regions. Moreover, we show that using these statistical methods to detect coevolved sites in multiple sequence alignments results in a high rate of false positives. Due to the amount of errors in the detection of coevolved site from multiple sequence alignments, we propose in this work a method to improve the detection efficacy of coevolved sites and we implement an algorithm to fix such sites correcting the misalignment produced in those specific locations. The detection part of our work is based on the mutual information between sites that are guessed as having coevolved, due to their high statistical correlation score. With this information we search for possible misalignments on those regions due to the incorrect matching of amino acids during the alignment. The re-alignment part is based on constraint programming techniques, to avoid the combinatorial complexity when one amino acid can be aligned with many others and to avoid inconsistencies in the alignments. In this work, we present a framework to impose constraints over the sequences, and we show how it is possible to compute alignments based on different criteria just by setting constraint between the amino acids. This framework can be applied not only for improving the alignment and detection of coevolved regions, but also to any desired constraints that may be used to express functional or structural relations among the amino acids in multiple sequences. We show also that after we fix these misalignments, using constraints based techniques, the correlation between coevolved sites increases and, in general, the new alignment is closer to the correct alignment than the MSA alignment. Finally, we show possible future research lines with the objective of overcoming some drawbacks detected during this work.Faculdade de Ciências e TecnologiaKrippahl, LudwigRUNGuasco, Luciano M.2011-02-16T15:29:32Z20102010-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10362/5143enginfo:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2024-05-22T17:08:27Zoai:run.unl.pt:10362/5143Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-28T16:39:24.941755Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse
dc.title.none.fl_str_mv Multiple sequence alignment correction using constraints
title Multiple sequence alignment correction using constraints
spellingShingle Multiple sequence alignment correction using constraints
Guasco, Luciano M.
Multiple sequence alignment
Molecular coevolution
Mutual information
Constraints
Amino acids correlation
title_short Multiple sequence alignment correction using constraints
title_full Multiple sequence alignment correction using constraints
title_fullStr Multiple sequence alignment correction using constraints
title_full_unstemmed Multiple sequence alignment correction using constraints
title_sort Multiple sequence alignment correction using constraints
author Guasco, Luciano M.
author_facet Guasco, Luciano M.
author_role author
dc.contributor.none.fl_str_mv Krippahl, Ludwig
RUN
dc.contributor.author.fl_str_mv Guasco, Luciano M.
dc.subject.por.fl_str_mv Multiple sequence alignment
Molecular coevolution
Mutual information
Constraints
Amino acids correlation
topic Multiple sequence alignment
Molecular coevolution
Mutual information
Constraints
Amino acids correlation
description Trabalho apresentado no âmbito do European Master in Computational Logics, como requisito parcial para obtenção do grau de Mestre em Computational Logics
publishDate 2010
dc.date.none.fl_str_mv 2010
2010-01-01T00:00:00Z
2011-02-16T15:29:32Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10362/5143
url http://hdl.handle.net/10362/5143
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Faculdade de Ciências e Tecnologia
publisher.none.fl_str_mv Faculdade de Ciências e Tecnologia
dc.source.none.fl_str_mv reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron:RCAAP
instname_str FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron_str RCAAP
institution RCAAP
reponame_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
collection Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository.name.fl_str_mv Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
repository.mail.fl_str_mv info@rcaap.pt
_version_ 1833596089029099520