Employing compact intra-genomic language models to predict genomic sequences and characterize their entropy

Bibliographic Details
Main Author: Deusdado, Sérgio
Publication Date: 2010
Other Authors: Carvalho, Paulo
Language: eng
Source: Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
Download full: http://hdl.handle.net/10198/4357
Summary: Probabilistic models of languages are fundamental to understand and learn the profile of the subjacent code in order to estimate its entropy, enabling the verification and prediction of “natural” emanations of the language. Language models are devoted to capture salient statistical characteristics of the distribution of sequences of words, which transposed to the genomic language, allow modeling a predictive system of the peculiarities and regularities of genomic code in different inter and intra-genomic conditions. In this paper, we propose the application of compact intra-genomic language models to predict the composition of genomic sequences, aiming to achieve valuable resources for data compression and to contribute to enlarge the similarity analysis perspectives in genomic sequences. The obtained results encourage further investigation and validate the use of language models in biological sequence analysis.
id RCAP_349a37dbe1d2bb5bd20c4cb259c0d242
oai_identifier_str oai:bibliotecadigital.ipb.pt:10198/4357
network_acronym_str RCAP
network_name_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository_id_str https://opendoar.ac.uk/repository/7160
spelling Employing compact intra-genomic language models to predict genomic sequences and characterize their entropyLanguage modelsGenomic sequences modelingDNA entropy estimationProbabilistic models of languages are fundamental to understand and learn the profile of the subjacent code in order to estimate its entropy, enabling the verification and prediction of “natural” emanations of the language. Language models are devoted to capture salient statistical characteristics of the distribution of sequences of words, which transposed to the genomic language, allow modeling a predictive system of the peculiarities and regularities of genomic code in different inter and intra-genomic conditions. In this paper, we propose the application of compact intra-genomic language models to predict the composition of genomic sequences, aiming to achieve valuable resources for data compression and to contribute to enlarge the similarity analysis perspectives in genomic sequences. The obtained results encourage further investigation and validate the use of language models in biological sequence analysis.Springer-VerlagBiblioteca Digital do IPBDeusdado, SérgioCarvalho, Paulo2011-05-18T10:25:35Z20102010-01-01T00:00:00Zconference objectinfo:eu-repo/semantics/publishedVersionapplication/pdfhttp://hdl.handle.net/10198/4357engDeusdado, Sérgio; Carvalho, Paulo (2010). Employing compact intra-genomic language models to predict genomic sequences and characterize their entropy. In Rocha, Miguel P. [et tal.] 4th International Workshop on Practical Applications of Computational Biology & Bioinformatics. Guimarães. p. 143-150. ISBN 978-3-642-13214-8978-3-642-13214-810.1007/978-3-642-13214-8_19info:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2025-02-25T11:57:01Zoai:bibliotecadigital.ipb.pt:10198/4357Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-28T11:19:31.774690Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse
dc.title.none.fl_str_mv Employing compact intra-genomic language models to predict genomic sequences and characterize their entropy
title Employing compact intra-genomic language models to predict genomic sequences and characterize their entropy
spellingShingle Employing compact intra-genomic language models to predict genomic sequences and characterize their entropy
Deusdado, Sérgio
Language models
Genomic sequences modeling
DNA entropy estimation
title_short Employing compact intra-genomic language models to predict genomic sequences and characterize their entropy
title_full Employing compact intra-genomic language models to predict genomic sequences and characterize their entropy
title_fullStr Employing compact intra-genomic language models to predict genomic sequences and characterize their entropy
title_full_unstemmed Employing compact intra-genomic language models to predict genomic sequences and characterize their entropy
title_sort Employing compact intra-genomic language models to predict genomic sequences and characterize their entropy
author Deusdado, Sérgio
author_facet Deusdado, Sérgio
Carvalho, Paulo
author_role author
author2 Carvalho, Paulo
author2_role author
dc.contributor.none.fl_str_mv Biblioteca Digital do IPB
dc.contributor.author.fl_str_mv Deusdado, Sérgio
Carvalho, Paulo
dc.subject.por.fl_str_mv Language models
Genomic sequences modeling
DNA entropy estimation
topic Language models
Genomic sequences modeling
DNA entropy estimation
description Probabilistic models of languages are fundamental to understand and learn the profile of the subjacent code in order to estimate its entropy, enabling the verification and prediction of “natural” emanations of the language. Language models are devoted to capture salient statistical characteristics of the distribution of sequences of words, which transposed to the genomic language, allow modeling a predictive system of the peculiarities and regularities of genomic code in different inter and intra-genomic conditions. In this paper, we propose the application of compact intra-genomic language models to predict the composition of genomic sequences, aiming to achieve valuable resources for data compression and to contribute to enlarge the similarity analysis perspectives in genomic sequences. The obtained results encourage further investigation and validate the use of language models in biological sequence analysis.
publishDate 2010
dc.date.none.fl_str_mv 2010
2010-01-01T00:00:00Z
2011-05-18T10:25:35Z
dc.type.driver.fl_str_mv conference object
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10198/4357
url http://hdl.handle.net/10198/4357
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv Deusdado, Sérgio; Carvalho, Paulo (2010). Employing compact intra-genomic language models to predict genomic sequences and characterize their entropy. In Rocha, Miguel P. [et tal.] 4th International Workshop on Practical Applications of Computational Biology & Bioinformatics. Guimarães. p. 143-150. ISBN 978-3-642-13214-8
978-3-642-13214-8
10.1007/978-3-642-13214-8_19
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Springer-Verlag
publisher.none.fl_str_mv Springer-Verlag
dc.source.none.fl_str_mv reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron:RCAAP
instname_str FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron_str RCAAP
institution RCAAP
reponame_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
collection Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository.name.fl_str_mv Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
repository.mail.fl_str_mv info@rcaap.pt
_version_ 1833591798006546432