Employing compact intra-genomic language models to predict genomic sequences and characterize their entropy
| Main Author: | |
|---|---|
| Publication Date: | 2010 |
| Other Authors: | |
| Language: | eng |
| Source: | Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| Download full: | http://hdl.handle.net/10198/4357 |
Summary: | Probabilistic models of languages are fundamental to understand and learn the profile of the subjacent code in order to estimate its entropy, enabling the verification and prediction of “natural” emanations of the language. Language models are devoted to capture salient statistical characteristics of the distribution of sequences of words, which transposed to the genomic language, allow modeling a predictive system of the peculiarities and regularities of genomic code in different inter and intra-genomic conditions. In this paper, we propose the application of compact intra-genomic language models to predict the composition of genomic sequences, aiming to achieve valuable resources for data compression and to contribute to enlarge the similarity analysis perspectives in genomic sequences. The obtained results encourage further investigation and validate the use of language models in biological sequence analysis. |
| id |
RCAP_349a37dbe1d2bb5bd20c4cb259c0d242 |
|---|---|
| oai_identifier_str |
oai:bibliotecadigital.ipb.pt:10198/4357 |
| network_acronym_str |
RCAP |
| network_name_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| repository_id_str |
https://opendoar.ac.uk/repository/7160 |
| spelling |
Employing compact intra-genomic language models to predict genomic sequences and characterize their entropyLanguage modelsGenomic sequences modelingDNA entropy estimationProbabilistic models of languages are fundamental to understand and learn the profile of the subjacent code in order to estimate its entropy, enabling the verification and prediction of “natural” emanations of the language. Language models are devoted to capture salient statistical characteristics of the distribution of sequences of words, which transposed to the genomic language, allow modeling a predictive system of the peculiarities and regularities of genomic code in different inter and intra-genomic conditions. In this paper, we propose the application of compact intra-genomic language models to predict the composition of genomic sequences, aiming to achieve valuable resources for data compression and to contribute to enlarge the similarity analysis perspectives in genomic sequences. The obtained results encourage further investigation and validate the use of language models in biological sequence analysis.Springer-VerlagBiblioteca Digital do IPBDeusdado, SérgioCarvalho, Paulo2011-05-18T10:25:35Z20102010-01-01T00:00:00Zconference objectinfo:eu-repo/semantics/publishedVersionapplication/pdfhttp://hdl.handle.net/10198/4357engDeusdado, Sérgio; Carvalho, Paulo (2010). Employing compact intra-genomic language models to predict genomic sequences and characterize their entropy. In Rocha, Miguel P. [et tal.] 4th International Workshop on Practical Applications of Computational Biology & Bioinformatics. Guimarães. p. 143-150. ISBN 978-3-642-13214-8978-3-642-13214-810.1007/978-3-642-13214-8_19info:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2025-02-25T11:57:01Zoai:bibliotecadigital.ipb.pt:10198/4357Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-28T11:19:31.774690Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse |
| dc.title.none.fl_str_mv |
Employing compact intra-genomic language models to predict genomic sequences and characterize their entropy |
| title |
Employing compact intra-genomic language models to predict genomic sequences and characterize their entropy |
| spellingShingle |
Employing compact intra-genomic language models to predict genomic sequences and characterize their entropy Deusdado, Sérgio Language models Genomic sequences modeling DNA entropy estimation |
| title_short |
Employing compact intra-genomic language models to predict genomic sequences and characterize their entropy |
| title_full |
Employing compact intra-genomic language models to predict genomic sequences and characterize their entropy |
| title_fullStr |
Employing compact intra-genomic language models to predict genomic sequences and characterize their entropy |
| title_full_unstemmed |
Employing compact intra-genomic language models to predict genomic sequences and characterize their entropy |
| title_sort |
Employing compact intra-genomic language models to predict genomic sequences and characterize their entropy |
| author |
Deusdado, Sérgio |
| author_facet |
Deusdado, Sérgio Carvalho, Paulo |
| author_role |
author |
| author2 |
Carvalho, Paulo |
| author2_role |
author |
| dc.contributor.none.fl_str_mv |
Biblioteca Digital do IPB |
| dc.contributor.author.fl_str_mv |
Deusdado, Sérgio Carvalho, Paulo |
| dc.subject.por.fl_str_mv |
Language models Genomic sequences modeling DNA entropy estimation |
| topic |
Language models Genomic sequences modeling DNA entropy estimation |
| description |
Probabilistic models of languages are fundamental to understand and learn the profile of the subjacent code in order to estimate its entropy, enabling the verification and prediction of “natural” emanations of the language. Language models are devoted to capture salient statistical characteristics of the distribution of sequences of words, which transposed to the genomic language, allow modeling a predictive system of the peculiarities and regularities of genomic code in different inter and intra-genomic conditions. In this paper, we propose the application of compact intra-genomic language models to predict the composition of genomic sequences, aiming to achieve valuable resources for data compression and to contribute to enlarge the similarity analysis perspectives in genomic sequences. The obtained results encourage further investigation and validate the use of language models in biological sequence analysis. |
| publishDate |
2010 |
| dc.date.none.fl_str_mv |
2010 2010-01-01T00:00:00Z 2011-05-18T10:25:35Z |
| dc.type.driver.fl_str_mv |
conference object |
| dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
| status_str |
publishedVersion |
| dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10198/4357 |
| url |
http://hdl.handle.net/10198/4357 |
| dc.language.iso.fl_str_mv |
eng |
| language |
eng |
| dc.relation.none.fl_str_mv |
Deusdado, Sérgio; Carvalho, Paulo (2010). Employing compact intra-genomic language models to predict genomic sequences and characterize their entropy. In Rocha, Miguel P. [et tal.] 4th International Workshop on Practical Applications of Computational Biology & Bioinformatics. Guimarães. p. 143-150. ISBN 978-3-642-13214-8 978-3-642-13214-8 10.1007/978-3-642-13214-8_19 |
| dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
| eu_rights_str_mv |
openAccess |
| dc.format.none.fl_str_mv |
application/pdf |
| dc.publisher.none.fl_str_mv |
Springer-Verlag |
| publisher.none.fl_str_mv |
Springer-Verlag |
| dc.source.none.fl_str_mv |
reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia instacron:RCAAP |
| instname_str |
FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
| instacron_str |
RCAAP |
| institution |
RCAAP |
| reponame_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| collection |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| repository.name.fl_str_mv |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
| repository.mail.fl_str_mv |
info@rcaap.pt |
| _version_ |
1833591798006546432 |