Predicting non-coding RNA function using Artificial Intelligence
| Main Author: | |
|---|---|
| Publication Date: | 2024 |
| Format: | Master thesis |
| Language: | eng |
| Source: | Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| Download full: | http://hdl.handle.net/10400.5/96901 |
Summary: | Tese de mestrado, Bioinformática e Biologia Computacional, 2024, Universidade de Lisboa, Faculdade de Ciências |
| id |
RCAP_7f895bfebe1fa39f58e214476489ccef |
|---|---|
| oai_identifier_str |
oai:repositorio.ulisboa.pt:10400.5/96901 |
| network_acronym_str |
RCAP |
| network_name_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| repository_id_str |
https://opendoar.ac.uk/repository/7160 |
| spelling |
Predicting non-coding RNA function using Artificial IntelligenceRNAs não codificantesExtração de RelaçõesProspeção de TextoSupervisão à DistânciaGrandes Modelos de LinguagemTeses de mestrado - 2024Departamento de InformáticaTese de mestrado, Bioinformática e Biologia Computacional, 2024, Universidade de Lisboa, Faculdade de CiênciasNon-coding RNAs (ncRNAs) represent the majority of human gene products and are involved in various important biological processes, being considered relevant disease biomarkers and therapeutic agents. However, there are few functional annotation databases dedicated to ncRNAs and information about these biomolecules remains sparsely distributed, mostly in the form of scientific research articles. It is then of pivotal importance to aggregate and summarize the existing information. Natural Language Processing methods applied to text mining enable automatic information extraction and summarization from textual data. These techniques can be used to generate collections of annotated sentences expressing relations between entities, called relational corpora. In this work, a text mining pipeline was implemented to generate a ncRNA-phenotype relational corpus (ncoRP) using Distant Supervision Relation Extraction (DSRE), consisting of 21,608 annotated articles, 2,835 unique ncRNAs, 1,118 unique phenotypes and 35,295 unique relations, with a precision of 0.761 and F1-score of 0.593, calculated through human validation. DSRE methods require a set of predocumented relations to function, as such, a high-fidelity ncRNA-phenotype relation dataset, consisting of 214,300 unique relations, was created by the aggregation of five ncRNA-disease functional annotation databases. Then, both ncoRP and the relation dataset represent important contributions towards solving the problem with the sparseness of information about ncRNAs. Large Language Models (LLMs) are an emerging type of language model, showing great capabilities in general task-solving through text generation, without the requirement of fine-tuning with large datasets. This benefit shows promise for applications in Relation Extraction (RE), when compared to data-intensive state-of-the-art deep learning methods. In this work, a LLM RE methodology is proposed and evaluated, achieving an F1-score of 0.978 by combining the RE task with a preceding sentence filtering task and applying prompting principles such as in-context learning and Chain-of-Thought self-explanation.Martiniano, Hugo Filipe de Mesquita Costa, 1978-Couto, Francisco José MoreiraRepositório da Universidade de LisboaCorreia, David Alexandre da Costa2025-01-07T10:51:19Z202420242024-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10400.5/96901enginfo:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2025-03-17T16:31:17Zoai:repositorio.ulisboa.pt:10400.5/96901Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-29T04:18:01.434059Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse |
| dc.title.none.fl_str_mv |
Predicting non-coding RNA function using Artificial Intelligence |
| title |
Predicting non-coding RNA function using Artificial Intelligence |
| spellingShingle |
Predicting non-coding RNA function using Artificial Intelligence Correia, David Alexandre da Costa RNAs não codificantes Extração de Relações Prospeção de Texto Supervisão à Distância Grandes Modelos de Linguagem Teses de mestrado - 2024 Departamento de Informática |
| title_short |
Predicting non-coding RNA function using Artificial Intelligence |
| title_full |
Predicting non-coding RNA function using Artificial Intelligence |
| title_fullStr |
Predicting non-coding RNA function using Artificial Intelligence |
| title_full_unstemmed |
Predicting non-coding RNA function using Artificial Intelligence |
| title_sort |
Predicting non-coding RNA function using Artificial Intelligence |
| author |
Correia, David Alexandre da Costa |
| author_facet |
Correia, David Alexandre da Costa |
| author_role |
author |
| dc.contributor.none.fl_str_mv |
Martiniano, Hugo Filipe de Mesquita Costa, 1978- Couto, Francisco José Moreira Repositório da Universidade de Lisboa |
| dc.contributor.author.fl_str_mv |
Correia, David Alexandre da Costa |
| dc.subject.por.fl_str_mv |
RNAs não codificantes Extração de Relações Prospeção de Texto Supervisão à Distância Grandes Modelos de Linguagem Teses de mestrado - 2024 Departamento de Informática |
| topic |
RNAs não codificantes Extração de Relações Prospeção de Texto Supervisão à Distância Grandes Modelos de Linguagem Teses de mestrado - 2024 Departamento de Informática |
| description |
Tese de mestrado, Bioinformática e Biologia Computacional, 2024, Universidade de Lisboa, Faculdade de Ciências |
| publishDate |
2024 |
| dc.date.none.fl_str_mv |
2024 2024 2024-01-01T00:00:00Z 2025-01-07T10:51:19Z |
| dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
| dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
| format |
masterThesis |
| status_str |
publishedVersion |
| dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10400.5/96901 |
| url |
http://hdl.handle.net/10400.5/96901 |
| dc.language.iso.fl_str_mv |
eng |
| language |
eng |
| dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
| eu_rights_str_mv |
openAccess |
| dc.format.none.fl_str_mv |
application/pdf |
| dc.source.none.fl_str_mv |
reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia instacron:RCAAP |
| instname_str |
FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
| instacron_str |
RCAAP |
| institution |
RCAAP |
| reponame_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| collection |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| repository.name.fl_str_mv |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
| repository.mail.fl_str_mv |
info@rcaap.pt |
| _version_ |
1833602010061996032 |