Evaluating Pre-trained Word Embeddings in domain specific Ontology Matching

Bibliographic Details
Main Author: Amorim, Sofia Pessoa de
Publication Date: 2021
Format: Master thesis
Language: eng
Source: Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
Download full: http://hdl.handle.net/10451/53906
Summary: Tese de mestrado, Ciência de Dados, Universidade de Lisboa, Faculdade de Ciências, 2022
id RCAP_073ffd96d600d94fd67c9a3736b56a32
oai_identifier_str oai:repositorio.ulisboa.pt:10451/53906
network_acronym_str RCAP
network_name_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository_id_str https://opendoar.ac.uk/repository/7160
spelling Evaluating Pre-trained Word Embeddings in domain specific Ontology MatchingEmbeddings de PalavrasAlinhamento de OntologiasOntologias BiomédicasTeses de mestrado - 2022Departamento de InformáticaTese de mestrado, Ciência de Dados, Universidade de Lisboa, Faculdade de Ciências, 2022The ontology matching process focuses on discovering mappings between two concepts from distinct ontologies, a source and a target. It is a fundamental step when trying to integrate heterogeneous data sources that are described in ontologies. This data represents an even more challenging problem since we are working with complex data as biomedical data. Thus, derived from the necessity of keeping on improving ontology matching techniques, this dissertation focused on implementing a new approach to the AML pipeline to calculate similarities between entities from two distinct ontologies. For the implementation of this dissertation, we used some of the OAEI tracks, such as Anatomy and LargeBio, to apply a new algorithm and evaluate if it improves AML’s results against a refer ence alignment. This new approach consisted of using pre-trained word embeddings of five different types, BioWordVec Extrinsic, BioWordVec Intrinsic, PubMed+PC, PubMed+PC+Wikipedia and English Wikipedia. These pre-trained word embeddings use a machine learning technique, Word2Vec, and were used in this work since it allows to carry the semantic meaning inherent to the words represented with the corresponding vector. Word embeddings allowed that each concept of each ontology was represented with a corresponding vector to see if, with that information, it was possible to improve how relations between concepts were determined in the AML system. The similarity between concepts was calculated through the cosine distance and the evaluation of the new alignment used the metrics precision recall and F-measure. Although we could not prove that word embeddings improve AML current results, this implementation could be refined, and the technique can be still an option to consider in future work if applied in some other way.Pesquita, Cátia, 1980-Repositório da Universidade de LisboaAmorim, Sofia Pessoa de2022-07-22T08:59:36Z202220212022-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10451/53906TID:203205685enginfo:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2025-03-17T14:48:18Zoai:repositorio.ulisboa.pt:10451/53906Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-29T03:25:15.783212Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse
dc.title.none.fl_str_mv Evaluating Pre-trained Word Embeddings in domain specific Ontology Matching
title Evaluating Pre-trained Word Embeddings in domain specific Ontology Matching
spellingShingle Evaluating Pre-trained Word Embeddings in domain specific Ontology Matching
Amorim, Sofia Pessoa de
Embeddings de Palavras
Alinhamento de Ontologias
Ontologias Biomédicas
Teses de mestrado - 2022
Departamento de Informática
title_short Evaluating Pre-trained Word Embeddings in domain specific Ontology Matching
title_full Evaluating Pre-trained Word Embeddings in domain specific Ontology Matching
title_fullStr Evaluating Pre-trained Word Embeddings in domain specific Ontology Matching
title_full_unstemmed Evaluating Pre-trained Word Embeddings in domain specific Ontology Matching
title_sort Evaluating Pre-trained Word Embeddings in domain specific Ontology Matching
author Amorim, Sofia Pessoa de
author_facet Amorim, Sofia Pessoa de
author_role author
dc.contributor.none.fl_str_mv Pesquita, Cátia, 1980-
Repositório da Universidade de Lisboa
dc.contributor.author.fl_str_mv Amorim, Sofia Pessoa de
dc.subject.por.fl_str_mv Embeddings de Palavras
Alinhamento de Ontologias
Ontologias Biomédicas
Teses de mestrado - 2022
Departamento de Informática
topic Embeddings de Palavras
Alinhamento de Ontologias
Ontologias Biomédicas
Teses de mestrado - 2022
Departamento de Informática
description Tese de mestrado, Ciência de Dados, Universidade de Lisboa, Faculdade de Ciências, 2022
publishDate 2021
dc.date.none.fl_str_mv 2021
2022-07-22T08:59:36Z
2022
2022-01-01T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10451/53906
TID:203205685
url http://hdl.handle.net/10451/53906
identifier_str_mv TID:203205685
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron:RCAAP
instname_str FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron_str RCAAP
institution RCAAP
reponame_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
collection Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository.name.fl_str_mv Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
repository.mail.fl_str_mv info@rcaap.pt
_version_ 1833601692778627072