Evaluation of word embedding vector averaging functions for biomedical word sense disambiguation

Bibliographic Details
Main Author: Antunes, Rui
Publication Date: 2017
Other Authors: Matos, Sérgio
Language: eng
Source: Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
Download full: http://hdl.handle.net/10773/25119
Summary: The biomedical lexicon contains a large amount of term ambiguity, which hinders correct identification of concepts and reduces the accuracy of semantic indexing and information retrieval tools. Previous work on biomedical word sense disambiguation has shown that supervised machine learning leads to better results than knowledge-based approaches. However, machine learning approaches require the availability of sufficient training data, and generalization performance behind the test data is not known. Knowledge-based methods on the other hand make use of existing knowledge-bases and are therefore mostly limited to the quality of such sources of information about concepts. In this work, we used word embedding vectors to complement the knowledge-base information. We represent the context of an ambiguous term by the average of the embedding vectors of words around the term, and evaluate the impact of using word distance for weighting this average. We show how this weighting improves the disambiguation accuracy of the knowledge-based approach in a subset of the reference MSH WSD data set from 86% to 88%.
id RCAP_48bc9ab6528b9e3bfeaa2525b3c70f8d
oai_identifier_str oai:ria.ua.pt:10773/25119
network_acronym_str RCAP
network_name_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository_id_str https://opendoar.ac.uk/repository/7160
spelling Evaluation of word embedding vector averaging functions for biomedical word sense disambiguationBiomedical word sense disambiguationKnowledge-based approachesWord embeddingsThe biomedical lexicon contains a large amount of term ambiguity, which hinders correct identification of concepts and reduces the accuracy of semantic indexing and information retrieval tools. Previous work on biomedical word sense disambiguation has shown that supervised machine learning leads to better results than knowledge-based approaches. However, machine learning approaches require the availability of sufficient training data, and generalization performance behind the test data is not known. Knowledge-based methods on the other hand make use of existing knowledge-bases and are therefore mostly limited to the quality of such sources of information about concepts. In this work, we used word embedding vectors to complement the knowledge-base information. We represent the context of an ambiguous term by the average of the embedding vectors of words around the term, and evaluate the impact of using word distance for weighting this average. We show how this weighting improves the disambiguation accuracy of the knowledge-based approach in a subset of the reference MSH WSD data set from 86% to 88%.UA Editora2019-01-15T16:25:23Z2017-10-01T00:00:00Z2017-10conference objectinfo:eu-repo/semantics/publishedVersionapplication/pdfhttp://hdl.handle.net/10773/25119eng978-972-789-522-9Antunes, RuiMatos, Sérgioinfo:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2024-05-06T04:18:14Zoai:ria.ua.pt:10773/25119Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-28T14:03:56.939164Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse
dc.title.none.fl_str_mv Evaluation of word embedding vector averaging functions for biomedical word sense disambiguation
title Evaluation of word embedding vector averaging functions for biomedical word sense disambiguation
spellingShingle Evaluation of word embedding vector averaging functions for biomedical word sense disambiguation
Antunes, Rui
Biomedical word sense disambiguation
Knowledge-based approaches
Word embeddings
title_short Evaluation of word embedding vector averaging functions for biomedical word sense disambiguation
title_full Evaluation of word embedding vector averaging functions for biomedical word sense disambiguation
title_fullStr Evaluation of word embedding vector averaging functions for biomedical word sense disambiguation
title_full_unstemmed Evaluation of word embedding vector averaging functions for biomedical word sense disambiguation
title_sort Evaluation of word embedding vector averaging functions for biomedical word sense disambiguation
author Antunes, Rui
author_facet Antunes, Rui
Matos, Sérgio
author_role author
author2 Matos, Sérgio
author2_role author
dc.contributor.author.fl_str_mv Antunes, Rui
Matos, Sérgio
dc.subject.por.fl_str_mv Biomedical word sense disambiguation
Knowledge-based approaches
Word embeddings
topic Biomedical word sense disambiguation
Knowledge-based approaches
Word embeddings
description The biomedical lexicon contains a large amount of term ambiguity, which hinders correct identification of concepts and reduces the accuracy of semantic indexing and information retrieval tools. Previous work on biomedical word sense disambiguation has shown that supervised machine learning leads to better results than knowledge-based approaches. However, machine learning approaches require the availability of sufficient training data, and generalization performance behind the test data is not known. Knowledge-based methods on the other hand make use of existing knowledge-bases and are therefore mostly limited to the quality of such sources of information about concepts. In this work, we used word embedding vectors to complement the knowledge-base information. We represent the context of an ambiguous term by the average of the embedding vectors of words around the term, and evaluate the impact of using word distance for weighting this average. We show how this weighting improves the disambiguation accuracy of the knowledge-based approach in a subset of the reference MSH WSD data set from 86% to 88%.
publishDate 2017
dc.date.none.fl_str_mv 2017-10-01T00:00:00Z
2017-10
2019-01-15T16:25:23Z
dc.type.driver.fl_str_mv conference object
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10773/25119
url http://hdl.handle.net/10773/25119
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 978-972-789-522-9
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv UA Editora
publisher.none.fl_str_mv UA Editora
dc.source.none.fl_str_mv reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron:RCAAP
instname_str FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron_str RCAAP
institution RCAAP
reponame_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
collection Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository.name.fl_str_mv Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
repository.mail.fl_str_mv info@rcaap.pt
_version_ 1833594256063725568