Evaluating semantic textual similarity in clinical sentences using deep learning and sentence embeddings

Bibliographic Details
Main Author: Antunes, Rui
Publication Date: 2020
Other Authors: Silva, João Figueira, Matos, Sérgio
Language: eng
Source: Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
Download full: http://hdl.handle.net/10773/31473
Summary: The wide adoption of electronic health records (EHRs) has fostered an improvement in healthcare quality, with EHRs currently representing a major source of medical information. Nevertheless, this process has also brought new challenges to the medical environment since the facilitated replication of information (e.g. using copy-paste) has resulted in less concise and sometimes incorrect information, which hinders the understandability of this data and can compromise the quality of medical decisions drawn from it. Due to the high volume and redundancy in medical data, it is imperative to develop solutions that can condense information whilst retaining its value, with a possible methodology involving the assessment of the semantic similarity between clinical text excerpts. In this paper we present an approach that explores neural networks and different types of text preprocessing pipelines, and that evaluates the impact of using word embeddings or sentence embeddings. We present the results following our participation in the n2c2 shared-task on clinical semantic textual similarity, perform an error analysis and discuss obtained results along with possible future improvements.
id RCAP_bfc81c93f43f7fc4a435888bd373a529
oai_identifier_str oai:ria.ua.pt:10773/31473
network_acronym_str RCAP
network_name_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository_id_str https://opendoar.ac.uk/repository/7160
spelling Evaluating semantic textual similarity in clinical sentences using deep learning and sentence embeddingsNatural language processingClinical information extractionSemantic textual similarityDeep learningSentence embeddingsThe wide adoption of electronic health records (EHRs) has fostered an improvement in healthcare quality, with EHRs currently representing a major source of medical information. Nevertheless, this process has also brought new challenges to the medical environment since the facilitated replication of information (e.g. using copy-paste) has resulted in less concise and sometimes incorrect information, which hinders the understandability of this data and can compromise the quality of medical decisions drawn from it. Due to the high volume and redundancy in medical data, it is imperative to develop solutions that can condense information whilst retaining its value, with a possible methodology involving the assessment of the semantic similarity between clinical text excerpts. In this paper we present an approach that explores neural networks and different types of text preprocessing pipelines, and that evaluates the impact of using word embeddings or sentence embeddings. We present the results following our participation in the n2c2 shared-task on clinical semantic textual similarity, perform an error analysis and discuss obtained results along with possible future improvements.Association for Computing Machinery2021-06-14T16:37:52Z2020-01-01T00:00:00Z2020book partinfo:eu-repo/semantics/publishedVersionapplication/pdfhttp://hdl.handle.net/10773/31473eng978-145036866-710.1145/3341105.3373987Antunes, RuiSilva, João FigueiraMatos, Sérgioinfo:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2024-05-06T04:32:02Zoai:ria.ua.pt:10773/31473Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-28T14:11:39.588870Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse
dc.title.none.fl_str_mv Evaluating semantic textual similarity in clinical sentences using deep learning and sentence embeddings
title Evaluating semantic textual similarity in clinical sentences using deep learning and sentence embeddings
spellingShingle Evaluating semantic textual similarity in clinical sentences using deep learning and sentence embeddings
Antunes, Rui
Natural language processing
Clinical information extraction
Semantic textual similarity
Deep learning
Sentence embeddings
title_short Evaluating semantic textual similarity in clinical sentences using deep learning and sentence embeddings
title_full Evaluating semantic textual similarity in clinical sentences using deep learning and sentence embeddings
title_fullStr Evaluating semantic textual similarity in clinical sentences using deep learning and sentence embeddings
title_full_unstemmed Evaluating semantic textual similarity in clinical sentences using deep learning and sentence embeddings
title_sort Evaluating semantic textual similarity in clinical sentences using deep learning and sentence embeddings
author Antunes, Rui
author_facet Antunes, Rui
Silva, João Figueira
Matos, Sérgio
author_role author
author2 Silva, João Figueira
Matos, Sérgio
author2_role author
author
dc.contributor.author.fl_str_mv Antunes, Rui
Silva, João Figueira
Matos, Sérgio
dc.subject.por.fl_str_mv Natural language processing
Clinical information extraction
Semantic textual similarity
Deep learning
Sentence embeddings
topic Natural language processing
Clinical information extraction
Semantic textual similarity
Deep learning
Sentence embeddings
description The wide adoption of electronic health records (EHRs) has fostered an improvement in healthcare quality, with EHRs currently representing a major source of medical information. Nevertheless, this process has also brought new challenges to the medical environment since the facilitated replication of information (e.g. using copy-paste) has resulted in less concise and sometimes incorrect information, which hinders the understandability of this data and can compromise the quality of medical decisions drawn from it. Due to the high volume and redundancy in medical data, it is imperative to develop solutions that can condense information whilst retaining its value, with a possible methodology involving the assessment of the semantic similarity between clinical text excerpts. In this paper we present an approach that explores neural networks and different types of text preprocessing pipelines, and that evaluates the impact of using word embeddings or sentence embeddings. We present the results following our participation in the n2c2 shared-task on clinical semantic textual similarity, perform an error analysis and discuss obtained results along with possible future improvements.
publishDate 2020
dc.date.none.fl_str_mv 2020-01-01T00:00:00Z
2020
2021-06-14T16:37:52Z
dc.type.driver.fl_str_mv book part
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10773/31473
url http://hdl.handle.net/10773/31473
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 978-145036866-7
10.1145/3341105.3373987
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Association for Computing Machinery
publisher.none.fl_str_mv Association for Computing Machinery
dc.source.none.fl_str_mv reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron:RCAAP
instname_str FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron_str RCAAP
institution RCAAP
reponame_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
collection Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository.name.fl_str_mv Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
repository.mail.fl_str_mv info@rcaap.pt
_version_ 1833594384271015936