Leveraging TFR-BERT for ICD diagnoses ranking

Bibliographic Details
Main Author: Silva, Ana
Publication Date: 2023
Other Authors: Chaves, Pedro, Rijo, Sara, Boné, João, Oliveira, Tiago José Martins, Novais, Paulo
Language: eng
Source: Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
Download full: https://hdl.handle.net/1822/89870
Summary: This work describes applying a transformer-based ranking solution to the specific problem of ordering ICD diagnoses codes. Taking advantage of the TFR-BERT framework and adapting it to the biomedical context using pre-trained and publicly available language representation models, namely BioBERT, BlueBERT and ClinicalBERT (Bio + Discharge Summary BERT Model), we demonstrate the effectiveness of such a framework and the strengths of using pre-trained models adapted to the biomedical domain. We showcase this by using a benchmark dataset in the healthcare field—MIMIC-III—showing how it was possible to learn how to sequence the main or primary diagnoses and the order in which the secondary diagnoses are presented. A window-based approach and a summary approach (using only the sentences with diagnoses) were also tested in an attempt to circumvent the maximum sequence length limitation of BERT-based models. BioBERT demonstrated superior performance in all approaches, achieving the best results in the summary approach.
id RCAP_6c0f6ad8e8aad8fbe55f929fd5444c4c
oai_identifier_str oai:repositorium.sdum.uminho.pt:1822/89870
network_acronym_str RCAP
network_name_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository_id_str https://opendoar.ac.uk/repository/7160
spelling Leveraging TFR-BERT for ICD diagnoses rankingBiomedical language modelsLearning-to-rankRanking diagnosesThis work describes applying a transformer-based ranking solution to the specific problem of ordering ICD diagnoses codes. Taking advantage of the TFR-BERT framework and adapting it to the biomedical context using pre-trained and publicly available language representation models, namely BioBERT, BlueBERT and ClinicalBERT (Bio + Discharge Summary BERT Model), we demonstrate the effectiveness of such a framework and the strengths of using pre-trained models adapted to the biomedical domain. We showcase this by using a benchmark dataset in the healthcare field—MIMIC-III—showing how it was possible to learn how to sequence the main or primary diagnoses and the order in which the secondary diagnoses are presented. A window-based approach and a summary approach (using only the sentences with diagnoses) were also tested in an attempt to circumvent the maximum sequence length limitation of BERT-based models. BioBERT demonstrated superior performance in all approaches, achieving the best results in the summary approach.FCT -Fundação para a Ciência e a Tecnologia(UIDB/00319/2020)Universidade do MinhoSilva, AnaChaves, PedroRijo, SaraBoné, JoãoOliveira, Tiago José MartinsNovais, Paulo2023-01-012023-01-01T00:00:00Zconference paperinfo:eu-repo/semantics/publishedVersionapplication/pdfhttps://hdl.handle.net/1822/89870eng97830314901010302-974310.1007/978-3-031-49011-8_25info:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2024-05-11T07:09:09Zoai:repositorium.sdum.uminho.pt:1822/89870Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-28T16:17:10.769865Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse
dc.title.none.fl_str_mv Leveraging TFR-BERT for ICD diagnoses ranking
title Leveraging TFR-BERT for ICD diagnoses ranking
spellingShingle Leveraging TFR-BERT for ICD diagnoses ranking
Silva, Ana
Biomedical language models
Learning-to-rank
Ranking diagnoses
title_short Leveraging TFR-BERT for ICD diagnoses ranking
title_full Leveraging TFR-BERT for ICD diagnoses ranking
title_fullStr Leveraging TFR-BERT for ICD diagnoses ranking
title_full_unstemmed Leveraging TFR-BERT for ICD diagnoses ranking
title_sort Leveraging TFR-BERT for ICD diagnoses ranking
author Silva, Ana
author_facet Silva, Ana
Chaves, Pedro
Rijo, Sara
Boné, João
Oliveira, Tiago José Martins
Novais, Paulo
author_role author
author2 Chaves, Pedro
Rijo, Sara
Boné, João
Oliveira, Tiago José Martins
Novais, Paulo
author2_role author
author
author
author
author
dc.contributor.none.fl_str_mv Universidade do Minho
dc.contributor.author.fl_str_mv Silva, Ana
Chaves, Pedro
Rijo, Sara
Boné, João
Oliveira, Tiago José Martins
Novais, Paulo
dc.subject.por.fl_str_mv Biomedical language models
Learning-to-rank
Ranking diagnoses
topic Biomedical language models
Learning-to-rank
Ranking diagnoses
description This work describes applying a transformer-based ranking solution to the specific problem of ordering ICD diagnoses codes. Taking advantage of the TFR-BERT framework and adapting it to the biomedical context using pre-trained and publicly available language representation models, namely BioBERT, BlueBERT and ClinicalBERT (Bio + Discharge Summary BERT Model), we demonstrate the effectiveness of such a framework and the strengths of using pre-trained models adapted to the biomedical domain. We showcase this by using a benchmark dataset in the healthcare field—MIMIC-III—showing how it was possible to learn how to sequence the main or primary diagnoses and the order in which the secondary diagnoses are presented. A window-based approach and a summary approach (using only the sentences with diagnoses) were also tested in an attempt to circumvent the maximum sequence length limitation of BERT-based models. BioBERT demonstrated superior performance in all approaches, achieving the best results in the summary approach.
publishDate 2023
dc.date.none.fl_str_mv 2023-01-01
2023-01-01T00:00:00Z
dc.type.driver.fl_str_mv conference paper
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://hdl.handle.net/1822/89870
url https://hdl.handle.net/1822/89870
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 9783031490101
0302-9743
10.1007/978-3-031-49011-8_25
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron:RCAAP
instname_str FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron_str RCAAP
institution RCAAP
reponame_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
collection Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository.name.fl_str_mv Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
repository.mail.fl_str_mv info@rcaap.pt
_version_ 1833595853858668544