Leveraging TFR-BERT for ICD diagnoses ranking
| Main Author: | |
|---|---|
| Publication Date: | 2023 |
| Other Authors: | , , , , |
| Language: | eng |
| Source: | Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| Download full: | https://hdl.handle.net/1822/89870 |
Summary: | This work describes applying a transformer-based ranking solution to the specific problem of ordering ICD diagnoses codes. Taking advantage of the TFR-BERT framework and adapting it to the biomedical context using pre-trained and publicly available language representation models, namely BioBERT, BlueBERT and ClinicalBERT (Bio + Discharge Summary BERT Model), we demonstrate the effectiveness of such a framework and the strengths of using pre-trained models adapted to the biomedical domain. We showcase this by using a benchmark dataset in the healthcare field—MIMIC-III—showing how it was possible to learn how to sequence the main or primary diagnoses and the order in which the secondary diagnoses are presented. A window-based approach and a summary approach (using only the sentences with diagnoses) were also tested in an attempt to circumvent the maximum sequence length limitation of BERT-based models. BioBERT demonstrated superior performance in all approaches, achieving the best results in the summary approach. |
| id |
RCAP_6c0f6ad8e8aad8fbe55f929fd5444c4c |
|---|---|
| oai_identifier_str |
oai:repositorium.sdum.uminho.pt:1822/89870 |
| network_acronym_str |
RCAP |
| network_name_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| repository_id_str |
https://opendoar.ac.uk/repository/7160 |
| spelling |
Leveraging TFR-BERT for ICD diagnoses rankingBiomedical language modelsLearning-to-rankRanking diagnosesThis work describes applying a transformer-based ranking solution to the specific problem of ordering ICD diagnoses codes. Taking advantage of the TFR-BERT framework and adapting it to the biomedical context using pre-trained and publicly available language representation models, namely BioBERT, BlueBERT and ClinicalBERT (Bio + Discharge Summary BERT Model), we demonstrate the effectiveness of such a framework and the strengths of using pre-trained models adapted to the biomedical domain. We showcase this by using a benchmark dataset in the healthcare field—MIMIC-III—showing how it was possible to learn how to sequence the main or primary diagnoses and the order in which the secondary diagnoses are presented. A window-based approach and a summary approach (using only the sentences with diagnoses) were also tested in an attempt to circumvent the maximum sequence length limitation of BERT-based models. BioBERT demonstrated superior performance in all approaches, achieving the best results in the summary approach.FCT -Fundação para a Ciência e a Tecnologia(UIDB/00319/2020)Universidade do MinhoSilva, AnaChaves, PedroRijo, SaraBoné, JoãoOliveira, Tiago José MartinsNovais, Paulo2023-01-012023-01-01T00:00:00Zconference paperinfo:eu-repo/semantics/publishedVersionapplication/pdfhttps://hdl.handle.net/1822/89870eng97830314901010302-974310.1007/978-3-031-49011-8_25info:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2024-05-11T07:09:09Zoai:repositorium.sdum.uminho.pt:1822/89870Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-28T16:17:10.769865Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse |
| dc.title.none.fl_str_mv |
Leveraging TFR-BERT for ICD diagnoses ranking |
| title |
Leveraging TFR-BERT for ICD diagnoses ranking |
| spellingShingle |
Leveraging TFR-BERT for ICD diagnoses ranking Silva, Ana Biomedical language models Learning-to-rank Ranking diagnoses |
| title_short |
Leveraging TFR-BERT for ICD diagnoses ranking |
| title_full |
Leveraging TFR-BERT for ICD diagnoses ranking |
| title_fullStr |
Leveraging TFR-BERT for ICD diagnoses ranking |
| title_full_unstemmed |
Leveraging TFR-BERT for ICD diagnoses ranking |
| title_sort |
Leveraging TFR-BERT for ICD diagnoses ranking |
| author |
Silva, Ana |
| author_facet |
Silva, Ana Chaves, Pedro Rijo, Sara Boné, João Oliveira, Tiago José Martins Novais, Paulo |
| author_role |
author |
| author2 |
Chaves, Pedro Rijo, Sara Boné, João Oliveira, Tiago José Martins Novais, Paulo |
| author2_role |
author author author author author |
| dc.contributor.none.fl_str_mv |
Universidade do Minho |
| dc.contributor.author.fl_str_mv |
Silva, Ana Chaves, Pedro Rijo, Sara Boné, João Oliveira, Tiago José Martins Novais, Paulo |
| dc.subject.por.fl_str_mv |
Biomedical language models Learning-to-rank Ranking diagnoses |
| topic |
Biomedical language models Learning-to-rank Ranking diagnoses |
| description |
This work describes applying a transformer-based ranking solution to the specific problem of ordering ICD diagnoses codes. Taking advantage of the TFR-BERT framework and adapting it to the biomedical context using pre-trained and publicly available language representation models, namely BioBERT, BlueBERT and ClinicalBERT (Bio + Discharge Summary BERT Model), we demonstrate the effectiveness of such a framework and the strengths of using pre-trained models adapted to the biomedical domain. We showcase this by using a benchmark dataset in the healthcare field—MIMIC-III—showing how it was possible to learn how to sequence the main or primary diagnoses and the order in which the secondary diagnoses are presented. A window-based approach and a summary approach (using only the sentences with diagnoses) were also tested in an attempt to circumvent the maximum sequence length limitation of BERT-based models. BioBERT demonstrated superior performance in all approaches, achieving the best results in the summary approach. |
| publishDate |
2023 |
| dc.date.none.fl_str_mv |
2023-01-01 2023-01-01T00:00:00Z |
| dc.type.driver.fl_str_mv |
conference paper |
| dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
| status_str |
publishedVersion |
| dc.identifier.uri.fl_str_mv |
https://hdl.handle.net/1822/89870 |
| url |
https://hdl.handle.net/1822/89870 |
| dc.language.iso.fl_str_mv |
eng |
| language |
eng |
| dc.relation.none.fl_str_mv |
9783031490101 0302-9743 10.1007/978-3-031-49011-8_25 |
| dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
| eu_rights_str_mv |
openAccess |
| dc.format.none.fl_str_mv |
application/pdf |
| dc.source.none.fl_str_mv |
reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia instacron:RCAAP |
| instname_str |
FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
| instacron_str |
RCAAP |
| institution |
RCAAP |
| reponame_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| collection |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| repository.name.fl_str_mv |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
| repository.mail.fl_str_mv |
info@rcaap.pt |
| _version_ |
1833595853858668544 |