Codification of clinical episodes in natural language

Bibliographic Details
Main Author: Silva, Hugo Filipe da Fonseca e
Publication Date: 2024
Format: Master thesis
Language: eng
Source: Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
Download full: http://hdl.handle.net/10400.26/51918
Summary: The International Classification of Diseases, 10th Revision (ICD-10), has been widely used to classify patient diagnostic information. Encoding pathologies of clinical episodes into ICD-10 codes is a laborious task, usually done by dedicated physicians with spe- cific training. Automatically classifying Electronic Health Records (EHR) from text into diagnostic codes has been challenging to the Natural Language Processing (NLP) community. This work presents a literature review on the subject of coding clinical episodes in nat- ural language, namely the main problems and barriers that affect it, the use of natural language processing in parallel with ontologies, use of NLP in the area of healthcare, automatic ICD-10 coding, use of Pretrained Language Models (PLM), as well as devel- oped works to solve the clinical abbreviation problem, and detection of clinical symp- tom negation. It also intends to propose the method PLM-ICD-C based on the cosine similarity, to process EHRs with natural language texts, in order to give useful suggestions of ICD- 10 codes for the coders, aiming to facilitate the process. For that it is proposed a tech- nique of multiple runs and a bucket category strategy, applied to the Medical Informa- tion Mart for Intensive Care (MIMIC)-IV dataset. The results show that the strategy of using the concept of bucket category improves the results, while providing useful suggestions, where Precision has a 5-fold improvement, while there are 2-3 fold im- provements in Recall and 4-fold improvements in F1-score. The previous methodology is combined with PLM-ICD, in order to increase the num- ber of probably useful suggestions of ICD-10 codes. The results show that the use of PLM-ICD-C, consisting of the improved cosine method and PLM-ICD, improves the re- sults, increasing the F1-score by 0.5%, but most important, by increasing the Precision from 46.3% to 50%, which means a significant improvement on the code suggestions given to the medical doctors performing encoding functions.
id RCAP_b0386c65f78ba534ab5feb0666ddb2b0
oai_identifier_str oai:comum.rcaap.pt:10400.26/51918
network_acronym_str RCAP
network_name_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository_id_str https://opendoar.ac.uk/repository/7160
spelling Codification of clinical episodes in natural languageCodificação AutomáticaEpisódios ClínicosICD-10NLPPLMPLM-ICD-CRegisto de saúde eletrónicoSimilaridade do CossenoThe International Classification of Diseases, 10th Revision (ICD-10), has been widely used to classify patient diagnostic information. Encoding pathologies of clinical episodes into ICD-10 codes is a laborious task, usually done by dedicated physicians with spe- cific training. Automatically classifying Electronic Health Records (EHR) from text into diagnostic codes has been challenging to the Natural Language Processing (NLP) community. This work presents a literature review on the subject of coding clinical episodes in nat- ural language, namely the main problems and barriers that affect it, the use of natural language processing in parallel with ontologies, use of NLP in the area of healthcare, automatic ICD-10 coding, use of Pretrained Language Models (PLM), as well as devel- oped works to solve the clinical abbreviation problem, and detection of clinical symp- tom negation. It also intends to propose the method PLM-ICD-C based on the cosine similarity, to process EHRs with natural language texts, in order to give useful suggestions of ICD- 10 codes for the coders, aiming to facilitate the process. For that it is proposed a tech- nique of multiple runs and a bucket category strategy, applied to the Medical Informa- tion Mart for Intensive Care (MIMIC)-IV dataset. The results show that the strategy of using the concept of bucket category improves the results, while providing useful suggestions, where Precision has a 5-fold improvement, while there are 2-3 fold im- provements in Recall and 4-fold improvements in F1-score. The previous methodology is combined with PLM-ICD, in order to increase the num- ber of probably useful suggestions of ICD-10 codes. The results show that the use of PLM-ICD-C, consisting of the improved cosine method and PLM-ICD, improves the re- sults, increasing the F1-score by 0.5%, but most important, by increasing the Precision from 46.3% to 50%, which means a significant improvement on the code suggestions given to the medical doctors performing encoding functions.Mendes, Mateus Daniel AlmeidaRepositório ComumSilva, Hugo Filipe da Fonseca e2024-08-27T10:36:25Z2024-07-252024-07-25T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10400.26/51918urn:tid:203676963enginfo:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2025-05-02T11:26:02Zoai:comum.rcaap.pt:10400.26/51918Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-29T06:46:22.539924Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse
dc.title.none.fl_str_mv Codification of clinical episodes in natural language
title Codification of clinical episodes in natural language
spellingShingle Codification of clinical episodes in natural language
Silva, Hugo Filipe da Fonseca e
Codificação Automática
Episódios Clínicos
ICD-10
NLP
PLM
PLM-ICD-C
Registo de saúde eletrónico
Similaridade do Cosseno
title_short Codification of clinical episodes in natural language
title_full Codification of clinical episodes in natural language
title_fullStr Codification of clinical episodes in natural language
title_full_unstemmed Codification of clinical episodes in natural language
title_sort Codification of clinical episodes in natural language
author Silva, Hugo Filipe da Fonseca e
author_facet Silva, Hugo Filipe da Fonseca e
author_role author
dc.contributor.none.fl_str_mv Mendes, Mateus Daniel Almeida
Repositório Comum
dc.contributor.author.fl_str_mv Silva, Hugo Filipe da Fonseca e
dc.subject.por.fl_str_mv Codificação Automática
Episódios Clínicos
ICD-10
NLP
PLM
PLM-ICD-C
Registo de saúde eletrónico
Similaridade do Cosseno
topic Codificação Automática
Episódios Clínicos
ICD-10
NLP
PLM
PLM-ICD-C
Registo de saúde eletrónico
Similaridade do Cosseno
description The International Classification of Diseases, 10th Revision (ICD-10), has been widely used to classify patient diagnostic information. Encoding pathologies of clinical episodes into ICD-10 codes is a laborious task, usually done by dedicated physicians with spe- cific training. Automatically classifying Electronic Health Records (EHR) from text into diagnostic codes has been challenging to the Natural Language Processing (NLP) community. This work presents a literature review on the subject of coding clinical episodes in nat- ural language, namely the main problems and barriers that affect it, the use of natural language processing in parallel with ontologies, use of NLP in the area of healthcare, automatic ICD-10 coding, use of Pretrained Language Models (PLM), as well as devel- oped works to solve the clinical abbreviation problem, and detection of clinical symp- tom negation. It also intends to propose the method PLM-ICD-C based on the cosine similarity, to process EHRs with natural language texts, in order to give useful suggestions of ICD- 10 codes for the coders, aiming to facilitate the process. For that it is proposed a tech- nique of multiple runs and a bucket category strategy, applied to the Medical Informa- tion Mart for Intensive Care (MIMIC)-IV dataset. The results show that the strategy of using the concept of bucket category improves the results, while providing useful suggestions, where Precision has a 5-fold improvement, while there are 2-3 fold im- provements in Recall and 4-fold improvements in F1-score. The previous methodology is combined with PLM-ICD, in order to increase the num- ber of probably useful suggestions of ICD-10 codes. The results show that the use of PLM-ICD-C, consisting of the improved cosine method and PLM-ICD, improves the re- sults, increasing the F1-score by 0.5%, but most important, by increasing the Precision from 46.3% to 50%, which means a significant improvement on the code suggestions given to the medical doctors performing encoding functions.
publishDate 2024
dc.date.none.fl_str_mv 2024-08-27T10:36:25Z
2024-07-25
2024-07-25T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10400.26/51918
urn:tid:203676963
url http://hdl.handle.net/10400.26/51918
identifier_str_mv urn:tid:203676963
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron:RCAAP
instname_str FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron_str RCAAP
institution RCAAP
reponame_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
collection Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository.name.fl_str_mv Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
repository.mail.fl_str_mv info@rcaap.pt
_version_ 1833602780424568832