Codification of clinical episodes in natural language
| Main Author: | |
|---|---|
| Publication Date: | 2024 |
| Format: | Master thesis |
| Language: | eng |
| Source: | Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| Download full: | http://hdl.handle.net/10400.26/51918 |
Summary: | The International Classification of Diseases, 10th Revision (ICD-10), has been widely used to classify patient diagnostic information. Encoding pathologies of clinical episodes into ICD-10 codes is a laborious task, usually done by dedicated physicians with spe- cific training. Automatically classifying Electronic Health Records (EHR) from text into diagnostic codes has been challenging to the Natural Language Processing (NLP) community. This work presents a literature review on the subject of coding clinical episodes in nat- ural language, namely the main problems and barriers that affect it, the use of natural language processing in parallel with ontologies, use of NLP in the area of healthcare, automatic ICD-10 coding, use of Pretrained Language Models (PLM), as well as devel- oped works to solve the clinical abbreviation problem, and detection of clinical symp- tom negation. It also intends to propose the method PLM-ICD-C based on the cosine similarity, to process EHRs with natural language texts, in order to give useful suggestions of ICD- 10 codes for the coders, aiming to facilitate the process. For that it is proposed a tech- nique of multiple runs and a bucket category strategy, applied to the Medical Informa- tion Mart for Intensive Care (MIMIC)-IV dataset. The results show that the strategy of using the concept of bucket category improves the results, while providing useful suggestions, where Precision has a 5-fold improvement, while there are 2-3 fold im- provements in Recall and 4-fold improvements in F1-score. The previous methodology is combined with PLM-ICD, in order to increase the num- ber of probably useful suggestions of ICD-10 codes. The results show that the use of PLM-ICD-C, consisting of the improved cosine method and PLM-ICD, improves the re- sults, increasing the F1-score by 0.5%, but most important, by increasing the Precision from 46.3% to 50%, which means a significant improvement on the code suggestions given to the medical doctors performing encoding functions. |
| id |
RCAP_b0386c65f78ba534ab5feb0666ddb2b0 |
|---|---|
| oai_identifier_str |
oai:comum.rcaap.pt:10400.26/51918 |
| network_acronym_str |
RCAP |
| network_name_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| repository_id_str |
https://opendoar.ac.uk/repository/7160 |
| spelling |
Codification of clinical episodes in natural languageCodificação AutomáticaEpisódios ClínicosICD-10NLPPLMPLM-ICD-CRegisto de saúde eletrónicoSimilaridade do CossenoThe International Classification of Diseases, 10th Revision (ICD-10), has been widely used to classify patient diagnostic information. Encoding pathologies of clinical episodes into ICD-10 codes is a laborious task, usually done by dedicated physicians with spe- cific training. Automatically classifying Electronic Health Records (EHR) from text into diagnostic codes has been challenging to the Natural Language Processing (NLP) community. This work presents a literature review on the subject of coding clinical episodes in nat- ural language, namely the main problems and barriers that affect it, the use of natural language processing in parallel with ontologies, use of NLP in the area of healthcare, automatic ICD-10 coding, use of Pretrained Language Models (PLM), as well as devel- oped works to solve the clinical abbreviation problem, and detection of clinical symp- tom negation. It also intends to propose the method PLM-ICD-C based on the cosine similarity, to process EHRs with natural language texts, in order to give useful suggestions of ICD- 10 codes for the coders, aiming to facilitate the process. For that it is proposed a tech- nique of multiple runs and a bucket category strategy, applied to the Medical Informa- tion Mart for Intensive Care (MIMIC)-IV dataset. The results show that the strategy of using the concept of bucket category improves the results, while providing useful suggestions, where Precision has a 5-fold improvement, while there are 2-3 fold im- provements in Recall and 4-fold improvements in F1-score. The previous methodology is combined with PLM-ICD, in order to increase the num- ber of probably useful suggestions of ICD-10 codes. The results show that the use of PLM-ICD-C, consisting of the improved cosine method and PLM-ICD, improves the re- sults, increasing the F1-score by 0.5%, but most important, by increasing the Precision from 46.3% to 50%, which means a significant improvement on the code suggestions given to the medical doctors performing encoding functions.Mendes, Mateus Daniel AlmeidaRepositório ComumSilva, Hugo Filipe da Fonseca e2024-08-27T10:36:25Z2024-07-252024-07-25T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10400.26/51918urn:tid:203676963enginfo:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2025-05-02T11:26:02Zoai:comum.rcaap.pt:10400.26/51918Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-29T06:46:22.539924Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse |
| dc.title.none.fl_str_mv |
Codification of clinical episodes in natural language |
| title |
Codification of clinical episodes in natural language |
| spellingShingle |
Codification of clinical episodes in natural language Silva, Hugo Filipe da Fonseca e Codificação Automática Episódios Clínicos ICD-10 NLP PLM PLM-ICD-C Registo de saúde eletrónico Similaridade do Cosseno |
| title_short |
Codification of clinical episodes in natural language |
| title_full |
Codification of clinical episodes in natural language |
| title_fullStr |
Codification of clinical episodes in natural language |
| title_full_unstemmed |
Codification of clinical episodes in natural language |
| title_sort |
Codification of clinical episodes in natural language |
| author |
Silva, Hugo Filipe da Fonseca e |
| author_facet |
Silva, Hugo Filipe da Fonseca e |
| author_role |
author |
| dc.contributor.none.fl_str_mv |
Mendes, Mateus Daniel Almeida Repositório Comum |
| dc.contributor.author.fl_str_mv |
Silva, Hugo Filipe da Fonseca e |
| dc.subject.por.fl_str_mv |
Codificação Automática Episódios Clínicos ICD-10 NLP PLM PLM-ICD-C Registo de saúde eletrónico Similaridade do Cosseno |
| topic |
Codificação Automática Episódios Clínicos ICD-10 NLP PLM PLM-ICD-C Registo de saúde eletrónico Similaridade do Cosseno |
| description |
The International Classification of Diseases, 10th Revision (ICD-10), has been widely used to classify patient diagnostic information. Encoding pathologies of clinical episodes into ICD-10 codes is a laborious task, usually done by dedicated physicians with spe- cific training. Automatically classifying Electronic Health Records (EHR) from text into diagnostic codes has been challenging to the Natural Language Processing (NLP) community. This work presents a literature review on the subject of coding clinical episodes in nat- ural language, namely the main problems and barriers that affect it, the use of natural language processing in parallel with ontologies, use of NLP in the area of healthcare, automatic ICD-10 coding, use of Pretrained Language Models (PLM), as well as devel- oped works to solve the clinical abbreviation problem, and detection of clinical symp- tom negation. It also intends to propose the method PLM-ICD-C based on the cosine similarity, to process EHRs with natural language texts, in order to give useful suggestions of ICD- 10 codes for the coders, aiming to facilitate the process. For that it is proposed a tech- nique of multiple runs and a bucket category strategy, applied to the Medical Informa- tion Mart for Intensive Care (MIMIC)-IV dataset. The results show that the strategy of using the concept of bucket category improves the results, while providing useful suggestions, where Precision has a 5-fold improvement, while there are 2-3 fold im- provements in Recall and 4-fold improvements in F1-score. The previous methodology is combined with PLM-ICD, in order to increase the num- ber of probably useful suggestions of ICD-10 codes. The results show that the use of PLM-ICD-C, consisting of the improved cosine method and PLM-ICD, improves the re- sults, increasing the F1-score by 0.5%, but most important, by increasing the Precision from 46.3% to 50%, which means a significant improvement on the code suggestions given to the medical doctors performing encoding functions. |
| publishDate |
2024 |
| dc.date.none.fl_str_mv |
2024-08-27T10:36:25Z 2024-07-25 2024-07-25T00:00:00Z |
| dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
| dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
| format |
masterThesis |
| status_str |
publishedVersion |
| dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10400.26/51918 urn:tid:203676963 |
| url |
http://hdl.handle.net/10400.26/51918 |
| identifier_str_mv |
urn:tid:203676963 |
| dc.language.iso.fl_str_mv |
eng |
| language |
eng |
| dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
| eu_rights_str_mv |
openAccess |
| dc.format.none.fl_str_mv |
application/pdf |
| dc.source.none.fl_str_mv |
reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia instacron:RCAAP |
| instname_str |
FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
| instacron_str |
RCAAP |
| institution |
RCAAP |
| reponame_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| collection |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| repository.name.fl_str_mv |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
| repository.mail.fl_str_mv |
info@rcaap.pt |
| _version_ |
1833602780424568832 |