Using data mining for prediction of hospital length of stay: an application of the CRISP-DM Methodology
Main Author: | |
---|---|
Publication Date: | 2015 |
Other Authors: | , |
Language: | eng |
Source: | Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
Download full: | http://hdl.handle.net/1822/38196 |
Summary: | Hospitals are nowadays collecting vast amounts of data related with patient records. All this data hold valuable knowledge that can be used to improve hospital decision making. Data mining techniques aim precisely at the extraction of useful knowledge from raw data. This work describes an implementation of a medical data mining project approach based on the CRISP-DM methodology. Recent real-world data, from 2000 to 2013, were collected from a Portuguese hospital and related with inpatient hospitalization. The goal was to predict generic hospital Length Of Stay based on indicators that are commonly available at the hospitalization process (e.g., gender, age, episode type, medical specialty). At the data preparation stage, the data were cleaned and variables were selected and transformed, leading to 14 inputs. Next, at the modeling stage, a regression approach was adopted, where six learning methods were compared: Average Prediction, Multiple Regression, Decision Tree, Artificial Neural Network ensemble, Support Vector Machine and Random Forest. The best learning model was obtained by the Random Forest method, which presents a high quality coefficient of determination value (0.81). This model was then opened by using a sensitivity analysis procedure that revealed three influential input attributes: the hospital episode type, the physical service where the patient is hospitalized and the associated medical specialty. Such extracted knowledge confirmed that the obtained predictive model is credible and with potential value for supporting decisions of hospital managers. |
id |
RCAP_1f521e69c064e94504baa1e27cfc5cdc |
---|---|
oai_identifier_str |
oai:repositorium.sdum.uminho.pt:1822/38196 |
network_acronym_str |
RCAP |
network_name_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
repository_id_str |
https://opendoar.ac.uk/repository/7160 |
spelling |
Using data mining for prediction of hospital length of stay: an application of the CRISP-DM MethodologyMedical data miningHospitalization processLength of stayCRISP-DMRegressionRandom forestEngenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e InformáticaScience & TechnologyHospitals are nowadays collecting vast amounts of data related with patient records. All this data hold valuable knowledge that can be used to improve hospital decision making. Data mining techniques aim precisely at the extraction of useful knowledge from raw data. This work describes an implementation of a medical data mining project approach based on the CRISP-DM methodology. Recent real-world data, from 2000 to 2013, were collected from a Portuguese hospital and related with inpatient hospitalization. The goal was to predict generic hospital Length Of Stay based on indicators that are commonly available at the hospitalization process (e.g., gender, age, episode type, medical specialty). At the data preparation stage, the data were cleaned and variables were selected and transformed, leading to 14 inputs. Next, at the modeling stage, a regression approach was adopted, where six learning methods were compared: Average Prediction, Multiple Regression, Decision Tree, Artificial Neural Network ensemble, Support Vector Machine and Random Forest. The best learning model was obtained by the Random Forest method, which presents a high quality coefficient of determination value (0.81). This model was then opened by using a sensitivity analysis procedure that revealed three influential input attributes: the hospital episode type, the physical service where the patient is hospitalized and the associated medical specialty. Such extracted knowledge confirmed that the obtained predictive model is credible and with potential value for supporting decisions of hospital managers.SpringerUniversidade do MinhoCaetano, NunoCortez, PauloLaureano, Raul2015-092015-09-01T00:00:00Zconference paperinfo:eu-repo/semantics/publishedVersionapplication/pdfhttp://hdl.handle.net/1822/38196engCaetano, N., Cortez, P., & Laureano, R. S. (2015). Using Data Mining for Prediction of Hospital Length of Stay: An Application of the CRISP-DM Methodology. In J. Cordeiro, S. Hammoudi, L. Maciaszek, O. Camp & J. Filipe (Eds.), Enterprise Information Systems (Vol. 227, pp. 149-166): Springer International Publishing.978-3-319-22347-61865-134810.1007/978-3-319-22348-3_9The original publication is available at : http://link.springer.com/chapter/10.1007%2F978-3-319-22348-3_9#info:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2024-05-11T04:58:15Zoai:repositorium.sdum.uminho.pt:1822/38196Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-28T15:04:01.345607Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse |
dc.title.none.fl_str_mv |
Using data mining for prediction of hospital length of stay: an application of the CRISP-DM Methodology |
title |
Using data mining for prediction of hospital length of stay: an application of the CRISP-DM Methodology |
spellingShingle |
Using data mining for prediction of hospital length of stay: an application of the CRISP-DM Methodology Caetano, Nuno Medical data mining Hospitalization process Length of stay CRISP-DM Regression Random forest Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática Science & Technology |
title_short |
Using data mining for prediction of hospital length of stay: an application of the CRISP-DM Methodology |
title_full |
Using data mining for prediction of hospital length of stay: an application of the CRISP-DM Methodology |
title_fullStr |
Using data mining for prediction of hospital length of stay: an application of the CRISP-DM Methodology |
title_full_unstemmed |
Using data mining for prediction of hospital length of stay: an application of the CRISP-DM Methodology |
title_sort |
Using data mining for prediction of hospital length of stay: an application of the CRISP-DM Methodology |
author |
Caetano, Nuno |
author_facet |
Caetano, Nuno Cortez, Paulo Laureano, Raul |
author_role |
author |
author2 |
Cortez, Paulo Laureano, Raul |
author2_role |
author author |
dc.contributor.none.fl_str_mv |
Universidade do Minho |
dc.contributor.author.fl_str_mv |
Caetano, Nuno Cortez, Paulo Laureano, Raul |
dc.subject.por.fl_str_mv |
Medical data mining Hospitalization process Length of stay CRISP-DM Regression Random forest Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática Science & Technology |
topic |
Medical data mining Hospitalization process Length of stay CRISP-DM Regression Random forest Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática Science & Technology |
description |
Hospitals are nowadays collecting vast amounts of data related with patient records. All this data hold valuable knowledge that can be used to improve hospital decision making. Data mining techniques aim precisely at the extraction of useful knowledge from raw data. This work describes an implementation of a medical data mining project approach based on the CRISP-DM methodology. Recent real-world data, from 2000 to 2013, were collected from a Portuguese hospital and related with inpatient hospitalization. The goal was to predict generic hospital Length Of Stay based on indicators that are commonly available at the hospitalization process (e.g., gender, age, episode type, medical specialty). At the data preparation stage, the data were cleaned and variables were selected and transformed, leading to 14 inputs. Next, at the modeling stage, a regression approach was adopted, where six learning methods were compared: Average Prediction, Multiple Regression, Decision Tree, Artificial Neural Network ensemble, Support Vector Machine and Random Forest. The best learning model was obtained by the Random Forest method, which presents a high quality coefficient of determination value (0.81). This model was then opened by using a sensitivity analysis procedure that revealed three influential input attributes: the hospital episode type, the physical service where the patient is hospitalized and the associated medical specialty. Such extracted knowledge confirmed that the obtained predictive model is credible and with potential value for supporting decisions of hospital managers. |
publishDate |
2015 |
dc.date.none.fl_str_mv |
2015-09 2015-09-01T00:00:00Z |
dc.type.driver.fl_str_mv |
conference paper |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/1822/38196 |
url |
http://hdl.handle.net/1822/38196 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
Caetano, N., Cortez, P., & Laureano, R. S. (2015). Using Data Mining for Prediction of Hospital Length of Stay: An Application of the CRISP-DM Methodology. In J. Cordeiro, S. Hammoudi, L. Maciaszek, O. Camp & J. Filipe (Eds.), Enterprise Information Systems (Vol. 227, pp. 149-166): Springer International Publishing. 978-3-319-22347-6 1865-1348 10.1007/978-3-319-22348-3_9 The original publication is available at : http://link.springer.com/chapter/10.1007%2F978-3-319-22348-3_9# |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
Springer |
publisher.none.fl_str_mv |
Springer |
dc.source.none.fl_str_mv |
reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia instacron:RCAAP |
instname_str |
FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
collection |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
repository.name.fl_str_mv |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
repository.mail.fl_str_mv |
info@rcaap.pt |
_version_ |
1833595078211272704 |