Using data mining for prediction of hospital length of stay: an application of the CRISP-DM Methodology

Bibliographic Details
Main Author: Caetano, Nuno
Publication Date: 2015
Other Authors: Cortez, Paulo, Laureano, Raul
Language: eng
Source: Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
Download full: http://hdl.handle.net/1822/38196
Summary: Hospitals are nowadays collecting vast amounts of data related with patient records. All this data hold valuable knowledge that can be used to improve hospital decision making. Data mining techniques aim precisely at the extraction of useful knowledge from raw data. This work describes an implementation of a medical data mining project approach based on the CRISP-DM methodology. Recent real-world data, from 2000 to 2013, were collected from a Portuguese hospital and related with inpatient hospitalization. The goal was to predict generic hospital Length Of Stay based on indicators that are commonly available at the hospitalization process (e.g., gender, age, episode type, medical specialty). At the data preparation stage, the data were cleaned and variables were selected and transformed, leading to 14 inputs. Next, at the modeling stage, a regression approach was adopted, where six learning methods were compared: Average Prediction, Multiple Regression, Decision Tree, Artificial Neural Network ensemble, Support Vector Machine and Random Forest. The best learning model was obtained by the Random Forest method, which presents a high quality coefficient of determination value (0.81). This model was then opened by using a sensitivity analysis procedure that revealed three influential input attributes: the hospital episode type, the physical service where the patient is hospitalized and the associated medical specialty. Such extracted knowledge confirmed that the obtained predictive model is credible and with potential value for supporting decisions of hospital managers.
id RCAP_1f521e69c064e94504baa1e27cfc5cdc
oai_identifier_str oai:repositorium.sdum.uminho.pt:1822/38196
network_acronym_str RCAP
network_name_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository_id_str https://opendoar.ac.uk/repository/7160
spelling Using data mining for prediction of hospital length of stay: an application of the CRISP-DM MethodologyMedical data miningHospitalization processLength of stayCRISP-DMRegressionRandom forestEngenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e InformáticaScience & TechnologyHospitals are nowadays collecting vast amounts of data related with patient records. All this data hold valuable knowledge that can be used to improve hospital decision making. Data mining techniques aim precisely at the extraction of useful knowledge from raw data. This work describes an implementation of a medical data mining project approach based on the CRISP-DM methodology. Recent real-world data, from 2000 to 2013, were collected from a Portuguese hospital and related with inpatient hospitalization. The goal was to predict generic hospital Length Of Stay based on indicators that are commonly available at the hospitalization process (e.g., gender, age, episode type, medical specialty). At the data preparation stage, the data were cleaned and variables were selected and transformed, leading to 14 inputs. Next, at the modeling stage, a regression approach was adopted, where six learning methods were compared: Average Prediction, Multiple Regression, Decision Tree, Artificial Neural Network ensemble, Support Vector Machine and Random Forest. The best learning model was obtained by the Random Forest method, which presents a high quality coefficient of determination value (0.81). This model was then opened by using a sensitivity analysis procedure that revealed three influential input attributes: the hospital episode type, the physical service where the patient is hospitalized and the associated medical specialty. Such extracted knowledge confirmed that the obtained predictive model is credible and with potential value for supporting decisions of hospital managers.SpringerUniversidade do MinhoCaetano, NunoCortez, PauloLaureano, Raul2015-092015-09-01T00:00:00Zconference paperinfo:eu-repo/semantics/publishedVersionapplication/pdfhttp://hdl.handle.net/1822/38196engCaetano, N., Cortez, P., & Laureano, R. S. (2015). Using Data Mining for Prediction of Hospital Length of Stay: An Application of the CRISP-DM Methodology. In J. Cordeiro, S. Hammoudi, L. Maciaszek, O. Camp & J. Filipe (Eds.), Enterprise Information Systems (Vol. 227, pp. 149-166): Springer International Publishing.978-3-319-22347-61865-134810.1007/978-3-319-22348-3_9The original publication is available at : http://link.springer.com/chapter/10.1007%2F978-3-319-22348-3_9#info:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2024-05-11T04:58:15Zoai:repositorium.sdum.uminho.pt:1822/38196Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-28T15:04:01.345607Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse
dc.title.none.fl_str_mv Using data mining for prediction of hospital length of stay: an application of the CRISP-DM Methodology
title Using data mining for prediction of hospital length of stay: an application of the CRISP-DM Methodology
spellingShingle Using data mining for prediction of hospital length of stay: an application of the CRISP-DM Methodology
Caetano, Nuno
Medical data mining
Hospitalization process
Length of stay
CRISP-DM
Regression
Random forest
Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
Science & Technology
title_short Using data mining for prediction of hospital length of stay: an application of the CRISP-DM Methodology
title_full Using data mining for prediction of hospital length of stay: an application of the CRISP-DM Methodology
title_fullStr Using data mining for prediction of hospital length of stay: an application of the CRISP-DM Methodology
title_full_unstemmed Using data mining for prediction of hospital length of stay: an application of the CRISP-DM Methodology
title_sort Using data mining for prediction of hospital length of stay: an application of the CRISP-DM Methodology
author Caetano, Nuno
author_facet Caetano, Nuno
Cortez, Paulo
Laureano, Raul
author_role author
author2 Cortez, Paulo
Laureano, Raul
author2_role author
author
dc.contributor.none.fl_str_mv Universidade do Minho
dc.contributor.author.fl_str_mv Caetano, Nuno
Cortez, Paulo
Laureano, Raul
dc.subject.por.fl_str_mv Medical data mining
Hospitalization process
Length of stay
CRISP-DM
Regression
Random forest
Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
Science & Technology
topic Medical data mining
Hospitalization process
Length of stay
CRISP-DM
Regression
Random forest
Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática
Science & Technology
description Hospitals are nowadays collecting vast amounts of data related with patient records. All this data hold valuable knowledge that can be used to improve hospital decision making. Data mining techniques aim precisely at the extraction of useful knowledge from raw data. This work describes an implementation of a medical data mining project approach based on the CRISP-DM methodology. Recent real-world data, from 2000 to 2013, were collected from a Portuguese hospital and related with inpatient hospitalization. The goal was to predict generic hospital Length Of Stay based on indicators that are commonly available at the hospitalization process (e.g., gender, age, episode type, medical specialty). At the data preparation stage, the data were cleaned and variables were selected and transformed, leading to 14 inputs. Next, at the modeling stage, a regression approach was adopted, where six learning methods were compared: Average Prediction, Multiple Regression, Decision Tree, Artificial Neural Network ensemble, Support Vector Machine and Random Forest. The best learning model was obtained by the Random Forest method, which presents a high quality coefficient of determination value (0.81). This model was then opened by using a sensitivity analysis procedure that revealed three influential input attributes: the hospital episode type, the physical service where the patient is hospitalized and the associated medical specialty. Such extracted knowledge confirmed that the obtained predictive model is credible and with potential value for supporting decisions of hospital managers.
publishDate 2015
dc.date.none.fl_str_mv 2015-09
2015-09-01T00:00:00Z
dc.type.driver.fl_str_mv conference paper
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/1822/38196
url http://hdl.handle.net/1822/38196
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv Caetano, N., Cortez, P., & Laureano, R. S. (2015). Using Data Mining for Prediction of Hospital Length of Stay: An Application of the CRISP-DM Methodology. In J. Cordeiro, S. Hammoudi, L. Maciaszek, O. Camp & J. Filipe (Eds.), Enterprise Information Systems (Vol. 227, pp. 149-166): Springer International Publishing.
978-3-319-22347-6
1865-1348
10.1007/978-3-319-22348-3_9
The original publication is available at : http://link.springer.com/chapter/10.1007%2F978-3-319-22348-3_9#
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Springer
publisher.none.fl_str_mv Springer
dc.source.none.fl_str_mv reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron:RCAAP
instname_str FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron_str RCAAP
institution RCAAP
reponame_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
collection Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository.name.fl_str_mv Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
repository.mail.fl_str_mv info@rcaap.pt
_version_ 1833595078211272704