Probabilistic change detection and visualization methods for the assessment of temporal stability in biomedical data quality
Autor(a) principal: | |
---|---|
Data de Publicação: | 2015 |
Outros Autores: | , , , |
Tipo de documento: | Artigo |
Idioma: | eng |
Título da fonte: | Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
Texto Completo: | http://repositorio.inesctec.pt/handle/123456789/5306 http://dx.doi.org/10.1007/s10618-014-0378-6 |
Resumo: | Knowledge discovery on biomedical data can be based on on-line, data-stream analyses, or using retrospective, timestamped, off-line datasets. In both cases, changes in the processes that generate data or in their quality features through time may hinder either the knowledge discovery process or the generalization of past knowledge. These problems can be seen as a lack of data temporal stability. This work establishes the temporal stability as a data quality dimension and proposes new methods for its assessment based on a probabilistic framework. Concretely, methods are proposed for (1) monitoring changes, and (2) characterizing changes, trends and detecting temporal subgroups. First, a probabilistic change detection algorithm is proposed based on the Statistical Process Control of the posterior Beta distribution of the Jensen-Shannon distance, with a memoryless forgetting mechanism. This algorithm (PDF-SPC) classifies the degree of current change in three states: In-Control, Warning, and Out-of-Control. Second, a novel method is proposed to visualize and characterize the temporal changes of data based on the projection of a non-parametric information-geometric statistical manifold of time windows. This projection facilitates the exploration of temporal trends using the proposed IGT-plot and, by means of unsupervised learning methods, discovering conceptually-related temporal subgroups. Methods are evaluated using real and simulated data based on the National Hospital Discharge Survey (NHDS) dataset. |
id |
RCAP_f86a996e888d17f85f3255b7cb6168cd |
---|---|
oai_identifier_str |
oai:repositorio.inesctec.pt:123456789/5306 |
network_acronym_str |
RCAP |
network_name_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
repository_id_str |
https://opendoar.ac.uk/repository/7160 |
spelling |
Probabilistic change detection and visualization methods for the assessment of temporal stability in biomedical data qualityKnowledge discovery on biomedical data can be based on on-line, data-stream analyses, or using retrospective, timestamped, off-line datasets. In both cases, changes in the processes that generate data or in their quality features through time may hinder either the knowledge discovery process or the generalization of past knowledge. These problems can be seen as a lack of data temporal stability. This work establishes the temporal stability as a data quality dimension and proposes new methods for its assessment based on a probabilistic framework. Concretely, methods are proposed for (1) monitoring changes, and (2) characterizing changes, trends and detecting temporal subgroups. First, a probabilistic change detection algorithm is proposed based on the Statistical Process Control of the posterior Beta distribution of the Jensen-Shannon distance, with a memoryless forgetting mechanism. This algorithm (PDF-SPC) classifies the degree of current change in three states: In-Control, Warning, and Out-of-Control. Second, a novel method is proposed to visualize and characterize the temporal changes of data based on the projection of a non-parametric information-geometric statistical manifold of time windows. This projection facilitates the exploration of temporal trends using the proposed IGT-plot and, by means of unsupervised learning methods, discovering conceptually-related temporal subgroups. Methods are evaluated using real and simulated data based on the National Hospital Discharge Survey (NHDS) dataset.2018-01-03T10:34:35Z2015-01-01T00:00:00Z2015info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://repositorio.inesctec.pt/handle/123456789/5306http://dx.doi.org/10.1007/s10618-014-0378-6engSaez,CRodrigues,PJoão GamaRobles,MGarcia Gomez,JMinfo:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2024-10-12T02:22:18Zoai:repositorio.inesctec.pt:123456789/5306Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-28T18:58:10.991331Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse |
dc.title.none.fl_str_mv |
Probabilistic change detection and visualization methods for the assessment of temporal stability in biomedical data quality |
title |
Probabilistic change detection and visualization methods for the assessment of temporal stability in biomedical data quality |
spellingShingle |
Probabilistic change detection and visualization methods for the assessment of temporal stability in biomedical data quality Saez,C |
title_short |
Probabilistic change detection and visualization methods for the assessment of temporal stability in biomedical data quality |
title_full |
Probabilistic change detection and visualization methods for the assessment of temporal stability in biomedical data quality |
title_fullStr |
Probabilistic change detection and visualization methods for the assessment of temporal stability in biomedical data quality |
title_full_unstemmed |
Probabilistic change detection and visualization methods for the assessment of temporal stability in biomedical data quality |
title_sort |
Probabilistic change detection and visualization methods for the assessment of temporal stability in biomedical data quality |
author |
Saez,C |
author_facet |
Saez,C Rodrigues,P João Gama Robles,M Garcia Gomez,JM |
author_role |
author |
author2 |
Rodrigues,P João Gama Robles,M Garcia Gomez,JM |
author2_role |
author author author author |
dc.contributor.author.fl_str_mv |
Saez,C Rodrigues,P João Gama Robles,M Garcia Gomez,JM |
description |
Knowledge discovery on biomedical data can be based on on-line, data-stream analyses, or using retrospective, timestamped, off-line datasets. In both cases, changes in the processes that generate data or in their quality features through time may hinder either the knowledge discovery process or the generalization of past knowledge. These problems can be seen as a lack of data temporal stability. This work establishes the temporal stability as a data quality dimension and proposes new methods for its assessment based on a probabilistic framework. Concretely, methods are proposed for (1) monitoring changes, and (2) characterizing changes, trends and detecting temporal subgroups. First, a probabilistic change detection algorithm is proposed based on the Statistical Process Control of the posterior Beta distribution of the Jensen-Shannon distance, with a memoryless forgetting mechanism. This algorithm (PDF-SPC) classifies the degree of current change in three states: In-Control, Warning, and Out-of-Control. Second, a novel method is proposed to visualize and characterize the temporal changes of data based on the projection of a non-parametric information-geometric statistical manifold of time windows. This projection facilitates the exploration of temporal trends using the proposed IGT-plot and, by means of unsupervised learning methods, discovering conceptually-related temporal subgroups. Methods are evaluated using real and simulated data based on the National Hospital Discharge Survey (NHDS) dataset. |
publishDate |
2015 |
dc.date.none.fl_str_mv |
2015-01-01T00:00:00Z 2015 2018-01-03T10:34:35Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://repositorio.inesctec.pt/handle/123456789/5306 http://dx.doi.org/10.1007/s10618-014-0378-6 |
url |
http://repositorio.inesctec.pt/handle/123456789/5306 http://dx.doi.org/10.1007/s10618-014-0378-6 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia instacron:RCAAP |
instname_str |
FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
collection |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
repository.name.fl_str_mv |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
repository.mail.fl_str_mv |
info@rcaap.pt |
_version_ |
1833597787239874560 |