Detalhes bibliográficos
Ano de defesa: |
2016 |
Autor(a) principal: |
Veiga, Allan Koch |
Orientador(a): |
Não Informado pela instituição |
Banca de defesa: |
Não Informado pela instituição |
Tipo de documento: |
Tese
|
Tipo de acesso: |
Acesso aberto |
Idioma: |
eng |
Instituição de defesa: |
Biblioteca Digitais de Teses e Dissertações da USP
|
Programa de Pós-Graduação: |
Não Informado pela instituição
|
Departamento: |
Não Informado pela instituição
|
País: |
Não Informado pela instituição
|
Palavras-chave em Português: |
|
Link de acesso: |
http://www.teses.usp.br/teses/disponiveis/3/3141/tde-17032017-085248/
|
Resumo: |
The increasing availability of digitized biodiversity data worldwide, provided by an increasing number of sources, and the growing use of those data for a variety of purposes have raised concerns related to the \"fitness for use\" of such data and the impact of data quality (DQ) on outcomes of analyses, reports and decisions making. A consistent approach to assess and manage DQ is currently critical for biodiversity data users. However, achieving this goal has been particularly challenging because of the idiosyncrasies inherent to the concept of quality. DQ assessment and management cannot be suitably carried out if we have not clearly established the meaning of quality according to the data user\'s standpoint. This thesis presents a formal conceptual framework to support the Biodiversity Informatics (BI) community to consistently describe the meaning of data \"fitness for use\". Principles behind data fitness for use are used to establish a formal and common ground for the collaborative definition of DQ needs, solutions and reports useful for DQ assessment and management. Based on the study of the DQ domain and its contextualization in the BI domain, which involved discussions with experts in DQ and BI in an iterative process, a comprehensive framework was designed and formalized. The framework defines eight fundamental concepts and 21 derived concepts, organized into three classes: DQ Needs, DQ Solutions and DQ Report. The concepts of each class describe, respectively, the meaning of DQ in a given context, the methods and tools that can serve as solutions for meeting DQ needs, and reports that present the current status of quality of a data resource. The formalization of the framework was presented using conceptual maps notation and sets theory notation. In order to validate the framework, we present a proof of concept based on a case study conducted at the Museum of Comparative Zoology of Harvard University. The tools FP-Akka Kurator and the BDQ Toolkit were used in the case study to perform DQ measures, validations and improvements in a dataset of the Arizona State University Hasbrouck Insect Collection. The results illustrate how the framework enables data users to assess and manage DQ of datasets and single records using quality control and quality assurance approaches. The proof of concept has also shown that the framework is adequately formalized and flexible, and sufficiently complete for defining DQ needs, solutions and reports in the BI domain. The framework is able of formalizing human thinking into well-defined components to make it possible sharing and reusing definitions of DQ in different scenarios, describing and finding DQ tools and services, and communicating the current status of quality of data in a standardized format among the stakeholders. In addition, the framework supports the players of that community to join efforts on the collaborative gathering and developing of the necessary components for the DQ assessment and management in different contexts. The framework is also the foundation of a Task Group on Data Quality, under the auspices of the Biodiversity Information Standards (TDWG) and the Global Biodiversity Information Facility (GBIF) and is being used to help collect user\'s needs on data quality on agrobiodiversity and on species distributed modeling, initially. In future work, we plan to use the framework to engage the BI community to formalize and share DQ profiles related to a number of other data usages, to recommend methods, guidelines, protocols, metadata schemas and controlled vocabulary for supporting data fitness for use assessment and management in distributed system and data environments. In addition, we plan to build a platform based on the framework to serve as a common backbone for registering and retrieving DQ concepts, such as DQ profiles, methods, tools and reports. |