Uma abordagem semiautomática dirigida a métricas para avaliação da qualidade de datasets conectados
Ano de defesa: | 2017 |
---|---|
Autor(a) principal: | |
Orientador(a): | |
Banca de defesa: | |
Tipo de documento: | Dissertação |
Tipo de acesso: | Acesso aberto |
Idioma: | por |
Instituição de defesa: |
Universidade Federal de Alagoas
Brasil Programa de Pós-Graduação em Informática UFAL |
Programa de Pós-Graduação: |
Não Informado pela instituição
|
Departamento: |
Não Informado pela instituição
|
País: |
Não Informado pela instituição
|
Palavras-chave em Português: | |
Link de acesso: | http://www.repositorio.ufal.br/handle/riufal/1779 |
Resumo: | Linked Data has contributed to a lot of information on the Web represented in structured formats and linked to other information. The main purpose of linked data initiatives is to create knowledge by linking scattered and relational data. The current Linked Open Data Cloud (LOD Cloud) consists of more than 50 billion facts represented as RDF triples. This information belongs to a large number of covering various domains, such as science, geography, government, etc. However, recent studies show that most of these datasets suffer from various data quality problems, such as representational problems, inconsistency problems, and interoperability issues. These problems make data interpretation difficult and affect the quality of the results. In this way, a challenge in the area is to analyze the quality of linked datasets and make it explicit. This work aims to create a computational solution based on quality dimensions and best practices for publishing that performs the semiautomatic verification and validation of the quality of linked datasets. For this, quality dimensions were analyzed and correlated to the best practices of data quality contained in the documents, “Data on the Web Best Practices” and “Best Practices for Publishing Linked Data”. To validate the proposal, an experiment was carried out to evaluate the developed solution and identify if it makes the evaluation of the quality of linked datasets more efficient by comparing the semiautomatic computational solution proposed in this dissertation to the manual approach of quality evaluation of linked datasets. As a result, a semiautomatic solution is expected to be an efficient way of performing the quality evaluation of a linked dataset and reduce the evaluation time as well as the workload of the user. The contribution of this dissertation is to provide an evaluation alternative focused on the best practices of the W3C, based on the quality dimensions existing in the literature. |