Detalhes bibliográficos
Ano de defesa: |
2011 |
Autor(a) principal: |
Pinheiro, João Carlos |
Orientador(a): |
Não Informado pela instituição |
Banca de defesa: |
Não Informado pela instituição |
Tipo de documento: |
Tese
|
Tipo de acesso: |
Acesso aberto |
Idioma: |
por |
Instituição de defesa: |
Não Informado pela instituição
|
Programa de Pós-Graduação: |
Não Informado pela instituição
|
Departamento: |
Não Informado pela instituição
|
País: |
Não Informado pela instituição
|
Palavras-chave em Português: |
|
Link de acesso: |
http://www.repositorio.ufc.br/handle/riufc/61237
|
Resumo: |
The Web evolved from a global information space of hypertext to the Linked Data network, also known as Web of Data. The use of RDF, one of the cornerstones of the Semantic Web, has been crucial for storage and publication of Linked Data accessible via SPARQL endpoint through the SPARQL query language, that allows answering distributed queries which could not be answered by a single data source or even search engines on the Web. However the difficulty of distributed query formulation has been an obstacle to take advantage of these data because of the autonomy, distribution and heterogeneous vocabulary of data sources. This scenario confirms the need for efficient mechanisms for data integration that can leverage the reuse of such data simply and efficiently. In that context, this work presents a framework based on a mediator for Linked Data integration accessible via SPARQL endpoint where global schema is represented by a domain ontology, which provides a shared vocabulary. Each data source, published on the Web according to the Linked Data principles, is described by an application ontology, whose vocabulary is restricted to be a subset of the domain ontology vocabulary. Inside this context, this work proposes a method for processing distributed SPARQL queries, including: a) an algorithm for query reformulation in which two key questions are addressed: the search for data only to data sources that may contribute with any intermediate result, without appeal to inference mechanisms for query expansion, and the use of same-as and URI-links to deal with incomplete information, b) the execution step explores algorithms and techniques that enable the reduction in the volume of intermediate data, parallel query processing, pull and push models for delivery of data and processing that combines adaptive join algorithms proficiently. These techniques are essential in the highly dynamic environment of the Linked Data, which have two characteristics that challenge the distributed SPARQL query evaluation: a large scale and unpredictability in time data delivery. The optimization strategy was evaluated through several experiments, and the results provide empirical evidence of its scalability and performance gains for data integration. |