Detalhes bibliográficos
Ano de defesa: |
2023 |
Autor(a) principal: |
Poletti, Marcus Vinicius Santana |
Orientador(a): |
Rodrigues Júnior, Methanias Colaço |
Banca de defesa: |
Não Informado pela instituição |
Tipo de documento: |
Dissertação
|
Tipo de acesso: |
Acesso aberto |
Idioma: |
por |
Instituição de defesa: |
Não Informado pela instituição
|
Programa de Pós-Graduação: |
Pós-Graduação em Ciência da Computação
|
Departamento: |
Não Informado pela instituição
|
País: |
Não Informado pela instituição
|
Palavras-chave em Português: |
|
Palavras-chave em Inglês: |
|
Área do conhecimento CNPq: |
|
Link de acesso: |
https://ri.ufs.br/jspui/handle/riufs/19473
|
Resumo: |
Context: Open data portals are built based on ETL processes (Extract, Transform and Load), which increase data quality and interoperability, making a critical subsystem for these applications, subject to evaluative research for improvements. Objective: To analyze publications on the use of ETL in transparency portals, in order to characterize them in terms of their scenarios, impacts, empirical methods and general bibliometric data. From this characterization, develop and evaluate an ETL module for a transparency portal, qualitatively comparing it with modules developed in two ETL tools widely used in the market. Additionally, an analysis of the efficiencies of the loading procedures generated by the 3 evaluated treatments was carried out. Method: Using the PICO (Population, Intervention, Comparison and Outcome) strategy, a systematic mapping of the literature was carried out. In addition, an Action-Research was carried out for the construction of ETL procedures for the Economic Yearbook of Sergipe. The tools evaluated during the development process were: (1) Pentaho Data Integration - Kettle, Open Source, and (2) SQL Server Integration Services - SSIS, Closed Source, against (3) an ETL code built in the Python language. Results: From a total of 204 researched publications, 25 works were selected, of which 40% present, as the main impact for the portals, the availability of support for the construction of loads through a graphical interface, followed by the possibility of connectivity between bases heterogeneous data (27%) and load monitoring capacity (22%). Regarding the actual automation of loads and its quality control, respectively, only 8% and 3% of the works discussed the impacts of these characteristics. With regard to action research, outstanding evidence of the Kettle tool was found, from the point of view of usability and development efficiency through the graphical interface, as well as from the point of view of the learning curve. Next came the Python programming language and the SSIS tool. Regarding efficiency, the load time measurement showed a better performance of the Python language, followed by Kettle and SSIS. Conclusion: The work showed that the use of ETL in transparency portals still lacks comparative and feasibility studies. In this sense, an existing challenge is the scarcity of research that carry out replications to consolidate and validate already published works, evidenced by the insufficiency of controlled experiments in the area. In addition, analyzes on the quality control of loads was an important identified gap. Finally, once the contextual priorities of transparency portals are defined, such as load efficiency or development efficiency, the systematic evaluation of available solutions, such as the 7 one proposed in this dissertation, guides trade-off situations and selection of the best cost -benefit. |