Uma análise exploratória e prática do uso do ETL em portais de transparência

Detalhes bibliográficos
Autor(a) principal: Poletti, Marcus Vinicius Santana
Data de Publicação: 2023
Tipo de documento: Dissertação
Idioma: por
Título da fonte: Repositório Institucional da UFS
Texto Completo: https://ri.ufs.br/jspui/handle/riufs/19473
Resumo: Context: Open data portals are built based on ETL processes (Extract, Transform and Load), which increase data quality and interoperability, making a critical subsystem for these applications, subject to evaluative research for improvements. Objective: To analyze publications on the use of ETL in transparency portals, in order to characterize them in terms of their scenarios, impacts, empirical methods and general bibliometric data. From this characterization, develop and evaluate an ETL module for a transparency portal, qualitatively comparing it with modules developed in two ETL tools widely used in the market. Additionally, an analysis of the efficiencies of the loading procedures generated by the 3 evaluated treatments was carried out. Method: Using the PICO (Population, Intervention, Comparison and Outcome) strategy, a systematic mapping of the literature was carried out. In addition, an Action-Research was carried out for the construction of ETL procedures for the Economic Yearbook of Sergipe. The tools evaluated during the development process were: (1) Pentaho Data Integration - Kettle, Open Source, and (2) SQL Server Integration Services - SSIS, Closed Source, against (3) an ETL code built in the Python language. Results: From a total of 204 researched publications, 25 works were selected, of which 40% present, as the main impact for the portals, the availability of support for the construction of loads through a graphical interface, followed by the possibility of connectivity between bases heterogeneous data (27%) and load monitoring capacity (22%). Regarding the actual automation of loads and its quality control, respectively, only 8% and 3% of the works discussed the impacts of these characteristics. With regard to action research, outstanding evidence of the Kettle tool was found, from the point of view of usability and development efficiency through the graphical interface, as well as from the point of view of the learning curve. Next came the Python programming language and the SSIS tool. Regarding efficiency, the load time measurement showed a better performance of the Python language, followed by Kettle and SSIS. Conclusion: The work showed that the use of ETL in transparency portals still lacks comparative and feasibility studies. In this sense, an existing challenge is the scarcity of research that carry out replications to consolidate and validate already published works, evidenced by the insufficiency of controlled experiments in the area. In addition, analyzes on the quality control of loads was an important identified gap. Finally, once the contextual priorities of transparency portals are defined, such as load efficiency or development efficiency, the systematic evaluation of available solutions, such as the 7 one proposed in this dissertation, guides trade-off situations and selection of the best cost -benefit.
id UFS-2_dbe6b9d3a2886a1e2a1f3bd25fb7c27f
oai_identifier_str oai:oai:ri.ufs.br:repo_01:riufs/19473
network_acronym_str UFS-2
network_name_str Repositório Institucional da UFS
repository_id_str
spelling Poletti, Marcus Vinicius SantanaRodrigues Júnior, Methanias Colaço2024-07-05T19:11:57Z2024-07-05T19:11:57Z2023-08-30POLETTI, Marcus Vinicius Santana. Uma análise exploratória e prática do uso do ETL em portais de transparência. 2023. 60 f. Dissertação (Mestrado em Ciência da Computação) – Universidade Federal de Sergipe, São Cristóvão, 2023.https://ri.ufs.br/jspui/handle/riufs/19473Context: Open data portals are built based on ETL processes (Extract, Transform and Load), which increase data quality and interoperability, making a critical subsystem for these applications, subject to evaluative research for improvements. Objective: To analyze publications on the use of ETL in transparency portals, in order to characterize them in terms of their scenarios, impacts, empirical methods and general bibliometric data. From this characterization, develop and evaluate an ETL module for a transparency portal, qualitatively comparing it with modules developed in two ETL tools widely used in the market. Additionally, an analysis of the efficiencies of the loading procedures generated by the 3 evaluated treatments was carried out. Method: Using the PICO (Population, Intervention, Comparison and Outcome) strategy, a systematic mapping of the literature was carried out. In addition, an Action-Research was carried out for the construction of ETL procedures for the Economic Yearbook of Sergipe. The tools evaluated during the development process were: (1) Pentaho Data Integration - Kettle, Open Source, and (2) SQL Server Integration Services - SSIS, Closed Source, against (3) an ETL code built in the Python language. Results: From a total of 204 researched publications, 25 works were selected, of which 40% present, as the main impact for the portals, the availability of support for the construction of loads through a graphical interface, followed by the possibility of connectivity between bases heterogeneous data (27%) and load monitoring capacity (22%). Regarding the actual automation of loads and its quality control, respectively, only 8% and 3% of the works discussed the impacts of these characteristics. With regard to action research, outstanding evidence of the Kettle tool was found, from the point of view of usability and development efficiency through the graphical interface, as well as from the point of view of the learning curve. Next came the Python programming language and the SSIS tool. Regarding efficiency, the load time measurement showed a better performance of the Python language, followed by Kettle and SSIS. Conclusion: The work showed that the use of ETL in transparency portals still lacks comparative and feasibility studies. In this sense, an existing challenge is the scarcity of research that carry out replications to consolidate and validate already published works, evidenced by the insufficiency of controlled experiments in the area. In addition, analyzes on the quality control of loads was an important identified gap. Finally, once the contextual priorities of transparency portals are defined, such as load efficiency or development efficiency, the systematic evaluation of available solutions, such as the 7 one proposed in this dissertation, guides trade-off situations and selection of the best cost -benefit.Contexto: Os portais de dados abertos são construídos com base em processos ETL (Extract, Transform and Load), os quais aumentam a qualidade e interoperabilidade dos dados, perfazendo um subsistema crítico para estas aplicações, passível de pesquisas avaliativas para melhorias. Objetivo: Analisar publicações sobre o uso de ETL em portais de transparência, a fim de caracterizá-las quanto aos seus cenários, impactos, métodos empíricos e dados bibliométricos gerais. A partir dessa caracterização, desenvolver e avaliar um módulo ETL para um portal de transparência, comparando-o qualitativamente com módulos desenvolvidos em duas ferramentas ETL amplamente usadas no mercado. Adicionalmente, foi feita uma análise das eficiências dos procedimentos de carga gerados pelos 3 tratamentos avaliados. Método: Utilizando a estratégia PICO (População, Intervenção, Comparação e Resultado), foi realizado um mapeamento sistemático da literatura. Além disso, foi executada uma Pesquisa-Ação para construção de procedimentos ETL do Anuário Econômico de Sergipe. As ferramentas avaliadas durante o processo de desenvolvimento foram: (1) Pentaho Data Integration - Kettle, Open Source, e (2) SQL Server Integration Services - SSIS, Closed Source, contra (3) um código ETL construído na linguagem Python. Resultados: De um total de 204 publicações pesquisadas, foram selecionados 25 trabalhos, dos quais 40% apresentam, como principal impacto para os portais, a disponibilidade de suporte para construção de cargas por meio de uma interface gráfica, seguida da possibilidade de conectividade entre bases de dados heterogêneos (27%) e capacidade de monitoramento de cargas (22%). Em relação à automação real de cargas e seu controle de qualidade, respectivamente, apenas 8% e 3% dos trabalhos discutiram os impactos dessas características. No que concerne à pesquisaação, foram encontradas evidências de destaque da ferramenta Kettle, do ponto de vista da usabilidade e eficiência de desenvolvimento por meio de interface gráfica, bem como do ponto de vista da curva de aprendizagem. Na sequência, vieram a linguagem de programação Python e a ferramenta SSIS. Em relação à eficiência, a mensuração do tempo de carga mostrou um melhor desempenho da linguagem Python, seguida do Kettle e do SSIS. Conclusão: O trabalho mostrou que o uso de ETL em portais de transparência ainda carece de estudos comparativos e de viabilidade. Nesse sentido, um desafio existente é a escassez de pesquisas que realizem replicações para consolidar e validar os trabalhos já publicados, evidenciado pela insuficiência de experimentos controlados na área. Além disso, análises sobre o controle de qualidade das cargas foram uma importante 5 lacuna identificada. Por fim, definidas as prioridades contextuais de portais de transparência, como, por exemplo, a eficiência das cargas ou a eficiência de desenvolvimento, a avaliação sistematizada de soluções disponíveis, tal como a proposta nesta dissertação, norteia situações de trade-off e seleção do melhor custo-benefício.Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPESSão CristóvãoporComputaçãoArmazenamento de dadosBanco de dadosPortais da transparênciaExtração, Transformação e Carga (ETL)EficiênciaUsabilidadeQualidadeTransparency portalsExtract, Transform and Load (ETL)EfficiencyUsabilityQualityCIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAOUma análise exploratória e prática do uso do ETL em portais de transparênciainfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisPós-Graduação em Ciência da ComputaçãoUniversidade Federal de Sergipe (UFS)reponame:Repositório Institucional da UFSinstname:Universidade Federal de Sergipe (UFS)instacron:UFSinfo:eu-repo/semantics/openAccessLICENSElicense.txtlicense.txttext/plain; charset=utf-81475https://ri.ufs.br/jspui/bitstream/riufs/19473/1/license.txt098cbbf65c2c15e1fb2e49c5d306a44cMD51ORIGINALMARCUS_VINICIUS_SANTANA_POLETTI.pdfMARCUS_VINICIUS_SANTANA_POLETTI.pdfapplication/pdf1298583https://ri.ufs.br/jspui/bitstream/riufs/19473/2/MARCUS_VINICIUS_SANTANA_POLETTI.pdfb07cf18c1a9ca63d94525621cd93a7d4MD52riufs/194732024-07-05 16:12:02.938oai:oai:ri.ufs.br:repo_01:riufs/19473TElDRU7Dh0EgREUgRElTVFJJQlVJw4fDg08gTsODTy1FWENMVVNJVkEKCkNvbSBhIGFwcmVzZW50YcOnw6NvIGRlc3RhIGxpY2Vuw6dhLCB2b2PDqiAobyBhdXRvcihlcykgb3UgbyB0aXR1bGFyIGRvcyBkaXJlaXRvcyBkZSBhdXRvcikgY29uY2VkZSDDoCBVbml2ZXJzaWRhZGUgRmVkZXJhbCBkZSBTZXJnaXBlIG8gZGlyZWl0byBuw6NvLWV4Y2x1c2l2byBkZSByZXByb2R1emlyIHNldSB0cmFiYWxobyBubyBmb3JtYXRvIGVsZXRyw7RuaWNvLCBpbmNsdWluZG8gb3MgZm9ybWF0b3Mgw6F1ZGlvIG91IHbDrWRlby4KClZvY8OqIGNvbmNvcmRhIHF1ZSBhIFVuaXZlcnNpZGFkZSBGZWRlcmFsIGRlIFNlcmdpcGUgcG9kZSwgc2VtIGFsdGVyYXIgbyBjb250ZcO6ZG8sIHRyYW5zcG9yIHNldSB0cmFiYWxobyBwYXJhIHF1YWxxdWVyIG1laW8gb3UgZm9ybWF0byBwYXJhIGZpbnMgZGUgcHJlc2VydmHDp8Ojby4KClZvY8OqIHRhbWLDqW0gY29uY29yZGEgcXVlIGEgVW5pdmVyc2lkYWRlIEZlZGVyYWwgZGUgU2VyZ2lwZSBwb2RlIG1hbnRlciBtYWlzIGRlIHVtYSBjw7NwaWEgZGUgc2V1IHRyYWJhbGhvIHBhcmEgZmlucyBkZSBzZWd1cmFuw6dhLCBiYWNrLXVwIGUgcHJlc2VydmHDp8Ojby4KClZvY8OqIGRlY2xhcmEgcXVlIHNldSB0cmFiYWxobyDDqSBvcmlnaW5hbCBlIHF1ZSB2b2PDqiB0ZW0gbyBwb2RlciBkZSBjb25jZWRlciBvcyBkaXJlaXRvcyBjb250aWRvcyBuZXN0YSBsaWNlbsOnYS4gVm9jw6ogdGFtYsOpbSBkZWNsYXJhIHF1ZSBvIGRlcMOzc2l0bywgcXVlIHNlamEgZGUgc2V1IGNvbmhlY2ltZW50bywgbsOjbyBpbmZyaW5nZSBkaXJlaXRvcyBhdXRvcmFpcyBkZSBuaW5ndcOpbS4KCkNhc28gbyB0cmFiYWxobyBjb250ZW5oYSBtYXRlcmlhbCBxdWUgdm9jw6ogbsOjbyBwb3NzdWkgYSB0aXR1bGFyaWRhZGUgZG9zIGRpcmVpdG9zIGF1dG9yYWlzLCB2b2PDqiBkZWNsYXJhIHF1ZSBvYnRldmUgYSBwZXJtaXNzw6NvIGlycmVzdHJpdGEgZG8gZGV0ZW50b3IgZG9zIGRpcmVpdG9zIGF1dG9yYWlzIHBhcmEgY29uY2VkZXIgw6AgVW5pdmVyc2lkYWRlIEZlZGVyYWwgZGUgU2VyZ2lwZSBvcyBkaXJlaXRvcyBhcHJlc2VudGFkb3MgbmVzdGEgbGljZW7Dp2EsIGUgcXVlIGVzc2UgbWF0ZXJpYWwgZGUgcHJvcHJpZWRhZGUgZGUgdGVyY2Vpcm9zIGVzdMOhIGNsYXJhbWVudGUgaWRlbnRpZmljYWRvIGUgcmVjb25oZWNpZG8gbm8gdGV4dG8gb3Ugbm8gY29udGXDumRvLgoKQSBVbml2ZXJzaWRhZGUgRmVkZXJhbCBkZSBTZXJnaXBlIHNlIGNvbXByb21ldGUgYSBpZGVudGlmaWNhciBjbGFyYW1lbnRlIG8gc2V1IG5vbWUocykgb3UgbyhzKSBub21lKHMpIGRvKHMpIApkZXRlbnRvcihlcykgZG9zIGRpcmVpdG9zIGF1dG9yYWlzIGRvIHRyYWJhbGhvLCBlIG7Do28gZmFyw6EgcXVhbHF1ZXIgYWx0ZXJhw6fDo28sIGFsw6ltIGRhcXVlbGFzIGNvbmNlZGlkYXMgcG9yIGVzdGEgbGljZW7Dp2EuIAo=Repositório InstitucionalPUBhttps://ri.ufs.br/oai/requestrepositorio@academico.ufs.bropendoar:2024-07-05T19:12:02Repositório Institucional da UFS - Universidade Federal de Sergipe (UFS)false
dc.title.pt_BR.fl_str_mv Uma análise exploratória e prática do uso do ETL em portais de transparência
title Uma análise exploratória e prática do uso do ETL em portais de transparência
spellingShingle Uma análise exploratória e prática do uso do ETL em portais de transparência
Poletti, Marcus Vinicius Santana
Computação
Armazenamento de dados
Banco de dados
Portais da transparência
Extração, Transformação e Carga (ETL)
Eficiência
Usabilidade
Qualidade
Transparency portals
Extract, Transform and Load (ETL)
Efficiency
Usability
Quality
CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
title_short Uma análise exploratória e prática do uso do ETL em portais de transparência
title_full Uma análise exploratória e prática do uso do ETL em portais de transparência
title_fullStr Uma análise exploratória e prática do uso do ETL em portais de transparência
title_full_unstemmed Uma análise exploratória e prática do uso do ETL em portais de transparência
title_sort Uma análise exploratória e prática do uso do ETL em portais de transparência
author Poletti, Marcus Vinicius Santana
author_facet Poletti, Marcus Vinicius Santana
author_role author
dc.contributor.author.fl_str_mv Poletti, Marcus Vinicius Santana
dc.contributor.advisor1.fl_str_mv Rodrigues Júnior, Methanias Colaço
contributor_str_mv Rodrigues Júnior, Methanias Colaço
dc.subject.por.fl_str_mv Computação
Armazenamento de dados
Banco de dados
Portais da transparência
Extração, Transformação e Carga (ETL)
Eficiência
Usabilidade
Qualidade
topic Computação
Armazenamento de dados
Banco de dados
Portais da transparência
Extração, Transformação e Carga (ETL)
Eficiência
Usabilidade
Qualidade
Transparency portals
Extract, Transform and Load (ETL)
Efficiency
Usability
Quality
CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
dc.subject.eng.fl_str_mv Transparency portals
Extract, Transform and Load (ETL)
Efficiency
Usability
Quality
dc.subject.cnpq.fl_str_mv CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
description Context: Open data portals are built based on ETL processes (Extract, Transform and Load), which increase data quality and interoperability, making a critical subsystem for these applications, subject to evaluative research for improvements. Objective: To analyze publications on the use of ETL in transparency portals, in order to characterize them in terms of their scenarios, impacts, empirical methods and general bibliometric data. From this characterization, develop and evaluate an ETL module for a transparency portal, qualitatively comparing it with modules developed in two ETL tools widely used in the market. Additionally, an analysis of the efficiencies of the loading procedures generated by the 3 evaluated treatments was carried out. Method: Using the PICO (Population, Intervention, Comparison and Outcome) strategy, a systematic mapping of the literature was carried out. In addition, an Action-Research was carried out for the construction of ETL procedures for the Economic Yearbook of Sergipe. The tools evaluated during the development process were: (1) Pentaho Data Integration - Kettle, Open Source, and (2) SQL Server Integration Services - SSIS, Closed Source, against (3) an ETL code built in the Python language. Results: From a total of 204 researched publications, 25 works were selected, of which 40% present, as the main impact for the portals, the availability of support for the construction of loads through a graphical interface, followed by the possibility of connectivity between bases heterogeneous data (27%) and load monitoring capacity (22%). Regarding the actual automation of loads and its quality control, respectively, only 8% and 3% of the works discussed the impacts of these characteristics. With regard to action research, outstanding evidence of the Kettle tool was found, from the point of view of usability and development efficiency through the graphical interface, as well as from the point of view of the learning curve. Next came the Python programming language and the SSIS tool. Regarding efficiency, the load time measurement showed a better performance of the Python language, followed by Kettle and SSIS. Conclusion: The work showed that the use of ETL in transparency portals still lacks comparative and feasibility studies. In this sense, an existing challenge is the scarcity of research that carry out replications to consolidate and validate already published works, evidenced by the insufficiency of controlled experiments in the area. In addition, analyzes on the quality control of loads was an important identified gap. Finally, once the contextual priorities of transparency portals are defined, such as load efficiency or development efficiency, the systematic evaluation of available solutions, such as the 7 one proposed in this dissertation, guides trade-off situations and selection of the best cost -benefit.
publishDate 2023
dc.date.issued.fl_str_mv 2023-08-30
dc.date.accessioned.fl_str_mv 2024-07-05T19:11:57Z
dc.date.available.fl_str_mv 2024-07-05T19:11:57Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.citation.fl_str_mv POLETTI, Marcus Vinicius Santana. Uma análise exploratória e prática do uso do ETL em portais de transparência. 2023. 60 f. Dissertação (Mestrado em Ciência da Computação) – Universidade Federal de Sergipe, São Cristóvão, 2023.
dc.identifier.uri.fl_str_mv https://ri.ufs.br/jspui/handle/riufs/19473
identifier_str_mv POLETTI, Marcus Vinicius Santana. Uma análise exploratória e prática do uso do ETL em portais de transparência. 2023. 60 f. Dissertação (Mestrado em Ciência da Computação) – Universidade Federal de Sergipe, São Cristóvão, 2023.
url https://ri.ufs.br/jspui/handle/riufs/19473
dc.language.iso.fl_str_mv por
language por
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.publisher.program.fl_str_mv Pós-Graduação em Ciência da Computação
dc.publisher.initials.fl_str_mv Universidade Federal de Sergipe (UFS)
dc.source.none.fl_str_mv reponame:Repositório Institucional da UFS
instname:Universidade Federal de Sergipe (UFS)
instacron:UFS
instname_str Universidade Federal de Sergipe (UFS)
instacron_str UFS
institution UFS
reponame_str Repositório Institucional da UFS
collection Repositório Institucional da UFS
bitstream.url.fl_str_mv https://ri.ufs.br/jspui/bitstream/riufs/19473/1/license.txt
https://ri.ufs.br/jspui/bitstream/riufs/19473/2/MARCUS_VINICIUS_SANTANA_POLETTI.pdf
bitstream.checksum.fl_str_mv 098cbbf65c2c15e1fb2e49c5d306a44c
b07cf18c1a9ca63d94525621cd93a7d4
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
repository.name.fl_str_mv Repositório Institucional da UFS - Universidade Federal de Sergipe (UFS)
repository.mail.fl_str_mv repositorio@academico.ufs.br
_version_ 1846687669115944960