Uma arquitetura não intrusiva e reativa para realizar o processo ETL em tempo real em ambientes de data warehousing

Detalhes bibliográficos
Ano de defesa: 2021
Autor(a) principal: Vilela, Flávio de Assis
Orientador(a): Ciferri, Ricardo Rodrigues lattes
Banca de defesa: Não Informado pela instituição
Tipo de documento: Tese
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Universidade Federal de São Carlos
Câmpus São Carlos
Programa de Pós-Graduação: Programa de Pós-Graduação em Ciência da Computação - PPGCC
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
ETL
Palavras-chave em Inglês:
Área do conhecimento CNPq:
Link de acesso: https://repositorio.ufscar.br/handle/ufscar/15889
Resumo: There is a great interest in obtaining data that support the decision-making process in business. These data are available in data sources in the operational environment, which are autonomous, heterogeneous, and distributed. The data are extracted through the Extract, Transform, and Load process (ETL) and stored in the informational environment in a homogeneous, integrated, and dimensional database called data warehouse. The ETL process traditionally takes place at predefined periods, such as daily, weekly, monthly, or according to the organization's data update rules. However, there are applications that need operational data as quickly as possible or immediately after the data is available from data sources. Examples of these applications are medical systems, highway control systems and digital farming systems. Therefore, the traditional ETL process and currently available techniques are unable to make the data available for decision making in real-time, ensuring availability, low elapsed time, and scalability. This work presents an innovative, non-intrusive and reactive architecture, called Data Magnet, from which it is possible to perform the ETL process in real time in data warehousing environments. The non-intrusive feature means that the solution does not need to search for data in the operating environment and, therefore, it is not necessary to make a connection with the data sources or deal directly with the heterogeneity of the data. The reactive feature indicates that the solution will react to events in the operating environment and perform an automatic action in order to guarantee real-time requirements. Two experimental tests were performed, the first one in a real environment in the field of dairy farming, and the second one in a synthetic environment, in order to assess the Data Magnet with a high volume of data. In addition, the Data Magnet produced a good performance with low elapsed time, guaranteed availability and great scalability as the data volume increased. The Data Magnet also produced a huge performance gain for the average metric with regard to the traditional trigger technique commonly used in real-time ETL process.