SAVIME: enabling declarative array processing in memory

Detalhes bibliográficos
Ano de defesa: 2020
Autor(a) principal: Lustosa, Hermano Lourenço Souza
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Tese
Tipo de acesso: Acesso aberto
Idioma: eng
Instituição de defesa: Laboratório Nacional de Computação Científica
Coordenação de Pós-Graduação e Aperfeiçoamento (COPGA)
Brasil
LNCC
Programa de Pós-Graduação em Modelagem Computacional
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: https://tede.lncc.br/handle/tede/328
Resumo: Current limitations in array database management systems prevent their adoption in scientific applications, even though arrays are vastly present in scientific datasets. Most traditional DBMSs impose a huge performance penalty during data ingestion, since they require data to be converted to the DBMS’s internal format. In addition, when the data is kept in the DBMS’s format, it gets concealed from users, unless they know in details how the DBMS stores it. This is fine for many applications that access data only through the DBMS’s query language, but is inconvenient for domain specific applications whose complex analytical code is unlikely to be performed efficiently by such languages, requiring a more involved approach in which files are accessed directly. As a consequence, users adopt in-situ analysis libraries, in-transit I/O interfaces and scientific data format files to manage their data. However, these alternatives might not offer the same benefits a DBMS does, such as: richer data model semantics, declarative analytical query languages and isolation between data and applications. Therefore, in this work, we propose a novel array data model named TARS and a database management system, named Savime, which implements such data model. We show how Savime can foster declarative scientific data analysis and visualization without imposing costly data rearrangements and format conversions by using a flexible data model. We also compare Savime with a state-of-the-art array DBMS. The results show that Savime is up to 20 times faster than another array DBMS for data ingestion, while providing a performance similar for the execution of basic array operations. We believe that Savime can substantially empower scientists in developing scientific data analysis in-silico.