SAVIME: enabling declarative array processing in memory
Ano de defesa: | 2020 |
---|---|
Autor(a) principal: | |
Orientador(a): | |
Banca de defesa: | |
Tipo de documento: | Tese |
Tipo de acesso: | Acesso aberto |
Idioma: | eng |
Instituição de defesa: |
Laboratório Nacional de Computação Científica
Coordenação de Pós-Graduação e Aperfeiçoamento (COPGA) Brasil LNCC Programa de Pós-Graduação em Modelagem Computacional |
Programa de Pós-Graduação: |
Não Informado pela instituição
|
Departamento: |
Não Informado pela instituição
|
País: |
Não Informado pela instituição
|
Palavras-chave em Português: | |
Link de acesso: | https://tede.lncc.br/handle/tede/328 |
Resumo: | Current limitations in array database management systems prevent their adoption in scientific applications, even though arrays are vastly present in scientific datasets. Most traditional DBMSs impose a huge performance penalty during data ingestion, since they require data to be converted to the DBMS’s internal format. In addition, when the data is kept in the DBMS’s format, it gets concealed from users, unless they know in details how the DBMS stores it. This is fine for many applications that access data only through the DBMS’s query language, but is inconvenient for domain specific applications whose complex analytical code is unlikely to be performed efficiently by such languages, requiring a more involved approach in which files are accessed directly. As a consequence, users adopt in-situ analysis libraries, in-transit I/O interfaces and scientific data format files to manage their data. However, these alternatives might not offer the same benefits a DBMS does, such as: richer data model semantics, declarative analytical query languages and isolation between data and applications. Therefore, in this work, we propose a novel array data model named TARS and a database management system, named Savime, which implements such data model. We show how Savime can foster declarative scientific data analysis and visualization without imposing costly data rearrangements and format conversions by using a flexible data model. We also compare Savime with a state-of-the-art array DBMS. The results show that Savime is up to 20 times faster than another array DBMS for data ingestion, while providing a performance similar for the execution of basic array operations. We believe that Savime can substantially empower scientists in developing scientific data analysis in-silico. |