Text mining applied to SQL queries: a case study for SDSS SkyServer

Detalhes bibliográficos
Ano de defesa: 2015
Autor(a) principal: Vitor Hirota Makiyama
Orientador(a): Rafael Duarte Coelho dos Santos
Banca de defesa: Karine Reis Ferreira Gomes, Gilberto Ribeiro de Queiroz, Daniela Leal Musa
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: eng
Instituição de defesa: Instituto Nacional de Pesquisas Espaciais (INPE)
Programa de Pós-Graduação: Programa de Pós-Graduação do INPE em Computação Aplicada
Departamento: Não Informado pela instituição
País: BR
Link de acesso: http://urlib.net/sid.inpe.br/mtc-m21b/2015/08.31.17.43
Resumo: SkyServer, the Internet portal for the Sloan Digital Sky Survey (SDSS) catalog, provides a set of tools that allows data access for astronomers and scientific education. One of the available interfaces allows users to enter ad-hoc SQL statements to query the catalog, and has logged over 280 million queries since 2001. To assess and investigate usage behavior, log analyses were performed after the 5$^{th}$ and 10$^{th}$ year of the portal being in production. Such analyses, however, focused on the HTTP access, and just simple information for the database usage. This work aims to apply text mining techniques over the SQL logs to define a methodology to parse, clean and tokenize statements into an intermediate numerical representation for data mining and knowledge discovery, which can provide deeper analysis over SQL usage, and also has a number of foreseen applications in database optimization and improving user experience.