Resumo: |
SkyServer, the Internet portal for the Sloan Digital Sky Survey (SDSS) catalog, provides a set of tools that allows data access for astronomers and scientific education. One of the available interfaces allows users to enter ad-hoc SQL statements to query the catalog, and has logged over 280 million queries since 2001. To assess and investigate usage behavior, log analyses were performed after the 5$^{th}$ and 10$^{th}$ year of the portal being in production. Such analyses, however, focused on the HTTP access, and just simple information for the database usage. This work aims to apply text mining techniques over the SQL logs to define a methodology to parse, clean and tokenize statements into an intermediate numerical representation for data mining and knowledge discovery, which can provide deeper analysis over SQL usage, and also has a number of foreseen applications in database optimization and improving user experience. |
---|