Avaliação de um sistema escalável de classificação CNAE implementado em cloud computing
Ano de defesa: | 2011 |
---|---|
Autor(a) principal: | |
Orientador(a): | |
Banca de defesa: | |
Tipo de documento: | Dissertação |
Tipo de acesso: | Acesso aberto |
Idioma: | por |
Instituição de defesa: |
Universidade Federal do Espírito Santo
BR Mestrado em Informática Centro Tecnológico UFES Programa de Pós-Graduação em Informática |
Programa de Pós-Graduação: |
Não Informado pela instituição
|
Departamento: |
Não Informado pela instituição
|
País: |
Não Informado pela instituição
|
Palavras-chave em Português: | |
Link de acesso: | http://repositorio.ufes.br/handle/10/6356 |
Resumo: | In problems in automatic text classification with a large number of labels, training databases are large, therefore the classification time can become prohibitive for online rating systems. Thus, our motivation for this work came from the need of the Federal Government to implement a Cadastro Sincronizado Nacional (CSN) of companies, where the Classificação Nacional de Atividades Econômicas (CNAE) would compose the system. In this classification task one or more CNAE-Subclasses codes are associated to the description of the economic activities of companies. It is worth noticing that in 2009, the task of assigning codes or revise the CNAE was done in the country about 2 million times. This way, we investigated the use ofWeb servers based on Cloud Computing on its scalability and low cost of development and operation. Due to the ease of use and free quotas, the Cloud Computing server chosen for this application development was Google App Engine. Thus, we designed, implemented and hosted a system of classification of such texts on the server. However, Google App Engine service charges for exceeding the amount of free quota (renewable every day), whereas the lower the complexity of the processing system, the lower the financial cost of implementation. Aiming this, an optimization was performed on the storage system of classifiers, taking advantage of the features of the text base. We successfully reduced the computational cost of the system and, in consequence, it was estimated that for the current demand of requests the CNAE annual financial cost would be $ 2,000. This is a small amount when it is compared to the cost of infrastructure, maintenance and power that would take to perform a similar service to a traditional Web server. |