Classificação de data streams utilizando árvore de decisão estatística e a teoria dos fractais na análise evolutiva dos dados

Cazzolato, Mirela Teixeira

Classificação de data streams utilizando árvore de decisão estatística e a teoria dos fractais na análise evolutiva dos dados

Detalhes bibliográficos
Ano de defesa:	2014
Autor(a) principal:	Cazzolato, Mirela Teixeira
Orientador(a):	Ribeiro, Marcela Xavier
Banca de defesa:	Não Informado pela instituição
Tipo de documento:	Dissertação
Tipo de acesso:	Acesso aberto
Idioma:	por
Instituição de defesa:	Universidade Federal de São Carlos
Programa de Pós-Graduação:	Programa de Pós-Graduação em Ciência da Computação - PPGCC
Departamento:	Não Informado pela instituição
País:	BR
Palavras-chave em Português:	Ciência da computação Banco de dados Fluxo de dados Classificação Data mining (Mineração de dados) Fractais Árvore de decisão Algoritmo Incremental
Palavras-chave em Inglês:	Data streams Classification Data mining Decision tree Incremental algorithm StARMiner Tree FDDM Fractal theory
Área do conhecimento CNPq:	CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
Link de acesso:	https://repositorio.ufscar.br/handle/20.500.14289/565
Resumo:	A data stream is generated in a fast way, continuously, ordered, and in large quantities. To process data streams there must be considered, among others factors, the limited use of memory, the need of real-time processing, the accuracy of the results and the concept drift (which occurs when there is a change in the concept of the data being analyzed). Decision tree is a popular form of representation of the classifier, that is intuitive and fast to build, generally obtaining high accuracy. The techniques of incremental decision trees present in the literature generally have high computational costs to construct and update the model, especially regarding the calculation to split the decision nodes. The existent methods have a conservative characteristic to deal with limited amounts of data, tending to improve their results as the number of examples increases. Another problem is that many real-world applications generate data with noise, and the existing techniques have a low tolerance to these events. This work aims to develop decision tree methods for data streams, that supply the deficiencies of the current state of the art. In addition, another objective is to develop a technique to detect concept drift using the fractal theory. This functionality should indicate when there is a need to correct the model, allowing the adequate description of most recent events. To achieve the objectives, three decision tree algorithms were developed: StARMiner Tree, Automatic StARMiner Tree, and Information Gain StARMiner Tree. These algorithms use a statistical method as heuristic to split the nodes, which is not dependent on the number of examples and is fast. In the experiments the algorithms achieved high accuracy, also showing a tolerant behavior in the classification of noisy data. Finally, a drift detection method was proposed to detect changes in the data distribution, based on the fractal theory. The method, called Fractal Detection Method, detects significant changes on the data distribution, causing the model to be updated when it does not describe the data (becoming obsolete). The method achieved good results in the classification of data containing concept drift, proving to be suitable for evolutionary analysis of data.

Classificação de data streams utilizando árvore de decisão estatística e a teoria dos fractais na análise evolutiva dos dados

Registros relacionados