Fuzzy approach for classification and novelty detection in data streams
Ano de defesa: | 2022 |
---|---|
Autor(a) principal: | |
Orientador(a): | |
Banca de defesa: | |
Tipo de documento: | Dissertação |
Tipo de acesso: | Acesso aberto |
Idioma: | eng |
Instituição de defesa: |
Universidade Federal de São Carlos
Câmpus São Carlos |
Programa de Pós-Graduação: |
Programa de Pós-Graduação em Ciência da Computação - PPGCC
|
Departamento: |
Não Informado pela instituição
|
País: |
Não Informado pela instituição
|
Palavras-chave em Português: | |
Palavras-chave em Inglês: | |
Área do conhecimento CNPq: | |
Link de acesso: | https://repositorio.ufscar.br/handle/20.500.14289/20010 |
Resumo: | Learning in data streams (DS) is a research area that seeks to extract knowledge from a large amount of continuously generated data in a short period of time. The novelty detection (ND) is responsible for identifying the emergence of new concepts and changes in known concepts. The true labels of the instances can be used so that the algorithms adapt to the concept evolution and concept drift. The time between the classification of an instance and the arrival of its true label is called latency. Most applications consider that these true labels will never be available. Others are more optimistic and assume that the true label will be available shortly after the instance has been classified. Another way is to consider that, after a certain time, the true labels will be available, which is applicable in most real-world scenarios. The use of concepts from fuzzy set theory makes it possible to make learning adaptable to possible inaccuracies in the data. However, few approaches use the concepts of fuzzy set theory and consider intermediate latency to obtain the labels. Therefore, this work proposes a method for classifying multiclass ND in DS for intermediate and extreme latency scenarios based on ECSMiner and PFuzzND algorithms. The results obtained show that the proposed algorithm obtained good accuracy in the classification and detection of multiclass novelties, classifying outliers that approaches that use crisp clustering were not able to classify. In addition, improvements were presented in relation to the algorithm initialization parameters, which reduce the complexity of its use, maintaining good results. |