Aprendizado ativo para classificadores de fluxo de dados baseados em agrupamento
Ano de defesa: | 2021 |
---|---|
Autor(a) principal: | |
Orientador(a): | |
Banca de defesa: | |
Tipo de documento: | Dissertação |
Tipo de acesso: | Acesso aberto |
Idioma: | eng |
Instituição de defesa: |
Universidade Federal de Uberlândia
Brasil Programa de Pós-graduação em Ciência da Computação |
Programa de Pós-Graduação: |
Não Informado pela instituição
|
Departamento: |
Não Informado pela instituição
|
País: |
Não Informado pela instituição
|
Palavras-chave em Português: | |
Link de acesso: | https://repositorio.ufu.br/handle/123456789/34035 http://doi.org/10.14393/ufu.di.2021.673 |
Resumo: | The update process of clustering-based data stream classifiers generates clusters from partially or fully unlabeled data instances. Each cluster is then categorized as the extension of a known class or as the emergence of a new one, summarized, and finally added to the classification model. Considering the cost of label acquisition, when compared to exclusively supervised approaches, clustering-based strategies present the advantage of allowing the use of unlabeled data to update the classification model. However, the gain of information about the data classes’ distribution through unlabeled data is subject to assumptions of how the distribution of the features interacts with the distribution of the data classes. Because of that, the updated process of clustering-based data stream classifiers is prone to fail as this interaction changes unexpectedly due to the stream’s non-stationary characteristic, leading to class inference errors and consequently the miscategorization of clusters, compromising the consistency of the classification model. Considering this problem, in this work, we propose an active learning strategy that selects for the clusters for which the categorization is more uncertain and then, for each chosen cluster, queries for the label of the instances more informative in the context of the inner cluster distribution. By dividing the active learning query responsibility among two query strategies, one for the cluster-level and the other for the instance-level, the strategy guarantees an efficient and effective use of label resources by acquiring labels only for the clusters more likely to need it. To test the proposed active learning strategy, we applied it to two clustering-based data stream classifiers from the literature: MINAS and ECHO. In the results, the active learning strategy recovered a significant number of cluster miscategorizations at the cost of a few additional label acquisitions. |