Detecção de novidade em fluxos contínuos de dados multirrótulo

Detalhes bibliográficos
Ano de defesa: 2019
Autor(a) principal: Costa Júnior, Joel David
Orientador(a): Cerri, Ricardo lattes
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Universidade Federal de São Carlos
Câmpus São Carlos
Programa de Pós-Graduação: Programa de Pós-Graduação em Ciência da Computação - PPGCC
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Palavras-chave em Inglês:
Área do conhecimento CNPq:
Link de acesso: https://repositorio.ufscar.br/handle/20.500.14289/12197
Resumo: Multi-label Stream Classification (MLSC) aims at classifying streaming examples into multiple classes simultaneously. It is a very challenging task, since new classes may emerge (concept evolution), and known classes may change over the stream (concept drift. Most of classifier methods deal with the aforementioned characteristics updating their models when actuals labels of examples become available, it is a very optimistic scenario though, since for many data streams application these labels are never available, hence is unfeasible to update models in a supervised fashion. This is known as infinitely delayed labels problems and a task that has been stood out for dealing with it, is the Novelty Detection (ND). ND aims at detecting new patterns in arriving streaming examples that, in some aspect, might differ from the previously observed patterns and then used for updating the decision models. Despite different ND procedures have been proposed in the literature, it is still an open problem for MLSC. In this work we proposed two new methods: the MultI-label learNing Algorithm for data Streams with Label Combination-based methods (MINAS-LC) which applies ND to deal exclusively with concept drifts and the MultIlabel learNing Algorithm for data Streams with Binary Relevance transformation (MINAS-BR) which applies ND to deal with both concept drifts and concept evolution either. The methods address problem transformation techniques to deal with multi-labelled data at the offline phase and build micro-clusters based decision models. At the online phase the arrival examples are classified or rejected (labelled as unknown) by the current decision models. Rejected examples are used in ND phase to shape new micro-clusters, maintaining the decision models updated. The ND phase is run from time to time to work around concept drifts and concept evolution. Experiments over synthetic and real-world data sets with different degrees of concept drift and concept evolution attested the potential of MINAS-BR and MINAS-LC, in which have obtained significant results when compared to baseline methods.