AMANDA : density-based adaptive model for nonstationary data under extreme verification latency scenarios

Ferreira, Raul Sena

AMANDA : density-based adaptive model for nonstationary data under extreme verification latency scenarios

Detalhes bibliográficos
Autor(a) principal:	Ferreira, Raul Sena
Data de Publicação:	2018
Tipo de documento:	Dissertação
Idioma:	eng
Título da fonte:	Repositório Institucional da UFRJ
Texto Completo:	http://hdl.handle.net/11422/12982
Resumo:	Gradual concept-drift refers to a smooth and gradual change in the relations between input and output data in the underlying distribution over time. The problem generates a model obsolescence and consequently a quality decrease in predictions. Besides, there is a challenging task during the stream: The extreme verification latency (EVL) to verify the labels. For batch scenarios, state-of-the-art methods propose an adaptation of a supervised model by using an unconstrained least squares importance fitting (uLSIF) algorithm or a semi-supervised approach along with a core support extraction (CSE) method. However, these methods do not properly tackle the mentioned problems due to their high computational time for large data volumes, lack in representing the right samples of the drift or even for having several parameters for tuning. Therefore, we propose a density-based adaptive model for nonstationary data (AMANDA), which uses a semi-supervised classifier along with a CSE method. AMANDA has two variations: AMANDA with a fixed cutting percentage (AMANDA-FCP); and AMANDA with a dynamic cutting percentage (AMANDADCP). Our results indicate that the two variations of AMANDA outperform the state-of-the-art methods for almost all synthetic datasets and real ones with an improvement up to 27.98% regarding the average error. We have found that the use of AMANDA-FCP improved the results for a gradual concept-drift even with a small size of initial labeled data. Moreover, our results indicate that SSL classifiers are improved when they work along with our static or dynamic CSE methods. Therefore, we emphasize the importance of research directions based on this approach.

Metadados do item

id	UFRJ_ed833607705d10bc45a60716e67b5130
oai_identifier_str	oai:pantheon.ufrj.br:11422/12982
network_acronym_str	UFRJ
network_name_str	Repositório Institucional da UFRJ
repository_id_str
spelling	AMANDA : density-based adaptive model for nonstationary data under extreme verification latency scenariosAprendizagem semi-supervisionadaDeriva do conceitoCNPQ::ENGENHARIASGradual concept-drift refers to a smooth and gradual change in the relations between input and output data in the underlying distribution over time. The problem generates a model obsolescence and consequently a quality decrease in predictions. Besides, there is a challenging task during the stream: The extreme verification latency (EVL) to verify the labels. For batch scenarios, state-of-the-art methods propose an adaptation of a supervised model by using an unconstrained least squares importance fitting (uLSIF) algorithm or a semi-supervised approach along with a core support extraction (CSE) method. However, these methods do not properly tackle the mentioned problems due to their high computational time for large data volumes, lack in representing the right samples of the drift or even for having several parameters for tuning. Therefore, we propose a density-based adaptive model for nonstationary data (AMANDA), which uses a semi-supervised classifier along with a CSE method. AMANDA has two variations: AMANDA with a fixed cutting percentage (AMANDA-FCP); and AMANDA with a dynamic cutting percentage (AMANDADCP). Our results indicate that the two variations of AMANDA outperform the state-of-the-art methods for almost all synthetic datasets and real ones with an improvement up to 27.98% regarding the average error. We have found that the use of AMANDA-FCP improved the results for a gradual concept-drift even with a small size of initial labeled data. Moreover, our results indicate that SSL classifiers are improved when they work along with our static or dynamic CSE methods. Therefore, we emphasize the importance of research directions based on this approach.Concept-drift gradual refere-se à mudança suave e gradual na distribuição dos dados conforme o tempo passa. Este problema causa obsolescência no modelo de aprendizado e queda na qualidade das previsões. Além disso, existe um complicador durante o processamento dos dados: a latência de verificação extrema (LVE) para se verificar os rótulos. Métodos do estado da arte propõem uma adaptação do modelo supervisionado usando uma abordagem de estimação de importância baseado em mínimos quadrados ou usando uma abordagem semi-supervisionada em conjunto com a extração de instâncias centrais, na sigla em inglês (CSE). Entretanto, estes métodos não tratam adequadamente os problemas mencionados devido ao fato de requererem alto tempo computacional para processar grandes volumes de dados, falta de correta seleção das instâncias que representam a mudança da distribuição, ou ainda por demandarem o ajuste de grande quantidade de parâmetros. Portanto, propomos um modelo adaptativo baseado em densidades para dados não-estacionários (AMANDA), que tem como base um classificador semi-supervisionado e um método CSE baseado em densidade. AMANDA tem duas variações: percentual de corte fixo (AMANDAFCP); e percentual de corte dinâmico (AMANDA-DCP). Nossos resultados indicam que as duas variações da proposta superam o estado da arte em quase todas as bases de dados sintéticas e reais em até 27,98% em relação ao erro médio. Concluímos que a aplicação do método AMANDA-FCP faz com que a classificação melhore mesmo quando há uma pequena porção inicial de dados rotulados. Mais ainda, os classificadores semi-supervisionados são melhorados quando trabalham em conjunto com nossos métodos de CSE, estático ou dinâmico.Universidade Federal do Rio de JaneiroBrasilInstituto Alberto Luiz Coimbra de Pós-Graduação e Pesquisa de EngenhariaPrograma de Pós-Graduação em Engenharia de Sistemas e ComputaçãoUFRJSilva, Geraldo Zimbrão dahttp://lattes.cnpq.br/3937502490683382http://lattes.cnpq.br/7007150957758256Alvim, Leandro Guimarães MarquesLima, Alexandre de Assis BentoOgasawara, Eduardo SoaresFerreira, Raul Sena2020-08-25T14:25:05Z2023-12-21T03:02:14Z2018-06info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesishttp://hdl.handle.net/11422/12982enginfo:eu-repo/semantics/openAccessreponame:Repositório Institucional da UFRJinstname:Universidade Federal do Rio de Janeiro (UFRJ)instacron:UFRJ2023-12-21T03:02:14Zoai:pantheon.ufrj.br:11422/12982Repositório InstitucionalPUBhttp://www.pantheon.ufrj.br/oai/requestpantheon@sibi.ufrj.bropendoar:2023-12-21T03:02:14Repositório Institucional da UFRJ - Universidade Federal do Rio de Janeiro (UFRJ)false
dc.title.none.fl_str_mv	AMANDA : density-based adaptive model for nonstationary data under extreme verification latency scenarios
title	AMANDA : density-based adaptive model for nonstationary data under extreme verification latency scenarios
spellingShingle	AMANDA : density-based adaptive model for nonstationary data under extreme verification latency scenarios Ferreira, Raul Sena Aprendizagem semi-supervisionada Deriva do conceito CNPQ::ENGENHARIAS
title_short	AMANDA : density-based adaptive model for nonstationary data under extreme verification latency scenarios
title_full	AMANDA : density-based adaptive model for nonstationary data under extreme verification latency scenarios
title_fullStr	AMANDA : density-based adaptive model for nonstationary data under extreme verification latency scenarios
title_full_unstemmed	AMANDA : density-based adaptive model for nonstationary data under extreme verification latency scenarios
title_sort	AMANDA : density-based adaptive model for nonstationary data under extreme verification latency scenarios
author	Ferreira, Raul Sena
author_facet	Ferreira, Raul Sena
author_role	author
dc.contributor.none.fl_str_mv	Silva, Geraldo Zimbrão da http://lattes.cnpq.br/3937502490683382 http://lattes.cnpq.br/7007150957758256 Alvim, Leandro Guimarães Marques Lima, Alexandre de Assis Bento Ogasawara, Eduardo Soares
dc.contributor.author.fl_str_mv	Ferreira, Raul Sena
dc.subject.por.fl_str_mv	Aprendizagem semi-supervisionada Deriva do conceito CNPQ::ENGENHARIAS
topic	Aprendizagem semi-supervisionada Deriva do conceito CNPQ::ENGENHARIAS
description	Gradual concept-drift refers to a smooth and gradual change in the relations between input and output data in the underlying distribution over time. The problem generates a model obsolescence and consequently a quality decrease in predictions. Besides, there is a challenging task during the stream: The extreme verification latency (EVL) to verify the labels. For batch scenarios, state-of-the-art methods propose an adaptation of a supervised model by using an unconstrained least squares importance fitting (uLSIF) algorithm or a semi-supervised approach along with a core support extraction (CSE) method. However, these methods do not properly tackle the mentioned problems due to their high computational time for large data volumes, lack in representing the right samples of the drift or even for having several parameters for tuning. Therefore, we propose a density-based adaptive model for nonstationary data (AMANDA), which uses a semi-supervised classifier along with a CSE method. AMANDA has two variations: AMANDA with a fixed cutting percentage (AMANDA-FCP); and AMANDA with a dynamic cutting percentage (AMANDADCP). Our results indicate that the two variations of AMANDA outperform the state-of-the-art methods for almost all synthetic datasets and real ones with an improvement up to 27.98% regarding the average error. We have found that the use of AMANDA-FCP improved the results for a gradual concept-drift even with a small size of initial labeled data. Moreover, our results indicate that SSL classifiers are improved when they work along with our static or dynamic CSE methods. Therefore, we emphasize the importance of research directions based on this approach.
publishDate	2018
dc.date.none.fl_str_mv	2018-06 2020-08-25T14:25:05Z 2023-12-21T03:02:14Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://hdl.handle.net/11422/12982
url	http://hdl.handle.net/11422/12982
dc.language.iso.fl_str_mv	eng
language	eng
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.publisher.none.fl_str_mv	Universidade Federal do Rio de Janeiro Brasil Instituto Alberto Luiz Coimbra de Pós-Graduação e Pesquisa de Engenharia Programa de Pós-Graduação em Engenharia de Sistemas e Computação UFRJ
publisher.none.fl_str_mv	Universidade Federal do Rio de Janeiro Brasil Instituto Alberto Luiz Coimbra de Pós-Graduação e Pesquisa de Engenharia Programa de Pós-Graduação em Engenharia de Sistemas e Computação UFRJ
dc.source.none.fl_str_mv	reponame:Repositório Institucional da UFRJ instname:Universidade Federal do Rio de Janeiro (UFRJ) instacron:UFRJ
instname_str	Universidade Federal do Rio de Janeiro (UFRJ)
instacron_str	UFRJ
institution	UFRJ
reponame_str	Repositório Institucional da UFRJ
collection	Repositório Institucional da UFRJ
repository.name.fl_str_mv	Repositório Institucional da UFRJ - Universidade Federal do Rio de Janeiro (UFRJ)
repository.mail.fl_str_mv	pantheon@sibi.ufrj.br
_version_	1834470883420274688

AMANDA : density-based adaptive model for nonstationary data under extreme verification latency scenarios

Registros relacionados