Comparing the performance of oversampling techniques in combination with a clustering algorithm for imbalanced learning
Main Author: | |
---|---|
Publication Date: | 2019 |
Format: | Master thesis |
Language: | eng |
Source: | Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
Download full: | http://hdl.handle.net/10362/63810 |
Summary: | Dissertation presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business Intelligence |
id |
RCAP_d7331435b97005d662b3eda1a8493d24 |
---|---|
oai_identifier_str |
oai:run.unl.pt:10362/63810 |
network_acronym_str |
RCAP |
network_name_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
repository_id_str |
https://opendoar.ac.uk/repository/7160 |
spelling |
Comparing the performance of oversampling techniques in combination with a clustering algorithm for imbalanced learningImbalanced LearningOversamplingClusteringSupervised LearningDissertation presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business IntelligenceImbalanced datasets in supervised learning are considered an ongoing challenging task for standard algorithms, seeing as they are designed to handle balanced class distributions and perform poorly when applied to problems of the imbalanced nature. Many methods have been developed to address this specific problem but the more general approach to achieve a balanced class distribution is data level modification, instead of algorithm modifications. Although class imbalances are responsible for significant losses of performance in standard classifiers in many different types of problems, another aspect that is important to consider is the small disjuncts problem. Therefore, it is important to consider and understand solutions that not only take into the account the between-class imbalance (the imbalance occurring between the two classes) but also the within-class imbalance (the imbalance occurring between the sub-clusters of each class) and to oversample the dataset by rectifying these two types of imbalances simultaneously. It has been shown that cluster-based oversampling is a robust solution that takes into consideration these two problems. This work sets out to study the effect and impact combining different existing oversampling methods with a clustering-based approach. Empirical results of extensive experiments show that the combinations of different oversampling techniques with the clustering algorithm k-means – K-Means Oversampling - improves upon classification results resulting solely from the oversampling techniques with no prior clustering step.Bação, Fernando José Ferreira LucasRUNPereira, Mariana Matoso2019-03-19T17:54:35Z2019-03-012019-03-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10362/63810TID:202200000enginfo:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2024-05-22T17:37:56Zoai:run.unl.pt:10362/63810Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-28T17:08:47.214286Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse |
dc.title.none.fl_str_mv |
Comparing the performance of oversampling techniques in combination with a clustering algorithm for imbalanced learning |
title |
Comparing the performance of oversampling techniques in combination with a clustering algorithm for imbalanced learning |
spellingShingle |
Comparing the performance of oversampling techniques in combination with a clustering algorithm for imbalanced learning Pereira, Mariana Matoso Imbalanced Learning Oversampling Clustering Supervised Learning |
title_short |
Comparing the performance of oversampling techniques in combination with a clustering algorithm for imbalanced learning |
title_full |
Comparing the performance of oversampling techniques in combination with a clustering algorithm for imbalanced learning |
title_fullStr |
Comparing the performance of oversampling techniques in combination with a clustering algorithm for imbalanced learning |
title_full_unstemmed |
Comparing the performance of oversampling techniques in combination with a clustering algorithm for imbalanced learning |
title_sort |
Comparing the performance of oversampling techniques in combination with a clustering algorithm for imbalanced learning |
author |
Pereira, Mariana Matoso |
author_facet |
Pereira, Mariana Matoso |
author_role |
author |
dc.contributor.none.fl_str_mv |
Bação, Fernando José Ferreira Lucas RUN |
dc.contributor.author.fl_str_mv |
Pereira, Mariana Matoso |
dc.subject.por.fl_str_mv |
Imbalanced Learning Oversampling Clustering Supervised Learning |
topic |
Imbalanced Learning Oversampling Clustering Supervised Learning |
description |
Dissertation presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business Intelligence |
publishDate |
2019 |
dc.date.none.fl_str_mv |
2019-03-19T17:54:35Z 2019-03-01 2019-03-01T00:00:00Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10362/63810 TID:202200000 |
url |
http://hdl.handle.net/10362/63810 |
identifier_str_mv |
TID:202200000 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia instacron:RCAAP |
instname_str |
FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
collection |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
repository.name.fl_str_mv |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
repository.mail.fl_str_mv |
info@rcaap.pt |
_version_ |
1833596471290626048 |