Automatic Handling of Imbalanced Datasets for Classification

Bibliographic Details
Main Author: Vieira, Pedro Marques
Publication Date: 2022
Format: Master thesis
Language: eng
Source: Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
Download full: http://hdl.handle.net/10400.22/22518
Summary: Imbalanced data is present in various business areas and when facing it without proper knowledge, it can have undesired negative consequences. In addition, the most common evaluation metrics in machine learning to measure the desired solution can be inappropriate and misleading. Multiple combinations of methods are proposed to handle imbalanced data however, often, they required specialised knowledge to be used correctly. For imbalanced classification, the desire to correctly classify the underrepresented class tends to be more important than the overrepresented class, while being more challenging and time-consuming. Several approaches, ranging from more accessible and more advanced in the domains of data resampling and cost-sensitive techniques, will be considered to handle imbalanced data. The application developed delivers recommendations of the most suited combinations of techniques for the specific dataset imported, by extracting and comparing meta-features values recorded in a knowledge base. It facilitates effortless classification and automates part of the machine learning pipeline with comparable or better results to a state-of-the-art solution and with a much smaller execution time
id RCAP_e33342fdf4bd2ff306c907fc6d02451a
oai_identifier_str oai:recipp.ipp.pt:10400.22/22518
network_acronym_str RCAP
network_name_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository_id_str https://opendoar.ac.uk/repository/7160
spelling Automatic Handling of Imbalanced Datasets for ClassificationImbalanced ClassificationHandling Imbalanced DataAutomated Machine LearningClassificação Não BalanceadaManipulação de Dados Não BalanceadosAutomatização de Aprendizagem de MáquinaImbalanced data is present in various business areas and when facing it without proper knowledge, it can have undesired negative consequences. In addition, the most common evaluation metrics in machine learning to measure the desired solution can be inappropriate and misleading. Multiple combinations of methods are proposed to handle imbalanced data however, often, they required specialised knowledge to be used correctly. For imbalanced classification, the desire to correctly classify the underrepresented class tends to be more important than the overrepresented class, while being more challenging and time-consuming. Several approaches, ranging from more accessible and more advanced in the domains of data resampling and cost-sensitive techniques, will be considered to handle imbalanced data. The application developed delivers recommendations of the most suited combinations of techniques for the specific dataset imported, by extracting and comparing meta-features values recorded in a knowledge base. It facilitates effortless classification and automates part of the machine learning pipeline with comparable or better results to a state-of-the-art solution and with a much smaller execution timeRodrigues, Maria de Fátima CoutinhoREPOSITÓRIO P.PORTOVieira, Pedro Marques2023-03-15T15:06:26Z20222022-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10400.22/22518urn:tid:203113730enginfo:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2025-03-07T10:22:35Zoai:recipp.ipp.pt:10400.22/22518Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-29T00:51:07.356651Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse
dc.title.none.fl_str_mv Automatic Handling of Imbalanced Datasets for Classification
title Automatic Handling of Imbalanced Datasets for Classification
spellingShingle Automatic Handling of Imbalanced Datasets for Classification
Vieira, Pedro Marques
Imbalanced Classification
Handling Imbalanced Data
Automated Machine Learning
Classificação Não Balanceada
Manipulação de Dados Não Balanceados
Automatização de Aprendizagem de Máquina
title_short Automatic Handling of Imbalanced Datasets for Classification
title_full Automatic Handling of Imbalanced Datasets for Classification
title_fullStr Automatic Handling of Imbalanced Datasets for Classification
title_full_unstemmed Automatic Handling of Imbalanced Datasets for Classification
title_sort Automatic Handling of Imbalanced Datasets for Classification
author Vieira, Pedro Marques
author_facet Vieira, Pedro Marques
author_role author
dc.contributor.none.fl_str_mv Rodrigues, Maria de Fátima Coutinho
REPOSITÓRIO P.PORTO
dc.contributor.author.fl_str_mv Vieira, Pedro Marques
dc.subject.por.fl_str_mv Imbalanced Classification
Handling Imbalanced Data
Automated Machine Learning
Classificação Não Balanceada
Manipulação de Dados Não Balanceados
Automatização de Aprendizagem de Máquina
topic Imbalanced Classification
Handling Imbalanced Data
Automated Machine Learning
Classificação Não Balanceada
Manipulação de Dados Não Balanceados
Automatização de Aprendizagem de Máquina
description Imbalanced data is present in various business areas and when facing it without proper knowledge, it can have undesired negative consequences. In addition, the most common evaluation metrics in machine learning to measure the desired solution can be inappropriate and misleading. Multiple combinations of methods are proposed to handle imbalanced data however, often, they required specialised knowledge to be used correctly. For imbalanced classification, the desire to correctly classify the underrepresented class tends to be more important than the overrepresented class, while being more challenging and time-consuming. Several approaches, ranging from more accessible and more advanced in the domains of data resampling and cost-sensitive techniques, will be considered to handle imbalanced data. The application developed delivers recommendations of the most suited combinations of techniques for the specific dataset imported, by extracting and comparing meta-features values recorded in a knowledge base. It facilitates effortless classification and automates part of the machine learning pipeline with comparable or better results to a state-of-the-art solution and with a much smaller execution time
publishDate 2022
dc.date.none.fl_str_mv 2022
2022-01-01T00:00:00Z
2023-03-15T15:06:26Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10400.22/22518
urn:tid:203113730
url http://hdl.handle.net/10400.22/22518
identifier_str_mv urn:tid:203113730
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron:RCAAP
instname_str FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron_str RCAAP
institution RCAAP
reponame_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
collection Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository.name.fl_str_mv Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
repository.mail.fl_str_mv info@rcaap.pt
_version_ 1833600724756332544