Automatic Handling of Imbalanced Datasets for Classification
Autor(a) principal: | |
---|---|
Data de Publicação: | 2022 |
Tipo de documento: | Dissertação |
Idioma: | eng |
Título da fonte: | Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
Texto Completo: | http://hdl.handle.net/10400.22/22518 |
Resumo: | Imbalanced data is present in various business areas and when facing it without proper knowledge, it can have undesired negative consequences. In addition, the most common evaluation metrics in machine learning to measure the desired solution can be inappropriate and misleading. Multiple combinations of methods are proposed to handle imbalanced data however, often, they required specialised knowledge to be used correctly. For imbalanced classification, the desire to correctly classify the underrepresented class tends to be more important than the overrepresented class, while being more challenging and time-consuming. Several approaches, ranging from more accessible and more advanced in the domains of data resampling and cost-sensitive techniques, will be considered to handle imbalanced data. The application developed delivers recommendations of the most suited combinations of techniques for the specific dataset imported, by extracting and comparing meta-features values recorded in a knowledge base. It facilitates effortless classification and automates part of the machine learning pipeline with comparable or better results to a state-of-the-art solution and with a much smaller execution time |
id |
RCAP_e33342fdf4bd2ff306c907fc6d02451a |
---|---|
oai_identifier_str |
oai:recipp.ipp.pt:10400.22/22518 |
network_acronym_str |
RCAP |
network_name_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
repository_id_str |
https://opendoar.ac.uk/repository/7160 |
spelling |
Automatic Handling of Imbalanced Datasets for ClassificationImbalanced ClassificationHandling Imbalanced DataAutomated Machine LearningClassificação Não BalanceadaManipulação de Dados Não BalanceadosAutomatização de Aprendizagem de MáquinaImbalanced data is present in various business areas and when facing it without proper knowledge, it can have undesired negative consequences. In addition, the most common evaluation metrics in machine learning to measure the desired solution can be inappropriate and misleading. Multiple combinations of methods are proposed to handle imbalanced data however, often, they required specialised knowledge to be used correctly. For imbalanced classification, the desire to correctly classify the underrepresented class tends to be more important than the overrepresented class, while being more challenging and time-consuming. Several approaches, ranging from more accessible and more advanced in the domains of data resampling and cost-sensitive techniques, will be considered to handle imbalanced data. The application developed delivers recommendations of the most suited combinations of techniques for the specific dataset imported, by extracting and comparing meta-features values recorded in a knowledge base. It facilitates effortless classification and automates part of the machine learning pipeline with comparable or better results to a state-of-the-art solution and with a much smaller execution timeRodrigues, Maria de Fátima CoutinhoREPOSITÓRIO P.PORTOVieira, Pedro Marques2023-03-15T15:06:26Z20222022-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10400.22/22518urn:tid:203113730enginfo:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2025-03-07T10:22:35Zoai:recipp.ipp.pt:10400.22/22518Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-29T00:51:07.356651Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse |
dc.title.none.fl_str_mv |
Automatic Handling of Imbalanced Datasets for Classification |
title |
Automatic Handling of Imbalanced Datasets for Classification |
spellingShingle |
Automatic Handling of Imbalanced Datasets for Classification Vieira, Pedro Marques Imbalanced Classification Handling Imbalanced Data Automated Machine Learning Classificação Não Balanceada Manipulação de Dados Não Balanceados Automatização de Aprendizagem de Máquina |
title_short |
Automatic Handling of Imbalanced Datasets for Classification |
title_full |
Automatic Handling of Imbalanced Datasets for Classification |
title_fullStr |
Automatic Handling of Imbalanced Datasets for Classification |
title_full_unstemmed |
Automatic Handling of Imbalanced Datasets for Classification |
title_sort |
Automatic Handling of Imbalanced Datasets for Classification |
author |
Vieira, Pedro Marques |
author_facet |
Vieira, Pedro Marques |
author_role |
author |
dc.contributor.none.fl_str_mv |
Rodrigues, Maria de Fátima Coutinho REPOSITÓRIO P.PORTO |
dc.contributor.author.fl_str_mv |
Vieira, Pedro Marques |
dc.subject.por.fl_str_mv |
Imbalanced Classification Handling Imbalanced Data Automated Machine Learning Classificação Não Balanceada Manipulação de Dados Não Balanceados Automatização de Aprendizagem de Máquina |
topic |
Imbalanced Classification Handling Imbalanced Data Automated Machine Learning Classificação Não Balanceada Manipulação de Dados Não Balanceados Automatização de Aprendizagem de Máquina |
description |
Imbalanced data is present in various business areas and when facing it without proper knowledge, it can have undesired negative consequences. In addition, the most common evaluation metrics in machine learning to measure the desired solution can be inappropriate and misleading. Multiple combinations of methods are proposed to handle imbalanced data however, often, they required specialised knowledge to be used correctly. For imbalanced classification, the desire to correctly classify the underrepresented class tends to be more important than the overrepresented class, while being more challenging and time-consuming. Several approaches, ranging from more accessible and more advanced in the domains of data resampling and cost-sensitive techniques, will be considered to handle imbalanced data. The application developed delivers recommendations of the most suited combinations of techniques for the specific dataset imported, by extracting and comparing meta-features values recorded in a knowledge base. It facilitates effortless classification and automates part of the machine learning pipeline with comparable or better results to a state-of-the-art solution and with a much smaller execution time |
publishDate |
2022 |
dc.date.none.fl_str_mv |
2022 2022-01-01T00:00:00Z 2023-03-15T15:06:26Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10400.22/22518 urn:tid:203113730 |
url |
http://hdl.handle.net/10400.22/22518 |
identifier_str_mv |
urn:tid:203113730 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia instacron:RCAAP |
instname_str |
FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
collection |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
repository.name.fl_str_mv |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
repository.mail.fl_str_mv |
info@rcaap.pt |
_version_ |
1833600724756332544 |