Geometric SMOTE for imbalanced datasets with nominal and continuous features
| Autor(a) principal: | |
|---|---|
| Data de Publicação: | 2023 |
| Outros Autores: | |
| Tipo de documento: | Artigo |
| Idioma: | eng |
| Título da fonte: | Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| Texto Completo: | http://hdl.handle.net/10362/157223 |
Resumo: | Fonseca, J., & Bacao, F. (2023). Geometric SMOTE for imbalanced datasets with nominal and continuous features. Expert Systems with Applications, 234(December), 1-9. [121053]. https://doi.org/10.1016/j.eswa.2023.121053 --- This research was supported by research grants of the Portuguese Foundation for Science and Technology (“Fundação para a Ciência e a Tecnologia”), references SFRH/BD/151473/2021, DSAIPA/DS/0116/2019, and by project UIDB/04152/2020 — Centro de Investigação em Gestão de Informação (MagIC) . |
| id |
RCAP_b8e90456f3966e10ada23295faa0ee9b |
|---|---|
| oai_identifier_str |
oai:run.unl.pt:10362/157223 |
| network_acronym_str |
RCAP |
| network_name_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| repository_id_str |
https://opendoar.ac.uk/repository/7160 |
| spelling |
Geometric SMOTE for imbalanced datasets with nominal and continuous featuresImbalanced learningOversamplingSMOTEData generationNominal dataEngineering(all)Computer Science ApplicationsArtificial IntelligenceFonseca, J., & Bacao, F. (2023). Geometric SMOTE for imbalanced datasets with nominal and continuous features. Expert Systems with Applications, 234(December), 1-9. [121053]. https://doi.org/10.1016/j.eswa.2023.121053 --- This research was supported by research grants of the Portuguese Foundation for Science and Technology (“Fundação para a Ciência e a Tecnologia”), references SFRH/BD/151473/2021, DSAIPA/DS/0116/2019, and by project UIDB/04152/2020 — Centro de Investigação em Gestão de Informação (MagIC) .Imbalanced learning can be addressed in 3 different ways: Resampling, algorithmic modifications and cost-sensitive solutions. Resampling, and specifically oversampling, are more general approaches when opposed to algorithmic and cost-sensitive methods. Since the proposal of the Synthetic Minority Oversampling TEchnique (SMOTE), various SMOTE variants and neural network-based oversampling methods have been developed. However, the options to oversample datasets with nominal and continuous features are limited. We propose Geometric SMOTE for Nominal and Continuous features (G-SMOTENC), based on a combination of G-SMOTE and SMOTENC. Our method modifies SMOTENC’s encoding and generation mechanism for nominal features while using G-SMOTE’s data selection mechanism to determine the center observation and k-nearest neighbors and generation mechanism for continuous features. G-SMOTENC’s performance is compared against SMOTENC’s along with two other baseline methods, a State-of-the-art oversampling method and no oversampling. The experiment was performed over 20 datasets with varying imbalance ratios, number of metric and non-metric features and target classes. We found a significant improvement in classification performance when using G-SMOTENC as the oversampling method. An open-source implementation of G-SMOTENC is made available in the Python programming language.Information Management Research Center (MagIC) - NOVA Information Management SchoolNOVA Information Management School (NOVA IMS)RUNFonseca, JoaoBacao, Fernando2023-09-01T22:15:42Z2023-12-302023-12-30T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/article9application/pdfhttp://hdl.handle.net/10362/157223eng0957-4174PURE: 67759778https://doi.org/10.1016/j.eswa.2023.121053info:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2024-10-21T01:36:50Zoai:run.unl.pt:10362/157223Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-28T17:44:21.974167Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse |
| dc.title.none.fl_str_mv |
Geometric SMOTE for imbalanced datasets with nominal and continuous features |
| title |
Geometric SMOTE for imbalanced datasets with nominal and continuous features |
| spellingShingle |
Geometric SMOTE for imbalanced datasets with nominal and continuous features Fonseca, Joao Imbalanced learning Oversampling SMOTE Data generation Nominal data Engineering(all) Computer Science Applications Artificial Intelligence |
| title_short |
Geometric SMOTE for imbalanced datasets with nominal and continuous features |
| title_full |
Geometric SMOTE for imbalanced datasets with nominal and continuous features |
| title_fullStr |
Geometric SMOTE for imbalanced datasets with nominal and continuous features |
| title_full_unstemmed |
Geometric SMOTE for imbalanced datasets with nominal and continuous features |
| title_sort |
Geometric SMOTE for imbalanced datasets with nominal and continuous features |
| author |
Fonseca, Joao |
| author_facet |
Fonseca, Joao Bacao, Fernando |
| author_role |
author |
| author2 |
Bacao, Fernando |
| author2_role |
author |
| dc.contributor.none.fl_str_mv |
Information Management Research Center (MagIC) - NOVA Information Management School NOVA Information Management School (NOVA IMS) RUN |
| dc.contributor.author.fl_str_mv |
Fonseca, Joao Bacao, Fernando |
| dc.subject.por.fl_str_mv |
Imbalanced learning Oversampling SMOTE Data generation Nominal data Engineering(all) Computer Science Applications Artificial Intelligence |
| topic |
Imbalanced learning Oversampling SMOTE Data generation Nominal data Engineering(all) Computer Science Applications Artificial Intelligence |
| description |
Fonseca, J., & Bacao, F. (2023). Geometric SMOTE for imbalanced datasets with nominal and continuous features. Expert Systems with Applications, 234(December), 1-9. [121053]. https://doi.org/10.1016/j.eswa.2023.121053 --- This research was supported by research grants of the Portuguese Foundation for Science and Technology (“Fundação para a Ciência e a Tecnologia”), references SFRH/BD/151473/2021, DSAIPA/DS/0116/2019, and by project UIDB/04152/2020 — Centro de Investigação em Gestão de Informação (MagIC) . |
| publishDate |
2023 |
| dc.date.none.fl_str_mv |
2023-09-01T22:15:42Z 2023-12-30 2023-12-30T00:00:00Z |
| dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
| dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
| format |
article |
| status_str |
publishedVersion |
| dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10362/157223 |
| url |
http://hdl.handle.net/10362/157223 |
| dc.language.iso.fl_str_mv |
eng |
| language |
eng |
| dc.relation.none.fl_str_mv |
0957-4174 PURE: 67759778 https://doi.org/10.1016/j.eswa.2023.121053 |
| dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
| eu_rights_str_mv |
openAccess |
| dc.format.none.fl_str_mv |
9 application/pdf |
| dc.source.none.fl_str_mv |
reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia instacron:RCAAP |
| instname_str |
FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
| instacron_str |
RCAAP |
| institution |
RCAAP |
| reponame_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| collection |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| repository.name.fl_str_mv |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
| repository.mail.fl_str_mv |
info@rcaap.pt |
| _version_ |
1833596929178599424 |