Predicting how much a consumer is willing to pay for a bottle of wine: dealing with data imbalance

Detalhes bibliográficos
Autor(a) principal: Alonso, Hugo
Data de Publicação: 2023
Outros Autores: Candeias, Teresa
Idioma: eng
Título da fonte: Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
Texto Completo: http://hdl.handle.net/10773/39596
Resumo: The wine industry has becoming increasingly important worldwide and is one of the most significant industries in Portugal. In a previous paper, the problem of predicting how much a Portuguese consumer is willing to pay for a bottle of wine was considered for the first time ever. The problem was treated as a multi-class ordinal classification task. Although we achieved good prediction results, globally speaking, it was difficult to identify rare cases of consumers who are interested in paying for more expensive wines. We found that this was a direct consequence of data imbalance. Therefore, here, we present a first attempt to deal with this issue, based on the use of re-sampling strategies to balance the training data, namely random under-sampling, random over- sampling with replacement and the synthetic minority over-sampling technique. We consider several learning methods and develop various predictive models. A comparative study is carried out and its results highlight the importance of a careful choice of the re-sampling strategy and the learning method in order to get the best possible prediction results.
id RCAP_110998fda823ad3260eeb5ea0607a15e
oai_identifier_str oai:ria.ua.pt:10773/39596
network_acronym_str RCAP
network_name_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository_id_str https://opendoar.ac.uk/repository/7160
spelling Predicting how much a consumer is willing to pay for a bottle of wine: dealing with data imbalanceWineClassificationData imbalanceRe-samplingLearning methodsPredictive modelsThe wine industry has becoming increasingly important worldwide and is one of the most significant industries in Portugal. In a previous paper, the problem of predicting how much a Portuguese consumer is willing to pay for a bottle of wine was considered for the first time ever. The problem was treated as a multi-class ordinal classification task. Although we achieved good prediction results, globally speaking, it was difficult to identify rare cases of consumers who are interested in paying for more expensive wines. We found that this was a direct consequence of data imbalance. Therefore, here, we present a first attempt to deal with this issue, based on the use of re-sampling strategies to balance the training data, namely random under-sampling, random over- sampling with replacement and the synthetic minority over-sampling technique. We consider several learning methods and develop various predictive models. A comparative study is carried out and its results highlight the importance of a careful choice of the re-sampling strategy and the learning method in order to get the best possible prediction results.SciTePress2023-10-23T14:10:18Z2023-01-01T00:00:00Z2023conference objectinfo:eu-repo/semantics/publishedVersionapplication/pdfhttp://hdl.handle.net/10773/39596eng2184-285X10.5220/0012068800003541Alonso, HugoCandeias, Teresainfo:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2024-05-06T04:50:15Zoai:ria.ua.pt:10773/39596Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-28T14:21:49.541714Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse
dc.title.none.fl_str_mv Predicting how much a consumer is willing to pay for a bottle of wine: dealing with data imbalance
title Predicting how much a consumer is willing to pay for a bottle of wine: dealing with data imbalance
spellingShingle Predicting how much a consumer is willing to pay for a bottle of wine: dealing with data imbalance
Alonso, Hugo
Wine
Classification
Data imbalance
Re-sampling
Learning methods
Predictive models
title_short Predicting how much a consumer is willing to pay for a bottle of wine: dealing with data imbalance
title_full Predicting how much a consumer is willing to pay for a bottle of wine: dealing with data imbalance
title_fullStr Predicting how much a consumer is willing to pay for a bottle of wine: dealing with data imbalance
title_full_unstemmed Predicting how much a consumer is willing to pay for a bottle of wine: dealing with data imbalance
title_sort Predicting how much a consumer is willing to pay for a bottle of wine: dealing with data imbalance
author Alonso, Hugo
author_facet Alonso, Hugo
Candeias, Teresa
author_role author
author2 Candeias, Teresa
author2_role author
dc.contributor.author.fl_str_mv Alonso, Hugo
Candeias, Teresa
dc.subject.por.fl_str_mv Wine
Classification
Data imbalance
Re-sampling
Learning methods
Predictive models
topic Wine
Classification
Data imbalance
Re-sampling
Learning methods
Predictive models
description The wine industry has becoming increasingly important worldwide and is one of the most significant industries in Portugal. In a previous paper, the problem of predicting how much a Portuguese consumer is willing to pay for a bottle of wine was considered for the first time ever. The problem was treated as a multi-class ordinal classification task. Although we achieved good prediction results, globally speaking, it was difficult to identify rare cases of consumers who are interested in paying for more expensive wines. We found that this was a direct consequence of data imbalance. Therefore, here, we present a first attempt to deal with this issue, based on the use of re-sampling strategies to balance the training data, namely random under-sampling, random over- sampling with replacement and the synthetic minority over-sampling technique. We consider several learning methods and develop various predictive models. A comparative study is carried out and its results highlight the importance of a careful choice of the re-sampling strategy and the learning method in order to get the best possible prediction results.
publishDate 2023
dc.date.none.fl_str_mv 2023-10-23T14:10:18Z
2023-01-01T00:00:00Z
2023
dc.type.driver.fl_str_mv conference object
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10773/39596
url http://hdl.handle.net/10773/39596
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 2184-285X
10.5220/0012068800003541
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv SciTePress
publisher.none.fl_str_mv SciTePress
dc.source.none.fl_str_mv reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron:RCAAP
instname_str FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron_str RCAAP
institution RCAAP
reponame_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
collection Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository.name.fl_str_mv Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
repository.mail.fl_str_mv info@rcaap.pt
_version_ 1833594534868549632