Predicting how much a consumer is willing to pay for a bottle of wine: dealing with data imbalance
| Autor(a) principal: | |
|---|---|
| Data de Publicação: | 2023 |
| Outros Autores: | |
| Idioma: | eng |
| Título da fonte: | Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| Texto Completo: | http://hdl.handle.net/10773/39596 |
Resumo: | The wine industry has becoming increasingly important worldwide and is one of the most significant industries in Portugal. In a previous paper, the problem of predicting how much a Portuguese consumer is willing to pay for a bottle of wine was considered for the first time ever. The problem was treated as a multi-class ordinal classification task. Although we achieved good prediction results, globally speaking, it was difficult to identify rare cases of consumers who are interested in paying for more expensive wines. We found that this was a direct consequence of data imbalance. Therefore, here, we present a first attempt to deal with this issue, based on the use of re-sampling strategies to balance the training data, namely random under-sampling, random over- sampling with replacement and the synthetic minority over-sampling technique. We consider several learning methods and develop various predictive models. A comparative study is carried out and its results highlight the importance of a careful choice of the re-sampling strategy and the learning method in order to get the best possible prediction results. |
| id |
RCAP_110998fda823ad3260eeb5ea0607a15e |
|---|---|
| oai_identifier_str |
oai:ria.ua.pt:10773/39596 |
| network_acronym_str |
RCAP |
| network_name_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| repository_id_str |
https://opendoar.ac.uk/repository/7160 |
| spelling |
Predicting how much a consumer is willing to pay for a bottle of wine: dealing with data imbalanceWineClassificationData imbalanceRe-samplingLearning methodsPredictive modelsThe wine industry has becoming increasingly important worldwide and is one of the most significant industries in Portugal. In a previous paper, the problem of predicting how much a Portuguese consumer is willing to pay for a bottle of wine was considered for the first time ever. The problem was treated as a multi-class ordinal classification task. Although we achieved good prediction results, globally speaking, it was difficult to identify rare cases of consumers who are interested in paying for more expensive wines. We found that this was a direct consequence of data imbalance. Therefore, here, we present a first attempt to deal with this issue, based on the use of re-sampling strategies to balance the training data, namely random under-sampling, random over- sampling with replacement and the synthetic minority over-sampling technique. We consider several learning methods and develop various predictive models. A comparative study is carried out and its results highlight the importance of a careful choice of the re-sampling strategy and the learning method in order to get the best possible prediction results.SciTePress2023-10-23T14:10:18Z2023-01-01T00:00:00Z2023conference objectinfo:eu-repo/semantics/publishedVersionapplication/pdfhttp://hdl.handle.net/10773/39596eng2184-285X10.5220/0012068800003541Alonso, HugoCandeias, Teresainfo:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2024-05-06T04:50:15Zoai:ria.ua.pt:10773/39596Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-28T14:21:49.541714Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse |
| dc.title.none.fl_str_mv |
Predicting how much a consumer is willing to pay for a bottle of wine: dealing with data imbalance |
| title |
Predicting how much a consumer is willing to pay for a bottle of wine: dealing with data imbalance |
| spellingShingle |
Predicting how much a consumer is willing to pay for a bottle of wine: dealing with data imbalance Alonso, Hugo Wine Classification Data imbalance Re-sampling Learning methods Predictive models |
| title_short |
Predicting how much a consumer is willing to pay for a bottle of wine: dealing with data imbalance |
| title_full |
Predicting how much a consumer is willing to pay for a bottle of wine: dealing with data imbalance |
| title_fullStr |
Predicting how much a consumer is willing to pay for a bottle of wine: dealing with data imbalance |
| title_full_unstemmed |
Predicting how much a consumer is willing to pay for a bottle of wine: dealing with data imbalance |
| title_sort |
Predicting how much a consumer is willing to pay for a bottle of wine: dealing with data imbalance |
| author |
Alonso, Hugo |
| author_facet |
Alonso, Hugo Candeias, Teresa |
| author_role |
author |
| author2 |
Candeias, Teresa |
| author2_role |
author |
| dc.contributor.author.fl_str_mv |
Alonso, Hugo Candeias, Teresa |
| dc.subject.por.fl_str_mv |
Wine Classification Data imbalance Re-sampling Learning methods Predictive models |
| topic |
Wine Classification Data imbalance Re-sampling Learning methods Predictive models |
| description |
The wine industry has becoming increasingly important worldwide and is one of the most significant industries in Portugal. In a previous paper, the problem of predicting how much a Portuguese consumer is willing to pay for a bottle of wine was considered for the first time ever. The problem was treated as a multi-class ordinal classification task. Although we achieved good prediction results, globally speaking, it was difficult to identify rare cases of consumers who are interested in paying for more expensive wines. We found that this was a direct consequence of data imbalance. Therefore, here, we present a first attempt to deal with this issue, based on the use of re-sampling strategies to balance the training data, namely random under-sampling, random over- sampling with replacement and the synthetic minority over-sampling technique. We consider several learning methods and develop various predictive models. A comparative study is carried out and its results highlight the importance of a careful choice of the re-sampling strategy and the learning method in order to get the best possible prediction results. |
| publishDate |
2023 |
| dc.date.none.fl_str_mv |
2023-10-23T14:10:18Z 2023-01-01T00:00:00Z 2023 |
| dc.type.driver.fl_str_mv |
conference object |
| dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
| status_str |
publishedVersion |
| dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10773/39596 |
| url |
http://hdl.handle.net/10773/39596 |
| dc.language.iso.fl_str_mv |
eng |
| language |
eng |
| dc.relation.none.fl_str_mv |
2184-285X 10.5220/0012068800003541 |
| dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
| eu_rights_str_mv |
openAccess |
| dc.format.none.fl_str_mv |
application/pdf |
| dc.publisher.none.fl_str_mv |
SciTePress |
| publisher.none.fl_str_mv |
SciTePress |
| dc.source.none.fl_str_mv |
reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia instacron:RCAAP |
| instname_str |
FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
| instacron_str |
RCAAP |
| institution |
RCAAP |
| reponame_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| collection |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| repository.name.fl_str_mv |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
| repository.mail.fl_str_mv |
info@rcaap.pt |
| _version_ |
1833594534868549632 |