Anomaly detection on natural language processing to improve predictions on tourist preferences

Bibliographic Details
Main Author: Meira, Jorge
Publication Date: 2022
Other Authors: Carneiro, João, Bolón-Canedo, Verónica, Alonso-Betanzos, Amparo, Novais, Paulo, Marreiros, Goreti
Format: Article
Language: eng
Source: Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
Download full: https://hdl.handle.net/1822/79084
Summary: Argumentation-based dialogue models have shown to be appropriate for decision contexts in which it is intended to overcome the lack of interaction between decision-makers, either because they are dispersed, they are too many, or they are simply not even known. However, to support decision processes with argumentation-based dialogue models, it is necessary to have knowledge of certain aspects that are specific to each decision-maker, such as preferences, interests, and limitations, among others. Failure to obtain this knowledge could ruin the model’s success. In this work, we sought to facilitate the information acquisition process by studying strategies to automatically predict the tourists’ preferences (ratings) in relation to points of interest based on their reviews. We explored different Machine Learning methods to predict users’ ratings. We used Natural Language Processing strategies to predict whether a review is positive or negative and the rating assigned by users on a scale of 1 to 5. We then applied supervised methods such as Logistic Regression, Random Forest, Decision Trees, K-Nearest Neighbors, and Recurrent Neural Networks to determine whether a tourist likes/dislikes a given point of interest. We also used a distinctive approach in this field through unsupervised techniques for anomaly detection problems. The goal was to improve the supervised model in identifying only those tourists who truly like or dislike a particular point of interest, in which the main objective is not to identify everyone, but fundamentally not to fail those who are identified in those conditions. The experiments carried out showed that the developed models could predict with high accuracy whether a review is positive or negative but have some difficulty in accurately predicting the rating assigned by users. Unsupervised method Local Outlier Factor improved the results, reducing Logistic Regression false positives with an associated cost of increasing false negatives.
id RCAP_d0ff2fb392f7920999dbc8e017aff55c
oai_identifier_str oai:repositorium.sdum.uminho.pt:1822/79084
network_acronym_str RCAP
network_name_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository_id_str https://opendoar.ac.uk/repository/7160
spelling Anomaly detection on natural language processing to improve predictions on tourist preferencesMachine LearningNatural Language ProcessingSentiment analysisArgumentation-based dialoguesTourismTripAdvisorScience & TechnologyArgumentation-based dialogue models have shown to be appropriate for decision contexts in which it is intended to overcome the lack of interaction between decision-makers, either because they are dispersed, they are too many, or they are simply not even known. However, to support decision processes with argumentation-based dialogue models, it is necessary to have knowledge of certain aspects that are specific to each decision-maker, such as preferences, interests, and limitations, among others. Failure to obtain this knowledge could ruin the model’s success. In this work, we sought to facilitate the information acquisition process by studying strategies to automatically predict the tourists’ preferences (ratings) in relation to points of interest based on their reviews. We explored different Machine Learning methods to predict users’ ratings. We used Natural Language Processing strategies to predict whether a review is positive or negative and the rating assigned by users on a scale of 1 to 5. We then applied supervised methods such as Logistic Regression, Random Forest, Decision Trees, K-Nearest Neighbors, and Recurrent Neural Networks to determine whether a tourist likes/dislikes a given point of interest. We also used a distinctive approach in this field through unsupervised techniques for anomaly detection problems. The goal was to improve the supervised model in identifying only those tourists who truly like or dislike a particular point of interest, in which the main objective is not to identify everyone, but fundamentally not to fail those who are identified in those conditions. The experiments carried out showed that the developed models could predict with high accuracy whether a review is positive or negative but have some difficulty in accurately predicting the rating assigned by users. Unsupervised method Local Outlier Factor improved the results, reducing Logistic Regression false positives with an associated cost of increasing false negatives.This work was supported by the GrouPlanner Project under the European Regional Development Fund POCI-01-0145-FEDER-29178 and by National Funds through the FCT—Fundação para a Ciência e a Tecnologia (Portuguese Foundation for Science and Technology) within the Projects UIDB/00319/2020 and UIDP/00760/2020.Multidisciplinary Digital Publishing Institute (MDPI)Universidade do MinhoMeira, JorgeCarneiro, JoãoBolón-Canedo, VerónicaAlonso-Betanzos, AmparoNovais, PauloMarreiros, Goreti2022-03-032022-03-03T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttps://hdl.handle.net/1822/79084engMeira, J.; Carneiro, J.; Bolón-Canedo, V.; Alonso-Betanzos, A.; Novais, P.; Marreiros, G. Anomaly Detection on Natural Language Processing to Improve Predictions on Tourist Preferences. Electronics 2022, 11, 779. https://doi.org/10.3390/electronics110507792079-929210.3390/electronics11050779779https://www.mdpi.com/2079-9292/11/5/779info:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2024-05-11T06:13:31Zoai:repositorium.sdum.uminho.pt:1822/79084Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-28T15:45:34.291378Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse
dc.title.none.fl_str_mv Anomaly detection on natural language processing to improve predictions on tourist preferences
title Anomaly detection on natural language processing to improve predictions on tourist preferences
spellingShingle Anomaly detection on natural language processing to improve predictions on tourist preferences
Meira, Jorge
Machine Learning
Natural Language Processing
Sentiment analysis
Argumentation-based dialogues
Tourism
TripAdvisor
Science & Technology
title_short Anomaly detection on natural language processing to improve predictions on tourist preferences
title_full Anomaly detection on natural language processing to improve predictions on tourist preferences
title_fullStr Anomaly detection on natural language processing to improve predictions on tourist preferences
title_full_unstemmed Anomaly detection on natural language processing to improve predictions on tourist preferences
title_sort Anomaly detection on natural language processing to improve predictions on tourist preferences
author Meira, Jorge
author_facet Meira, Jorge
Carneiro, João
Bolón-Canedo, Verónica
Alonso-Betanzos, Amparo
Novais, Paulo
Marreiros, Goreti
author_role author
author2 Carneiro, João
Bolón-Canedo, Verónica
Alonso-Betanzos, Amparo
Novais, Paulo
Marreiros, Goreti
author2_role author
author
author
author
author
dc.contributor.none.fl_str_mv Universidade do Minho
dc.contributor.author.fl_str_mv Meira, Jorge
Carneiro, João
Bolón-Canedo, Verónica
Alonso-Betanzos, Amparo
Novais, Paulo
Marreiros, Goreti
dc.subject.por.fl_str_mv Machine Learning
Natural Language Processing
Sentiment analysis
Argumentation-based dialogues
Tourism
TripAdvisor
Science & Technology
topic Machine Learning
Natural Language Processing
Sentiment analysis
Argumentation-based dialogues
Tourism
TripAdvisor
Science & Technology
description Argumentation-based dialogue models have shown to be appropriate for decision contexts in which it is intended to overcome the lack of interaction between decision-makers, either because they are dispersed, they are too many, or they are simply not even known. However, to support decision processes with argumentation-based dialogue models, it is necessary to have knowledge of certain aspects that are specific to each decision-maker, such as preferences, interests, and limitations, among others. Failure to obtain this knowledge could ruin the model’s success. In this work, we sought to facilitate the information acquisition process by studying strategies to automatically predict the tourists’ preferences (ratings) in relation to points of interest based on their reviews. We explored different Machine Learning methods to predict users’ ratings. We used Natural Language Processing strategies to predict whether a review is positive or negative and the rating assigned by users on a scale of 1 to 5. We then applied supervised methods such as Logistic Regression, Random Forest, Decision Trees, K-Nearest Neighbors, and Recurrent Neural Networks to determine whether a tourist likes/dislikes a given point of interest. We also used a distinctive approach in this field through unsupervised techniques for anomaly detection problems. The goal was to improve the supervised model in identifying only those tourists who truly like or dislike a particular point of interest, in which the main objective is not to identify everyone, but fundamentally not to fail those who are identified in those conditions. The experiments carried out showed that the developed models could predict with high accuracy whether a review is positive or negative but have some difficulty in accurately predicting the rating assigned by users. Unsupervised method Local Outlier Factor improved the results, reducing Logistic Regression false positives with an associated cost of increasing false negatives.
publishDate 2022
dc.date.none.fl_str_mv 2022-03-03
2022-03-03T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://hdl.handle.net/1822/79084
url https://hdl.handle.net/1822/79084
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv Meira, J.; Carneiro, J.; Bolón-Canedo, V.; Alonso-Betanzos, A.; Novais, P.; Marreiros, G. Anomaly Detection on Natural Language Processing to Improve Predictions on Tourist Preferences. Electronics 2022, 11, 779. https://doi.org/10.3390/electronics11050779
2079-9292
10.3390/electronics11050779
779
https://www.mdpi.com/2079-9292/11/5/779
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Multidisciplinary Digital Publishing Institute (MDPI)
publisher.none.fl_str_mv Multidisciplinary Digital Publishing Institute (MDPI)
dc.source.none.fl_str_mv reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron:RCAAP
instname_str FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron_str RCAAP
institution RCAAP
reponame_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
collection Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository.name.fl_str_mv Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
repository.mail.fl_str_mv info@rcaap.pt
_version_ 1833595521198981120