Text Mining Techniques for Car Price Prediction
| Autor(a) principal: | |
|---|---|
| Data de Publicação: | 2022 |
| Tipo de documento: | Dissertação |
| Idioma: | eng |
| Título da fonte: | Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| Texto Completo: | http://hdl.handle.net/10362/135551 |
Resumo: | Project Work presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data Science |
| id |
RCAP_897930f8a73a178a83a41e6cab8edf06 |
|---|---|
| oai_identifier_str |
oai:run.unl.pt:10362/135551 |
| network_acronym_str |
RCAP |
| network_name_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| repository_id_str |
https://opendoar.ac.uk/repository/7160 |
| spelling |
Text Mining Techniques for Car Price PredictionText MiningRegression AnalysisCar Price PredictionProject Work presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data ScienceModern data sources routinely contain information both in unstructured and structured forms, combining text with the usual numerical and categorical data. For instance, in websites dedicated for selling and buying cars the listings typically include a textual description of the car. Others also include a detailed list of numerical or categorical attributes, such as the total number of kilometers the car has, or it´s model. In this work project we apply text mining techniques to create predictors for car price regression from unstructured data, the textual description in car listings. Two different types of predictors were studied, the tf-idf features obtained from the n-gram count matrix, or the singular vectors derived from the decomposition of the tf-idf matrix. In this work we also examine the performance of reducing the vocabulary dimension by applying stemming, lemmatization or not applying either of those. We also compare the effects of creating the initial n-gram count matrix with only unigrams, unigrams and bigrams or only bigrams. Our regression experiment shows that Support Vector Regression performs best at car price prediction using text data as predictors with R2 = 0.77, MSE = 0.19 and MAE = 0.32. These results can be seen as respectable given the complex nature of the task.Henriques, Roberto André PereiraRUNGonçalves, Ricardo Miguel Galvão2022-03-30T16:58:48Z2022-03-022022-03-02T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10362/135551TID:202979733enginfo:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2024-05-22T18:00:46Zoai:run.unl.pt:10362/135551Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-28T17:31:49.981787Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse |
| dc.title.none.fl_str_mv |
Text Mining Techniques for Car Price Prediction |
| title |
Text Mining Techniques for Car Price Prediction |
| spellingShingle |
Text Mining Techniques for Car Price Prediction Gonçalves, Ricardo Miguel Galvão Text Mining Regression Analysis Car Price Prediction |
| title_short |
Text Mining Techniques for Car Price Prediction |
| title_full |
Text Mining Techniques for Car Price Prediction |
| title_fullStr |
Text Mining Techniques for Car Price Prediction |
| title_full_unstemmed |
Text Mining Techniques for Car Price Prediction |
| title_sort |
Text Mining Techniques for Car Price Prediction |
| author |
Gonçalves, Ricardo Miguel Galvão |
| author_facet |
Gonçalves, Ricardo Miguel Galvão |
| author_role |
author |
| dc.contributor.none.fl_str_mv |
Henriques, Roberto André Pereira RUN |
| dc.contributor.author.fl_str_mv |
Gonçalves, Ricardo Miguel Galvão |
| dc.subject.por.fl_str_mv |
Text Mining Regression Analysis Car Price Prediction |
| topic |
Text Mining Regression Analysis Car Price Prediction |
| description |
Project Work presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data Science |
| publishDate |
2022 |
| dc.date.none.fl_str_mv |
2022-03-30T16:58:48Z 2022-03-02 2022-03-02T00:00:00Z |
| dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
| dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
| format |
masterThesis |
| status_str |
publishedVersion |
| dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10362/135551 TID:202979733 |
| url |
http://hdl.handle.net/10362/135551 |
| identifier_str_mv |
TID:202979733 |
| dc.language.iso.fl_str_mv |
eng |
| language |
eng |
| dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
| eu_rights_str_mv |
openAccess |
| dc.format.none.fl_str_mv |
application/pdf |
| dc.source.none.fl_str_mv |
reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia instacron:RCAAP |
| instname_str |
FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
| instacron_str |
RCAAP |
| institution |
RCAAP |
| reponame_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| collection |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| repository.name.fl_str_mv |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
| repository.mail.fl_str_mv |
info@rcaap.pt |
| _version_ |
1833596760544509952 |