Web scraping and analysis of car data

Detalhes bibliográficos
Autor(a) principal: Silva, João Luís Magalhães da
Data de Publicação: 2024
Tipo de documento: Dissertação
Idioma: eng
Título da fonte: Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
Texto Completo: http://hdl.handle.net/10400.22/26596
Resumo: The growth of online car marketplaces has created challenges in efficiently gathering and analyzing car data due to price fluctuations and increasing digital reliance. This thesis tackles the problem through web scraping and data analysis to assist in market insights. A review of web scraping tools like BeautifulSoup, Requests, and Selenium, alongside data analysis libraries such as Pandas, was conducted. A system was developed to scrape car data from Standvirtual and analyze key attributes like price and mileage. The data was processed using Python tools, and a Flask-based server application was built for easy access, with offline analysis supported through Excel. Challenges such as incomplete data and anti-scraping measures were resolved with advanced extraction techniques and error handling. Further improvements include optimizing the scraping process and integrating machine learning models for more accurate price predictions. In conclusion, the project demonstrates the potential of web scraping for car market analysis, providing a foundation for future predictive analytics and real-time data applications.
id RCAP_2a163f4d6e9406f98e5f3ae75f118a26
oai_identifier_str oai:recipp.ipp.pt:10400.22/26596
network_acronym_str RCAP
network_name_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository_id_str https://opendoar.ac.uk/repository/7160
spelling Web scraping and analysis of car dataAgregamento e análise de dados de automóveisWeb scrapingData analysisApplicationData extractionData insertionPython librariesCar dataAgregamento de dadosAnálise de dadosAplicaçãoExtração de dadosInserção de dadosBibliotecas PythonDados de automóveisThe growth of online car marketplaces has created challenges in efficiently gathering and analyzing car data due to price fluctuations and increasing digital reliance. This thesis tackles the problem through web scraping and data analysis to assist in market insights. A review of web scraping tools like BeautifulSoup, Requests, and Selenium, alongside data analysis libraries such as Pandas, was conducted. A system was developed to scrape car data from Standvirtual and analyze key attributes like price and mileage. The data was processed using Python tools, and a Flask-based server application was built for easy access, with offline analysis supported through Excel. Challenges such as incomplete data and anti-scraping measures were resolved with advanced extraction techniques and error handling. Further improvements include optimizing the scraping process and integrating machine learning models for more accurate price predictions. In conclusion, the project demonstrates the potential of web scraping for car market analysis, providing a foundation for future predictive analytics and real-time data applications.Araújo, Susana Cláudia Nicola deREPOSITÓRIO P.PORTOSilva, João Luís Magalhães da2024-12-02T16:09:09Z2024-10-292024-10-29T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10400.22/26596urn:tid:203733142enginfo:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2025-03-07T10:33:29Zoai:recipp.ipp.pt:10400.22/26596Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-29T01:01:30.630733Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse
dc.title.none.fl_str_mv Web scraping and analysis of car data
Agregamento e análise de dados de automóveis
title Web scraping and analysis of car data
spellingShingle Web scraping and analysis of car data
Silva, João Luís Magalhães da
Web scraping
Data analysis
Application
Data extraction
Data insertion
Python libraries
Car data
Agregamento de dados
Análise de dados
Aplicação
Extração de dados
Inserção de dados
Bibliotecas Python
Dados de automóveis
title_short Web scraping and analysis of car data
title_full Web scraping and analysis of car data
title_fullStr Web scraping and analysis of car data
title_full_unstemmed Web scraping and analysis of car data
title_sort Web scraping and analysis of car data
author Silva, João Luís Magalhães da
author_facet Silva, João Luís Magalhães da
author_role author
dc.contributor.none.fl_str_mv Araújo, Susana Cláudia Nicola de
REPOSITÓRIO P.PORTO
dc.contributor.author.fl_str_mv Silva, João Luís Magalhães da
dc.subject.por.fl_str_mv Web scraping
Data analysis
Application
Data extraction
Data insertion
Python libraries
Car data
Agregamento de dados
Análise de dados
Aplicação
Extração de dados
Inserção de dados
Bibliotecas Python
Dados de automóveis
topic Web scraping
Data analysis
Application
Data extraction
Data insertion
Python libraries
Car data
Agregamento de dados
Análise de dados
Aplicação
Extração de dados
Inserção de dados
Bibliotecas Python
Dados de automóveis
description The growth of online car marketplaces has created challenges in efficiently gathering and analyzing car data due to price fluctuations and increasing digital reliance. This thesis tackles the problem through web scraping and data analysis to assist in market insights. A review of web scraping tools like BeautifulSoup, Requests, and Selenium, alongside data analysis libraries such as Pandas, was conducted. A system was developed to scrape car data from Standvirtual and analyze key attributes like price and mileage. The data was processed using Python tools, and a Flask-based server application was built for easy access, with offline analysis supported through Excel. Challenges such as incomplete data and anti-scraping measures were resolved with advanced extraction techniques and error handling. Further improvements include optimizing the scraping process and integrating machine learning models for more accurate price predictions. In conclusion, the project demonstrates the potential of web scraping for car market analysis, providing a foundation for future predictive analytics and real-time data applications.
publishDate 2024
dc.date.none.fl_str_mv 2024-12-02T16:09:09Z
2024-10-29
2024-10-29T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10400.22/26596
urn:tid:203733142
url http://hdl.handle.net/10400.22/26596
identifier_str_mv urn:tid:203733142
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron:RCAAP
instname_str FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron_str RCAAP
institution RCAAP
reponame_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
collection Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository.name.fl_str_mv Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
repository.mail.fl_str_mv info@rcaap.pt
_version_ 1833600801485881344