A Google trends spatial clustering approach for a worldwide Twitter user geolocation
| Main Author: | |
|---|---|
| Publication Date: | 2020 |
| Other Authors: | , |
| Format: | Article |
| Language: | eng |
| Source: | Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| Download full: | https://hdl.handle.net/1822/66815 |
Summary: | User location data is valuable for diverse social media analytics. In this paper, we address the non-trivial task of estimating a worldwide city-level Twitter user location considering only historical tweets. We propose a purely unsupervised approach that is based on a synthetic geographic sampling of Google Trends (GT) city-level frequencies of tweet nouns and three clustering algorithms. The approach was validated empirically by using a recently collected dataset, with 3,268 worldwide city-level locations of Twitter users, obtaining competitive results when compared with a state-of-the-art Word Distribution (WD) user location estimation method. The best overall results were achieved by the GT noun DBSCAN (GTN-DB) method, which is computationally fast, and correctly predicts the ground truth locations of 15%, 23%, 39% and 58% of the users for tolerance distances of 250 km, 500 km, 1,000 km and 2,000 km. |
| id |
RCAP_df95b87eebbb7f15bc837af2eb6619a9 |
|---|---|
| oai_identifier_str |
oai:repositorium.sdum.uminho.pt:1822/66815 |
| network_acronym_str |
RCAP |
| network_name_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| repository_id_str |
https://opendoar.ac.uk/repository/7160 |
| spelling |
A Google trends spatial clustering approach for a worldwide Twitter user geolocationCity-level geolocationClusteringGoogle TrendsNatural language processingTwitterEngenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e InformáticaScience & TechnologyUser location data is valuable for diverse social media analytics. In this paper, we address the non-trivial task of estimating a worldwide city-level Twitter user location considering only historical tweets. We propose a purely unsupervised approach that is based on a synthetic geographic sampling of Google Trends (GT) city-level frequencies of tweet nouns and three clustering algorithms. The approach was validated empirically by using a recently collected dataset, with 3,268 worldwide city-level locations of Twitter users, obtaining competitive results when compared with a state-of-the-art Word Distribution (WD) user location estimation method. The best overall results were achieved by the GT noun DBSCAN (GTN-DB) method, which is computationally fast, and correctly predicts the ground truth locations of 15%, 23%, 39% and 58% of the users for tolerance distances of 250 km, 500 km, 1,000 km and 2,000 km.The work of P. Cortez was supported by FCT – Fundação para a Ciência eTecnologia within the R&D Units Project Scope: UIDB/00319/2020. We wouldalso like to thank the anonymous reviewers for their helpful suggestions.ElsevierUniversidade do MinhoZola, PaolaRagno, CostantinoCortez, Paulo20202020-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttps://hdl.handle.net/1822/66815eng0306-457310.1016/j.ipm.2020.102312https://www.sciencedirect.com/science/article/pii/S0306457320308074info:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2025-04-12T05:19:34Zoai:repositorium.sdum.uminho.pt:1822/66815Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-28T16:23:34.903595Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse |
| dc.title.none.fl_str_mv |
A Google trends spatial clustering approach for a worldwide Twitter user geolocation |
| title |
A Google trends spatial clustering approach for a worldwide Twitter user geolocation |
| spellingShingle |
A Google trends spatial clustering approach for a worldwide Twitter user geolocation Zola, Paola City-level geolocation Clustering Google Trends Natural language processing Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática Science & Technology |
| title_short |
A Google trends spatial clustering approach for a worldwide Twitter user geolocation |
| title_full |
A Google trends spatial clustering approach for a worldwide Twitter user geolocation |
| title_fullStr |
A Google trends spatial clustering approach for a worldwide Twitter user geolocation |
| title_full_unstemmed |
A Google trends spatial clustering approach for a worldwide Twitter user geolocation |
| title_sort |
A Google trends spatial clustering approach for a worldwide Twitter user geolocation |
| author |
Zola, Paola |
| author_facet |
Zola, Paola Ragno, Costantino Cortez, Paulo |
| author_role |
author |
| author2 |
Ragno, Costantino Cortez, Paulo |
| author2_role |
author author |
| dc.contributor.none.fl_str_mv |
Universidade do Minho |
| dc.contributor.author.fl_str_mv |
Zola, Paola Ragno, Costantino Cortez, Paulo |
| dc.subject.por.fl_str_mv |
City-level geolocation Clustering Google Trends Natural language processing Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática Science & Technology |
| topic |
City-level geolocation Clustering Google Trends Natural language processing Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática Science & Technology |
| description |
User location data is valuable for diverse social media analytics. In this paper, we address the non-trivial task of estimating a worldwide city-level Twitter user location considering only historical tweets. We propose a purely unsupervised approach that is based on a synthetic geographic sampling of Google Trends (GT) city-level frequencies of tweet nouns and three clustering algorithms. The approach was validated empirically by using a recently collected dataset, with 3,268 worldwide city-level locations of Twitter users, obtaining competitive results when compared with a state-of-the-art Word Distribution (WD) user location estimation method. The best overall results were achieved by the GT noun DBSCAN (GTN-DB) method, which is computationally fast, and correctly predicts the ground truth locations of 15%, 23%, 39% and 58% of the users for tolerance distances of 250 km, 500 km, 1,000 km and 2,000 km. |
| publishDate |
2020 |
| dc.date.none.fl_str_mv |
2020 2020-01-01T00:00:00Z |
| dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
| dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
| format |
article |
| status_str |
publishedVersion |
| dc.identifier.uri.fl_str_mv |
https://hdl.handle.net/1822/66815 |
| url |
https://hdl.handle.net/1822/66815 |
| dc.language.iso.fl_str_mv |
eng |
| language |
eng |
| dc.relation.none.fl_str_mv |
0306-4573 10.1016/j.ipm.2020.102312 https://www.sciencedirect.com/science/article/pii/S0306457320308074 |
| dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
| eu_rights_str_mv |
openAccess |
| dc.format.none.fl_str_mv |
application/pdf |
| dc.publisher.none.fl_str_mv |
Elsevier |
| publisher.none.fl_str_mv |
Elsevier |
| dc.source.none.fl_str_mv |
reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia instacron:RCAAP |
| instname_str |
FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
| instacron_str |
RCAAP |
| institution |
RCAAP |
| reponame_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| collection |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| repository.name.fl_str_mv |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
| repository.mail.fl_str_mv |
info@rcaap.pt |
| _version_ |
1833595917108772864 |