Detalhes bibliográficos
Ano de defesa: |
2024 |
Autor(a) principal: |
Melo, Wilken Charles Dantas de |
Orientador(a): |
Não Informado pela instituição |
Banca de defesa: |
Não Informado pela instituição |
Tipo de documento: |
Dissertação
|
Tipo de acesso: |
Acesso aberto |
Idioma: |
por |
Instituição de defesa: |
Não Informado pela instituição
|
Programa de Pós-Graduação: |
Não Informado pela instituição
|
Departamento: |
Não Informado pela instituição
|
País: |
Não Informado pela instituição
|
Palavras-chave em Português: |
|
Link de acesso: |
http://repositorio.ufc.br/handle/riufc/76829
|
Resumo: |
The proliferation of electronic devices capable of providing geospatial information, such as cell phones, automobiles, personal devices, among others, has led to an unprecedented increase in trajectory data generation. This data is crucial for various machine learning domains, especially mobility analysis. The present work focuses on an inherent problem in this domain, the evaluation of trajectory similarity. Recent research seeks to transform trajectories into embeddings, compact vector representations that can efficiently capture the characteristics of paths. The fundamental idea is that similar trajectories have close embeddings in their vector space. Although there are several deep learning methods that generate trajectory embeddings, this work focuses on those that first discretize trajectories, using a uniform grid, and then generate the embeddings. In this context, the t2vec approach was considered as a reference. A parallel field is Natural Language Processing (NLP), which involves converting vast textual corpora into numerical vectors that can capture a variety of semantic contexts. Advances in more robust language models, such as BERT and GPT, demonstrate their remarkable ability to mimic human reasoning: this suggests a potential alternative for capturing spatiotemporal mobility patterns, a relatively unexplored research direction. The present work investigates whether language models can be repurposed to produce high-quality embeddings for trajectories. In particular, it is argued that adequately discretized trajectories can be treated as words or sentences, thus allowing language models to identify patterns and relationships in this data. In the experimental evaluation, two public trajectory datasets (Porto and T-drive) were considered. Then, the performance of four well-established language models (Word2Vec, Doc2Vec, BERT, and SBERT) was compared with t2vec. Additionally, classical similarity methods were also considered to provide a more comprehensive comparison. The results indicate that language models, when trained on datasets with dense trajectories, can generate higher-quality embeddings than t2vec, thus highlighting the strong potential of these approaches. |