Detalhes bibliográficos
Ano de defesa: |
2023 |
Autor(a) principal: |
Mesquita, Kaio Gefferson de Almeida |
Orientador(a): |
Não Informado pela instituição |
Banca de defesa: |
Não Informado pela instituição |
Tipo de documento: |
Dissertação
|
Tipo de acesso: |
Acesso aberto |
Idioma: |
por |
Instituição de defesa: |
Não Informado pela instituição
|
Programa de Pós-Graduação: |
Não Informado pela instituição
|
Departamento: |
Não Informado pela instituição
|
País: |
Não Informado pela instituição
|
Palavras-chave em Português: |
|
Link de acesso: |
http://repositorio.ufc.br/handle/riufc/74404
|
Resumo: |
In recent years automatic fare pricing (electronic ticketing) and automatic vehicle location data have been exploited to support public transportation system planning and analysis. However, there are several challenges in using this massive data, such as the identification of boarding and alighting locations in open systems in multimodal and trunk-fed networks through the system usage pattern. The objective of this work is to develop a method to identify the boarding locations of trips using machine learning from the Big Data of the Integrated Public Transportation System of Fortaleza (SIT-FOR), for habitual patterns of system use. As specific objectives, we have: (i) to consolidate a management structure for the SIT-FOR Big Data; (ii) to identify through data mining methods the usual patterns of system use; and (iii) to analyze through supervised modeling how the usual patterns can help in the identification of boarding locations. It is believed that recurring patterns, spatial or temporal patterns of system usage, allow the identification of travel attributes in the data. Thus, the Big Data data was processed and integrated into a single relational database. The usage patterns were identified from a clustering technique (K-means), allowing to assess how different attributes influence the formation of each group. From the identified patterns, different supervised models (Naive Bayes, Random Forest, Neural Network) were applied to predict the probability of a user validating when boarding the first trip of the day. As results, it was possible to identify 4 habitual usage patterns characterized by temporal and spatial aspects. Finally, among the supervised models segregated by groups, a better performance (accuracies between 0.58 and 0.67) was obtained with Random Forest. The modeling results mainly indicated that segmenting the data into habitual patterns improved the performance of the models, supporting the hypothesis that understanding different usage patterns can support the identification of travel attributes in ticketing data. |