Método de identificação dos padrões de uso e locais de embarque a partir do Big Data de transporte público: uma abordagem baseada em Machine Learning

Detalhes bibliográficos
Ano de defesa: 2023
Autor(a) principal: Mesquita, Kaio Gefferson de Almeida
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Não Informado pela instituição
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: http://repositorio.ufc.br/handle/riufc/74404
Resumo: In recent years automatic fare pricing (electronic ticketing) and automatic vehicle location data have been exploited to support public transportation system planning and analysis. However, there are several challenges in using this massive data, such as the identification of boarding and alighting locations in open systems in multimodal and trunk-fed networks through the system usage pattern. The objective of this work is to develop a method to identify the boarding locations of trips using machine learning from the Big Data of the Integrated Public Transportation System of Fortaleza (SIT-FOR), for habitual patterns of system use. As specific objectives, we have: (i) to consolidate a management structure for the SIT-FOR Big Data; (ii) to identify through data mining methods the usual patterns of system use; and (iii) to analyze through supervised modeling how the usual patterns can help in the identification of boarding locations. It is believed that recurring patterns, spatial or temporal patterns of system usage, allow the identification of travel attributes in the data. Thus, the Big Data data was processed and integrated into a single relational database. The usage patterns were identified from a clustering technique (K-means), allowing to assess how different attributes influence the formation of each group. From the identified patterns, different supervised models (Naive Bayes, Random Forest, Neural Network) were applied to predict the probability of a user validating when boarding the first trip of the day. As results, it was possible to identify 4 habitual usage patterns characterized by temporal and spatial aspects. Finally, among the supervised models segregated by groups, a better performance (accuracies between 0.58 and 0.67) was obtained with Random Forest. The modeling results mainly indicated that segmenting the data into habitual patterns improved the performance of the models, supporting the hypothesis that understanding different usage patterns can support the identification of travel attributes in ticketing data.