Estudo da influência de diversas medidas de similaridade na previsão de séries temporais utilizando o algoritmo KNN-TSP

Detalhes bibliográficos
Ano de defesa: 2012
Autor(a) principal: Aikes Junior, Jorge lattes
Orientador(a): Lee, Huei Diana lattes
Banca de defesa: Lotero, Roberto Cayetano lattes, Batista, Gustavo Enrique Almeida Prado Alves lattes
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Universidade Estadual do Oeste do Parana
Foz do Iguaçu
Programa de Pós-Graduação: Programa de Pós-Graduação em Engenharia de Sistemas Dinâmicos e Energéticos
Departamento: Centro de Engenharias e Ciências Exatas
País: BR
Palavras-chave em Português:
Palavras-chave em Inglês:
Área do conhecimento CNPq:
Link de acesso: http://tede.unioeste.br:8080/tede/handle/tede/1084
Resumo: Time series can be understood as any set of observations which are time ordered. Among the many possible tasks appliable to temporal data, one that has attracted increasing interest, due to its various applications, is the time series forecasting. The k-Nearest Neighbor - Time Series Prediction (kNN-TSP) algorithm is a non-parametric method for forecasting time series. One of its advantages, is its easiness application when compared to parametric methods. Even though its easier to define kNN-TSP s parameters, some issues remain opened. This research is focused on the study of one of these parameters: the similarity measure. This parameter was empirically evaluated using various similarity measures in a large set of time series, including artificial series with seasonal and chaotic characteristics, and several real world time series. It was also carried out a case study comparing the predictive accuracy of the kNN-TSP algorithm with the Moving Average (MA), univariate Seasonal Auto-Regressive Integrated Moving Average (SARIMA) and multivariate SARIMA methods in a time series of a Korean s hospital daily patients flow in the Emergency Department. This work also proposes an approach to the development of a hybrid similarity measure which combines characteristics from several measures. The research s result demonstrated that the Lp Norm s measures have an advantage over other measures evaluated, due to its lower computational cost and for providing, in general, greater accuracy in temporal data forecasting using the kNN-TSP algorithm. Although the literature in general adopts the Euclidean similarity measure to calculate de similarity between time series, the Manhattan s distance can be considered an interesting candidate for defining similarity, due to the absence of statistical significant difference and to its lower computational cost when compared to the Euclidian measure. The measure proposed in this work does not show significant results, but it is promising for further research. Regarding the case study, the kNN-TSP algorithm with only the similarity measure parameter optimized achieves a considerably lower error than the MA s best configuration, and a slightly greater error than the univariate e multivariate SARIMA s optimal settings presenting less than one percent of difference.