Enriquecendo a previsão de séries temporais usando informação textual
Ano de defesa: | 2021 |
---|---|
Autor(a) principal: | |
Orientador(a): | |
Banca de defesa: | |
Tipo de documento: | Dissertação |
Tipo de acesso: | Acesso aberto |
Idioma: | por |
Instituição de defesa: |
Universidade Federal de São Carlos
Câmpus São Carlos |
Programa de Pós-Graduação: |
Programa de Pós-Graduação em Ciência da Computação - PPGCC
|
Departamento: |
Não Informado pela instituição
|
País: |
Não Informado pela instituição
|
Palavras-chave em Português: | |
Palavras-chave em Inglês: | |
Área do conhecimento CNPq: | |
Link de acesso: | https://repositorio.ufscar.br/handle/ufscar/14258 |
Resumo: | The ability to extract knowledge and forecast stock trends is crucial to mitigate investors' risks and uncertainties in the market. The stock trend is affected by non-linearity, complexity, noise, and especially the surrounding events. External factors such as daily news became one of the investors' primary resources for making decisions about buying or selling assets. However, this kind of information appears very fast. There are thousands of news generated by numerous web sources, taking a long time to analyze them, which can cost millions of dollars losses for investors due to a late decision. Recent contextual language models have transformed the area of natural language processing. However, classification models that use news that influence stock values need to deal with the unlabeled, class imbalance, and dissimilar texts. Recent studies show that the prediction of time series substantially improves by considering external information. This work proposes a hybrid methodology with three phases, one for news mining, a model for representation compact features, and the forecast model of time series, which merge for a more accurate prediction of prices. Initially, a small corpus is built using as support the time series. After that, we label the corpus based on semi-supervised learning to assign labels to other unlabeled news. In the second phase, the mining model with a classifier is used, whose output is concatenated with time series features, so the compact model representation extracts new features in a latent space. Finally, we predicted future prices with this fused knowledge. In a case study with Bitcoin cryptocurrency, the proposed methodology achieved a 1.62% decrease in the mean absolute percentage error. |