Planning in stochastic computation graphs: solving stochastic nonlinear problems with backpropagation

Bueno, Thiago Pereira

Planning in stochastic computation graphs: solving stochastic nonlinear problems with backpropagation

Detalhes bibliográficos
Ano de defesa:	2021
Autor(a) principal:	Bueno, Thiago Pereira
Orientador(a):	Não Informado pela instituição
Banca de defesa:	Não Informado pela instituição
Tipo de documento:	Tese
Tipo de acesso:	Acesso aberto
Idioma:	eng
Instituição de defesa:	Biblioteca Digitais de Teses e Dissertações da USP
Programa de Pós-Graduação:	Não Informado pela instituição
Departamento:	Não Informado pela instituição
País:	Não Informado pela instituição
Palavras-chave em Português:	Aprendizado profundo Busca de política Deep learning Deep neural nets Gradiente descendente estocástico Grafo de computação estocástica Information relaxation Markov Decision Process MDP Otimização de trajetória Planejamento probabilístico Policy search Probabilistic planning Processo de Decisão Markoviano Redes neurais profundas Relaxação de informação Stochastic computation graphs Stochastic gradient descent Trajectory optimization
Link de acesso:	https://www.teses.usp.br/teses/disponiveis/45/45134/tde-29102021-133418/
Resumo:	Deep Learning has achieved remarkable success in a range of complex perception tasks, games, and other real-world applications. At a high level, it can be argued that the main reason behind the astonishing performance of deep neural networks is the stochastic gradient descent method, which is based on the well-known error backpropagation algorithm. Inspired by the recent applications of deep learning, we propose to investigate the opportunities and challenges in adapting the backpropagation algorithm as a planning technique in continuous sequential decision-making problems. We make the key observation that if a differentiable model of the dynamics of a system can be made available, then an autonomous agent can leverage the advanced gradient-based optimizers developed in the context of learning algorithms to solve long-horizon planning problems. Besides reformulating the recently-proposed deterministic planning through backpropagation algorithm as a form of gradient-based trajectory optimization technique, we propose several extensions to the more general setting of stochastic decision processes in AI planning. In particular, we propose a framework to train Deep Reactive Policies offline for fast decision-making based on stochastic computation graphs and the re-parametrization trick. In addition, we investigate how the duality theory of information relaxation can be adapted to obtain a gradient-based online planning algorithm that interleaves optimization and execution. Empirical experiments show the effectiveness of our proposed approaches in a variety of sequential decision-making problems exhibiting nonlinear dynamics and stochastic exogenous events, such as path planning, multi-reservoir control and HVAC systems.

Planning in stochastic computation graphs: solving stochastic nonlinear problems with backpropagation

Registros relacionados