Algoritmos da Família LMS para a Solução Aproximada da HJB em Projetos Online de Controle Ótimo Discreto Multivariável e Aprendizado por Reforço.

SILVA, Márcio Eduardo Gonçalves

Algoritmos da Família LMS para a Solução Aproximada da HJB em Projetos Online de Controle Ótimo Discreto Multivariável e Aprendizado por Reforço.

Detalhes bibliográficos
Ano de defesa:	2014
Autor(a) principal:	SILVA, Márcio Eduardo Gonçalves
Orientador(a):	FONSECA NETO, João Viana da
Banca de defesa:	FONSECA NETO, João Viana da, SOUZA, Francisco das Chagas de, PINTO, Vandilberto Pereira, SANTANA, Ewaldo Eder Carvalho
Tipo de documento:	Dissertação
Tipo de acesso:	Acesso aberto
Idioma:	por
Instituição de defesa:	Universidade Federal do Maranhão
Programa de Pós-Graduação:	PROGRAMA DE PÓS-GRADUAÇÃO EM ENGENHARIA DE ELETRICIDADE/CCET
Departamento:	DEPARTAMENTO DE ENGENHARIA DA ELETRICIDADE/CCET
País:	Brasil
Palavras-chave em Português:	Programação Dinâmica; Aprendizagem por Reforço; Regulador Linear Quadrático; Crítico Adaptativo
Palavras-chave em Inglês:	Dynamic Programming; Reinforcement Learning; Linear Quadratic Regulator; Adaptive critic
Área do conhecimento CNPq:	Sistemas de Computação
Link de acesso:	http://tedebc.ufma.br:8080/jspui/handle/tede/1891
Resumo:	The technique of linear control based on the minimization of a quadratic performance index using the second method of Lyapunov to guarantee the stability of the system, if this is controllable and observable. however, this technique is inevitably necessary to find the solution of the HJB or Riccati equation. The control system design online need, real time, to adjust your feedback gain to maintain a certain dynamic, it requires the calculation of the Riccati equation solution in each sampling generating a large computational load that can derail its implementation. This work shows an intelligent control system design that meets the optimal or suboptimal control action from the sensory data of process states and the instantaneous cost observed after each state transition. To find this optimal control action or policy, the approximate dynamic programming and adaptive critics are used, based on the parameterizations given by the problem of linear quadratic regulator (LQR), but without explicitly solving the associated Riccati equation. More specifically, the LQR problem is solved by four different methods which are the Dynamic Programming Heuristic, the Dual Heuristic Dynamic Programming, Action Dependent Dynamic Programming Heuristic and Action Dependent Dual Heuristic Dynamic Programming algorithms. However, these algorithms depend on knowledge of the value functions to derive the optimal control actions. These value functions with known structures have their parameters estimated using the least mean square family and Recursive Least Squares algorithms. Two processes that have the Markov property were used in the computational validation of the algorithms adaptive critics implemented, one corresponds to the longitudinal dynamics of an aircraft and the other to an electrical circuit.

Algoritmos da Família LMS para a Solução Aproximada da HJB em Projetos Online de Controle Ótimo Discreto Multivariável e Aprendizado por Reforço.

Registros relacionados