Efeito do ruído de Prova no Controle Ótimo LQR via Q-Learning baseado em filtragem adaptativa

YÁNEZ, Williams Jesús López

Efeito do ruído de Prova no Controle Ótimo LQR via Q-Learning baseado em filtragem adaptativa

Detalhes bibliográficos
Ano de defesa:	2022
Autor(a) principal:	YÁNEZ, Williams Jesús López
Orientador(a):	SOUZA, Francisco das Chagas de
Banca de defesa:	SOUZA, Francisco das Chagas de , FONSECA NETO, João Viana da , SERRA, Ginalber Luiz de Oliveira , RÊGO, Patrícia Helena Moraes , CORTES, Omar Andres Carmona
Tipo de documento:	Tese
Tipo de acesso:	Acesso aberto
Idioma:	por
Instituição de defesa:	Universidade Federal do Maranhão
Programa de Pós-Graduação:	PROGRAMA DE PÓS-GRADUAÇÃO EM ENGENHARIA DE ELETRICIDADE/CCET
Departamento:	DEPARTAMENTO DE ENGENHARIA DA ELETRICIDADE/CCET
País:	Brasil
Palavras-chave em Português:	aprendizagem por reforço; Controle ótimo discreto LQR; persistência de excitação; Q-learning; ruído de prova.
Palavras-chave em Inglês:	reinforcement learning; Optimal discrete LQR control; persistence of excitation; Q-learning; probe noise.
Área do conhecimento CNPq:	Ciência da Computação
Link de acesso:	https://tedebc.ufma.br/jspui/handle/tede/3845
Resumo:	Q-learning is a reinforcement learning (RL) method, model-free, that is used to solve the optimal control problem based on learning the action value function (or function Q). The usual way to learn the action value function is to solve a Bellman equation. In this thesis, to solve the Bellman equation in the LQR optimal control problem, an adaptive filtering algorithm based on the normalized least-mean-square (NLMS) algorithm is used instead of the recursive least-squares (RLS). A general requirement for achieving convergence in adaptive filtering algorithms is the excitation persistence condition. The persistence of excitation is a condition imposed so that the matrix formed by the regressor vectors has all columns linearly independent. In the context of optimal control via Q-learning, persistence of excitation is obtained by adding a probe noise to the control action. The probe noise affects real system states and may affect the performance of the adaptive filter in solving the Bellman equation. In this work, a study is carried out on the effect of probe noise based on the covariance matrices of the states and control inputs of the system, where a closed formula and convergence properties of such matrices are obtained. Furthermore, it is verified through numerical experiments that the NLMS algorithm presents superior performance when compared to the RLS algorithm, in cases where the probe noise has small variance. The use of the NLMS algorithm in our approach has two advantages: first, the NLMS algorithm presents lower computational complexity when compared to the RLS algorithm; the second, to obtain the persistence of the excitation condition, one can use probe noises with low variance, which is desirable in real-world applications.

Efeito do ruído de Prova no Controle Ótimo LQR via Q-Learning baseado em filtragem adaptativa

Registros relacionados