Detalhes bibliográficos
Ano de defesa: |
2022 |
Autor(a) principal: |
YÁNEZ, Williams Jesús López
 |
Orientador(a): |
SOUZA, Francisco das Chagas de
 |
Banca de defesa: |
SOUZA, Francisco das Chagas de
,
FONSECA NETO, João Viana da
,
SERRA, Ginalber Luiz de Oliveira
,
RÊGO, Patrícia Helena Moraes
,
CORTES, Omar Andres Carmona
 |
Tipo de documento: |
Tese
|
Tipo de acesso: |
Acesso aberto |
Idioma: |
por |
Instituição de defesa: |
Universidade Federal do Maranhão
|
Programa de Pós-Graduação: |
PROGRAMA DE PÓS-GRADUAÇÃO EM ENGENHARIA DE ELETRICIDADE/CCET
|
Departamento: |
DEPARTAMENTO DE ENGENHARIA DA ELETRICIDADE/CCET
|
País: |
Brasil
|
Palavras-chave em Português: |
|
Palavras-chave em Inglês: |
|
Área do conhecimento CNPq: |
|
Link de acesso: |
https://tedebc.ufma.br/jspui/handle/tede/3845
|
Resumo: |
Q-learning is a reinforcement learning (RL) method, model-free, that is used to solve the optimal control problem based on learning the action value function (or function Q). The usual way to learn the action value function is to solve a Bellman equation. In this thesis, to solve the Bellman equation in the LQR optimal control problem, an adaptive filtering algorithm based on the normalized least-mean-square (NLMS) algorithm is used instead of the recursive least-squares (RLS). A general requirement for achieving convergence in adaptive filtering algorithms is the excitation persistence condition. The persistence of excitation is a condition imposed so that the matrix formed by the regressor vectors has all columns linearly independent. In the context of optimal control via Q-learning, persistence of excitation is obtained by adding a probe noise to the control action. The probe noise affects real system states and may affect the performance of the adaptive filter in solving the Bellman equation. In this work, a study is carried out on the effect of probe noise based on the covariance matrices of the states and control inputs of the system, where a closed formula and convergence properties of such matrices are obtained. Furthermore, it is verified through numerical experiments that the NLMS algorithm presents superior performance when compared to the RLS algorithm, in cases where the probe noise has small variance. The use of the NLMS algorithm in our approach has two advantages: first, the NLMS algorithm presents lower computational complexity when compared to the RLS algorithm; the second, to obtain the persistence of the excitation condition, one can use probe noises with low variance, which is desirable in real-world applications. |