Observadores de estado orientados por dados: sintonia, programação dinâmica aproximada e aprendizagem por reforço

SILVA, Fábio Nogueira da

Observadores de estado orientados por dados: sintonia, programação dinâmica aproximada e aprendizagem por reforço

Detalhes bibliográficos
Ano de defesa:	2021
Autor(a) principal:	SILVA, Fábio Nogueira da
Orientador(a):	FONSECA NETO, João Viana da
Banca de defesa:	FONSECA NETO, João Viana da , SERRA, Ginalber Luiz de Oliveira , SOUZA, Francisco das Chagas de , BARRA JUNIOR, Walter , SILVEIRA, Antônio da Silva
Tipo de documento:	Tese
Tipo de acesso:	Acesso aberto
Idioma:	por
Instituição de defesa:	Universidade Federal do Maranhão
Programa de Pós-Graduação:	PROGRAMA DE PÓS-GRADUAÇÃO EM ENGENHARIA DE ELETRICIDADE/CCET
Departamento:	DEPARTAMENTO DE ENGENHARIA DA ELETRICIDADE/CCET
País:	Brasil
Palavras-chave em Português:	observadores de estado; programação dinâmica aproximada; aprendizagem por reforço; controle ótimo; realimentação de saída; sistemas dinâmicos; sintonia;
Palavras-chave em Inglês:	state observers; approximate dynamic programming; reinforcement learning; reinforcement Learning; optimal control; output feedback; dynamical systems;
Área do conhecimento CNPq:	Ciência da Computação
Link de acesso:	https://tedebc.ufma.br/jspui/handle/tede/3695
Resumo:	Formulations for state observers for dynamical systems, based on the fundamentals of approxi mate dynamic programming (ADP), optimal control and reinforcement learning are proposed, developed, applied and analyzed in this Thesis. Algorithm proposals, metrics for performance evaluation, robustness, convergence and solvability analysis are also presented. Studies on para metric sensitivity of the algorithms, with respect to noise signals, initial conditions of parameters and initial states of the dynamic system are presented. The rationale for the proposed observers is based on approximate dynamic programming, with approximation of the valued function performed by a reinforcement learning algorithm (RL), using the temporal differences errors, aiming at the coupling of observers for online applications, being able to also be implemented offline. The observer’s formulation is based on the discrete optimal control problem, associated with the discrete linear quadratic regulator (DLQR) with output feedback, requiring only the measured input and output signals. For state estimation with ADP-based structure, the availability of two matrices is necessary, and a formulation is proposed that results in a system of nonlinear algebraic equations for matrix recovery. To solve this problem, a feedforward multi layer neural network is initially applied, but due to its high computational complexity throughout the iterative process, such a solution was found to be unfeasible. An alternative based on an approxima tion is proposed, not being necessary to solve the system of equations and thus reducing the computational complexity. To evaluate the performance of the algorithms, error metrics are proposed, since the algorithms have several tunable parameters. To facilitate the tuning and analysis process, error surfaces are constructed with parameter variations, in order to observe the parametric sensitivities in the algorithm in relation to the error metrics and to evaluate the solvability and convergence, facilitating the observer tuning process. The application of the proposed methodologies has advantages such as the lack of modeling or dynamical system identification, the incorporation of dynamic changes through the use of approaches based on reinforcement learning, in addition to helping in the tuning and analysis process. Keywords:

Observadores de estado orientados por dados: sintonia, programação dinâmica aproximada e aprendizagem por reforço

Registros relacionados