Detalhes bibliográficos
Ano de defesa: |
2023 |
Autor(a) principal: |
Pereira Neto, Eduardo Lopes |
Orientador(a): |
Não Informado pela instituição |
Banca de defesa: |
Não Informado pela instituição |
Tipo de documento: |
Dissertação
|
Tipo de acesso: |
Acesso aberto |
Idioma: |
eng |
Instituição de defesa: |
Biblioteca Digitais de Teses e Dissertações da USP
|
Programa de Pós-Graduação: |
Não Informado pela instituição
|
Departamento: |
Não Informado pela instituição
|
País: |
Não Informado pela instituição
|
Palavras-chave em Português: |
|
Link de acesso: |
https://www.teses.usp.br/teses/disponiveis/45/45134/tde-06122023-173644/
|
Resumo: |
Reinforcement Learning has proven to be highly successful in addressing sequential decision problems in complex environments, with a focus on maximizing the expected accumulated reward. Although Reinforcement Learning has shown its value, real-world scenarios often involve inherent risks that go beyond expected outcomes where, sometimes, in the same situation different agents could consider taking different levels of risk. In such cases, Risk-Sensitive Reinforcement Learning emerges as a solution, incorporating risk criteria into the decision-making process. Among these criteria, exponential-based methods have been extensively studied and applied. However, the response of exponential criteria when integrated with learning parameters and approximations, particularly in combination with Deep Reinforcement Learning, remains relatively unexplored. This lack of knowledge can directly impact the practical applicability of these methods in real-world scenarios. In this dissertation, we present a comprehensive framework that facilitates the comparison of exponential risk criteria, such as Exponential Expected Utility, Exponential Temporal Difference Transformation, and Soft Indicator Temporal Difference Transformation with Reinforcement Learning algorithms such as Q-Learning and Deep Q-Learning. We formally demonstrate that Exponential Expected Utility and Exponential Temporal Difference Transformation converge to the same value. We also perform experiments to explore the relationship of each exponential risk criterion with the learning rate parameter, the risk factor, and sampling algorithms. The results reveal that Exponential Expected Utility exhibits superior stability. Additionally, this dissertation empirically analyzes numerical overflow issues. A truncation technique to handle this issue is analyzed. Furthermore, we propose the application of the LogSumExp technique to mitigate this problem in algorithms that use Exponential Expected Utility. |