Detalhes bibliográficos
Ano de defesa: |
2013 |
Autor(a) principal: |
Koga, Marcelo Li |
Orientador(a): |
Não Informado pela instituição |
Banca de defesa: |
Não Informado pela instituição |
Tipo de documento: |
Dissertação
|
Tipo de acesso: |
Acesso aberto |
Idioma: |
eng |
Instituição de defesa: |
Biblioteca Digitais de Teses e Dissertações da USP
|
Programa de Pós-Graduação: |
Não Informado pela instituição
|
Departamento: |
Não Informado pela instituição
|
País: |
Não Informado pela instituição
|
Palavras-chave em Português: |
|
Link de acesso: |
http://www.teses.usp.br/teses/disponiveis/3/3141/tde-04112014-103827/
|
Resumo: |
When designing intelligent agents that must solve sequential decision problems, often we do not have enough knowledge to build a complete model for the problems at hand. Reinforcement learning enables an agent to learn behavior by acquiring experience through trial-and-error interactions with the environment. However, knowledge is usually built from scratch and learning the optimal policy may take a long time. In this work, we improve the learning performance by exploring transfer learning; that is, the knowledge acquired in previous source tasks is used to accelerate learning in new target tasks. If the tasks present similarities, then the transferred knowledge guides the agent towards faster learning. We explore the use of a relational representation that allows description of relationships among objects. This representation simplifies the use of abstraction and the extraction of the similarities among tasks, enabling the generalization of solutions that can be used across different, but related, tasks. This work presents two model-free algorithms for online learning of abstract policies: AbsSarsa(λ) and AbsProb-RL. The former builds a deterministic abstract policy from value functions, while the latter builds a stochastic abstract policy through direct search on the space of policies. We also propose the S2L-RL agent architecture, containing two levels of learning: an abstract level and a ground level. The agent simultaneously builds a ground policy and an abstract policy; not only the abstract policy can accelerate learning on the current task, but also it can guide the agent in a future task. Experiments in a robotic navigation environment show that these techniques are effective in improving the agents learning performance, especially during the early stages of the learning process, when the agent is completely unaware of the new task. |