ERG-ARCH : a reinforcement learning architecture for propositionally constrained multi-agent state spaces

Detalhes bibliográficos
Ano de defesa: 2014
Autor(a) principal: Anderson Viçoso de Araújo
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Tese
Tipo de acesso: Acesso aberto
Idioma: eng
Instituição de defesa: Instituto Tecnológico de Aeronáutica
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: http://www.bd.bibl.ita.br/tde_busca/arquivo.php?codArquivo=3096
Resumo: The main goal of this work is to present an approach that ?nds an appropriate set of sequential actions for a group of cooperative agents interacting over a constrained environment. This search is considered a complex task for autonomous agents and is not possible to use default reinforcement learning algorithms to learn the adequate policy. In this thesis, a technique that deals with propositionally constrained state spaces and makes use of a Reinforcement Learning algorithm based on Markov Decision Process is proposed. A new model is also presented which formally de?nes this restricted search space. By so doing, this work aims at reducing the overall exploratory need, thus improving the performance of the learning algorithm. To constrain the state space the concept of extended reachability goals is employed. Through them it is possible to de?ne an objective to be preserved during the iteration with the environment and another that de?nes a goal state. In this cooperative environment, the information about the propositions is shared among the agents during its interaction. An architecture to solve problems in such environments is also presented. Experiments to validate the proposed algorithm were performed on different test cases and showed interesting results. A performance evaluation against standard Reinforcement Learning techniques showed that by extending autonomous learning with propositional constraints updated along the learning process can produce faster convergence to adequate policies. The best results achieved present an important reduction over execution time (34,32%) and number of iterations (67.94%). This occurs due to the early state space reduction caused by shared information on state space constraints.