Learning generalized policies for Markov decision processes with imprecise probabilities

Detalhes bibliográficos
Ano de defesa: 2024
Autor(a) principal: Moukarzel, André Ferrari
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: eng
Instituição de defesa: Biblioteca Digitais de Teses e Dissertações da USP
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: https://www.teses.usp.br/teses/disponiveis/45/45134/tde-16122024-154940/
Resumo: ASNet is a neural network architecture used in probabilistic planning that exploits the relational structure between actions and propositions of a given domain to learn generalized policies. By using imitation learning over the action choices of a teacher (e.g. a state-of-art planner), ASNets are able to learn a policy that can solve large problems using a training set of small problems. Motivated by that, this work investigates the application of ASNets to solve probabilistic planning with imprecise probabilities modeled as Stochastic Shortest Path problems (SSP-IPs), for which the out-of-shelf planners can only solve small instances. We also show that training ASNets with relaxed SSP-IP problems, based on state-set transitions problems (SSP-STs) where solutions are less costly, can still lead to learning of good generalized policies. To define the optimal configuration of ASNets to learn generalized policies in environments with imprecise probability transitions, we present an extensive empirical analysis with training sets of different sizes and variations of hyper-parameters. The results show that, while state-of-art MDP-IP solutions were able to solve problems with up to 80 state-variables (i.e. 2 80 states) in less than 1000 seconds, the ASNet-based solution with policy trained on small MDP-IP domain instances were able to solve problems with more than 260 statevariables (i.e. 2 260 states) in less then 1 second (ASNet inference time) using a single generalized policy learned with only 6480 seconds of training