Learning generalized policies for Markov decision processes with imprecise probabilities
Main Author: | |
---|---|
Publication Date: | 2024 |
Format: | Master thesis |
Language: | eng |
Source: | Biblioteca Digital de Teses e Dissertações da USP |
Download full: | https://www.teses.usp.br/teses/disponiveis/45/45134/tde-16122024-154940/ |
Summary: | ASNet is a neural network architecture used in probabilistic planning that exploits the relational structure between actions and propositions of a given domain to learn generalized policies. By using imitation learning over the action choices of a teacher (e.g. a state-of-art planner), ASNets are able to learn a policy that can solve large problems using a training set of small problems. Motivated by that, this work investigates the application of ASNets to solve probabilistic planning with imprecise probabilities modeled as Stochastic Shortest Path problems (SSP-IPs), for which the out-of-shelf planners can only solve small instances. We also show that training ASNets with relaxed SSP-IP problems, based on state-set transitions problems (SSP-STs) where solutions are less costly, can still lead to learning of good generalized policies. To define the optimal configuration of ASNets to learn generalized policies in environments with imprecise probability transitions, we present an extensive empirical analysis with training sets of different sizes and variations of hyper-parameters. The results show that, while state-of-art MDP-IP solutions were able to solve problems with up to 80 state-variables (i.e. 2 80 states) in less than 1000 seconds, the ASNet-based solution with policy trained on small MDP-IP domain instances were able to solve problems with more than 260 statevariables (i.e. 2 260 states) in less then 1 second (ASNet inference time) using a single generalized policy learned with only 6480 seconds of training |
id |
USP_031736cfcc2beab1a37ea37c6f6f2790 |
---|---|
oai_identifier_str |
oai:teses.usp.br:tde-16122024-154940 |
network_acronym_str |
USP |
network_name_str |
Biblioteca Digital de Teses e Dissertações da USP |
repository_id_str |
2721 |
spelling |
Learning generalized policies for Markov decision processes with imprecise probabilitiesAprendendo políticas generalizadas para processos de decisão de Markov com probabilidades imprecisasGeneralized planningImprecise probabilitiesNeural networkPlanejamento generalizadoPlanejamento probabilísticoProbabilidades imprecisasProbabilistic planningRede neuralASNet is a neural network architecture used in probabilistic planning that exploits the relational structure between actions and propositions of a given domain to learn generalized policies. By using imitation learning over the action choices of a teacher (e.g. a state-of-art planner), ASNets are able to learn a policy that can solve large problems using a training set of small problems. Motivated by that, this work investigates the application of ASNets to solve probabilistic planning with imprecise probabilities modeled as Stochastic Shortest Path problems (SSP-IPs), for which the out-of-shelf planners can only solve small instances. We also show that training ASNets with relaxed SSP-IP problems, based on state-set transitions problems (SSP-STs) where solutions are less costly, can still lead to learning of good generalized policies. To define the optimal configuration of ASNets to learn generalized policies in environments with imprecise probability transitions, we present an extensive empirical analysis with training sets of different sizes and variations of hyper-parameters. The results show that, while state-of-art MDP-IP solutions were able to solve problems with up to 80 state-variables (i.e. 2 80 states) in less than 1000 seconds, the ASNet-based solution with policy trained on small MDP-IP domain instances were able to solve problems with more than 260 statevariables (i.e. 2 260 states) in less then 1 second (ASNet inference time) using a single generalized policy learned with only 6480 seconds of trainingASNet é uma arquitetura de rede neural utilizada no planejamento probabilístico SSP-MDPs ou SSPs, em forma curta), que explora a estrutura relacional entre ações e proposições de um domínio para aprender políticas generalizadas. Ao utilizar aprendizado por imitação sobre as escolhas de ação de um professor (por exemplo, um planejador estado-da-arte), ASNet é capaz de aprender uma política que pode resolver grandes problemas usando um conjunto de treinamento de problemas pequenos. Motivado por isso, este trabalho investiga a aplicação de ASNet para resolver planejamento probabilístico com probabilidades imprecisas modeladas como problema do Caminho Estocástico Mais Curto (SSP-IPs), para os quais os planejadores de prateleira só conseguem resolver instâncias pequenas. Também mostramos que treinar ASNet com problemas SSP-IP relaxados, baseados em problemas com transições definidas por conjunto de estados (SSP-STs), onde as soluções são menos custosas, ainda pode levar à aprendizagem de boas políticas generalizadas. Para definir a configuração ótima da ASNet para aprender políticas generalizadas em ambientes com transições de probabilidades imprecisas, apresentamos uma análise empírica extensiva com conjuntos de treinamento de diferentes tamanhos e variações de hiperparâmetros em três domínios de planejamento. Os resultados mostram que, enquanto as soluções MDP-IP de última geração foram capazes de resolver problemas de Triangle Tireworld com até 80 variáveis de estado (ou seja, 2 80 estados) em menos de 1000 segundos, a solução baseada em ASNet com políticas treinadas em pequenas instâncias de MDP-IP foi capaz de resolver problemas com mais de 260 variáveis de estado (ou seja, 2 260 estados) em menos de 1 segundo (tempo de inferência de ASNet) utilizando uma única política generalizada aprendida com apenas 6480 segundos de treinamento.Biblioteca Digitais de Teses e Dissertações da USPBarros, Leliane Nunes deMoukarzel, André Ferrari2024-11-06info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttps://www.teses.usp.br/teses/disponiveis/45/45134/tde-16122024-154940/reponame:Biblioteca Digital de Teses e Dissertações da USPinstname:Universidade de São Paulo (USP)instacron:USPLiberar o conteúdo para acesso público.info:eu-repo/semantics/openAccesseng2024-12-23T15:55:02Zoai:teses.usp.br:tde-16122024-154940Biblioteca Digital de Teses e Dissertaçõeshttp://www.teses.usp.br/PUBhttp://www.teses.usp.br/cgi-bin/mtd2br.plvirginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.bropendoar:27212024-12-23T15:55:02Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)false |
dc.title.none.fl_str_mv |
Learning generalized policies for Markov decision processes with imprecise probabilities Aprendendo políticas generalizadas para processos de decisão de Markov com probabilidades imprecisas |
title |
Learning generalized policies for Markov decision processes with imprecise probabilities |
spellingShingle |
Learning generalized policies for Markov decision processes with imprecise probabilities Moukarzel, André Ferrari Generalized planning Imprecise probabilities Neural network Planejamento generalizado Planejamento probabilístico Probabilidades imprecisas Probabilistic planning Rede neural |
title_short |
Learning generalized policies for Markov decision processes with imprecise probabilities |
title_full |
Learning generalized policies for Markov decision processes with imprecise probabilities |
title_fullStr |
Learning generalized policies for Markov decision processes with imprecise probabilities |
title_full_unstemmed |
Learning generalized policies for Markov decision processes with imprecise probabilities |
title_sort |
Learning generalized policies for Markov decision processes with imprecise probabilities |
author |
Moukarzel, André Ferrari |
author_facet |
Moukarzel, André Ferrari |
author_role |
author |
dc.contributor.none.fl_str_mv |
Barros, Leliane Nunes de |
dc.contributor.author.fl_str_mv |
Moukarzel, André Ferrari |
dc.subject.por.fl_str_mv |
Generalized planning Imprecise probabilities Neural network Planejamento generalizado Planejamento probabilístico Probabilidades imprecisas Probabilistic planning Rede neural |
topic |
Generalized planning Imprecise probabilities Neural network Planejamento generalizado Planejamento probabilístico Probabilidades imprecisas Probabilistic planning Rede neural |
description |
ASNet is a neural network architecture used in probabilistic planning that exploits the relational structure between actions and propositions of a given domain to learn generalized policies. By using imitation learning over the action choices of a teacher (e.g. a state-of-art planner), ASNets are able to learn a policy that can solve large problems using a training set of small problems. Motivated by that, this work investigates the application of ASNets to solve probabilistic planning with imprecise probabilities modeled as Stochastic Shortest Path problems (SSP-IPs), for which the out-of-shelf planners can only solve small instances. We also show that training ASNets with relaxed SSP-IP problems, based on state-set transitions problems (SSP-STs) where solutions are less costly, can still lead to learning of good generalized policies. To define the optimal configuration of ASNets to learn generalized policies in environments with imprecise probability transitions, we present an extensive empirical analysis with training sets of different sizes and variations of hyper-parameters. The results show that, while state-of-art MDP-IP solutions were able to solve problems with up to 80 state-variables (i.e. 2 80 states) in less than 1000 seconds, the ASNet-based solution with policy trained on small MDP-IP domain instances were able to solve problems with more than 260 statevariables (i.e. 2 260 states) in less then 1 second (ASNet inference time) using a single generalized policy learned with only 6480 seconds of training |
publishDate |
2024 |
dc.date.none.fl_str_mv |
2024-11-06 |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
format |
masterThesis |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
https://www.teses.usp.br/teses/disponiveis/45/45134/tde-16122024-154940/ |
url |
https://www.teses.usp.br/teses/disponiveis/45/45134/tde-16122024-154940/ |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
|
dc.rights.driver.fl_str_mv |
Liberar o conteúdo para acesso público. info:eu-repo/semantics/openAccess |
rights_invalid_str_mv |
Liberar o conteúdo para acesso público. |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.coverage.none.fl_str_mv |
|
dc.publisher.none.fl_str_mv |
Biblioteca Digitais de Teses e Dissertações da USP |
publisher.none.fl_str_mv |
Biblioteca Digitais de Teses e Dissertações da USP |
dc.source.none.fl_str_mv |
reponame:Biblioteca Digital de Teses e Dissertações da USP instname:Universidade de São Paulo (USP) instacron:USP |
instname_str |
Universidade de São Paulo (USP) |
instacron_str |
USP |
institution |
USP |
reponame_str |
Biblioteca Digital de Teses e Dissertações da USP |
collection |
Biblioteca Digital de Teses e Dissertações da USP |
repository.name.fl_str_mv |
Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP) |
repository.mail.fl_str_mv |
virginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.br |
_version_ |
1831147752677965824 |