Signatures and Consequences of Distributional Reinforcement Learning

Detalhes bibliográficos
Autor(a) principal: Sousa, Margarida
Data de Publicação: 2024
Idioma: eng
Título da fonte: Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
Texto Completo: http://hdl.handle.net/10362/182540
Resumo: "Learning to predict rewards is fundamental for adaptive behavior. Midbrain dopamine neurons (DANs) play a key role in such learning by signaling reward prediction errors (RPEs) that teach recipient circuits about expected rewards given current circumstances and actions [114]. However, the algo rithm that DANs are thought to provide a substrate for, temporal difference (TD) reinforcement learning (RL), learns the mean of temporally discounted expected future rewards, discarding useful information concerning experi enced distributions of reward amounts and delays [135]. Here we present time-magnitude RL (TMRL), a multidimensional variant of distributional re inforcement learning that learns the joint distribution of future rewards over time and magnitude using an efficient code that adapts to environmental statistics. In addition, we discovered signatures of TMRL-like computations in the activity of optogenetically identified DANs in mice during behavior. Specifically, we found significant diversity in both temporal discounting and tuning for the magnitude of rewards across DANs, features that allow the computation of a two dimensional, probabilistic map of future rewards from just 450ms of neural activity recorded from a population of DANs in response to a reward-predictive cue. Furthermore, reward time predictions derived from this population code correlated with the timing of anticipatory behav ior, suggesting that similar information is used to guide decisions regarding when to act.(...)"
id RCAP_e9c49bdc2d3ce1d0a1dfc33243276668
oai_identifier_str oai:run.unl.pt:10362/182540
network_acronym_str RCAP
network_name_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository_id_str https://opendoar.ac.uk/repository/7160
spelling Signatures and Consequences of Distributional Reinforcement LearningReinforcement learningdopaminebasal gangliatimingdecision-makingDomínio/Área Científica::Ciências Médicas"Learning to predict rewards is fundamental for adaptive behavior. Midbrain dopamine neurons (DANs) play a key role in such learning by signaling reward prediction errors (RPEs) that teach recipient circuits about expected rewards given current circumstances and actions [114]. However, the algo rithm that DANs are thought to provide a substrate for, temporal difference (TD) reinforcement learning (RL), learns the mean of temporally discounted expected future rewards, discarding useful information concerning experi enced distributions of reward amounts and delays [135]. Here we present time-magnitude RL (TMRL), a multidimensional variant of distributional re inforcement learning that learns the joint distribution of future rewards over time and magnitude using an efficient code that adapts to environmental statistics. In addition, we discovered signatures of TMRL-like computations in the activity of optogenetically identified DANs in mice during behavior. Specifically, we found significant diversity in both temporal discounting and tuning for the magnitude of rewards across DANs, features that allow the computation of a two dimensional, probabilistic map of future rewards from just 450ms of neural activity recorded from a population of DANs in response to a reward-predictive cue. Furthermore, reward time predictions derived from this population code correlated with the timing of anticipatory behav ior, suggesting that similar information is used to guide decisions regarding when to act.(...)"Paton, JoeMCNamee, DanielRUNSousa, Margarida2025-04-23T14:55:33Z2024-11-292024-09-162024-11-29T00:00:00Zdoctoral thesisinfo:eu-repo/semantics/publishedVersionapplication/pdfhttp://hdl.handle.net/10362/182540TID:101809093enginfo:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2025-04-28T01:33:58Zoai:run.unl.pt:10362/182540Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-29T06:33:30.764660Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse
dc.title.none.fl_str_mv Signatures and Consequences of Distributional Reinforcement Learning
title Signatures and Consequences of Distributional Reinforcement Learning
spellingShingle Signatures and Consequences of Distributional Reinforcement Learning
Sousa, Margarida
Reinforcement learning
dopamine
basal ganglia
timing
decision-making
Domínio/Área Científica::Ciências Médicas
title_short Signatures and Consequences of Distributional Reinforcement Learning
title_full Signatures and Consequences of Distributional Reinforcement Learning
title_fullStr Signatures and Consequences of Distributional Reinforcement Learning
title_full_unstemmed Signatures and Consequences of Distributional Reinforcement Learning
title_sort Signatures and Consequences of Distributional Reinforcement Learning
author Sousa, Margarida
author_facet Sousa, Margarida
author_role author
dc.contributor.none.fl_str_mv Paton, Joe
MCNamee, Daniel
RUN
dc.contributor.author.fl_str_mv Sousa, Margarida
dc.subject.por.fl_str_mv Reinforcement learning
dopamine
basal ganglia
timing
decision-making
Domínio/Área Científica::Ciências Médicas
topic Reinforcement learning
dopamine
basal ganglia
timing
decision-making
Domínio/Área Científica::Ciências Médicas
description "Learning to predict rewards is fundamental for adaptive behavior. Midbrain dopamine neurons (DANs) play a key role in such learning by signaling reward prediction errors (RPEs) that teach recipient circuits about expected rewards given current circumstances and actions [114]. However, the algo rithm that DANs are thought to provide a substrate for, temporal difference (TD) reinforcement learning (RL), learns the mean of temporally discounted expected future rewards, discarding useful information concerning experi enced distributions of reward amounts and delays [135]. Here we present time-magnitude RL (TMRL), a multidimensional variant of distributional re inforcement learning that learns the joint distribution of future rewards over time and magnitude using an efficient code that adapts to environmental statistics. In addition, we discovered signatures of TMRL-like computations in the activity of optogenetically identified DANs in mice during behavior. Specifically, we found significant diversity in both temporal discounting and tuning for the magnitude of rewards across DANs, features that allow the computation of a two dimensional, probabilistic map of future rewards from just 450ms of neural activity recorded from a population of DANs in response to a reward-predictive cue. Furthermore, reward time predictions derived from this population code correlated with the timing of anticipatory behav ior, suggesting that similar information is used to guide decisions regarding when to act.(...)"
publishDate 2024
dc.date.none.fl_str_mv 2024-11-29
2024-09-16
2024-11-29T00:00:00Z
2025-04-23T14:55:33Z
dc.type.driver.fl_str_mv doctoral thesis
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10362/182540
TID:101809093
url http://hdl.handle.net/10362/182540
identifier_str_mv TID:101809093
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron:RCAAP
instname_str FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron_str RCAAP
institution RCAAP
reponame_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
collection Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository.name.fl_str_mv Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
repository.mail.fl_str_mv info@rcaap.pt
_version_ 1833602717777395712