Signatures and Consequences of Distributional Reinforcement Learning
| Autor(a) principal: | |
|---|---|
| Data de Publicação: | 2024 |
| Idioma: | eng |
| Título da fonte: | Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| Texto Completo: | http://hdl.handle.net/10362/182540 |
Resumo: | "Learning to predict rewards is fundamental for adaptive behavior. Midbrain dopamine neurons (DANs) play a key role in such learning by signaling reward prediction errors (RPEs) that teach recipient circuits about expected rewards given current circumstances and actions [114]. However, the algo rithm that DANs are thought to provide a substrate for, temporal difference (TD) reinforcement learning (RL), learns the mean of temporally discounted expected future rewards, discarding useful information concerning experi enced distributions of reward amounts and delays [135]. Here we present time-magnitude RL (TMRL), a multidimensional variant of distributional re inforcement learning that learns the joint distribution of future rewards over time and magnitude using an efficient code that adapts to environmental statistics. In addition, we discovered signatures of TMRL-like computations in the activity of optogenetically identified DANs in mice during behavior. Specifically, we found significant diversity in both temporal discounting and tuning for the magnitude of rewards across DANs, features that allow the computation of a two dimensional, probabilistic map of future rewards from just 450ms of neural activity recorded from a population of DANs in response to a reward-predictive cue. Furthermore, reward time predictions derived from this population code correlated with the timing of anticipatory behav ior, suggesting that similar information is used to guide decisions regarding when to act.(...)" |
| id |
RCAP_e9c49bdc2d3ce1d0a1dfc33243276668 |
|---|---|
| oai_identifier_str |
oai:run.unl.pt:10362/182540 |
| network_acronym_str |
RCAP |
| network_name_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| repository_id_str |
https://opendoar.ac.uk/repository/7160 |
| spelling |
Signatures and Consequences of Distributional Reinforcement LearningReinforcement learningdopaminebasal gangliatimingdecision-makingDomínio/Área Científica::Ciências Médicas"Learning to predict rewards is fundamental for adaptive behavior. Midbrain dopamine neurons (DANs) play a key role in such learning by signaling reward prediction errors (RPEs) that teach recipient circuits about expected rewards given current circumstances and actions [114]. However, the algo rithm that DANs are thought to provide a substrate for, temporal difference (TD) reinforcement learning (RL), learns the mean of temporally discounted expected future rewards, discarding useful information concerning experi enced distributions of reward amounts and delays [135]. Here we present time-magnitude RL (TMRL), a multidimensional variant of distributional re inforcement learning that learns the joint distribution of future rewards over time and magnitude using an efficient code that adapts to environmental statistics. In addition, we discovered signatures of TMRL-like computations in the activity of optogenetically identified DANs in mice during behavior. Specifically, we found significant diversity in both temporal discounting and tuning for the magnitude of rewards across DANs, features that allow the computation of a two dimensional, probabilistic map of future rewards from just 450ms of neural activity recorded from a population of DANs in response to a reward-predictive cue. Furthermore, reward time predictions derived from this population code correlated with the timing of anticipatory behav ior, suggesting that similar information is used to guide decisions regarding when to act.(...)"Paton, JoeMCNamee, DanielRUNSousa, Margarida2025-04-23T14:55:33Z2024-11-292024-09-162024-11-29T00:00:00Zdoctoral thesisinfo:eu-repo/semantics/publishedVersionapplication/pdfhttp://hdl.handle.net/10362/182540TID:101809093enginfo:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2025-04-28T01:33:58Zoai:run.unl.pt:10362/182540Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-29T06:33:30.764660Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse |
| dc.title.none.fl_str_mv |
Signatures and Consequences of Distributional Reinforcement Learning |
| title |
Signatures and Consequences of Distributional Reinforcement Learning |
| spellingShingle |
Signatures and Consequences of Distributional Reinforcement Learning Sousa, Margarida Reinforcement learning dopamine basal ganglia timing decision-making Domínio/Área Científica::Ciências Médicas |
| title_short |
Signatures and Consequences of Distributional Reinforcement Learning |
| title_full |
Signatures and Consequences of Distributional Reinforcement Learning |
| title_fullStr |
Signatures and Consequences of Distributional Reinforcement Learning |
| title_full_unstemmed |
Signatures and Consequences of Distributional Reinforcement Learning |
| title_sort |
Signatures and Consequences of Distributional Reinforcement Learning |
| author |
Sousa, Margarida |
| author_facet |
Sousa, Margarida |
| author_role |
author |
| dc.contributor.none.fl_str_mv |
Paton, Joe MCNamee, Daniel RUN |
| dc.contributor.author.fl_str_mv |
Sousa, Margarida |
| dc.subject.por.fl_str_mv |
Reinforcement learning dopamine basal ganglia timing decision-making Domínio/Área Científica::Ciências Médicas |
| topic |
Reinforcement learning dopamine basal ganglia timing decision-making Domínio/Área Científica::Ciências Médicas |
| description |
"Learning to predict rewards is fundamental for adaptive behavior. Midbrain dopamine neurons (DANs) play a key role in such learning by signaling reward prediction errors (RPEs) that teach recipient circuits about expected rewards given current circumstances and actions [114]. However, the algo rithm that DANs are thought to provide a substrate for, temporal difference (TD) reinforcement learning (RL), learns the mean of temporally discounted expected future rewards, discarding useful information concerning experi enced distributions of reward amounts and delays [135]. Here we present time-magnitude RL (TMRL), a multidimensional variant of distributional re inforcement learning that learns the joint distribution of future rewards over time and magnitude using an efficient code that adapts to environmental statistics. In addition, we discovered signatures of TMRL-like computations in the activity of optogenetically identified DANs in mice during behavior. Specifically, we found significant diversity in both temporal discounting and tuning for the magnitude of rewards across DANs, features that allow the computation of a two dimensional, probabilistic map of future rewards from just 450ms of neural activity recorded from a population of DANs in response to a reward-predictive cue. Furthermore, reward time predictions derived from this population code correlated with the timing of anticipatory behav ior, suggesting that similar information is used to guide decisions regarding when to act.(...)" |
| publishDate |
2024 |
| dc.date.none.fl_str_mv |
2024-11-29 2024-09-16 2024-11-29T00:00:00Z 2025-04-23T14:55:33Z |
| dc.type.driver.fl_str_mv |
doctoral thesis |
| dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
| status_str |
publishedVersion |
| dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10362/182540 TID:101809093 |
| url |
http://hdl.handle.net/10362/182540 |
| identifier_str_mv |
TID:101809093 |
| dc.language.iso.fl_str_mv |
eng |
| language |
eng |
| dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
| eu_rights_str_mv |
openAccess |
| dc.format.none.fl_str_mv |
application/pdf |
| dc.source.none.fl_str_mv |
reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia instacron:RCAAP |
| instname_str |
FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
| instacron_str |
RCAAP |
| institution |
RCAAP |
| reponame_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| collection |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
| repository.name.fl_str_mv |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
| repository.mail.fl_str_mv |
info@rcaap.pt |
| _version_ |
1833602717777395712 |