End-to-end reinforcement learning for autonomous driving in urban environments
Main Author: | |
---|---|
Publication Date: | 2024 |
Language: | eng |
Source: | Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
Download full: | http://hdl.handle.net/10773/42846 |
Summary: | This thesis advances the field of end-to-end Autonomous Driving (AD) in urban environments, focusing primarily on Reinforcement Learning (RL) techniques to address existing limitations. The research begins with a comprehensive review of current end-to-end AD systems, highlighting the strengths and weaknesses of Imitation Learning (IL) and RL approaches, and identifying critical areas for improvement and promising avenues for future research. To address these challenges, we introduced RLAD, the first Reinforcement Learning from Pixels (RLfP) method for urban AD. Several techniques were introduced to enhance the performance of state-of-the-art methods. First, we developed an image encoder that utilizes both image augmentations and Adaptive Local Signal Mixing (A-LIX) layers. Additionally, we introduced WayConv1D, a waypoint encoder that captures the 2D geometrical information of waypoints using 1D convolutions. Furthermore, we designed an auxiliary loss function to emphasize the significance of traffic lights in the latent representation of the environment. RLAD demonstrated good performance on the NoCrash benchmark but required further integration of demonstrations to match state-of-the-art AD systems. Consequently, we developed RLfOLD (Reinforcement Learning from Online Demonstrations), which combines IL and RL by incorporating online demonstrations into RL training. We proposed a policy network that outputs two standard deviations, enabling adaptive control for exploration and IL training while considering uncertainty in both domains. Additionally, we incorporated an uncertainty-based technique guided by an online expert to enhance the exploration process. RLfOLD achieves state-of-the-art results on the NoCrash benchmark with enhanced efficiency and resource utilization. We further tackled the CARLA Leaderboard 2.0 benchmark’s challenges by developing PRIBOOT, an expert agent leveraging privileged information and limited human driving logs to navigate demanding scenarios. PRIBOOT’s novel techniques, such as the bird’s-eye view (BEV) representation and processing the BEV as an RGB image instead of a set of masks, significantly improved performance. This approach provided an expert capable of navigating this challenging benchmark, enabling researchers to generate extensive datasets and potentially resolving the data availability issues that have hindered progress. Collectively, our work presents significant contributions to the field of AD, offering insights and tools that pave the way for safer and more reliable autonomous systems in urban environments. |
id |
RCAP_32383b487d5ca9ae08059a1a9ac42d56 |
---|---|
oai_identifier_str |
oai:ria.ua.pt:10773/42846 |
network_acronym_str |
RCAP |
network_name_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
repository_id_str |
https://opendoar.ac.uk/repository/7160 |
spelling |
End-to-end reinforcement learning for autonomous driving in urban environmentsAutonomous drivingReinforcement learningDeep learningEnd-to-End systemsThis thesis advances the field of end-to-end Autonomous Driving (AD) in urban environments, focusing primarily on Reinforcement Learning (RL) techniques to address existing limitations. The research begins with a comprehensive review of current end-to-end AD systems, highlighting the strengths and weaknesses of Imitation Learning (IL) and RL approaches, and identifying critical areas for improvement and promising avenues for future research. To address these challenges, we introduced RLAD, the first Reinforcement Learning from Pixels (RLfP) method for urban AD. Several techniques were introduced to enhance the performance of state-of-the-art methods. First, we developed an image encoder that utilizes both image augmentations and Adaptive Local Signal Mixing (A-LIX) layers. Additionally, we introduced WayConv1D, a waypoint encoder that captures the 2D geometrical information of waypoints using 1D convolutions. Furthermore, we designed an auxiliary loss function to emphasize the significance of traffic lights in the latent representation of the environment. RLAD demonstrated good performance on the NoCrash benchmark but required further integration of demonstrations to match state-of-the-art AD systems. Consequently, we developed RLfOLD (Reinforcement Learning from Online Demonstrations), which combines IL and RL by incorporating online demonstrations into RL training. We proposed a policy network that outputs two standard deviations, enabling adaptive control for exploration and IL training while considering uncertainty in both domains. Additionally, we incorporated an uncertainty-based technique guided by an online expert to enhance the exploration process. RLfOLD achieves state-of-the-art results on the NoCrash benchmark with enhanced efficiency and resource utilization. We further tackled the CARLA Leaderboard 2.0 benchmark’s challenges by developing PRIBOOT, an expert agent leveraging privileged information and limited human driving logs to navigate demanding scenarios. PRIBOOT’s novel techniques, such as the bird’s-eye view (BEV) representation and processing the BEV as an RGB image instead of a set of masks, significantly improved performance. This approach provided an expert capable of navigating this challenging benchmark, enabling researchers to generate extensive datasets and potentially resolving the data availability issues that have hindered progress. Collectively, our work presents significant contributions to the field of AD, offering insights and tools that pave the way for safer and more reliable autonomous systems in urban environments.Esta tese avança o campo da Condução Autónoma (AD) de ponta a ponta em ambientes urbanos, focando-se principalmente em técnicas de Aprendizagem por Reforço (RL) para resolver limitações existentes. A pesquisa começa com uma revisão abrangente dos sistemas de AD de ponta a ponta atuais, destacando os pontos fortes e fracos das abordagens de Aprendizagem por Imitação (IL) e RL, e identificando áreas críticas para melhoria e caminhos promissores para futuras pesquisas. Para enfrentar estes desafios, introduzimos o RLAD, o primeiro método de Aprendizagem por Reforço a partir de Pixels (RLfP) para AD urbana. Foram introduzidas várias técnicas para melhorar o desempenho dos métodos do estado da arte. Em primeiro lugar, desenvolvemos um codificador de imagens que utiliza tanto aumentações de imagens como camadas de Mistura de Sinal Local Adaptativa (A-LIX). Adicionalmente, introduzimos o WayConv1D, um codificador de waypoints que capta a informação geométrica 2D dos waypoints utilizando convoluções 1D. Além disso, desenvolvemos uma função de custo auxiliar para enfatizar a importância dos semáforos na representação latente do ambiente. RLAD demonstrou um desempenho positivo no benchmark NoCrash, mas necessitava da integração de demonstrações para igualar os sistemas de AD do estado de arte. Consequentemente, desenvolvemos o RL- fOLD (Aprendizagem por Reforço a partir de demonstrações online), que combina IL e RL ao incorporar demonstrações online no treino de RL. Propusemos uma rede de políticas que gera dois desvios padrão, permitindo um controlo adaptativo para exploração e treino de IL enquanto considera a incerteza em ambos os domínios. Adicionalmente, incorporámos uma técnica baseada em incerteza orientada por um especialista online para melhorar o processo de exploração. O RLfOLD atinge resultados de estado de arte no benchmark NoCrash com maior eficiência e menor utilização de recursos. Abordámos ainda os desafios do benchmark CARLA Leaderboard 2.0 desenvolvendo o PRIBOOT, um agente especialista que aproveita informação privilegiada e dados limitados de condução humana para navegar em cenários exigentes. As técnicas inovadoras do PRIBOOT, como a representação de visão de pássaro (BEV) e o processamento do BEV como uma imagem RGB em vez de um conjunto de máscaras, melhoraram significativamente o desempenho. Esta abordagem forneceu um especialista capaz de navegar neste benchmark desafiador, permitindo aos investigadores gerar extensos conjuntos de dados e potencialmente resolver os problemas de disponibilidade de dados que têm dificultado o progresso. Coletivamente, o nosso trabalho apresenta contribuições significativas para o campo da AD, oferecendo insights e ferramentas que pavimentam o caminho para sistemas autónomos mais seguros e fiáveis em ambientes urbanos.2024-11-14T14:40:34Z2024-10-04T00:00:00Z2024-10-04doctoral thesisinfo:eu-repo/semantics/publishedVersionapplication/pdfhttp://hdl.handle.net/10773/42846engCoelho, Daniel Filipe Silveirainfo:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2024-11-18T01:48:37Zoai:ria.ua.pt:10773/42846Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-28T19:14:24.249914Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse |
dc.title.none.fl_str_mv |
End-to-end reinforcement learning for autonomous driving in urban environments |
title |
End-to-end reinforcement learning for autonomous driving in urban environments |
spellingShingle |
End-to-end reinforcement learning for autonomous driving in urban environments Coelho, Daniel Filipe Silveira Autonomous driving Reinforcement learning Deep learning End-to-End systems |
title_short |
End-to-end reinforcement learning for autonomous driving in urban environments |
title_full |
End-to-end reinforcement learning for autonomous driving in urban environments |
title_fullStr |
End-to-end reinforcement learning for autonomous driving in urban environments |
title_full_unstemmed |
End-to-end reinforcement learning for autonomous driving in urban environments |
title_sort |
End-to-end reinforcement learning for autonomous driving in urban environments |
author |
Coelho, Daniel Filipe Silveira |
author_facet |
Coelho, Daniel Filipe Silveira |
author_role |
author |
dc.contributor.author.fl_str_mv |
Coelho, Daniel Filipe Silveira |
dc.subject.por.fl_str_mv |
Autonomous driving Reinforcement learning Deep learning End-to-End systems |
topic |
Autonomous driving Reinforcement learning Deep learning End-to-End systems |
description |
This thesis advances the field of end-to-end Autonomous Driving (AD) in urban environments, focusing primarily on Reinforcement Learning (RL) techniques to address existing limitations. The research begins with a comprehensive review of current end-to-end AD systems, highlighting the strengths and weaknesses of Imitation Learning (IL) and RL approaches, and identifying critical areas for improvement and promising avenues for future research. To address these challenges, we introduced RLAD, the first Reinforcement Learning from Pixels (RLfP) method for urban AD. Several techniques were introduced to enhance the performance of state-of-the-art methods. First, we developed an image encoder that utilizes both image augmentations and Adaptive Local Signal Mixing (A-LIX) layers. Additionally, we introduced WayConv1D, a waypoint encoder that captures the 2D geometrical information of waypoints using 1D convolutions. Furthermore, we designed an auxiliary loss function to emphasize the significance of traffic lights in the latent representation of the environment. RLAD demonstrated good performance on the NoCrash benchmark but required further integration of demonstrations to match state-of-the-art AD systems. Consequently, we developed RLfOLD (Reinforcement Learning from Online Demonstrations), which combines IL and RL by incorporating online demonstrations into RL training. We proposed a policy network that outputs two standard deviations, enabling adaptive control for exploration and IL training while considering uncertainty in both domains. Additionally, we incorporated an uncertainty-based technique guided by an online expert to enhance the exploration process. RLfOLD achieves state-of-the-art results on the NoCrash benchmark with enhanced efficiency and resource utilization. We further tackled the CARLA Leaderboard 2.0 benchmark’s challenges by developing PRIBOOT, an expert agent leveraging privileged information and limited human driving logs to navigate demanding scenarios. PRIBOOT’s novel techniques, such as the bird’s-eye view (BEV) representation and processing the BEV as an RGB image instead of a set of masks, significantly improved performance. This approach provided an expert capable of navigating this challenging benchmark, enabling researchers to generate extensive datasets and potentially resolving the data availability issues that have hindered progress. Collectively, our work presents significant contributions to the field of AD, offering insights and tools that pave the way for safer and more reliable autonomous systems in urban environments. |
publishDate |
2024 |
dc.date.none.fl_str_mv |
2024-11-14T14:40:34Z 2024-10-04T00:00:00Z 2024-10-04 |
dc.type.driver.fl_str_mv |
doctoral thesis |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10773/42846 |
url |
http://hdl.handle.net/10773/42846 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia instacron:RCAAP |
instname_str |
FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
collection |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
repository.name.fl_str_mv |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
repository.mail.fl_str_mv |
info@rcaap.pt |
_version_ |
1833597965233553408 |