End-to-end reinforcement learning for autonomous driving in urban environments

Coelho, Daniel Filipe Silveira

End-to-end reinforcement learning for autonomous driving in urban environments

Bibliographic Details
Main Author:	Coelho, Daniel Filipe Silveira
Publication Date:	2024
Language:	eng
Source:	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
Download full:	http://hdl.handle.net/10773/42846
Summary:	This thesis advances the field of end-to-end Autonomous Driving (AD) in urban environments, focusing primarily on Reinforcement Learning (RL) techniques to address existing limitations. The research begins with a comprehensive review of current end-to-end AD systems, highlighting the strengths and weaknesses of Imitation Learning (IL) and RL approaches, and identifying critical areas for improvement and promising avenues for future research. To address these challenges, we introduced RLAD, the first Reinforcement Learning from Pixels (RLfP) method for urban AD. Several techniques were introduced to enhance the performance of state-of-the-art methods. First, we developed an image encoder that utilizes both image augmentations and Adaptive Local Signal Mixing (A-LIX) layers. Additionally, we introduced WayConv1D, a waypoint encoder that captures the 2D geometrical information of waypoints using 1D convolutions. Furthermore, we designed an auxiliary loss function to emphasize the significance of traffic lights in the latent representation of the environment. RLAD demonstrated good performance on the NoCrash benchmark but required further integration of demonstrations to match state-of-the-art AD systems. Consequently, we developed RLfOLD (Reinforcement Learning from Online Demonstrations), which combines IL and RL by incorporating online demonstrations into RL training. We proposed a policy network that outputs two standard deviations, enabling adaptive control for exploration and IL training while considering uncertainty in both domains. Additionally, we incorporated an uncertainty-based technique guided by an online expert to enhance the exploration process. RLfOLD achieves state-of-the-art results on the NoCrash benchmark with enhanced efficiency and resource utilization. We further tackled the CARLA Leaderboard 2.0 benchmark’s challenges by developing PRIBOOT, an expert agent leveraging privileged information and limited human driving logs to navigate demanding scenarios. PRIBOOT’s novel techniques, such as the bird’s-eye view (BEV) representation and processing the BEV as an RGB image instead of a set of masks, significantly improved performance. This approach provided an expert capable of navigating this challenging benchmark, enabling researchers to generate extensive datasets and potentially resolving the data availability issues that have hindered progress. Collectively, our work presents significant contributions to the field of AD, offering insights and tools that pave the way for safer and more reliable autonomous systems in urban environments.

Item metadata

id	RCAP_32383b487d5ca9ae08059a1a9ac42d56
oai_identifier_str	oai:ria.ua.pt:10773/42846
network_acronym_str	RCAP
network_name_str	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository_id_str	https://opendoar.ac.uk/repository/7160
spelling	End-to-end reinforcement learning for autonomous driving in urban environmentsAutonomous drivingReinforcement learningDeep learningEnd-to-End systemsThis thesis advances the field of end-to-end Autonomous Driving (AD) in urban environments, focusing primarily on Reinforcement Learning (RL) techniques to address existing limitations. The research begins with a comprehensive review of current end-to-end AD systems, highlighting the strengths and weaknesses of Imitation Learning (IL) and RL approaches, and identifying critical areas for improvement and promising avenues for future research. To address these challenges, we introduced RLAD, the first Reinforcement Learning from Pixels (RLfP) method for urban AD. Several techniques were introduced to enhance the performance of state-of-the-art methods. First, we developed an image encoder that utilizes both image augmentations and Adaptive Local Signal Mixing (A-LIX) layers. Additionally, we introduced WayConv1D, a waypoint encoder that captures the 2D geometrical information of waypoints using 1D convolutions. Furthermore, we designed an auxiliary loss function to emphasize the significance of traffic lights in the latent representation of the environment. RLAD demonstrated good performance on the NoCrash benchmark but required further integration of demonstrations to match state-of-the-art AD systems. Consequently, we developed RLfOLD (Reinforcement Learning from Online Demonstrations), which combines IL and RL by incorporating online demonstrations into RL training. We proposed a policy network that outputs two standard deviations, enabling adaptive control for exploration and IL training while considering uncertainty in both domains. Additionally, we incorporated an uncertainty-based technique guided by an online expert to enhance the exploration process. RLfOLD achieves state-of-the-art results on the NoCrash benchmark with enhanced efficiency and resource utilization. We further tackled the CARLA Leaderboard 2.0 benchmark’s challenges by developing PRIBOOT, an expert agent leveraging privileged information and limited human driving logs to navigate demanding scenarios. PRIBOOT’s novel techniques, such as the bird’s-eye view (BEV) representation and processing the BEV as an RGB image instead of a set of masks, significantly improved performance. This approach provided an expert capable of navigating this challenging benchmark, enabling researchers to generate extensive datasets and potentially resolving the data availability issues that have hindered progress. Collectively, our work presents significant contributions to the field of AD, offering insights and tools that pave the way for safer and more reliable autonomous systems in urban environments.Esta tese avança o campo da Condução Autónoma (AD) de ponta a ponta em ambientes urbanos, focando-se principalmente em técnicas de Aprendizagem por Reforço (RL) para resolver limitações existentes. A pesquisa começa com uma revisão abrangente dos sistemas de AD de ponta a ponta atuais, destacando os pontos fortes e fracos das abordagens de Aprendizagem por Imitação (IL) e RL, e identificando áreas críticas para melhoria e caminhos promissores para futuras pesquisas. Para enfrentar estes desafios, introduzimos o RLAD, o primeiro método de Aprendizagem por Reforço a partir de Pixels (RLfP) para AD urbana. Foram introduzidas várias técnicas para melhorar o desempenho dos métodos do estado da arte. Em primeiro lugar, desenvolvemos um codificador de imagens que utiliza tanto aumentações de imagens como camadas de Mistura de Sinal Local Adaptativa (A-LIX). Adicionalmente, introduzimos o WayConv1D, um codificador de waypoints que capta a informação geométrica 2D dos waypoints utilizando convoluções 1D. Além disso, desenvolvemos uma função de custo auxiliar para enfatizar a importância dos semáforos na representação latente do ambiente. RLAD demonstrou um desempenho positivo no benchmark NoCrash, mas necessitava da integração de demonstrações para igualar os sistemas de AD do estado de arte. Consequentemente, desenvolvemos o RL- fOLD (Aprendizagem por Reforço a partir de demonstrações online), que combina IL e RL ao incorporar demonstrações online no treino de RL. Propusemos uma rede de políticas que gera dois desvios padrão, permitindo um controlo adaptativo para exploração e treino de IL enquanto considera a incerteza em ambos os domínios. Adicionalmente, incorporámos uma técnica baseada em incerteza orientada por um especialista online para melhorar o processo de exploração. O RLfOLD atinge resultados de estado de arte no benchmark NoCrash com maior eficiência e menor utilização de recursos. Abordámos ainda os desafios do benchmark CARLA Leaderboard 2.0 desenvolvendo o PRIBOOT, um agente especialista que aproveita informação privilegiada e dados limitados de condução humana para navegar em cenários exigentes. As técnicas inovadoras do PRIBOOT, como a representação de visão de pássaro (BEV) e o processamento do BEV como uma imagem RGB em vez de um conjunto de máscaras, melhoraram significativamente o desempenho. Esta abordagem forneceu um especialista capaz de navegar neste benchmark desafiador, permitindo aos investigadores gerar extensos conjuntos de dados e potencialmente resolver os problemas de disponibilidade de dados que têm dificultado o progresso. Coletivamente, o nosso trabalho apresenta contribuições significativas para o campo da AD, oferecendo insights e ferramentas que pavimentam o caminho para sistemas autónomos mais seguros e fiáveis em ambientes urbanos.2024-11-14T14:40:34Z2024-10-04T00:00:00Z2024-10-04doctoral thesisinfo:eu-repo/semantics/publishedVersionapplication/pdfhttp://hdl.handle.net/10773/42846engCoelho, Daniel Filipe Silveirainfo:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2024-11-18T01:48:37Zoai:ria.ua.pt:10773/42846Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-28T19:14:24.249914Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse
dc.title.none.fl_str_mv	End-to-end reinforcement learning for autonomous driving in urban environments
title	End-to-end reinforcement learning for autonomous driving in urban environments
spellingShingle	End-to-end reinforcement learning for autonomous driving in urban environments Coelho, Daniel Filipe Silveira Autonomous driving Reinforcement learning Deep learning End-to-End systems
title_short	End-to-end reinforcement learning for autonomous driving in urban environments
title_full	End-to-end reinforcement learning for autonomous driving in urban environments
title_fullStr	End-to-end reinforcement learning for autonomous driving in urban environments
title_full_unstemmed	End-to-end reinforcement learning for autonomous driving in urban environments
title_sort	End-to-end reinforcement learning for autonomous driving in urban environments
author	Coelho, Daniel Filipe Silveira
author_facet	Coelho, Daniel Filipe Silveira
author_role	author
dc.contributor.author.fl_str_mv	Coelho, Daniel Filipe Silveira
dc.subject.por.fl_str_mv	Autonomous driving Reinforcement learning Deep learning End-to-End systems
topic	Autonomous driving Reinforcement learning Deep learning End-to-End systems
description	This thesis advances the field of end-to-end Autonomous Driving (AD) in urban environments, focusing primarily on Reinforcement Learning (RL) techniques to address existing limitations. The research begins with a comprehensive review of current end-to-end AD systems, highlighting the strengths and weaknesses of Imitation Learning (IL) and RL approaches, and identifying critical areas for improvement and promising avenues for future research. To address these challenges, we introduced RLAD, the first Reinforcement Learning from Pixels (RLfP) method for urban AD. Several techniques were introduced to enhance the performance of state-of-the-art methods. First, we developed an image encoder that utilizes both image augmentations and Adaptive Local Signal Mixing (A-LIX) layers. Additionally, we introduced WayConv1D, a waypoint encoder that captures the 2D geometrical information of waypoints using 1D convolutions. Furthermore, we designed an auxiliary loss function to emphasize the significance of traffic lights in the latent representation of the environment. RLAD demonstrated good performance on the NoCrash benchmark but required further integration of demonstrations to match state-of-the-art AD systems. Consequently, we developed RLfOLD (Reinforcement Learning from Online Demonstrations), which combines IL and RL by incorporating online demonstrations into RL training. We proposed a policy network that outputs two standard deviations, enabling adaptive control for exploration and IL training while considering uncertainty in both domains. Additionally, we incorporated an uncertainty-based technique guided by an online expert to enhance the exploration process. RLfOLD achieves state-of-the-art results on the NoCrash benchmark with enhanced efficiency and resource utilization. We further tackled the CARLA Leaderboard 2.0 benchmark’s challenges by developing PRIBOOT, an expert agent leveraging privileged information and limited human driving logs to navigate demanding scenarios. PRIBOOT’s novel techniques, such as the bird’s-eye view (BEV) representation and processing the BEV as an RGB image instead of a set of masks, significantly improved performance. This approach provided an expert capable of navigating this challenging benchmark, enabling researchers to generate extensive datasets and potentially resolving the data availability issues that have hindered progress. Collectively, our work presents significant contributions to the field of AD, offering insights and tools that pave the way for safer and more reliable autonomous systems in urban environments.
publishDate	2024
dc.date.none.fl_str_mv	2024-11-14T14:40:34Z 2024-10-04T00:00:00Z 2024-10-04
dc.type.driver.fl_str_mv	doctoral thesis
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://hdl.handle.net/10773/42846
url	http://hdl.handle.net/10773/42846
dc.language.iso.fl_str_mv	eng
language	eng
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.source.none.fl_str_mv	reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia instacron:RCAAP
instname_str	FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron_str	RCAAP
institution	RCAAP
reponame_str	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
collection	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository.name.fl_str_mv	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
repository.mail.fl_str_mv	info@rcaap.pt
_version_	1833597965233553408

End-to-end reinforcement learning for autonomous driving in urban environments

Similar Items