Detalhes bibliográficos
Ano de defesa: |
2023 |
Autor(a) principal: |
CHEVTCHENKO, Sergio Fernandovitch |
Orientador(a): |
LUDERMIR, Teresa Bernarda |
Banca de defesa: |
Não Informado pela instituição |
Tipo de documento: |
Tese
|
Tipo de acesso: |
Acesso aberto |
Idioma: |
eng |
Instituição de defesa: |
Universidade Federal de Pernambuco
|
Programa de Pós-Graduação: |
Programa de Pos Graduacao em Ciencia da Computacao
|
Departamento: |
Não Informado pela instituição
|
País: |
Brasil
|
Palavras-chave em Português: |
|
Link de acesso: |
https://repositorio.ufpe.br/handle/123456789/54351
|
Resumo: |
Artificial intelligence systems have made impressive progress in recent years, but they still lag behind simple biological brains in terms of control capabilities and power con- sumption. Spiking neural networks (SNNs) seek to emulate the energy efficiency, learning speed, and temporal processing of biological brains. However, in the context of reinforce- ment learning (RL), SNNs still fall short of traditional neural networks. The primary aim of this work is to bridge the performance gap between spiking models and powerful deep RL (DRL) algorithms on specific tasks. To this end, we have proposed new architectures that have been compared, both in terms of learning speed and final accuracy, to DRL algorithms and classical tabular RL approaches. This thesis consists of three stages. The initial stage presents a simple spiking model that addresses the scalability limitations of related models in terms of the state space. The model is evaluated on two classical RL problems: grid-world and acrobot. The results suggest that the proposed spiking model is comparable to both tabular and DRL algorithms, while maintaining an advantage in terms of complexity over the DRL algorithm. In the second stage, we further explore the proposed model by combining it with a binary feature extraction network. A binary con- volutional neural network (CNN) is pre-trained on a set of naturalistic RGB images and a separate set of images is used as observations on a modified grid-world task. We present improvements in architecture and dynamics to address this more challenging task with image observations. As before, the model is experimentally compared to state-of-the-art DRL algorithms. Additionally, we provide supplementary experiments to present a more detailed view of the connectivity and plasticity between different layers of the network. The third stage of this thesis presents a novel neuromorphic architecture for solving RL problems with real-valued observations. The proposed model incorporates feature extrac- tion layers, with the addition of temporal difference (TD)-error modulation and eligibility traces, building upon prior work. An ablation study confirms the significant impact of these components on the proposed model’s performance. Our model consistently outper- forms the tabular approach and successfully discovers stable control policies in mountain car, cart-pole and acrobot environments. Although the proposed model does not outper- form PPO in terms of optimal performance, it offers an appealing trade-off in terms of computational and hardware implementation requirements: the model does not require an external memory buffer nor global error gradient computation, and synaptic updates occur online, driven by local learning rules and a broadcast TD-error signal. We conclude by highlighting the limitations of our approach and suggest promising directions for future research. |