Optimizing cleanuNet architecture for speech denoising
Ano de defesa: | 2024 |
---|---|
Autor(a) principal: | |
Orientador(a): | |
Banca de defesa: | |
Tipo de documento: | Dissertação |
Tipo de acesso: | Acesso aberto |
Idioma: | eng |
Instituição de defesa: |
Universidade Federal de Uberlândia
Brasil Programa de Pós-graduação em Ciência da Computação |
Programa de Pós-Graduação: |
Não Informado pela instituição
|
Departamento: |
Não Informado pela instituição
|
País: |
Não Informado pela instituição
|
Palavras-chave em Português: | |
Link de acesso: | https://repositorio.ufu.br/handle/123456789/44653 http://doi.org/10.14393/ufu.di.2024.5523 |
Resumo: | Speech enhancement techniques are crucial for recovering clean speech from signals degraded by noise and suboptimal acoustic conditions, such as background noise and echo. These challenges demand effective denoising methods to improve speech clarity. This work presents an optimized version of CleanUNet, a Convolutional Neural Network based on the U-Net architecture designed explicitly for causal speech-denoising tasks. Our approach introduces the Mamba architecture as a novel alternative to the traditional transformer bottleneck, enabling more efficient handling of encoder outputs with linear complexity. Additionally, we integrated batch normalization across the convolutional layers, stabilizing and accelerating the training process. We also experimented with various activation functions to identify the most effective configuration for our model. By reducing the number of hidden channels in the convolutional layers, we significantly reduced the model's parameter count, thereby enhancing training and inference speed on a single GPU with slight degradation in performance. These improvements make the model particularly suitable for real-time applications. Our best model, 52.53\% smaller than the baseline, achieves 2.745, 3.288, and 0.911 of PESQ (WB), PESQ (NB), and STOI, respectively. We also optimized the smallest model using only 1.36\% of the original parameters, and it achieved competitive results. To the best of our knowledge, this work is the first to integrate the Mamba architecture as a replacement for the vanilla transformer in CleanUNet and, in combination with architectural optimizations, offers a streamlined, computationally efficient solution for speech enhancement. |