Atenuação de ruído em sinais de voz utilizando redes neurais profundas

Detalhes bibliográficos
Ano de defesa: 2020
Autor(a) principal: Araújo, Jacques Henrique Bessa
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Não Informado pela instituição
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: http://www.repositorio.ufc.br/handle/riufc/52548
Resumo: ASR have become increasingly common in people’s daily lives and have changed the way humans interact with devices. Voice search applications, virtual and electronic assistants, home and vehicular automation systems, games and entertainment applications, text transcribers, voice translators, people, and emotion recognition systems, among others, are some ASR applications. The advent and growth of the Internet of things and the evolution of mobile communications enhance the use of such systems, since input and output devices commonly used in computers, such as mouse and keyboard, become inadequate in the interaction of the human being with, for example, household appliances and vehicular automation systems. ASR provide natural and convenient human-machine interaction. In this sense, such systems must be reliable and accurate. However, in adverse conditions, high levels of noise, reverberation, and interfering signals from multiple sources interfere with correctly recognizing speech. The word error rate, a metric commonly used in automatic speech recognition systems, is intrinsically related to signal processing algorithms to minimize the effects caused by such factors. This dissertation aims to investigate the efficiency and limitations of deep neural networks applied to noise attenuation in noisy environments. Four objective metrics are used for performance evaluation and the results are compared with those obtained by spectral subtraction algorithms and Wiener filter. The results show that the DNN algorithm obtained the best results in all SNR scenarios when compared to other algorithms. About the LSD metric, the average result of RNP is 36% lower compared to the Wiener filter and 25% lower than the spectral subtraction algorithm. For the STOI metric, the average result of DNN is 12% higher than that obtained by the Wiener filter and 8% higher than that obtained by spectral subtraction. For the PESQ metric, DNN is 17% higher than the Wiener filter and 13% higher than the spectral subtraction. For the WER metric, DNN is 29% lower than the Wiener filter and 13% lower than the spectral subtraction.