Resumo: |
ASR have become increasingly common in people’s daily lives and have changed the way humans interact with devices. Voice search applications, virtual and electronic assistants, home and vehicular automation systems, games and entertainment applications, text transcribers, voice translators, people, and emotion recognition systems, among others, are some ASR applications. The advent and growth of the Internet of things and the evolution of mobile communications enhance the use of such systems, since input and output devices commonly used in computers, such as mouse and keyboard, become inadequate in the interaction of the human being with, for example, household appliances and vehicular automation systems. ASR provide natural and convenient human-machine interaction. In this sense, such systems must be reliable and accurate. However, in adverse conditions, high levels of noise, reverberation, and interfering signals from multiple sources interfere with correctly recognizing speech. The word error rate, a metric commonly used in automatic speech recognition systems, is intrinsically related to signal processing algorithms to minimize the effects caused by such factors. This dissertation aims to investigate the efficiency and limitations of deep neural networks applied to noise attenuation in noisy environments. Four objective metrics are used for performance evaluation and the results are compared with those obtained by spectral subtraction algorithms and Wiener filter. The results show that the DNN algorithm obtained the best results in all SNR scenarios when compared to other algorithms. About the LSD metric, the average result of RNP is 36% lower compared to the Wiener filter and 25% lower than the spectral subtraction algorithm. For the STOI metric, the average result of DNN is 12% higher than that obtained by the Wiener filter and 8% higher than that obtained by spectral subtraction. For the PESQ metric, DNN is 17% higher than the Wiener filter and 13% higher than the spectral subtraction. For the WER metric, DNN is 29% lower than the Wiener filter and 13% lower than the spectral subtraction. |
---|