Classificação da qualidade do sinal de voz em comunicação VoIP utilizando Deep Learning

Detalhes bibliográficos
Ano de defesa: 2019
Autor(a) principal: Costa, Lucas Hilário da
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Universidade Federal de Lavras
Programa de Pós-graduação em Engenharia de Sistemas e Automação
UFLA
brasil
Departamento de Engenharia
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: http://repositorio.ufla.br/jspui/handle/1/36412
Resumo: Voice over IP (VoIP) is currently one of the most widely used communication services; however, its quality is related to several external factors that cause various types of voice signal degradation. In communication channels, packet loss significantly affects the voice signal, causing lower communication quality, directly affecting the user’s quality of experience (QoE). The objective of this work was the implementation and development of two Deep Learning (DL) network models that are able to classify the quality of the voice signal transmitted in a VoIP communication, mainly affected by packet loss. The proposed models were developed using a Deep Neural Network (DNN) model, where through the analysis of the voice signal affected by Packet Loss Rate (PLR) of the degraded signals, it was possible to classify them into four distinct classes according to the user experience. To perform the tests two databases were prepared, each containing four distinct classes, one of which was prepared with the ITU-T P.862 recommendation database files, with different packet loss rates, and the another base was prepared with the ITU-T P.501 recommendation files according to the mean opinion score (MOS) index of each degraded file. To obtain the databases, a program was implemented in MATLAB that degrades original voice files by changing the packet loss rate values. After processing, the files were grouped into four classes according to the packet loss rate applied to each original voice signal. For the database prepared by the MOS index the degraded files were processed by the ITU-T P.862 recommendation algorithm in order to determine the MOS by comparing the degraded voice signal with the original signal of each audio file and then grouped into four classes according to the MOS obtained. To validate the models two additional databases were prepared containing VoxCeleb database audio files divided into four classes with 250 files each, being grouped by PLR rate and MOS. The results obtained from the model using the database prepared by the packet loss rate was 94% accuracy in the validation and the model results for the database prepared by the MOS was 91% accuracy. The model achieved an accuracy of 86.96% for the additional database prepared according to packet loss rate and 83.29% accuracy for the additional database prepared according to MOS. To determine the efficiency of the developed model, its results were compared with the results obtained by the ITU-T recommendations P.563 and P.862 algorithms, where an average of 53.21% accuracy was obtained when comparing the results. MOS definition of the ITU-T P.563 recommendation algorithm with that defined by the ITU-T P.862 recommendation algorithm. From the obtained results it can be concluded that the generated models were able to classify the packet loss rate and the MOS index in a non intrusive way and with an excellent accuracy rate. It can be highlighted that when comparing the non-intrusive methods, the results obtained from the proposed model for the MOS index which was 91% accuracy was better compared to the results from the ITU-T P.563 recommendation algorithm that obtained an accuracy rate of 53.21% compared to the intrusive algorithm results from the ITU-T P.862 recommendation. Thus, the generated model is able to determine the MOS of the degraded voice files more efficiently than the ITU-T P.563 recommendation algorithm. Consequently, an important contribution of this work is the presentation of a non-intrusive evaluation model capable of identifying the real-time voice signal quality.