Reconhecimento Automático de Fonemas via RNA Profunda

Detalhes bibliográficos
Ano de defesa: 2020
Autor(a) principal: CARVALHO, Mateus Barros Frota de lattes
Orientador(a): ALMEIDA NETO, Areolino de lattes
Banca de defesa: ALMEIDA NETO, Areolino de lattes, OLIVEIRA, Alexandre César Muniz de lattes, SILVA, Rogério Moreira Lima lattes
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Universidade Federal do Maranhão
Programa de Pós-Graduação: PROGRAMA DE PÓS-GRADUAÇÃO EM CIÊNCIA DA COMPUTAÇÃO/CCET
Departamento: DEPARTAMENTO DE INFORMÁTICA/CCET
País: Brasil
Palavras-chave em Português:
Palavras-chave em Inglês:
Área do conhecimento CNPq:
Link de acesso: https://tedebc.ufma.br/jspui/handle/tede/3355
Resumo: This work presents a phoneme recognition model using object detection techniques. The Single Shot Detection detector was used in conjunction with the MobileNet convolutional network architecture. The databases used in model training were TIMIT and LibriSpeech, both have spoken audios in English. To generate a graphical representation using the audiobases, for each audio, its spectrogram was calculated on the Mel scale and to train the algorithm of phoneme location detection, the temporal position of the occurrence of each phoneme in its respective was noted for its spectrogram. Additionally, it was necessary to increase the training data set, in order to provide improvement in the generalization of the model and for that, the two databases were joined and data augmentation techniques were applied to audios. The results of this work were close to the results obtained in other state of the art works. This research used two models with different architectures: the MobileNet-Large architecture, which obtained an accuracy of 0.72 mAP@0.5IOU and an error rate per phoneme of 19.47 % and the MobileNet-Small architecture, which obtained an accuracy of 0.63 mAP@0.5IOU and error rate per phoneme equal to 31.02 %.