Redes neurais profundas com fusão de características na verificação biométrica pela voz

Detalhes bibliográficos
Ano de defesa: 2022
Autor(a) principal: Virgilli, Rafaello lattes
Orientador(a): Soares, Anderson da Silva lattes
Banca de defesa: Galvão Filho, Arlindo Rodrigues, Soares, Anderson da Silva, Cândido Júnior, Arnaldo
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Universidade Federal de Goiás
Programa de Pós-Graduação: Programa de Pós-graduação em Ciência da Computação (INF)
Departamento: Instituto de Informática - INF (RG)
País: Brasil
Palavras-chave em Português:
Palavras-chave em Inglês:
Área do conhecimento CNPq:
Link de acesso: http://repositorio.bc.ufg.br/tede/handle/tede/12071
Resumo: The voice spoken by a person has a considerable variability which is due both to factors related to the speaker himself, such as accent, emotional state, and voice transition over age, as well as on factors external to the speaker, such as background noise, reverberation, capture equipment, and the digitalization process. Therefore, there are many challenges present in the task of biometric verification by voice. The use of neural networks to tackle this problem brought a big leap in performance when compared to previous techniques, and the main input format used is the spectrogram. For voices, the spectrogram can emphasize different characteristics depending on the generation parameters. The purpose of this work is to explore feature fusion in biometric verification by voice, particularly with by using dual spectrograms as input to the model. This approach is justified by the existence of works that also use it in other tasks related to voice and speech, such as keyword spotting, detection of voiced excerpts and musical classification. From the results, it was possible to validate the hypothesis that the use of dual spectrograms allows a performance gain in existing models, implying that certain types of spectrogram carry complementary information. The Equal Error Rate obtained was 1.61 for the model trained with dual spectrograms, which is 26% less than the EER rate of 2.22 obtained by the reference work [Chung et al. 2020]. Furthermore, the model proposed in this work has better performance for any decision threshold when compared to the reference work, either to minimize false positives or false negatives.