Detalhes bibliográficos
Ano de defesa: |
2022 |
Autor(a) principal: |
Virgilli, Rafaello
 |
Orientador(a): |
Soares, Anderson da Silva
 |
Banca de defesa: |
Galvão Filho, Arlindo Rodrigues,
Soares, Anderson da Silva,
Cândido Júnior, Arnaldo |
Tipo de documento: |
Dissertação
|
Tipo de acesso: |
Acesso aberto |
Idioma: |
por |
Instituição de defesa: |
Universidade Federal de Goiás
|
Programa de Pós-Graduação: |
Programa de Pós-graduação em Ciência da Computação (INF)
|
Departamento: |
Instituto de Informática - INF (RG)
|
País: |
Brasil
|
Palavras-chave em Português: |
|
Palavras-chave em Inglês: |
|
Área do conhecimento CNPq: |
|
Link de acesso: |
http://repositorio.bc.ufg.br/tede/handle/tede/12071
|
Resumo: |
The voice spoken by a person has a considerable variability which is due both to factors related to the speaker himself, such as accent, emotional state, and voice transition over age, as well as on factors external to the speaker, such as background noise, reverberation, capture equipment, and the digitalization process. Therefore, there are many challenges present in the task of biometric verification by voice. The use of neural networks to tackle this problem brought a big leap in performance when compared to previous techniques, and the main input format used is the spectrogram. For voices, the spectrogram can emphasize different characteristics depending on the generation parameters. The purpose of this work is to explore feature fusion in biometric verification by voice, particularly with by using dual spectrograms as input to the model. This approach is justified by the existence of works that also use it in other tasks related to voice and speech, such as keyword spotting, detection of voiced excerpts and musical classification. From the results, it was possible to validate the hypothesis that the use of dual spectrograms allows a performance gain in existing models, implying that certain types of spectrogram carry complementary information. The Equal Error Rate obtained was 1.61 for the model trained with dual spectrograms, which is 26% less than the EER rate of 2.22 obtained by the reference work [Chung et al. 2020]. Furthermore, the model proposed in this work has better performance for any decision threshold when compared to the reference work, either to minimize false positives or false negatives. |