Medida da relação harmônico/ruído em vozes disfônicas pelo processamento digital de imagens espectrográficas

Detalhes bibliográficos
Ano de defesa: 2009
Autor(a) principal: Joao Pedro Hallack Sansao
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Universidade Federal de Minas Gerais
UFMG
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: http://hdl.handle.net/1843/BUOS-8CZGJ3
Resumo: This work presents the S2NR, Spectrographic Signal-to-Noise Ratio, a signal-to-noise ratio measurement obtained from the processing of vowel spectrograms by using adaptations of fingerprint image enhancement algorithms. In order to validate the S2NR method, a test bench was set to generate synthetic vowels with controlled values of fundamental frequency, amplitude, additive white noise, and cycle-to-cycle perturbations in the waveform amplitude (shimmer) and phonatory period (jitter). For comparison purposes, vowels were synthesized with known signal-to-noise ratio values. Next, the signal-to-noise ratio was measured with the S2NR algorithm and a method based on time domain periodicity analysis. In most of the synthetic voices, the S2NR exhibited a behavior more robust to jitter and shimmer perturbations than the time based algorithm, having also a reduced sensitivity to the vowel type. Both male and female fundamental frequencies were tested with /a/, /i/, and /u/ vocal tract shapes. Initially, jitter and shimmer were assessed independently, the simulated perturbation values varying from inexistent to extreme conditions in the human voice (0% to 3% for jitter, and 0% to 30% for shimmer). With jitter and Fo = 120 Hz , the measured S2NR estimates deviated from the reference values by 2.1 dB, 11.5 dB, and 2.9 dB for /a/, /i/ and /u/ respectively. With shimmer, these differences were 2.5 dB, 4.4 dB, and 3.6 dB. Subsequently both perturbations were varied simultaneously within the same ranges, no performance degradation occurring other than those observed with separated perturbations. Finally, the S2NR algorithm was tested with real, dysphonic, and predominantly breathy voices. Results showed a consistent relation between S2NR values and perceptual ratings of breathiness. Additionally, the potential application of the S2NR algorithm in running speech was explored.