Um estudo sobre limitações de técnicas de mascaramento espectral na separação cega de sinais de voz reverberados

Detalhes bibliográficos
Ano de defesa: 2008
Autor(a) principal: Gustavo Fernandes Rodrigues
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Tese
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Universidade Federal de Minas Gerais
UFMG
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: http://hdl.handle.net/1843/BUOS-8CVHTG
Resumo: The objective of this study is to analyze the limitations of techniques for blind source separation (BSS) of convolved mixtures based on time-frequency domain binary masking. These techniques are based on the sparsity of speech signals and assume that, in a mixture of independent speech signals, for each time and frequency, it is unlikely to find more than one active source. In this study, initially, the performance of the DUET algorithm is analyzed for convolved mixtures. The use of phase information for source separation is evaluated through the analysis of performance of an algorithm proposed which uses only relative amplitude information for the estimation of spectral masks. It was verified that, for the specific case of the separation of two speech signals, the use of phase information does not affect the algorithm performance. In the next step, the limitations of the spectrum masking technique is analyzed as a function of the reverberation time of the signals that compose the mixture to be separated. When the ideal masks are known, a separation of about 9 dB is obtained for the case of an environment with reverberation time less than 300 ms. From this point on, as the reverberation time increases, the signals that compose the mixture spread over the time-frequency plane, progressively reducing the separation process performance. Next, given that finding ideal masks based on a single mixture is still an open problem, the performance loss of the separation process is measured as a function of the distance between the ideal mask and the mask effectively used. The results found show a performance loss of 3 dB when approximately 10% of the bits of the ideal mask are inverted. Finally, preliminary analyses are carried out to find the ideal mask based on negentropy, kurtosis and on the energy of the separated signals.