Um estudo sobre limitações de técnicas de mascaramento espectral na separação cega de sinais de voz reverberados
Ano de defesa: | 2008 |
---|---|
Autor(a) principal: | |
Orientador(a): | |
Banca de defesa: | |
Tipo de documento: | Tese |
Tipo de acesso: | Acesso aberto |
Idioma: | por |
Instituição de defesa: |
Universidade Federal de Minas Gerais
UFMG |
Programa de Pós-Graduação: |
Não Informado pela instituição
|
Departamento: |
Não Informado pela instituição
|
País: |
Não Informado pela instituição
|
Palavras-chave em Português: | |
Link de acesso: | http://hdl.handle.net/1843/BUOS-8CVHTG |
Resumo: | The objective of this study is to analyze the limitations of techniques for blind source separation (BSS) of convolved mixtures based on time-frequency domain binary masking. These techniques are based on the sparsity of speech signals and assume that, in a mixture of independent speech signals, for each time and frequency, it is unlikely to find more than one active source. In this study, initially, the performance of the DUET algorithm is analyzed for convolved mixtures. The use of phase information for source separation is evaluated through the analysis of performance of an algorithm proposed which uses only relative amplitude information for the estimation of spectral masks. It was verified that, for the specific case of the separation of two speech signals, the use of phase information does not affect the algorithm performance. In the next step, the limitations of the spectrum masking technique is analyzed as a function of the reverberation time of the signals that compose the mixture to be separated. When the ideal masks are known, a separation of about 9 dB is obtained for the case of an environment with reverberation time less than 300 ms. From this point on, as the reverberation time increases, the signals that compose the mixture spread over the time-frequency plane, progressively reducing the separation process performance. Next, given that finding ideal masks based on a single mixture is still an open problem, the performance loss of the separation process is measured as a function of the distance between the ideal mask and the mask effectively used. The results found show a performance loss of 3 dB when approximately 10% of the bits of the ideal mask are inverted. Finally, preliminary analyses are carried out to find the ideal mask based on negentropy, kurtosis and on the energy of the separated signals. |