Implementação de um sistema de reconhecimento automático de voz utilizando as técnicas MFCC e Quantização Vetorial com atributos dinâmicos, de normalização e detecção de voz ativa

Machado, Mateus Lichfett

Implementação de um sistema de reconhecimento automático de voz utilizando as técnicas MFCC e Quantização Vetorial com atributos dinâmicos, de normalização e detecção de voz ativa

Detalhes bibliográficos
Ano de defesa:	2016
Autor(a) principal:	Machado, Mateus Lichfett
Orientador(a):	Não Informado pela instituição
Banca de defesa:	Não Informado pela instituição
Tipo de documento:	Dissertação
Tipo de acesso:	Acesso aberto
Idioma:	por
Instituição de defesa:	Universidade Federal de Uberlândia Brasil Programa de Pós-graduação em Engenharia Mecânica
Programa de Pós-Graduação:	Não Informado pela instituição
Departamento:	Não Informado pela instituição
País:	Não Informado pela instituição
Palavras-chave em Português:	Reconhecimento Automático de Voz Mel Frequency Cepstral Coefficients Quantização Vetorial Automatic Speech Recognition Vector Quantization Engenharia mecânica Reconhecimento automático da voz Voz - Codificação Sistemas de reconhecimento de padrões CNPQ::ENGENHARIAS::ENGENHARIA MECANICA
Link de acesso:	https://repositorio.ufu.br/handle/123456789/20710 http://dx.doi.org/10.14393/ufu.di.2018.82
Resumo:	The present research investigates and elaborates an automatic and robust voice recognition based system using Mel Frequency Cepstral Coefficients (MFCC) as a technique for extracting the acoustic properties of speech signals and Vector Quantization (VQ) for classification and pattern recognition. Combined to these techniques it was added dynamic tools, normalization techniques and active voice detection in order to improve the system. Two dynamic coefficients were tested: Delta-Delta Coefficients (DDC) and Shifted Delta-Coefficients (SDC); as well as three different normalization techniques: Cepstral Mean and Variance Normalization (CMVN), Windowed Cepstral Mean and Variance Normalization (WCMVN), and Short-Time Gaussianization (STG); and also the Voice Activity Detection (VAD) tool, which was implemented according to the algorithm developed by Qiang He, combining the Short-Time Energy (STE) and Zero Crossing Rate (ZCR) methodologies. The research examines the ability of the designed system to operate according to a plurality of tasks: recognition of words or commands; speaker identification; and the combination of the two first tasks. In addition, the research investigates the best configuration of the system among the tested techniques for performing the tasks mentioned, analyzing its efficiency. Five experiments were conducted in a noise controlled environment, with the participation of eight persons. Four of them had their voices trained to create databases, and the others participated only in the test phase together with the ones that had trained the system. It was captured 144 speech samples for the experiments, 24 of them were used for building the database and the 120 others used during the test phase. To ensure the integrity of the experiments, the training and the testing samples were mirrored to be processed according to the configuration of each experiment. The use of these techniques was aprooved as tools capable of performing the tasks for which the system was proposed and the best configuration found was the combination of the MFCC and VQ techniques with VAD, Shifted-Delta Coefficients and the Short-Time Gaussianization normalization technique.

Implementação de um sistema de reconhecimento automático de voz utilizando as técnicas MFCC e Quantização Vetorial com atributos dinâmicos, de normalização e detecção de voz ativa

Registros relacionados