Reconhecimento de voz utilizando seleção dinâmica de redes neurais

ROCHA, Priscila Lima

Reconhecimento de voz utilizando seleção dinâmica de redes neurais

Detalhes bibliográficos
Ano de defesa:	2018
Autor(a) principal:	ROCHA, Priscila Lima
Orientador(a):	BARROS FILHO, Allan Kardec Duailibe
Banca de defesa:	PRINCIPE, José Carlos , SOUZA, Francisco das Chagas de
Tipo de documento:	Dissertação
Tipo de acesso:	Acesso aberto
Idioma:	por
Instituição de defesa:	Universidade Federal do Maranhão
Programa de Pós-Graduação:	PROGRAMA DE PÓS-GRADUAÇÃO EM ENGENHARIA DE ELETRICIDADE/CCET
Departamento:	DEPARTAMENTO DE ENGENHARIA DA ELETRICIDADE/CCET
País:	Brasil
Palavras-chave em Português:	Redes Neurais Reconhecimento Automático de Voz Coeficientes Mel-Cepstrais Modelos TCD Perceptron de Múltiplas Camadas Aprendizado por Quantização Vetorial Função de Base Radial Gaussiana Mistura de Especialistas
Palavras-chave em Inglês:	Automatic Speech Recognition Neural Network DCT Models Multilayer Perceptron Learning Vector Quantization Gaussian Radial Basis Function Mixture of Experts
Área do conhecimento CNPq:	Linguagem Formais e Automatos
Link de acesso:	https://tedebc.ufma.br/jspui/handle/tede/2131
Resumo:	This work proposes a hierarchical architecture composed of a set of neural networks specialists based on the ensemble method with dynamic selection of classifiers for application in speech recognition systems. The task of pattern recognition proposed in this work involves a group of 30 commands in the Brazilian Portuguese language. These commands are coded by a two-dimensional temporal matrix, resulting from the application of the Discrete Cosine Transformation (DCT) in the mel-ceptral coefficients. To avoid the problem of separability of the patterns, they are modified through a nonlinear transformation to a high-dimensional space through a suitable set of Gaussian Radial Base Functions (GRBF). The classification is done through the dynamic classifier selection method, in which Multilayer Perceptron (MLP) and Vector Vector Quantization Learning (LVQ) configurations are analyzed to constitute the multiple classifiers specialized in the subdivisions made in the total of classes to be recognized. The performances these configurations are evaluated during the training, validation and testing phases of the voice recognition system. Then, given a new test pattern, this is applied to the GRBF set, where each function is parameterized with the centroid and variance characteristics of the classes. Therefore, the GRBF that present the highest image value for the function indicates to which class this pattern is located, thus directing, to the specialist neural network which will provide the final classification result based on the local accuracy. At the end, the performance of the neural network configurations chosen for the composition of the multiple classifiers was verified. The result of the comparison between MLP and LVQ configurations for the proposed system showed that the overall accuracy rate using patterns of dimensions 4, 9 and 16 in the original feature space for the LVQ networks was 87.52 %, 88.39 % and 89.6 %, respectively. The MLP networks obtained an overall accuracy rate of 91.44 %, 93.15 % and 94.9 %, respectively

Reconhecimento de voz utilizando seleção dinâmica de redes neurais

Registros relacionados