A novel word boundary detector based on the teager energy operator for automatic speech recognition

Peretta, Igor Santos

A novel word boundary detector based on the teager energy operator for automatic speech recognition

Detalhes bibliográficos
Ano de defesa:	2010
Autor(a) principal:	Peretta, Igor Santos
Orientador(a):	Não Informado pela instituição
Banca de defesa:	Não Informado pela instituição
Tipo de documento:	Dissertação
Tipo de acesso:	Acesso aberto
Idioma:	por
Instituição de defesa:	Universidade Federal de Uberlândia BR Programa de Pós-graduação em Engenharia Elétrica Engenharias UFU
Programa de Pós-Graduação:	Não Informado pela instituição
Departamento:	Não Informado pela instituição
País:	Não Informado pela instituição
Palavras-chave em Português:	Segmentação da fala Detecção de fronteiras de palavra falada TEO Independente de locutor Palavras isoladas Sistema de reconhecimento de voz MFCC MLP Reconhecimento automático da voz Redes neurais artificiais Speech segmentation Spoken word boundary detection Speaker-independent Isolated words Speech recognition system Mel-frequency cepstral coefficients Artificial neural network CNPQ::ENGENHARIAS::ENGENHARIA ELETRICA
Link de acesso:	https://repositorio.ufu.br/handle/123456789/14446
Resumo:	This work is part of a major research project and contributes into the development of a speaker-independent speech recognition system for isolated words from a limited vocabulary. It proposes a novel spoken word boundary detection method named TEO-based method for Spoken Word Segmentation (TSWS). Based on the Teager Energy Operator (TEO), the TSWS is presented and compared with two widely used speech segmentation methods: Classical , that uses energy and zero-crossing rate computations, and Bottom-up , based on the concepts of adaptive level equalization, energy pulse detection and endpoint ordering. The TSWS shows a great precision improvement on spoken word boundary detection when compared to Classical (67.8% of error reduction) and Bottom-up (61.2% of error reduction) methods. A complete isolated spoken word recognition system (ISWRS) is also presented. This ISWRS uses Mel-frequency Cepstral Coefficients (MFCC) as the parametric representation of the speech signal, and a standard multilayer feed-forward network (MLP) as the recognizer. Two sets of tests were conducted, one with a database of 50 different words with a total of 10,350 utterances, and another with a smaller vocabulary 17 words with a total of 3,519 utterances. Two in three of those utterances constituted the training set for the ISWRS, and one in three, the testing set. The tests were conducted for each of the TSWS, Classical or Bottom-up methods, used in the ISWRS speech segmentation stage. TSWS has enabled the ISWRS to achieve 99.0% of success on generalization tests, against 98.6% for Classical and Bottom-up methods. After, a white Gaussian noise was artificially added to ISWRS inputs to reach a signal-to-noise ratio of 15dB. The noise presence alters the ISWRS performances to 96.5%, 93.6%, and 91.4% on generalization tests when using TSWS, Classical and Bottom-up methods, respectively.

A novel word boundary detector based on the teager energy operator for automatic speech recognition

Registros relacionados