Detalhes bibliográficos
Ano de defesa: |
2024 |
Autor(a) principal: |
PEREIRA , Bianca Valéria Lopes
 |
Orientador(a): |
ALMEIDA NETO, Areolino de
 |
Banca de defesa: |
ALMEIDA NETO, Areolino de
,
OLIVEIRA, Alexandre César Muniz de
,
SAMPAIO NETO, Nelson Cruz
 |
Tipo de documento: |
Dissertação
|
Tipo de acesso: |
Acesso aberto |
Idioma: |
por |
Instituição de defesa: |
Universidade Federal do Maranhão
|
Programa de Pós-Graduação: |
PROGRAMA DE PÓS-GRADUAÇÃO EM CIÊNCIA DA COMPUTAÇÃO/CCET
|
Departamento: |
DEPARTAMENTO DE INFORMÁTICA/CCET
|
País: |
Brasil
|
Palavras-chave em Português: |
|
Palavras-chave em Inglês: |
|
Área do conhecimento CNPq: |
|
Link de acesso: |
https://tedebc.ufma.br/jspui/handle/tede/5486
|
Resumo: |
Phoneme recognition is an area of linguistics and speech processing that involves identifying and distinguishing the distinctive sounds that make up a language. Recognizing phonemes involves the ability to discern and categorize the different sounds of speech, even when there are variations in pronunciation, context or intonation. In this work, a phoneme recognition model is proposed using a stacked autoencoder network, called CollabNet. CollabNet introduces a collaborative method for inserting new hidden layers, in contrast to the traditional stacking of autoencoders. In CollabNet, the addition of a new layer is done in a coordinated and gradual manner, allowing the designer to control its influence on the training. This collaboration ensures that the learning of the new layer is effectively integrated with the previous layers, resulting in more aligned and efficient training. To represent the phonemes, the frequencies were compacted using centroids so as to preserve the particularities of the sound. In order to create a geometric representation of the audios in the databases, the fast Fourier transform (FFT) was calculated for each audio sample, then the frequencies were grouped and the centroid of each group was calculated. Subsequently, the deep stacked autoencoder network was parameterized and trained to recognize phonetic syllables. With this representation of the audios, one could maintain their particular characterization so that CollabNet could identify the various sounds of the Brazilian Portuguese language, thus achieving an accuracy of 75.96% and a PER of 23.73%. |