Detalhes bibliográficos
Ano de defesa: |
2023 |
Autor(a) principal: |
Oliveira, Frederico Santos de
 |
Orientador(a): |
Soares, Anderson da Silva
 |
Banca de defesa: |
Soares, Anderson da Silva,
Aluisio, Sandra Maria,
Duarte, Julio Cesar,
Laureano, Gustavo Teodoro,
Galvão Filho, Arlindo Rodrigues |
Tipo de documento: |
Tese
|
Tipo de acesso: |
Acesso aberto |
Idioma: |
por |
Instituição de defesa: |
Universidade Federal de Goiás
|
Programa de Pós-Graduação: |
Programa de Pós-graduação em Ciência da Computação (INF)
|
Departamento: |
Instituto de Informática - INF (RMG)
|
País: |
Brasil
|
Palavras-chave em Português: |
|
Palavras-chave em Inglês: |
|
Área do conhecimento CNPq: |
|
Link de acesso: |
http://repositorio.bc.ufg.br/tede/handle/tede/12916
|
Resumo: |
With the emergence of intelligent personal assistants, the need for high-quality conversational interfaces has increased. While text-based chatbots are popular, the development of voice interfaces is equally important. However, the primary method for evaluating voice-based conversational models is mainly done through Mean Opinion Score (MOS), which relies on a manual and subjective process. In this context, this thesis aims to contribute with a new methodology for evaluating voice-based conversational interfaces, with a case study specifically conducted in Brazilian Portuguese. The proposed methodology includes an architecture for predicting the quality of synthesized speech in Brazilian Portuguese, correlated with MOS. To evaluate the proposed methodology, this work included training Text-to-Speech models to create the dataset called BRSpeechMOS. Details about the creation of this dataset are presented, along with a qualitative and quantitative analysis of it. A series of experiments were conducted to train various architectures using the BRSpeechMOS dataset. The architectures used are based on supervised and self-supervised learning. The results obtained confirm the hypothesis raised that pre-trained models on voice processing tasks such as speaker verification and automatic speech recognition produce suitable acoustic representations for the task of predicting speech quality, contributing to the advancement of the state of the art in the development of evaluation methodologies for conversational models. |