Desenvolvimento de um modelo de classificação da tipologia dos sinais vocais com base no Deep Learning

Detalhes bibliográficos
Ano de defesa: 2022
Autor(a) principal: Leite, Danilo Rangel Arruda
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Tese
Tipo de acesso: Acesso aberto
Idioma: por
Instituição de defesa: Universidade Federal da Paraíba
Brasil
Ciências Exatas e da Saúde
Programa de Pós-Graduação em Modelos de Decisão e Saúde
UFPB
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: https://repositorio.ufpb.br/jspui/handle/123456789/25176
Resumo: The voice is one of the main means of communication of the human being, its emission must be pleasant, effortless and in accordance with the professional, social and personal interests of the interlocutor. Any change in its emission can be classified as a voice disorder. Diagnosing the disorder at its early stage can be crucial to avoid more serious morbidity situations, as it provides the patient with the opportunity for an uncomplicated treatment, offering a better quality of life. In traditional clinical practice, several medical tests are necessary to detect a voice disorder, such as observation of the vocal folds by means of laryngoscopy, to visualize possible morphological alterations, or acoustic analysis, useful to evidence possible functional alterations. These exams are often invasive and time-consuming and may cause discomfort to the patient during the procedure. Acoustic analysis has been indicated as an auxiliary tool that uses non-invasive, low-cost procedures, using digital voice signal processing techniques, collaborating in the diagnosis of voice pathologies. Among the possibilities of acoustic analysis, spectrography is a resource of great relevance, from which information such as the presence of noise in medium and high frequencies, intensity, instability of harmonics, breaks in sound, among others, can be viewed. Given the above, this study built an intelligent model using a pre-trained Deep Neural Network (DNN) to classify spectrographic images of the voice signal typology of the sustained vowel “é” according to the proposal of Titze (1975) and Sprecher et al. al. (2010). The classification proposed by Titze (1995), most used in research procedures, categorizes signals into Type I, II and III. Sprecher et al. (2010) proposed the inclusion of the Type IV signal to the original classification made by Titze (1995). Grad-CAM was also used to mark in the spectrogram the most relevant parts used by the model in the classification. In this sense, an automatic classification using the proposal by Titze (1975) and Sprecher et al. (2010) may be useful as a treatment outcome measure, since the classification reflects the intensity of the vocal deviation and the presence of laryngeal alteration. The construction of this automatic classification model to classify the signal typology may help the clinician in the decision-making process following the treatment. The architecture developed in the methodology resulted in an Overall Test Accuracy of 0.94, Precision of 0.94, F1Score of 0.94, kappa of 0.91, sensitivity and specificity of 0.94 and 0.98, respectively. The built model can be used as a tool in the pre-processing stage before calculating any disturbance measure, as well as contributing to enhance the efficiency of spectrographic analyses, helping the clinician in his decision making.