Desenvolvimento de um modelo de classificação da tipologia dos sinais vocais com base no Deep Learning
Ano de defesa: | 2022 |
---|---|
Autor(a) principal: | |
Orientador(a): | |
Banca de defesa: | |
Tipo de documento: | Tese |
Tipo de acesso: | Acesso aberto |
Idioma: | por |
Instituição de defesa: |
Universidade Federal da Paraíba
Brasil Ciências Exatas e da Saúde Programa de Pós-Graduação em Modelos de Decisão e Saúde UFPB |
Programa de Pós-Graduação: |
Não Informado pela instituição
|
Departamento: |
Não Informado pela instituição
|
País: |
Não Informado pela instituição
|
Palavras-chave em Português: | |
Link de acesso: | https://repositorio.ufpb.br/jspui/handle/123456789/25176 |
Resumo: | The voice is one of the main means of communication of the human being, its emission must be pleasant, effortless and in accordance with the professional, social and personal interests of the interlocutor. Any change in its emission can be classified as a voice disorder. Diagnosing the disorder at its early stage can be crucial to avoid more serious morbidity situations, as it provides the patient with the opportunity for an uncomplicated treatment, offering a better quality of life. In traditional clinical practice, several medical tests are necessary to detect a voice disorder, such as observation of the vocal folds by means of laryngoscopy, to visualize possible morphological alterations, or acoustic analysis, useful to evidence possible functional alterations. These exams are often invasive and time-consuming and may cause discomfort to the patient during the procedure. Acoustic analysis has been indicated as an auxiliary tool that uses non-invasive, low-cost procedures, using digital voice signal processing techniques, collaborating in the diagnosis of voice pathologies. Among the possibilities of acoustic analysis, spectrography is a resource of great relevance, from which information such as the presence of noise in medium and high frequencies, intensity, instability of harmonics, breaks in sound, among others, can be viewed. Given the above, this study built an intelligent model using a pre-trained Deep Neural Network (DNN) to classify spectrographic images of the voice signal typology of the sustained vowel “é” according to the proposal of Titze (1975) and Sprecher et al. al. (2010). The classification proposed by Titze (1995), most used in research procedures, categorizes signals into Type I, II and III. Sprecher et al. (2010) proposed the inclusion of the Type IV signal to the original classification made by Titze (1995). Grad-CAM was also used to mark in the spectrogram the most relevant parts used by the model in the classification. In this sense, an automatic classification using the proposal by Titze (1975) and Sprecher et al. (2010) may be useful as a treatment outcome measure, since the classification reflects the intensity of the vocal deviation and the presence of laryngeal alteration. The construction of this automatic classification model to classify the signal typology may help the clinician in the decision-making process following the treatment. The architecture developed in the methodology resulted in an Overall Test Accuracy of 0.94, Precision of 0.94, F1Score of 0.94, kappa of 0.91, sensitivity and specificity of 0.94 and 0.98, respectively. The built model can be used as a tool in the pre-processing stage before calculating any disturbance measure, as well as contributing to enhance the efficiency of spectrographic analyses, helping the clinician in his decision making. |