Segmentation of oral lesions through convolutional neural networks

Detalhes bibliográficos
Ano de defesa: 2025
Autor(a) principal: Souza, Eduardo Santos Carlos de
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Dissertação
Tipo de acesso: Acesso aberto
Idioma: eng
Instituição de defesa: Biblioteca Digitais de Teses e Dissertações da USP
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: https://www.teses.usp.br/teses/disponiveis/55/55134/tde-29072025-102150/
Resumo: Artificial intelligence has been widely used in the medical field in recent years, especially for medical imaging, with the goal of creating models capable of quickly and precisely identifying conditions or relevant characteristics present in the images, in order to aid medical professionals in their practice. Oral cancer and oral potentially malignant disorders are a group of conditions that has received relatively little attention from the scientific community, which is especially concerning since oral cancer is among the most common and deadly forms of cancer. Of the works present in the literature on these afflictions, most focus on classifying the lesions in the images, which is not sufficient for medical practice as practitioners also need to differentiate healthy tissue from the afflicted areas in order to properly diagnose and subsequently treat the disease. This work bridges this gap by conducting research focused on the usage of artificial intelligence models for the semantic segmentation of such images. Aside from building the models themselves, this work also explores the utilization of transfer learning practices to remediate the lack of annotated data necessary to build such models. In particular, the large ImageNet classification dataset was compared to smaller semantic segmentation ones, namely the COCO and ISIC 2018 datasets, regarding the transfer learning performance provided by these datasets. Finally, to produce a valuable reference for comparison, the pairwise difference in performance between the developed models and human performance was calculated. Through this work, two leading models based on the Attention U-Net were produced worthy of mention in this abstract: a model using the ConvNeXt for its backbone, of high computational cost, which obtained a Dice Score of 0.715; and a model using the MobileNet for its backbone, of low computational cost, which obtained a Dice Score of 0.692. Regarding transfer learning, it was concluded that the ISIC 2018 dataset provides worse performance than no transfer learning, whereas the ImageNet and COCO datasets provide significant gains in performance. These latter two datasets produced highly similar results, and the hypothesis testing conducted was unable to determine that either was superior to the other. This creates the possibility of using the COCO dataset for quicker and less resource-intensive transfer learning. Finally, the paired comparison between the models and the human performance statistically demonstrated that humans outperform the models, indicating the need for further research and development for this task.