TCDA: Temporally-Consistent Depth-Aware Unsupervised Domain Adaptation for Semantic Segmentation in Urban Scenes

Bibliographic Details
Main Author: Barbosa, Felipe Manfio
Publication Date: 2025
Format: Master thesis
Language: eng
Source: Biblioteca Digital de Teses e Dissertações da USP
Download full: https://www.teses.usp.br/teses/disponiveis/55/55134/tde-07082025-113420/
Summary: Visual perception plays a fundamental role in the correct localization and navigation of autonomous vehicles in urban environments. In this scenario, Semantic Segmentation is a valuable technique, usually derived from RGB images, for identifying elements and their relationships in a given scene. With the advent of Convolutional Neural Networks and Deep Learning, Semantic Segmentation methods reached a new level in terms of accuracy (90% in Cityscapes dataset). Nonetheless, this comes at the cost of heavy and complex network architectures that require high-cost embedded processing centers. Additionally, adapting these models to new operation domains remains an unsolved issue, mainly because fine-tuning for all possible target domains is impractical given the high costs of data annotation. Although simulation engines can provide large amounts of labeled data, the Domain shift between synthetic and real domains still hinders direct translation to application scenarios. In light of that, Unsupervised Domain Adaptation has emerged as a promising alternative by exploring unlabeled data and self-supervised learning as helper tools for Domain Adaptation. However, the majority of the literature builds upon single- frame RGB-only methods. Taking this into account, we argue that a huge opportunity for leveraging Temporal and Depth data as auxiliary sources of supervision is being disregarded. While Depth data has strong structural/geometric value and tend to be more robust to domain shifts, Temporal data provide large amounts of unlabeled data and useful temporal cues for helping in the adaptation process. Therefore, in this work we propose a synthetic-to-real Unsupervised Domain Adaptation method that builds upon three cornerstones: efficiency, depth and temporal awareness. The results demonstrate that the proposed method improves both segmentation quality and temporal consistency in the target domain. Specifically, it achieves an overall improvement of approximately 250% in the mean Intersection over Union (mIoU) for critical classesroad, sidewalk, car, and personcompared to the baseline (non-adapted) model.
id USP_589f8889116f643f84fc9b63ce6bb85c
oai_identifier_str oai:teses.usp.br:tde-07082025-113420
network_acronym_str USP
network_name_str Biblioteca Digital de Teses e Dissertações da USP
repository_id_str 2721
spelling TCDA: Temporally-Consistent Depth-Aware Unsupervised Domain Adaptation for Semantic Segmentation in Urban ScenesTCDA: Adaptação de Domínio Não Supervisionada com Consciência de Profundidade Temporalmente Consistente para Segmentação Semântica em Cenas UrbanasAdaptação de domínio não supervisionadaADASADASAprendizado auto-supervisionadoAprendizado profundoAutonomous vehiclesConsistência temporalDeep learningDepthMultimodal perceptionPercepção multimodalProfundidadeSegmentação semânticaSelf-supervised learningSemantic segmentationTemporal consistencyUnsupervised domain adaptationVeículos autônomosVisual perception plays a fundamental role in the correct localization and navigation of autonomous vehicles in urban environments. In this scenario, Semantic Segmentation is a valuable technique, usually derived from RGB images, for identifying elements and their relationships in a given scene. With the advent of Convolutional Neural Networks and Deep Learning, Semantic Segmentation methods reached a new level in terms of accuracy (90% in Cityscapes dataset). Nonetheless, this comes at the cost of heavy and complex network architectures that require high-cost embedded processing centers. Additionally, adapting these models to new operation domains remains an unsolved issue, mainly because fine-tuning for all possible target domains is impractical given the high costs of data annotation. Although simulation engines can provide large amounts of labeled data, the Domain shift between synthetic and real domains still hinders direct translation to application scenarios. In light of that, Unsupervised Domain Adaptation has emerged as a promising alternative by exploring unlabeled data and self-supervised learning as helper tools for Domain Adaptation. However, the majority of the literature builds upon single- frame RGB-only methods. Taking this into account, we argue that a huge opportunity for leveraging Temporal and Depth data as auxiliary sources of supervision is being disregarded. While Depth data has strong structural/geometric value and tend to be more robust to domain shifts, Temporal data provide large amounts of unlabeled data and useful temporal cues for helping in the adaptation process. Therefore, in this work we propose a synthetic-to-real Unsupervised Domain Adaptation method that builds upon three cornerstones: efficiency, depth and temporal awareness. The results demonstrate that the proposed method improves both segmentation quality and temporal consistency in the target domain. Specifically, it achieves an overall improvement of approximately 250% in the mean Intersection over Union (mIoU) for critical classesroad, sidewalk, car, and personcompared to the baseline (non-adapted) model.A percepção visual desempenha um papel fundamental na localização e navegação de veículos autônomos em ambientes urbanos. Neste cenário, a Segmentação Semântica é uma técnica que permite a identificação e classificação de elementos, assim como a compreensão de suas relações na cena sob análise. Com o advento das Redes Neurais Convolucionais, os métodos de Segmentação Semântica alcançaram um novo nível de precisão (90% em bases como Cityscapes). No entanto, tal nível de performance é obtido por arquiteturas complexas de Redes Neurais, as quais requerem hardwares de processamento com alto custo monetário. Adicionalmente, adaptar tais modelos a novos domínios de aplicação permanece um desafio, principalmente devido à impossibilidade de ajuste fino para todos os possíveis domínios-alvo, dado o alto custo de anotação de dados. Apesar de engines de simulação permitirem a geração de um grande número de dados rotulados, a Divergência de Domínio entre dados sintéticos e reais ainda compromete o uso direto de modelos pré-treinados (em dados sintéticos) em cenários de aplicação. Diante disso, a Adaptação de Domínio Não Supervisionada surgiu como uma alternativa para exploração dados não rotulados e aprendizado Auto-supervisionado como ferramentas auxiliares na Adaptação de Domínio. No entanto, a maioria da literatura adota métodos baseados em dados RGB, com processamento quadro a quadro. Diante disso, argumentamos que o uso de dados temporais e de profundidade como fontes auxiliares de supervisão permanece sub explorado. Enquanto dados de profundidade têm forte valor geométrico/estrutural e tendem a ser mais robustos às mudanças de domínio, dados temporais podem fornecer grandes quantidades de dados não rotulados e pistas temporais para apoiar o processo de adaptação. Diante dos pontos previamente apresentados, neste trabalho propomos um método de adaptação de domínio sintético-para-real não supervisionado, construído a partir de três pilares centrais: eficiência, consciência geométrica e temporal. Os resultados demonstram que o método proposto contribui para a melhoria da qualidade da segmentação, assim como da consistência temporal no domínio-alvo. Especificamente, obtém-se um ganho de aproximadamente 250% no valor médio da Intersecção sobre União (mIoU) para classes críticasrua, calçada, carro e pessoaem comparação com o modelo de referência (não adaptado).Biblioteca Digitais de Teses e Dissertações da USPOsório, Fernando SantosBarbosa, Felipe Manfio2025-06-02info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttps://www.teses.usp.br/teses/disponiveis/55/55134/tde-07082025-113420/reponame:Biblioteca Digital de Teses e Dissertações da USPinstname:Universidade de São Paulo (USP)instacron:USPLiberar o conteúdo para acesso público.info:eu-repo/semantics/openAccesseng2025-08-07T14:45:02Zoai:teses.usp.br:tde-07082025-113420Biblioteca Digital de Teses e Dissertaçõeshttp://www.teses.usp.br/PUBhttp://www.teses.usp.br/cgi-bin/mtd2br.plvirginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.bropendoar:27212025-08-07T14:45:02Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)false
dc.title.none.fl_str_mv TCDA: Temporally-Consistent Depth-Aware Unsupervised Domain Adaptation for Semantic Segmentation in Urban Scenes
TCDA: Adaptação de Domínio Não Supervisionada com Consciência de Profundidade Temporalmente Consistente para Segmentação Semântica em Cenas Urbanas
title TCDA: Temporally-Consistent Depth-Aware Unsupervised Domain Adaptation for Semantic Segmentation in Urban Scenes
spellingShingle TCDA: Temporally-Consistent Depth-Aware Unsupervised Domain Adaptation for Semantic Segmentation in Urban Scenes
Barbosa, Felipe Manfio
Adaptação de domínio não supervisionada
ADAS
ADAS
Aprendizado auto-supervisionado
Aprendizado profundo
Autonomous vehicles
Consistência temporal
Deep learning
Depth
Multimodal perception
Percepção multimodal
Profundidade
Segmentação semântica
Self-supervised learning
Semantic segmentation
Temporal consistency
Unsupervised domain adaptation
Veículos autônomos
title_short TCDA: Temporally-Consistent Depth-Aware Unsupervised Domain Adaptation for Semantic Segmentation in Urban Scenes
title_full TCDA: Temporally-Consistent Depth-Aware Unsupervised Domain Adaptation for Semantic Segmentation in Urban Scenes
title_fullStr TCDA: Temporally-Consistent Depth-Aware Unsupervised Domain Adaptation for Semantic Segmentation in Urban Scenes
title_full_unstemmed TCDA: Temporally-Consistent Depth-Aware Unsupervised Domain Adaptation for Semantic Segmentation in Urban Scenes
title_sort TCDA: Temporally-Consistent Depth-Aware Unsupervised Domain Adaptation for Semantic Segmentation in Urban Scenes
author Barbosa, Felipe Manfio
author_facet Barbosa, Felipe Manfio
author_role author
dc.contributor.none.fl_str_mv Osório, Fernando Santos
dc.contributor.author.fl_str_mv Barbosa, Felipe Manfio
dc.subject.por.fl_str_mv Adaptação de domínio não supervisionada
ADAS
ADAS
Aprendizado auto-supervisionado
Aprendizado profundo
Autonomous vehicles
Consistência temporal
Deep learning
Depth
Multimodal perception
Percepção multimodal
Profundidade
Segmentação semântica
Self-supervised learning
Semantic segmentation
Temporal consistency
Unsupervised domain adaptation
Veículos autônomos
topic Adaptação de domínio não supervisionada
ADAS
ADAS
Aprendizado auto-supervisionado
Aprendizado profundo
Autonomous vehicles
Consistência temporal
Deep learning
Depth
Multimodal perception
Percepção multimodal
Profundidade
Segmentação semântica
Self-supervised learning
Semantic segmentation
Temporal consistency
Unsupervised domain adaptation
Veículos autônomos
description Visual perception plays a fundamental role in the correct localization and navigation of autonomous vehicles in urban environments. In this scenario, Semantic Segmentation is a valuable technique, usually derived from RGB images, for identifying elements and their relationships in a given scene. With the advent of Convolutional Neural Networks and Deep Learning, Semantic Segmentation methods reached a new level in terms of accuracy (90% in Cityscapes dataset). Nonetheless, this comes at the cost of heavy and complex network architectures that require high-cost embedded processing centers. Additionally, adapting these models to new operation domains remains an unsolved issue, mainly because fine-tuning for all possible target domains is impractical given the high costs of data annotation. Although simulation engines can provide large amounts of labeled data, the Domain shift between synthetic and real domains still hinders direct translation to application scenarios. In light of that, Unsupervised Domain Adaptation has emerged as a promising alternative by exploring unlabeled data and self-supervised learning as helper tools for Domain Adaptation. However, the majority of the literature builds upon single- frame RGB-only methods. Taking this into account, we argue that a huge opportunity for leveraging Temporal and Depth data as auxiliary sources of supervision is being disregarded. While Depth data has strong structural/geometric value and tend to be more robust to domain shifts, Temporal data provide large amounts of unlabeled data and useful temporal cues for helping in the adaptation process. Therefore, in this work we propose a synthetic-to-real Unsupervised Domain Adaptation method that builds upon three cornerstones: efficiency, depth and temporal awareness. The results demonstrate that the proposed method improves both segmentation quality and temporal consistency in the target domain. Specifically, it achieves an overall improvement of approximately 250% in the mean Intersection over Union (mIoU) for critical classesroad, sidewalk, car, and personcompared to the baseline (non-adapted) model.
publishDate 2025
dc.date.none.fl_str_mv 2025-06-02
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://www.teses.usp.br/teses/disponiveis/55/55134/tde-07082025-113420/
url https://www.teses.usp.br/teses/disponiveis/55/55134/tde-07082025-113420/
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv
dc.rights.driver.fl_str_mv Liberar o conteúdo para acesso público.
info:eu-repo/semantics/openAccess
rights_invalid_str_mv Liberar o conteúdo para acesso público.
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.coverage.none.fl_str_mv
dc.publisher.none.fl_str_mv Biblioteca Digitais de Teses e Dissertações da USP
publisher.none.fl_str_mv Biblioteca Digitais de Teses e Dissertações da USP
dc.source.none.fl_str_mv
reponame:Biblioteca Digital de Teses e Dissertações da USP
instname:Universidade de São Paulo (USP)
instacron:USP
instname_str Universidade de São Paulo (USP)
instacron_str USP
institution USP
reponame_str Biblioteca Digital de Teses e Dissertações da USP
collection Biblioteca Digital de Teses e Dissertações da USP
repository.name.fl_str_mv Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)
repository.mail.fl_str_mv virginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.br
_version_ 1844786147834724352