TCDA: Temporally-Consistent Depth-Aware Unsupervised Domain Adaptation for Semantic Segmentation in Urban Scenes

Barbosa, Felipe Manfio

TCDA: Temporally-Consistent Depth-Aware Unsupervised Domain Adaptation for Semantic Segmentation in Urban Scenes

Bibliographic Details
Main Author:	Barbosa, Felipe Manfio
Publication Date:	2025
Format:	Master thesis
Language:	eng
Source:	Biblioteca Digital de Teses e Dissertações da USP
Download full:	https://www.teses.usp.br/teses/disponiveis/55/55134/tde-07082025-113420/
Summary:	Visual perception plays a fundamental role in the correct localization and navigation of autonomous vehicles in urban environments. In this scenario, Semantic Segmentation is a valuable technique, usually derived from RGB images, for identifying elements and their relationships in a given scene. With the advent of Convolutional Neural Networks and Deep Learning, Semantic Segmentation methods reached a new level in terms of accuracy (90% in Cityscapes dataset). Nonetheless, this comes at the cost of heavy and complex network architectures that require high-cost embedded processing centers. Additionally, adapting these models to new operation domains remains an unsolved issue, mainly because fine-tuning for all possible target domains is impractical given the high costs of data annotation. Although simulation engines can provide large amounts of labeled data, the Domain shift between synthetic and real domains still hinders direct translation to application scenarios. In light of that, Unsupervised Domain Adaptation has emerged as a promising alternative by exploring unlabeled data and self-supervised learning as helper tools for Domain Adaptation. However, the majority of the literature builds upon single- frame RGB-only methods. Taking this into account, we argue that a huge opportunity for leveraging Temporal and Depth data as auxiliary sources of supervision is being disregarded. While Depth data has strong structural/geometric value and tend to be more robust to domain shifts, Temporal data provide large amounts of unlabeled data and useful temporal cues for helping in the adaptation process. Therefore, in this work we propose a synthetic-to-real Unsupervised Domain Adaptation method that builds upon three cornerstones: efficiency, depth and temporal awareness. The results demonstrate that the proposed method improves both segmentation quality and temporal consistency in the target domain. Specifically, it achieves an overall improvement of approximately 250% in the mean Intersection over Union (mIoU) for critical classesroad, sidewalk, car, and personcompared to the baseline (non-adapted) model.

Item metadata

id	USP_589f8889116f643f84fc9b63ce6bb85c
oai_identifier_str	oai:teses.usp.br:tde-07082025-113420
network_acronym_str	USP
network_name_str	Biblioteca Digital de Teses e Dissertações da USP
repository_id_str	2721
spelling	TCDA: Temporally-Consistent Depth-Aware Unsupervised Domain Adaptation for Semantic Segmentation in Urban ScenesTCDA: Adaptação de Domínio Não Supervisionada com Consciência de Profundidade Temporalmente Consistente para Segmentação Semântica em Cenas UrbanasAdaptação de domínio não supervisionadaADASADASAprendizado auto-supervisionadoAprendizado profundoAutonomous vehiclesConsistência temporalDeep learningDepthMultimodal perceptionPercepção multimodalProfundidadeSegmentação semânticaSelf-supervised learningSemantic segmentationTemporal consistencyUnsupervised domain adaptationVeículos autônomosVisual perception plays a fundamental role in the correct localization and navigation of autonomous vehicles in urban environments. In this scenario, Semantic Segmentation is a valuable technique, usually derived from RGB images, for identifying elements and their relationships in a given scene. With the advent of Convolutional Neural Networks and Deep Learning, Semantic Segmentation methods reached a new level in terms of accuracy (90% in Cityscapes dataset). Nonetheless, this comes at the cost of heavy and complex network architectures that require high-cost embedded processing centers. Additionally, adapting these models to new operation domains remains an unsolved issue, mainly because fine-tuning for all possible target domains is impractical given the high costs of data annotation. Although simulation engines can provide large amounts of labeled data, the Domain shift between synthetic and real domains still hinders direct translation to application scenarios. In light of that, Unsupervised Domain Adaptation has emerged as a promising alternative by exploring unlabeled data and self-supervised learning as helper tools for Domain Adaptation. However, the majority of the literature builds upon single- frame RGB-only methods. Taking this into account, we argue that a huge opportunity for leveraging Temporal and Depth data as auxiliary sources of supervision is being disregarded. While Depth data has strong structural/geometric value and tend to be more robust to domain shifts, Temporal data provide large amounts of unlabeled data and useful temporal cues for helping in the adaptation process. Therefore, in this work we propose a synthetic-to-real Unsupervised Domain Adaptation method that builds upon three cornerstones: efficiency, depth and temporal awareness. The results demonstrate that the proposed method improves both segmentation quality and temporal consistency in the target domain. Specifically, it achieves an overall improvement of approximately 250% in the mean Intersection over Union (mIoU) for critical classesroad, sidewalk, car, and personcompared to the baseline (non-adapted) model.A percepção visual desempenha um papel fundamental na localização e navegação de veículos autônomos em ambientes urbanos. Neste cenário, a Segmentação Semântica é uma técnica que permite a identificação e classificação de elementos, assim como a compreensão de suas relações na cena sob análise. Com o advento das Redes Neurais Convolucionais, os métodos de Segmentação Semântica alcançaram um novo nível de precisão (90% em bases como Cityscapes). No entanto, tal nível de performance é obtido por arquiteturas complexas de Redes Neurais, as quais requerem hardwares de processamento com alto custo monetário. Adicionalmente, adaptar tais modelos a novos domínios de aplicação permanece um desafio, principalmente devido à impossibilidade de ajuste fino para todos os possíveis domínios-alvo, dado o alto custo de anotação de dados. Apesar de engines de simulação permitirem a geração de um grande número de dados rotulados, a Divergência de Domínio entre dados sintéticos e reais ainda compromete o uso direto de modelos pré-treinados (em dados sintéticos) em cenários de aplicação. Diante disso, a Adaptação de Domínio Não Supervisionada surgiu como uma alternativa para exploração dados não rotulados e aprendizado Auto-supervisionado como ferramentas auxiliares na Adaptação de Domínio. No entanto, a maioria da literatura adota métodos baseados em dados RGB, com processamento quadro a quadro. Diante disso, argumentamos que o uso de dados temporais e de profundidade como fontes auxiliares de supervisão permanece sub explorado. Enquanto dados de profundidade têm forte valor geométrico/estrutural e tendem a ser mais robustos às mudanças de domínio, dados temporais podem fornecer grandes quantidades de dados não rotulados e pistas temporais para apoiar o processo de adaptação. Diante dos pontos previamente apresentados, neste trabalho propomos um método de adaptação de domínio sintético-para-real não supervisionado, construído a partir de três pilares centrais: eficiência, consciência geométrica e temporal. Os resultados demonstram que o método proposto contribui para a melhoria da qualidade da segmentação, assim como da consistência temporal no domínio-alvo. Especificamente, obtém-se um ganho de aproximadamente 250% no valor médio da Intersecção sobre União (mIoU) para classes críticasrua, calçada, carro e pessoaem comparação com o modelo de referência (não adaptado).Biblioteca Digitais de Teses e Dissertações da USPOsório, Fernando SantosBarbosa, Felipe Manfio2025-06-02info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttps://www.teses.usp.br/teses/disponiveis/55/55134/tde-07082025-113420/reponame:Biblioteca Digital de Teses e Dissertações da USPinstname:Universidade de São Paulo (USP)instacron:USPLiberar o conteúdo para acesso público.info:eu-repo/semantics/openAccesseng2025-08-07T14:45:02Zoai:teses.usp.br:tde-07082025-113420Biblioteca Digital de Teses e Dissertaçõeshttp://www.teses.usp.br/PUBhttp://www.teses.usp.br/cgi-bin/mtd2br.plvirginia@if.usp.br\|\| atendimento@aguia.usp.br\|\|virginia@if.usp.bropendoar:27212025-08-07T14:45:02Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)false
dc.title.none.fl_str_mv	TCDA: Temporally-Consistent Depth-Aware Unsupervised Domain Adaptation for Semantic Segmentation in Urban Scenes TCDA: Adaptação de Domínio Não Supervisionada com Consciência de Profundidade Temporalmente Consistente para Segmentação Semântica em Cenas Urbanas
title	TCDA: Temporally-Consistent Depth-Aware Unsupervised Domain Adaptation for Semantic Segmentation in Urban Scenes
spellingShingle	TCDA: Temporally-Consistent Depth-Aware Unsupervised Domain Adaptation for Semantic Segmentation in Urban Scenes Barbosa, Felipe Manfio Adaptação de domínio não supervisionada ADAS ADAS Aprendizado auto-supervisionado Aprendizado profundo Autonomous vehicles Consistência temporal Deep learning Depth Multimodal perception Percepção multimodal Profundidade Segmentação semântica Self-supervised learning Semantic segmentation Temporal consistency Unsupervised domain adaptation Veículos autônomos
title_short	TCDA: Temporally-Consistent Depth-Aware Unsupervised Domain Adaptation for Semantic Segmentation in Urban Scenes
title_full	TCDA: Temporally-Consistent Depth-Aware Unsupervised Domain Adaptation for Semantic Segmentation in Urban Scenes
title_fullStr	TCDA: Temporally-Consistent Depth-Aware Unsupervised Domain Adaptation for Semantic Segmentation in Urban Scenes
title_full_unstemmed	TCDA: Temporally-Consistent Depth-Aware Unsupervised Domain Adaptation for Semantic Segmentation in Urban Scenes
title_sort	TCDA: Temporally-Consistent Depth-Aware Unsupervised Domain Adaptation for Semantic Segmentation in Urban Scenes
author	Barbosa, Felipe Manfio
author_facet	Barbosa, Felipe Manfio
author_role	author
dc.contributor.none.fl_str_mv	Osório, Fernando Santos
dc.contributor.author.fl_str_mv	Barbosa, Felipe Manfio
dc.subject.por.fl_str_mv	Adaptação de domínio não supervisionada ADAS ADAS Aprendizado auto-supervisionado Aprendizado profundo Autonomous vehicles Consistência temporal Deep learning Depth Multimodal perception Percepção multimodal Profundidade Segmentação semântica Self-supervised learning Semantic segmentation Temporal consistency Unsupervised domain adaptation Veículos autônomos
topic	Adaptação de domínio não supervisionada ADAS ADAS Aprendizado auto-supervisionado Aprendizado profundo Autonomous vehicles Consistência temporal Deep learning Depth Multimodal perception Percepção multimodal Profundidade Segmentação semântica Self-supervised learning Semantic segmentation Temporal consistency Unsupervised domain adaptation Veículos autônomos
description	Visual perception plays a fundamental role in the correct localization and navigation of autonomous vehicles in urban environments. In this scenario, Semantic Segmentation is a valuable technique, usually derived from RGB images, for identifying elements and their relationships in a given scene. With the advent of Convolutional Neural Networks and Deep Learning, Semantic Segmentation methods reached a new level in terms of accuracy (90% in Cityscapes dataset). Nonetheless, this comes at the cost of heavy and complex network architectures that require high-cost embedded processing centers. Additionally, adapting these models to new operation domains remains an unsolved issue, mainly because fine-tuning for all possible target domains is impractical given the high costs of data annotation. Although simulation engines can provide large amounts of labeled data, the Domain shift between synthetic and real domains still hinders direct translation to application scenarios. In light of that, Unsupervised Domain Adaptation has emerged as a promising alternative by exploring unlabeled data and self-supervised learning as helper tools for Domain Adaptation. However, the majority of the literature builds upon single- frame RGB-only methods. Taking this into account, we argue that a huge opportunity for leveraging Temporal and Depth data as auxiliary sources of supervision is being disregarded. While Depth data has strong structural/geometric value and tend to be more robust to domain shifts, Temporal data provide large amounts of unlabeled data and useful temporal cues for helping in the adaptation process. Therefore, in this work we propose a synthetic-to-real Unsupervised Domain Adaptation method that builds upon three cornerstones: efficiency, depth and temporal awareness. The results demonstrate that the proposed method improves both segmentation quality and temporal consistency in the target domain. Specifically, it achieves an overall improvement of approximately 250% in the mean Intersection over Union (mIoU) for critical classesroad, sidewalk, car, and personcompared to the baseline (non-adapted) model.
publishDate	2025
dc.date.none.fl_str_mv	2025-06-02
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/masterThesis
format	masterThesis
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	https://www.teses.usp.br/teses/disponiveis/55/55134/tde-07082025-113420/
url	https://www.teses.usp.br/teses/disponiveis/55/55134/tde-07082025-113420/
dc.language.iso.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv
dc.rights.driver.fl_str_mv	Liberar o conteúdo para acesso público. info:eu-repo/semantics/openAccess
rights_invalid_str_mv	Liberar o conteúdo para acesso público.
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.coverage.none.fl_str_mv
dc.publisher.none.fl_str_mv	Biblioteca Digitais de Teses e Dissertações da USP
publisher.none.fl_str_mv	Biblioteca Digitais de Teses e Dissertações da USP
dc.source.none.fl_str_mv	reponame:Biblioteca Digital de Teses e Dissertações da USP instname:Universidade de São Paulo (USP) instacron:USP
instname_str	Universidade de São Paulo (USP)
instacron_str	USP
institution	USP
reponame_str	Biblioteca Digital de Teses e Dissertações da USP
collection	Biblioteca Digital de Teses e Dissertações da USP
repository.name.fl_str_mv	Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)
repository.mail.fl_str_mv	virginia@if.usp.br\|\| atendimento@aguia.usp.br\|\|virginia@if.usp.br
_version_	1844786147834724352

TCDA: Temporally-Consistent Depth-Aware Unsupervised Domain Adaptation for Semantic Segmentation in Urban Scenes

Similar Items