TCDA: Temporally-Consistent Depth-Aware Unsupervised Domain Adaptation for Semantic Segmentation in Urban Scenes
| Main Author: | |
|---|---|
| Publication Date: | 2025 |
| Format: | Master thesis |
| Language: | eng |
| Source: | Biblioteca Digital de Teses e Dissertações da USP |
| Download full: | https://www.teses.usp.br/teses/disponiveis/55/55134/tde-07082025-113420/ |
Summary: | Visual perception plays a fundamental role in the correct localization and navigation of autonomous vehicles in urban environments. In this scenario, Semantic Segmentation is a valuable technique, usually derived from RGB images, for identifying elements and their relationships in a given scene. With the advent of Convolutional Neural Networks and Deep Learning, Semantic Segmentation methods reached a new level in terms of accuracy (90% in Cityscapes dataset). Nonetheless, this comes at the cost of heavy and complex network architectures that require high-cost embedded processing centers. Additionally, adapting these models to new operation domains remains an unsolved issue, mainly because fine-tuning for all possible target domains is impractical given the high costs of data annotation. Although simulation engines can provide large amounts of labeled data, the Domain shift between synthetic and real domains still hinders direct translation to application scenarios. In light of that, Unsupervised Domain Adaptation has emerged as a promising alternative by exploring unlabeled data and self-supervised learning as helper tools for Domain Adaptation. However, the majority of the literature builds upon single- frame RGB-only methods. Taking this into account, we argue that a huge opportunity for leveraging Temporal and Depth data as auxiliary sources of supervision is being disregarded. While Depth data has strong structural/geometric value and tend to be more robust to domain shifts, Temporal data provide large amounts of unlabeled data and useful temporal cues for helping in the adaptation process. Therefore, in this work we propose a synthetic-to-real Unsupervised Domain Adaptation method that builds upon three cornerstones: efficiency, depth and temporal awareness. The results demonstrate that the proposed method improves both segmentation quality and temporal consistency in the target domain. Specifically, it achieves an overall improvement of approximately 250% in the mean Intersection over Union (mIoU) for critical classesroad, sidewalk, car, and personcompared to the baseline (non-adapted) model. |
| id |
USP_589f8889116f643f84fc9b63ce6bb85c |
|---|---|
| oai_identifier_str |
oai:teses.usp.br:tde-07082025-113420 |
| network_acronym_str |
USP |
| network_name_str |
Biblioteca Digital de Teses e Dissertações da USP |
| repository_id_str |
2721 |
| spelling |
TCDA: Temporally-Consistent Depth-Aware Unsupervised Domain Adaptation for Semantic Segmentation in Urban ScenesTCDA: Adaptação de Domínio Não Supervisionada com Consciência de Profundidade Temporalmente Consistente para Segmentação Semântica em Cenas UrbanasAdaptação de domínio não supervisionadaADASADASAprendizado auto-supervisionadoAprendizado profundoAutonomous vehiclesConsistência temporalDeep learningDepthMultimodal perceptionPercepção multimodalProfundidadeSegmentação semânticaSelf-supervised learningSemantic segmentationTemporal consistencyUnsupervised domain adaptationVeículos autônomosVisual perception plays a fundamental role in the correct localization and navigation of autonomous vehicles in urban environments. In this scenario, Semantic Segmentation is a valuable technique, usually derived from RGB images, for identifying elements and their relationships in a given scene. With the advent of Convolutional Neural Networks and Deep Learning, Semantic Segmentation methods reached a new level in terms of accuracy (90% in Cityscapes dataset). Nonetheless, this comes at the cost of heavy and complex network architectures that require high-cost embedded processing centers. Additionally, adapting these models to new operation domains remains an unsolved issue, mainly because fine-tuning for all possible target domains is impractical given the high costs of data annotation. Although simulation engines can provide large amounts of labeled data, the Domain shift between synthetic and real domains still hinders direct translation to application scenarios. In light of that, Unsupervised Domain Adaptation has emerged as a promising alternative by exploring unlabeled data and self-supervised learning as helper tools for Domain Adaptation. However, the majority of the literature builds upon single- frame RGB-only methods. Taking this into account, we argue that a huge opportunity for leveraging Temporal and Depth data as auxiliary sources of supervision is being disregarded. While Depth data has strong structural/geometric value and tend to be more robust to domain shifts, Temporal data provide large amounts of unlabeled data and useful temporal cues for helping in the adaptation process. Therefore, in this work we propose a synthetic-to-real Unsupervised Domain Adaptation method that builds upon three cornerstones: efficiency, depth and temporal awareness. The results demonstrate that the proposed method improves both segmentation quality and temporal consistency in the target domain. Specifically, it achieves an overall improvement of approximately 250% in the mean Intersection over Union (mIoU) for critical classesroad, sidewalk, car, and personcompared to the baseline (non-adapted) model.A percepção visual desempenha um papel fundamental na localização e navegação de veículos autônomos em ambientes urbanos. Neste cenário, a Segmentação Semântica é uma técnica que permite a identificação e classificação de elementos, assim como a compreensão de suas relações na cena sob análise. Com o advento das Redes Neurais Convolucionais, os métodos de Segmentação Semântica alcançaram um novo nível de precisão (90% em bases como Cityscapes). No entanto, tal nível de performance é obtido por arquiteturas complexas de Redes Neurais, as quais requerem hardwares de processamento com alto custo monetário. Adicionalmente, adaptar tais modelos a novos domínios de aplicação permanece um desafio, principalmente devido à impossibilidade de ajuste fino para todos os possíveis domínios-alvo, dado o alto custo de anotação de dados. Apesar de engines de simulação permitirem a geração de um grande número de dados rotulados, a Divergência de Domínio entre dados sintéticos e reais ainda compromete o uso direto de modelos pré-treinados (em dados sintéticos) em cenários de aplicação. Diante disso, a Adaptação de Domínio Não Supervisionada surgiu como uma alternativa para exploração dados não rotulados e aprendizado Auto-supervisionado como ferramentas auxiliares na Adaptação de Domínio. No entanto, a maioria da literatura adota métodos baseados em dados RGB, com processamento quadro a quadro. Diante disso, argumentamos que o uso de dados temporais e de profundidade como fontes auxiliares de supervisão permanece sub explorado. Enquanto dados de profundidade têm forte valor geométrico/estrutural e tendem a ser mais robustos às mudanças de domínio, dados temporais podem fornecer grandes quantidades de dados não rotulados e pistas temporais para apoiar o processo de adaptação. Diante dos pontos previamente apresentados, neste trabalho propomos um método de adaptação de domínio sintético-para-real não supervisionado, construído a partir de três pilares centrais: eficiência, consciência geométrica e temporal. Os resultados demonstram que o método proposto contribui para a melhoria da qualidade da segmentação, assim como da consistência temporal no domínio-alvo. Especificamente, obtém-se um ganho de aproximadamente 250% no valor médio da Intersecção sobre União (mIoU) para classes críticasrua, calçada, carro e pessoaem comparação com o modelo de referência (não adaptado).Biblioteca Digitais de Teses e Dissertações da USPOsório, Fernando SantosBarbosa, Felipe Manfio2025-06-02info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttps://www.teses.usp.br/teses/disponiveis/55/55134/tde-07082025-113420/reponame:Biblioteca Digital de Teses e Dissertações da USPinstname:Universidade de São Paulo (USP)instacron:USPLiberar o conteúdo para acesso público.info:eu-repo/semantics/openAccesseng2025-08-07T14:45:02Zoai:teses.usp.br:tde-07082025-113420Biblioteca Digital de Teses e Dissertaçõeshttp://www.teses.usp.br/PUBhttp://www.teses.usp.br/cgi-bin/mtd2br.plvirginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.bropendoar:27212025-08-07T14:45:02Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP)false |
| dc.title.none.fl_str_mv |
TCDA: Temporally-Consistent Depth-Aware Unsupervised Domain Adaptation for Semantic Segmentation in Urban Scenes TCDA: Adaptação de Domínio Não Supervisionada com Consciência de Profundidade Temporalmente Consistente para Segmentação Semântica em Cenas Urbanas |
| title |
TCDA: Temporally-Consistent Depth-Aware Unsupervised Domain Adaptation for Semantic Segmentation in Urban Scenes |
| spellingShingle |
TCDA: Temporally-Consistent Depth-Aware Unsupervised Domain Adaptation for Semantic Segmentation in Urban Scenes Barbosa, Felipe Manfio Adaptação de domínio não supervisionada ADAS ADAS Aprendizado auto-supervisionado Aprendizado profundo Autonomous vehicles Consistência temporal Deep learning Depth Multimodal perception Percepção multimodal Profundidade Segmentação semântica Self-supervised learning Semantic segmentation Temporal consistency Unsupervised domain adaptation Veículos autônomos |
| title_short |
TCDA: Temporally-Consistent Depth-Aware Unsupervised Domain Adaptation for Semantic Segmentation in Urban Scenes |
| title_full |
TCDA: Temporally-Consistent Depth-Aware Unsupervised Domain Adaptation for Semantic Segmentation in Urban Scenes |
| title_fullStr |
TCDA: Temporally-Consistent Depth-Aware Unsupervised Domain Adaptation for Semantic Segmentation in Urban Scenes |
| title_full_unstemmed |
TCDA: Temporally-Consistent Depth-Aware Unsupervised Domain Adaptation for Semantic Segmentation in Urban Scenes |
| title_sort |
TCDA: Temporally-Consistent Depth-Aware Unsupervised Domain Adaptation for Semantic Segmentation in Urban Scenes |
| author |
Barbosa, Felipe Manfio |
| author_facet |
Barbosa, Felipe Manfio |
| author_role |
author |
| dc.contributor.none.fl_str_mv |
Osório, Fernando Santos |
| dc.contributor.author.fl_str_mv |
Barbosa, Felipe Manfio |
| dc.subject.por.fl_str_mv |
Adaptação de domínio não supervisionada ADAS ADAS Aprendizado auto-supervisionado Aprendizado profundo Autonomous vehicles Consistência temporal Deep learning Depth Multimodal perception Percepção multimodal Profundidade Segmentação semântica Self-supervised learning Semantic segmentation Temporal consistency Unsupervised domain adaptation Veículos autônomos |
| topic |
Adaptação de domínio não supervisionada ADAS ADAS Aprendizado auto-supervisionado Aprendizado profundo Autonomous vehicles Consistência temporal Deep learning Depth Multimodal perception Percepção multimodal Profundidade Segmentação semântica Self-supervised learning Semantic segmentation Temporal consistency Unsupervised domain adaptation Veículos autônomos |
| description |
Visual perception plays a fundamental role in the correct localization and navigation of autonomous vehicles in urban environments. In this scenario, Semantic Segmentation is a valuable technique, usually derived from RGB images, for identifying elements and their relationships in a given scene. With the advent of Convolutional Neural Networks and Deep Learning, Semantic Segmentation methods reached a new level in terms of accuracy (90% in Cityscapes dataset). Nonetheless, this comes at the cost of heavy and complex network architectures that require high-cost embedded processing centers. Additionally, adapting these models to new operation domains remains an unsolved issue, mainly because fine-tuning for all possible target domains is impractical given the high costs of data annotation. Although simulation engines can provide large amounts of labeled data, the Domain shift between synthetic and real domains still hinders direct translation to application scenarios. In light of that, Unsupervised Domain Adaptation has emerged as a promising alternative by exploring unlabeled data and self-supervised learning as helper tools for Domain Adaptation. However, the majority of the literature builds upon single- frame RGB-only methods. Taking this into account, we argue that a huge opportunity for leveraging Temporal and Depth data as auxiliary sources of supervision is being disregarded. While Depth data has strong structural/geometric value and tend to be more robust to domain shifts, Temporal data provide large amounts of unlabeled data and useful temporal cues for helping in the adaptation process. Therefore, in this work we propose a synthetic-to-real Unsupervised Domain Adaptation method that builds upon three cornerstones: efficiency, depth and temporal awareness. The results demonstrate that the proposed method improves both segmentation quality and temporal consistency in the target domain. Specifically, it achieves an overall improvement of approximately 250% in the mean Intersection over Union (mIoU) for critical classesroad, sidewalk, car, and personcompared to the baseline (non-adapted) model. |
| publishDate |
2025 |
| dc.date.none.fl_str_mv |
2025-06-02 |
| dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
| dc.type.driver.fl_str_mv |
info:eu-repo/semantics/masterThesis |
| format |
masterThesis |
| status_str |
publishedVersion |
| dc.identifier.uri.fl_str_mv |
https://www.teses.usp.br/teses/disponiveis/55/55134/tde-07082025-113420/ |
| url |
https://www.teses.usp.br/teses/disponiveis/55/55134/tde-07082025-113420/ |
| dc.language.iso.fl_str_mv |
eng |
| language |
eng |
| dc.relation.none.fl_str_mv |
|
| dc.rights.driver.fl_str_mv |
Liberar o conteúdo para acesso público. info:eu-repo/semantics/openAccess |
| rights_invalid_str_mv |
Liberar o conteúdo para acesso público. |
| eu_rights_str_mv |
openAccess |
| dc.format.none.fl_str_mv |
application/pdf |
| dc.coverage.none.fl_str_mv |
|
| dc.publisher.none.fl_str_mv |
Biblioteca Digitais de Teses e Dissertações da USP |
| publisher.none.fl_str_mv |
Biblioteca Digitais de Teses e Dissertações da USP |
| dc.source.none.fl_str_mv |
reponame:Biblioteca Digital de Teses e Dissertações da USP instname:Universidade de São Paulo (USP) instacron:USP |
| instname_str |
Universidade de São Paulo (USP) |
| instacron_str |
USP |
| institution |
USP |
| reponame_str |
Biblioteca Digital de Teses e Dissertações da USP |
| collection |
Biblioteca Digital de Teses e Dissertações da USP |
| repository.name.fl_str_mv |
Biblioteca Digital de Teses e Dissertações da USP - Universidade de São Paulo (USP) |
| repository.mail.fl_str_mv |
virginia@if.usp.br|| atendimento@aguia.usp.br||virginia@if.usp.br |
| _version_ |
1844786147834724352 |