Domain adaptation using randomized knowledge for monocular 6DoF pose estimation

Detalhes bibliográficos
Autor(a) principal: CUNHA, Kelvin Batista da
Data de Publicação: 2024
Tipo de documento: Tese
Idioma: eng
Título da fonte: Repositório Institucional da UFPE
dARK ID: ark:/64986/001300002868t
Texto Completo: https://repositorio.ufpe.br/handle/123456789/58501
Resumo: The 6DoF (six-degrees-of-freedom) pose of rigid objects is pivotal in solving various tasks within computer vision, facilitating seamless interaction between physical and virtual elements. Recent advancements in vision-based pose estimation, particularly through deep learning (DL), have significantly enhanced accuracy. DL models are adept at extracting intricate scene de- tails, empowering them to discern and adapt to diverse scenarios with efficiency. Still, DL methodologies demonstrate exceptional versatility, capable of assimilating various input types. Noteworthy is their ability to distill object features exclusively from RGB data, fitting mod- els that exhibit real-time performance across a spectrum of devices. This capability not only streamlines computational requirements but also broadens the applicability of such models in real-world settings. However, DL often requires extensive datasets tailored to specific tar- get distributions. Acquiring, annotating, and maintaining such datasets is not only costly and time-consuming but also susceptible to inaccuracies, failing to fully encapsulate the application domain. Our initial studies analyzed the impact of distribution shifts on 6DoF pose estimation, revealing models’ reliance on training data and their susceptibility to real-world challenges (i.e., generalization on test set). Variations rarely encountered during training, such as changes in object appearance (e.g., size, color, geometry), environmental conditions (e.g., illumination, motion speed, occlusion), and camera hardware (i.e., when the model is trained with one camera but tested with a different one), can drastically affect model accuracy. To address this challenge, we propose a pipeline that generates a diverse array of synthetic sequences using CAD models of objects. By randomizing scene elements in each frame, even if conditions ap- pear incoherent or surrealistic, we can train supervised models using simulated data, thereby reducing the dependency on labeled real data and enabling adaptation to continuous trans- formations in the target distribution. Furthermore, we extended our pipeline by introducing a novel strategy based on a photo-realistic randomized synthetic generation to mitigate target domain variations within monocular deep 6DoF pose estimation while preserving source fea- tures to reduce the domain gap. Leveraging a combination of NeRF (Neural Radiance Fields) reconstruction and domain randomization techniques, our approach demonstrates the feasibil- ity of achieving accurate pose estimation models with reduced reliance on real data. Finally, we propose a CAD-free 6DoF pose estimation pipeline using randomized frames for object tracking, seamlessly integrating object detection and optical flow. As an additional contribu- tion, we propose C3PO, a cross-device dataset organized for each device according to different challenges in pose estimation. The dataset includes more than 100000 full RGB images with pose annotations for three 3D printed objects and three different cameras, addressing issues such as occlusion, illumination changes, motion blur, color variation, and scale variation. Using C3PO, we can assess the method’s performance in the face of different isolated challenges to analyze the impact of randomized data in each variation. Comprehensive experiments against state-of-the-art methods on publicly available datasets, including linemod, linemod-Occlusion, C3PO, and HomebrewedDB, indicate the validity of our approach. Emphasizing the impact of randomization in addressing challenges associated with domain variations, such as changes in environmental lighting, motion blur, and object occlusion, underscores the significance of our contributions.
id UFPE_0f98c0363e6677e50d60db2a344f37a6
oai_identifier_str oai:repositorio.ufpe.br:123456789/58501
network_acronym_str UFPE
network_name_str Repositório Institucional da UFPE
repository_id_str 2221
spelling Domain adaptation using randomized knowledge for monocular 6DoF pose estimationEstimação de poseDetecção de objetosRandomização de domínioThe 6DoF (six-degrees-of-freedom) pose of rigid objects is pivotal in solving various tasks within computer vision, facilitating seamless interaction between physical and virtual elements. Recent advancements in vision-based pose estimation, particularly through deep learning (DL), have significantly enhanced accuracy. DL models are adept at extracting intricate scene de- tails, empowering them to discern and adapt to diverse scenarios with efficiency. Still, DL methodologies demonstrate exceptional versatility, capable of assimilating various input types. Noteworthy is their ability to distill object features exclusively from RGB data, fitting mod- els that exhibit real-time performance across a spectrum of devices. This capability not only streamlines computational requirements but also broadens the applicability of such models in real-world settings. However, DL often requires extensive datasets tailored to specific tar- get distributions. Acquiring, annotating, and maintaining such datasets is not only costly and time-consuming but also susceptible to inaccuracies, failing to fully encapsulate the application domain. Our initial studies analyzed the impact of distribution shifts on 6DoF pose estimation, revealing models’ reliance on training data and their susceptibility to real-world challenges (i.e., generalization on test set). Variations rarely encountered during training, such as changes in object appearance (e.g., size, color, geometry), environmental conditions (e.g., illumination, motion speed, occlusion), and camera hardware (i.e., when the model is trained with one camera but tested with a different one), can drastically affect model accuracy. To address this challenge, we propose a pipeline that generates a diverse array of synthetic sequences using CAD models of objects. By randomizing scene elements in each frame, even if conditions ap- pear incoherent or surrealistic, we can train supervised models using simulated data, thereby reducing the dependency on labeled real data and enabling adaptation to continuous trans- formations in the target distribution. Furthermore, we extended our pipeline by introducing a novel strategy based on a photo-realistic randomized synthetic generation to mitigate target domain variations within monocular deep 6DoF pose estimation while preserving source fea- tures to reduce the domain gap. Leveraging a combination of NeRF (Neural Radiance Fields) reconstruction and domain randomization techniques, our approach demonstrates the feasibil- ity of achieving accurate pose estimation models with reduced reliance on real data. Finally, we propose a CAD-free 6DoF pose estimation pipeline using randomized frames for object tracking, seamlessly integrating object detection and optical flow. As an additional contribu- tion, we propose C3PO, a cross-device dataset organized for each device according to different challenges in pose estimation. The dataset includes more than 100000 full RGB images with pose annotations for three 3D printed objects and three different cameras, addressing issues such as occlusion, illumination changes, motion blur, color variation, and scale variation. Using C3PO, we can assess the method’s performance in the face of different isolated challenges to analyze the impact of randomized data in each variation. Comprehensive experiments against state-of-the-art methods on publicly available datasets, including linemod, linemod-Occlusion, C3PO, and HomebrewedDB, indicate the validity of our approach. Emphasizing the impact of randomization in addressing challenges associated with domain variations, such as changes in environmental lighting, motion blur, and object occlusion, underscores the significance of our contributions.A pose 6DoF (seis graus de liberdade) de objetos rígidos é fundamental para resolver várias tarefas na visão computacional, facilitando a interação entre elementos físicos e virtuais. Recentes avanços em visão computacional, particularmente através de aprendizado profundo (DL), aumentaram significativamente a precisão das técnicas de estimação de pose. Modelos de DL são capazes de extrair detalhes intrinsecos da cena, capacitando-os a discernir e se adaptar a diversos cenários com eficiência. Além disso, as metodologias de DL demonstram versatilidade excepcional, sendo capazes de assimilar vários tipos de entrada. Ainda, é notável a capacidade deestes métodos em extrair características de objetos exclusivamente a partir de dados RGB, ajustando modelos que apresentam desempenho em tempo real em uma var- iedade de dispositivos. Essa capacidade não só simplifica os requisitos computacionais, mas também amplia a aplicabilidade desses modelos em configurações do mundo real. No entanto, algoritmos de DL muitas vezes requerem grandes bases de dados, adaptadas a distribuições específicas. Adquirir, anotar e manter tais bases não é apenas caro e demorado, mas também suscetível a imprecisões, falhando em encapsular completamente o domínio de aplicação. Nos- sos estudos iniciais analisaram o impacto das mudanças de distribuição dos dados na estimativa de pose 6DoF, revelando a dependência dos modelos aos dados de treinamento e sua suscep- tibilidade aos desafios do mundo real (ou seja, generalização no conjunto de teste). Variações raramente encontradas durante o treinamento, como mudanças na aparência do objeto (por exemplo, tamanho, cor, geometria), condições do ambiente (por exemplo, iluminação, veloci- dade de movimento, oclusão) e hardware da câmera (ou seja, quando o modelo é treinado com uma câmera, mas testado com outra), podem afetar drasticamente a precisão do modelo. Para enfrentar esse desafio, propomos uma pipeline que gera uma variedade diversificada de sequências sintéticas usando modelos CAD de objetos. Ao randomizar elementos da cena em cada quadro, mesmo que as condições pareçam incoerentes ou surrealistas, podemos treinar modelos supervisionados usando dados simulados, reduzindo assim a dependência de dados reais rotulados e permitindo adaptação a transformações contínuas. Além disso, estendemos nossa pipeline introduzindo uma nova estratégia baseada em geração sintética randomizada foto-realista para mitigar variações de domínio na estimativa de pose monocular 6DoF, en- quanto são preservadas características originais da cena para reduzir a lacuna entre o domínio real e simulado. Aproveitando uma combinação de técnicas de reconstrução NeRF (Neural Radiance Fields) e randomização de domínio, nossa abordagem demonstra a viabilidade de alcançar modelos precisos de estimativa de pose com menor dependência de dados reais. Fi- nalmente, propomos uma pipeline de estimativa de pose 6DoF sem CAD usando imagens randomizadas para rastreamento de objetos. Como contribuição adicional, propomos o C3PO, uma base de dados cross-device organizada para diferentes dispositivos de acordo com difer- entes desafios da estimativa de pose. O conjunto de dados inclui mais de 100000 imagens RGB completas com anotações de pose para três objetos impressos em 3D e três câmeras difer- entes, abordando questões como oclusão, mudanças de iluminação, desfoque de movimento, variação de cor e variação de escala. Usando o C3PO, podemos avaliar o desempenho do método diante de diferentes desafios para analisar o impacto dos dados randomizados. Exper- imentos comparativos com o estado-da-arte em conjuntos de dados publicamente disponíveis, incluindo linemod, linemod-Occlusion, C3PO e HomebrewedDB, indicam a validade de nossa abordagem. Destacamos a importância de nossas contribuições enfatizando o impacto da ran- domização nos desafios associados a variações de domínio, como mudanças na iluminação, desfoque de movimento e oclusão de objetos.Universidade Federal de PernambucoUFPEBrasilPrograma de Pos Graduacao em Ciencia da ComputacaoTEICHRIEB, VeronicaSIMÕES, Francisco Paulo Magalhãeshttp://lattes.cnpq.br/6273055129358941http://lattes.cnpq.br/3355338790654065http://lattes.cnpq.br/4321649532287831CUNHA, Kelvin Batista da2024-11-05T16:43:31Z2024-11-05T16:43:31Z2024-06-06info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisapplication/pdfCUNHA, Kelvin Batista da. Domain adaptation using randomized knowledge for monocular 6DoF pose estimation. 2024. Tese (Doutorado em Ciência da Computação) – Universidade Federal de Pernambuco, Recife, 2024.https://repositorio.ufpe.br/handle/123456789/58501ark:/64986/001300002868tengAttribution-NonCommercial-NoDerivs 3.0 Brazilhttp://creativecommons.org/licenses/by-nc-nd/3.0/br/info:eu-repo/semantics/openAccessreponame:Repositório Institucional da UFPEinstname:Universidade Federal de Pernambuco (UFPE)instacron:UFPE2024-11-07T05:39:38Zoai:repositorio.ufpe.br:123456789/58501Repositório InstitucionalPUBhttps://repositorio.ufpe.br/oai/requestattena@ufpe.bropendoar:22212024-11-07T05:39:38Repositório Institucional da UFPE - Universidade Federal de Pernambuco (UFPE)false
dc.title.none.fl_str_mv Domain adaptation using randomized knowledge for monocular 6DoF pose estimation
title Domain adaptation using randomized knowledge for monocular 6DoF pose estimation
spellingShingle Domain adaptation using randomized knowledge for monocular 6DoF pose estimation
CUNHA, Kelvin Batista da
Estimação de pose
Detecção de objetos
Randomização de domínio
title_short Domain adaptation using randomized knowledge for monocular 6DoF pose estimation
title_full Domain adaptation using randomized knowledge for monocular 6DoF pose estimation
title_fullStr Domain adaptation using randomized knowledge for monocular 6DoF pose estimation
title_full_unstemmed Domain adaptation using randomized knowledge for monocular 6DoF pose estimation
title_sort Domain adaptation using randomized knowledge for monocular 6DoF pose estimation
author CUNHA, Kelvin Batista da
author_facet CUNHA, Kelvin Batista da
author_role author
dc.contributor.none.fl_str_mv TEICHRIEB, Veronica
SIMÕES, Francisco Paulo Magalhães
http://lattes.cnpq.br/6273055129358941
http://lattes.cnpq.br/3355338790654065
http://lattes.cnpq.br/4321649532287831
dc.contributor.author.fl_str_mv CUNHA, Kelvin Batista da
dc.subject.por.fl_str_mv Estimação de pose
Detecção de objetos
Randomização de domínio
topic Estimação de pose
Detecção de objetos
Randomização de domínio
description The 6DoF (six-degrees-of-freedom) pose of rigid objects is pivotal in solving various tasks within computer vision, facilitating seamless interaction between physical and virtual elements. Recent advancements in vision-based pose estimation, particularly through deep learning (DL), have significantly enhanced accuracy. DL models are adept at extracting intricate scene de- tails, empowering them to discern and adapt to diverse scenarios with efficiency. Still, DL methodologies demonstrate exceptional versatility, capable of assimilating various input types. Noteworthy is their ability to distill object features exclusively from RGB data, fitting mod- els that exhibit real-time performance across a spectrum of devices. This capability not only streamlines computational requirements but also broadens the applicability of such models in real-world settings. However, DL often requires extensive datasets tailored to specific tar- get distributions. Acquiring, annotating, and maintaining such datasets is not only costly and time-consuming but also susceptible to inaccuracies, failing to fully encapsulate the application domain. Our initial studies analyzed the impact of distribution shifts on 6DoF pose estimation, revealing models’ reliance on training data and their susceptibility to real-world challenges (i.e., generalization on test set). Variations rarely encountered during training, such as changes in object appearance (e.g., size, color, geometry), environmental conditions (e.g., illumination, motion speed, occlusion), and camera hardware (i.e., when the model is trained with one camera but tested with a different one), can drastically affect model accuracy. To address this challenge, we propose a pipeline that generates a diverse array of synthetic sequences using CAD models of objects. By randomizing scene elements in each frame, even if conditions ap- pear incoherent or surrealistic, we can train supervised models using simulated data, thereby reducing the dependency on labeled real data and enabling adaptation to continuous trans- formations in the target distribution. Furthermore, we extended our pipeline by introducing a novel strategy based on a photo-realistic randomized synthetic generation to mitigate target domain variations within monocular deep 6DoF pose estimation while preserving source fea- tures to reduce the domain gap. Leveraging a combination of NeRF (Neural Radiance Fields) reconstruction and domain randomization techniques, our approach demonstrates the feasibil- ity of achieving accurate pose estimation models with reduced reliance on real data. Finally, we propose a CAD-free 6DoF pose estimation pipeline using randomized frames for object tracking, seamlessly integrating object detection and optical flow. As an additional contribu- tion, we propose C3PO, a cross-device dataset organized for each device according to different challenges in pose estimation. The dataset includes more than 100000 full RGB images with pose annotations for three 3D printed objects and three different cameras, addressing issues such as occlusion, illumination changes, motion blur, color variation, and scale variation. Using C3PO, we can assess the method’s performance in the face of different isolated challenges to analyze the impact of randomized data in each variation. Comprehensive experiments against state-of-the-art methods on publicly available datasets, including linemod, linemod-Occlusion, C3PO, and HomebrewedDB, indicate the validity of our approach. Emphasizing the impact of randomization in addressing challenges associated with domain variations, such as changes in environmental lighting, motion blur, and object occlusion, underscores the significance of our contributions.
publishDate 2024
dc.date.none.fl_str_mv 2024-11-05T16:43:31Z
2024-11-05T16:43:31Z
2024-06-06
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/doctoralThesis
format doctoralThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv CUNHA, Kelvin Batista da. Domain adaptation using randomized knowledge for monocular 6DoF pose estimation. 2024. Tese (Doutorado em Ciência da Computação) – Universidade Federal de Pernambuco, Recife, 2024.
https://repositorio.ufpe.br/handle/123456789/58501
dc.identifier.dark.fl_str_mv ark:/64986/001300002868t
identifier_str_mv CUNHA, Kelvin Batista da. Domain adaptation using randomized knowledge for monocular 6DoF pose estimation. 2024. Tese (Doutorado em Ciência da Computação) – Universidade Federal de Pernambuco, Recife, 2024.
ark:/64986/001300002868t
url https://repositorio.ufpe.br/handle/123456789/58501
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv Attribution-NonCommercial-NoDerivs 3.0 Brazil
http://creativecommons.org/licenses/by-nc-nd/3.0/br/
info:eu-repo/semantics/openAccess
rights_invalid_str_mv Attribution-NonCommercial-NoDerivs 3.0 Brazil
http://creativecommons.org/licenses/by-nc-nd/3.0/br/
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Universidade Federal de Pernambuco
UFPE
Brasil
Programa de Pos Graduacao em Ciencia da Computacao
publisher.none.fl_str_mv Universidade Federal de Pernambuco
UFPE
Brasil
Programa de Pos Graduacao em Ciencia da Computacao
dc.source.none.fl_str_mv reponame:Repositório Institucional da UFPE
instname:Universidade Federal de Pernambuco (UFPE)
instacron:UFPE
instname_str Universidade Federal de Pernambuco (UFPE)
instacron_str UFPE
institution UFPE
reponame_str Repositório Institucional da UFPE
collection Repositório Institucional da UFPE
repository.name.fl_str_mv Repositório Institucional da UFPE - Universidade Federal de Pernambuco (UFPE)
repository.mail.fl_str_mv attena@ufpe.br
_version_ 1846272391387283456