Novel deep learning methods applied for unconstrained visual object recognition and 3D modeling from point clouds

Bibliographic Details
Main Author: Marcon, Marlon
Publication Date: 2020
Format: Doctoral thesis
Language: eng
Source: Repositório Institucional da UTFPR (da Universidade Tecnológica Federal do Paraná (RIUT))
Download full: http://repositorio.utfpr.edu.br/jspui/handle/1/32123
Summary: Deep-learning-based solutions are rapidly evolving and outperforming classical hand-crafted approaches in the Computer Vision (CV) field. Texture-based methodologies (a.k.a 2D) are mature technologies with proven efficiency in several application scenarios. To work with deep learning for 3D CV and graphics applications is not straightforward. Some factors could be considered: finding a reliable representation from data; annotating data with true and false examples in a supervised fashion training; and achieving invariance to rotation induced during training. Real-time processing for 3D object recognition (3DOR) and pose estimation applications is also untrivial, and standard pipelines focus on the accuracy and do not provide such property. In this doctoral thesis, we present some strategies to tackle these issues. We split this dissertation into two main topics: first focusing on developing reliable techniques for generic feature-based CV applications and the second which proposes strategies to improve object recognition methods. We introduce LEAD, the first unsupervised rotation-equivariant 3D local feature descriptor learned from raw point cloud data. We also realize the first end-to-end learning approach to define and extract the canonical orientation of 3D shapes, which we named Compass. With the achievements of both previous methods, we merge them and propose, the first unsupervised rotation-invariant 3D descriptor, called SOUND. We evaluate our proposal’s impact experimentally, which outperform existing unsupervised methods and achieve competitive results against the supervised approaches through extensive experiments on standard surface registration datasets. To update the traditional pipeline for object recognition and pose estimation, we propose a boosted pipeline that uses saliency detection algorithms, and we found considerable improvement in such methodology. Results confirm that the boosted pipeline can substantially speed up processing time with limited impacts or even recognition accuracy benefits. We conducted a comprehensive study regarding 2D deep networks as off-the-shelf feature extractors and evaluated their 3DOR’s performance. Finally, we propose a novel pipeline to detect objects and estimate their 6DoF pose. To do so, we identify objects in RGB-D images applying visual features and estimate a fine-adjusted object’s pose with 3D local descriptors. Our proposal unlocks real-time processing for pose estimation applications. We trust our achievements will help researchers develop 3D CV applications in the robotics, autonomous driving, and assistive technology fields.
id UTFPR-12_c00d91fe00f6957a698a7433b8bd940a
oai_identifier_str oai:repositorio.utfpr.edu.br:1/32123
network_acronym_str UTFPR-12
network_name_str Repositório Institucional da UTFPR (da Universidade Tecnológica Federal do Paraná (RIUT))
repository_id_str
spelling Novel deep learning methods applied for unconstrained visual object recognition and 3D modeling from point cloudsMétodos de aprendizado profundo aplicados ao reconhecimento irrestrito de objetos e modelagem 3D a partir de nuvens de pontosRedes neurais (Computação)Inteligência computacionalVisão por computadorNeural networks (Computer science)Computational intelligenceComputer visionCNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAOINFORMÁTICA (40001016034P5)Deep-learning-based solutions are rapidly evolving and outperforming classical hand-crafted approaches in the Computer Vision (CV) field. Texture-based methodologies (a.k.a 2D) are mature technologies with proven efficiency in several application scenarios. To work with deep learning for 3D CV and graphics applications is not straightforward. Some factors could be considered: finding a reliable representation from data; annotating data with true and false examples in a supervised fashion training; and achieving invariance to rotation induced during training. Real-time processing for 3D object recognition (3DOR) and pose estimation applications is also untrivial, and standard pipelines focus on the accuracy and do not provide such property. In this doctoral thesis, we present some strategies to tackle these issues. We split this dissertation into two main topics: first focusing on developing reliable techniques for generic feature-based CV applications and the second which proposes strategies to improve object recognition methods. We introduce LEAD, the first unsupervised rotation-equivariant 3D local feature descriptor learned from raw point cloud data. We also realize the first end-to-end learning approach to define and extract the canonical orientation of 3D shapes, which we named Compass. With the achievements of both previous methods, we merge them and propose, the first unsupervised rotation-invariant 3D descriptor, called SOUND. We evaluate our proposal’s impact experimentally, which outperform existing unsupervised methods and achieve competitive results against the supervised approaches through extensive experiments on standard surface registration datasets. To update the traditional pipeline for object recognition and pose estimation, we propose a boosted pipeline that uses saliency detection algorithms, and we found considerable improvement in such methodology. Results confirm that the boosted pipeline can substantially speed up processing time with limited impacts or even recognition accuracy benefits. We conducted a comprehensive study regarding 2D deep networks as off-the-shelf feature extractors and evaluated their 3DOR’s performance. Finally, we propose a novel pipeline to detect objects and estimate their 6DoF pose. To do so, we identify objects in RGB-D images applying visual features and estimate a fine-adjusted object’s pose with 3D local descriptors. Our proposal unlocks real-time processing for pose estimation applications. We trust our achievements will help researchers develop 3D CV applications in the robotics, autonomous driving, and assistive technology fields.Soluções baseadas em aprendizagem profunda evoluíram rapidamente e superaram abordagens clássicas no âmbito da Visão Computacional (VC). Metodologias baseadas em processamento de textura (também chamadas de 2D), são tecnologias maduras com eficiência comprovada em diversos cenários de aplicação. Trabalhar com aprendizagem profunda em aplicações de VC e computação gráfica no contexto 3D não é trivial. Alguns fatores podem ser considerados: encontrar uma representação confiável para os dados; rotular amostras positivas e negativas para situações de aprendizagem supervisionada; e, obter invariância à rotação, induzida durante o treinamento. Processamento em tempo real para aplicações de reconhecimento de objetos e estimativa da pose também são desafiadoras, e métodos tradicionais focam principalmente na acurácia e não provêm tal propriedade. Nesta tese de doutorado, são apresentadas estratégias para lidar com tais situações, e para tal, duas partes principais são apresentadas: a primeira focada no desenvolvimento de técnicas efetivas para aplicações de VC baseadas em características de forma geral, e a segunda, que propõe estratégias para melhorar métodos de reconhecimento de objetos. O descritor LEAD é apresentado, sendo este o primeiro descritor de características local, equivariante à rotação, baseado em aprendizagem não supervisionada a partir de nuvens de pontos. Além disso, esta tese apresenta Compass, o primeiro método para definir e extrair a orientação canônica de formas tridimensionais, utilizando somente aprendizagem profunda. Com a união das duas propostas anteriores, também é apresentado o primeiro descritor local 3D invariante à rotação, baseado em aprendizagem não supervisionada, denominado SOUND. A eficácia das propostas foi avaliada experimentalmente em conjuntos de dados de referência para registro de superfícies 3D, e os resultados demonstram que as propostas deste documento superam outras técnicas não supervisionadas, além de manter-se competitivas com relação às supervisionadas. Relacionado às melhorias nos métodos de reconhecimento de objetos e estimativa da pose, neste trabalho foi proposta uma abordagem que utiliza detecção de objetos salientes, a qual provê melhorias consideráveis em relação a técnicas tradicionais. Resultados confirmam que o método impulsionado pelo uso da saliência, pode acelerar substancialmente o reconhecimento, impactando muito pouco ou até melhorando na acurácia. Também foi conduzido um extensivo estudo relacionado ao uso de arquiteturas baseadas em aprendizagem profunda, como extratores de características independentes, bem como seu desempenho no reconhecimento de objetos tridimensionais. Por fim, um método para detecção de objetos em estimativa da pose com seis graus de liberdade é apresentado. Tal proposta, identifica objetos em imagens RGB-D, extraindo características visuais e estimando a pose de objetos de forma precisa, por meio de descritores locais, e possibilita o processamento em tempo real em aplicações de estimativa da pose. Acredita-se que os avanços apresentados nesta tese, auxiliarão pesquisadores no desenvolvimento de aplicações de VC 3D, em áreas como robótica, direção autônoma e tecnologias assistivas.Universidade Federal do ParanáDois VizinhosBrasilPós-Graduação em InformáticaUFPRSilva, Lucianohttp://orcid.org/0000-0001-6341-1323http://lattes.cnpq.br/9578832375902806Almeida Junior, Jurandy Gomes dehttps://orcid.org/0000-0002-4998-6996http://lattes.cnpq.br/4495269939725770Bellon, Olga Regina Pereirahttps://orcid.org/0000-0003-2683-9704http://lattes.cnpq.br/5948590274082247Silva, Lucianohttp://orcid.org/0000-0001-6341-1323http://lattes.cnpq.br/9578832375902806Minetto, Rodrigohttps://orcid.org/0000-0003-2277-4632http://lattes.cnpq.br/8366112479020867Marcon, Marlon2023-08-17T11:48:32Z2023-08-17T11:48:32Z2020-11-09info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisapplication/pdfMARCON, Marlon. Novel deep learning methods applied for unconstrained visual object recognition and 3D modeling from point clouds. 2020. Tese (Doutorado em Informática) - Universidade Federal do Paraná, Curitiba, 2020.http://repositorio.utfpr.edu.br/jspui/handle/1/32123enghttps://hdl.handle.net/1884/69882info:eu-repo/semantics/openAccessreponame:Repositório Institucional da UTFPR (da Universidade Tecnológica Federal do Paraná (RIUT))instname:Universidade Tecnológica Federal do Paraná (UTFPR)instacron:UTFPR2023-08-18T06:07:24Zoai:repositorio.utfpr.edu.br:1/32123Repositório InstitucionalPUBhttp://repositorio.utfpr.edu.br:8080/oai/requestriut@utfpr.edu.br || sibi@utfpr.edu.bropendoar:2023-08-18T06:07:24Repositório Institucional da UTFPR (da Universidade Tecnológica Federal do Paraná (RIUT)) - Universidade Tecnológica Federal do Paraná (UTFPR)false
dc.title.none.fl_str_mv Novel deep learning methods applied for unconstrained visual object recognition and 3D modeling from point clouds
Métodos de aprendizado profundo aplicados ao reconhecimento irrestrito de objetos e modelagem 3D a partir de nuvens de pontos
title Novel deep learning methods applied for unconstrained visual object recognition and 3D modeling from point clouds
spellingShingle Novel deep learning methods applied for unconstrained visual object recognition and 3D modeling from point clouds
Marcon, Marlon
Redes neurais (Computação)
Inteligência computacional
Visão por computador
Neural networks (Computer science)
Computational intelligence
Computer vision
CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
INFORMÁTICA (40001016034P5)
title_short Novel deep learning methods applied for unconstrained visual object recognition and 3D modeling from point clouds
title_full Novel deep learning methods applied for unconstrained visual object recognition and 3D modeling from point clouds
title_fullStr Novel deep learning methods applied for unconstrained visual object recognition and 3D modeling from point clouds
title_full_unstemmed Novel deep learning methods applied for unconstrained visual object recognition and 3D modeling from point clouds
title_sort Novel deep learning methods applied for unconstrained visual object recognition and 3D modeling from point clouds
author Marcon, Marlon
author_facet Marcon, Marlon
author_role author
dc.contributor.none.fl_str_mv Silva, Luciano
http://orcid.org/0000-0001-6341-1323
http://lattes.cnpq.br/9578832375902806
Almeida Junior, Jurandy Gomes de
https://orcid.org/0000-0002-4998-6996
http://lattes.cnpq.br/4495269939725770
Bellon, Olga Regina Pereira
https://orcid.org/0000-0003-2683-9704
http://lattes.cnpq.br/5948590274082247
Silva, Luciano
http://orcid.org/0000-0001-6341-1323
http://lattes.cnpq.br/9578832375902806
Minetto, Rodrigo
https://orcid.org/0000-0003-2277-4632
http://lattes.cnpq.br/8366112479020867
dc.contributor.author.fl_str_mv Marcon, Marlon
dc.subject.por.fl_str_mv Redes neurais (Computação)
Inteligência computacional
Visão por computador
Neural networks (Computer science)
Computational intelligence
Computer vision
CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
INFORMÁTICA (40001016034P5)
topic Redes neurais (Computação)
Inteligência computacional
Visão por computador
Neural networks (Computer science)
Computational intelligence
Computer vision
CNPQ::CIENCIAS EXATAS E DA TERRA::CIENCIA DA COMPUTACAO
INFORMÁTICA (40001016034P5)
description Deep-learning-based solutions are rapidly evolving and outperforming classical hand-crafted approaches in the Computer Vision (CV) field. Texture-based methodologies (a.k.a 2D) are mature technologies with proven efficiency in several application scenarios. To work with deep learning for 3D CV and graphics applications is not straightforward. Some factors could be considered: finding a reliable representation from data; annotating data with true and false examples in a supervised fashion training; and achieving invariance to rotation induced during training. Real-time processing for 3D object recognition (3DOR) and pose estimation applications is also untrivial, and standard pipelines focus on the accuracy and do not provide such property. In this doctoral thesis, we present some strategies to tackle these issues. We split this dissertation into two main topics: first focusing on developing reliable techniques for generic feature-based CV applications and the second which proposes strategies to improve object recognition methods. We introduce LEAD, the first unsupervised rotation-equivariant 3D local feature descriptor learned from raw point cloud data. We also realize the first end-to-end learning approach to define and extract the canonical orientation of 3D shapes, which we named Compass. With the achievements of both previous methods, we merge them and propose, the first unsupervised rotation-invariant 3D descriptor, called SOUND. We evaluate our proposal’s impact experimentally, which outperform existing unsupervised methods and achieve competitive results against the supervised approaches through extensive experiments on standard surface registration datasets. To update the traditional pipeline for object recognition and pose estimation, we propose a boosted pipeline that uses saliency detection algorithms, and we found considerable improvement in such methodology. Results confirm that the boosted pipeline can substantially speed up processing time with limited impacts or even recognition accuracy benefits. We conducted a comprehensive study regarding 2D deep networks as off-the-shelf feature extractors and evaluated their 3DOR’s performance. Finally, we propose a novel pipeline to detect objects and estimate their 6DoF pose. To do so, we identify objects in RGB-D images applying visual features and estimate a fine-adjusted object’s pose with 3D local descriptors. Our proposal unlocks real-time processing for pose estimation applications. We trust our achievements will help researchers develop 3D CV applications in the robotics, autonomous driving, and assistive technology fields.
publishDate 2020
dc.date.none.fl_str_mv 2020-11-09
2023-08-17T11:48:32Z
2023-08-17T11:48:32Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/doctoralThesis
format doctoralThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv MARCON, Marlon. Novel deep learning methods applied for unconstrained visual object recognition and 3D modeling from point clouds. 2020. Tese (Doutorado em Informática) - Universidade Federal do Paraná, Curitiba, 2020.
http://repositorio.utfpr.edu.br/jspui/handle/1/32123
identifier_str_mv MARCON, Marlon. Novel deep learning methods applied for unconstrained visual object recognition and 3D modeling from point clouds. 2020. Tese (Doutorado em Informática) - Universidade Federal do Paraná, Curitiba, 2020.
url http://repositorio.utfpr.edu.br/jspui/handle/1/32123
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv https://hdl.handle.net/1884/69882
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Universidade Federal do Paraná
Dois Vizinhos
Brasil
Pós-Graduação em Informática
UFPR
publisher.none.fl_str_mv Universidade Federal do Paraná
Dois Vizinhos
Brasil
Pós-Graduação em Informática
UFPR
dc.source.none.fl_str_mv reponame:Repositório Institucional da UTFPR (da Universidade Tecnológica Federal do Paraná (RIUT))
instname:Universidade Tecnológica Federal do Paraná (UTFPR)
instacron:UTFPR
instname_str Universidade Tecnológica Federal do Paraná (UTFPR)
instacron_str UTFPR
institution UTFPR
reponame_str Repositório Institucional da UTFPR (da Universidade Tecnológica Federal do Paraná (RIUT))
collection Repositório Institucional da UTFPR (da Universidade Tecnológica Federal do Paraná (RIUT))
repository.name.fl_str_mv Repositório Institucional da UTFPR (da Universidade Tecnológica Federal do Paraná (RIUT)) - Universidade Tecnológica Federal do Paraná (UTFPR)
repository.mail.fl_str_mv riut@utfpr.edu.br || sibi@utfpr.edu.br
_version_ 1850498037619621888