Novel deep learning methods applied for unconstrained visual object recognition and 3D modeling from point clouds

Detalhes bibliográficos
Ano de defesa: 2020
Autor(a) principal: Marcon, Marlon
Orientador(a): Não Informado pela instituição
Banca de defesa: Não Informado pela instituição
Tipo de documento: Tese
Tipo de acesso: Acesso aberto
Idioma: eng
Instituição de defesa: Universidade Federal do Paraná
Dois Vizinhos
Brasil
Pós-Graduação em Informática
UFPR
Programa de Pós-Graduação: Não Informado pela instituição
Departamento: Não Informado pela instituição
País: Não Informado pela instituição
Palavras-chave em Português:
Link de acesso: http://repositorio.utfpr.edu.br/jspui/handle/1/32123
Resumo: Deep-learning-based solutions are rapidly evolving and outperforming classical hand-crafted approaches in the Computer Vision (CV) field. Texture-based methodologies (a.k.a 2D) are mature technologies with proven efficiency in several application scenarios. To work with deep learning for 3D CV and graphics applications is not straightforward. Some factors could be considered: finding a reliable representation from data; annotating data with true and false examples in a supervised fashion training; and achieving invariance to rotation induced during training. Real-time processing for 3D object recognition (3DOR) and pose estimation applications is also untrivial, and standard pipelines focus on the accuracy and do not provide such property. In this doctoral thesis, we present some strategies to tackle these issues. We split this dissertation into two main topics: first focusing on developing reliable techniques for generic feature-based CV applications and the second which proposes strategies to improve object recognition methods. We introduce LEAD, the first unsupervised rotation-equivariant 3D local feature descriptor learned from raw point cloud data. We also realize the first end-to-end learning approach to define and extract the canonical orientation of 3D shapes, which we named Compass. With the achievements of both previous methods, we merge them and propose, the first unsupervised rotation-invariant 3D descriptor, called SOUND. We evaluate our proposal’s impact experimentally, which outperform existing unsupervised methods and achieve competitive results against the supervised approaches through extensive experiments on standard surface registration datasets. To update the traditional pipeline for object recognition and pose estimation, we propose a boosted pipeline that uses saliency detection algorithms, and we found considerable improvement in such methodology. Results confirm that the boosted pipeline can substantially speed up processing time with limited impacts or even recognition accuracy benefits. We conducted a comprehensive study regarding 2D deep networks as off-the-shelf feature extractors and evaluated their 3DOR’s performance. Finally, we propose a novel pipeline to detect objects and estimate their 6DoF pose. To do so, we identify objects in RGB-D images applying visual features and estimate a fine-adjusted object’s pose with 3D local descriptors. Our proposal unlocks real-time processing for pose estimation applications. We trust our achievements will help researchers develop 3D CV applications in the robotics, autonomous driving, and assistive technology fields.