6D Pose Estimation and Object Recognition

Bibliographic Details
Main Author: Pereira, Nuno José Matos
Publication Date: 2024
Language: eng
Source: Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
Download full: http://hdl.handle.net/10400.6/14177
Summary: 6D pose estimation is a computer vision task where the objective is to estimate the 3 degrees of freedom of the object’s position (translation vector) and the other 3 degrees of freedom for the object’s orientation (rotation matrix). 6D pose estimation is a hard problem to tackle due to the possible scene cluttering, illumination variability, object truncations, and different shapes, sizes, textures, and similarities between objects. However, 6D pose estimation methods are used in multiple contexts like augmented reality, for example, where badly placed objects into the real-world can break the experience of augmented reality. Another application example is the use of augmented reality in the industry to train new and competent workers where virtual objects need to be placed in the correct positions to look like real objects or simulate their placement in the correct positions. In the context of Industry 4.0, robotic systems require adaptation to handle unconstrained pick-and-place tasks, human-robot interaction and collaboration, and autonomous robot movement. These environments and tasks are dependent on methods that perform object detection, object localization, object segmentation, and object pose estimation. To have accurate robotic manipulation, unconstrained pick-and-place, and scene understanding, accurate object detection and 6D pose estimation methods are needed. This thesis presents methods that were developed to tackle the 6D pose estimation problem as-well as the implementations of proposed pipelines in the real-world. To use the proposed pipelines in the real-world a data set needed to be capture and annotated to train and test the methods. Some controlling robot routines and interfaces were developed in order to be able to control a UR3 robot in the pipelines. The MaskedFusion method, proposed by us, achieves pose estimation accuracy below 6mm in the LineMOD dataset and an AUC score of 93.3% in the challenging YCB-Video dataset. Despite longer training time, MaskedFusion demonstrates low inference time, making it suitable for real-time applications. A study was performed about the effectiveness of employing different color spaces and improved segmentation algorithms to enhance the accuracy of 6D pose estimation methods. Moreover, the proposed MPF6D outperforms other approaches, achieving remarkable accuracy of 99.7% in the LineMOD dataset and 98.06% in the YCB-Video dataset, showcasing its potential for high-precision 6D pose estimation. Additionally, the thesis presents object grasping methods with exceptional accuracy. The first approach, comprising data capture, object detection, 6D pose estimation, grasping detection, robot planning, and motion execution, achieves a 90% success rate in non-controlled environment tests. Leveraging a diverse dataset with varying light conditions proves critical for accurate performance in real-world scenarios. Furthermore, an alternative method demonstrates accurate object grasping without relying on 6D pose estimation, offering faster execution and requiring less computational power. With a remarkable 96% accuracy and an average execution time of 5.59 seconds on a laptop without an NVIDIA GPU, this method demonstrates efficiency and practicality performing unconstrained pick-and-place tasks using a UR3 robot.
id RCAP_b95cd44a99d96e61142e7b3779474c3d
oai_identifier_str oai:ubibliorum.ubi.pt:10400.6/14177
network_acronym_str RCAP
network_name_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository_id_str https://opendoar.ac.uk/repository/7160
spelling 6D Pose Estimation and Object RecognitionInteligência ArtificialPostura de ObjectosDeep LearningRedes Neuronais ArtificiaisRedes Neuronais de convoluçãoSegmentação de imagensRobóticaPick-and-placeObject GraspingArtificial IntelligenceArtificial Neural NetworksRobotics6D pose estimation is a computer vision task where the objective is to estimate the 3 degrees of freedom of the object’s position (translation vector) and the other 3 degrees of freedom for the object’s orientation (rotation matrix). 6D pose estimation is a hard problem to tackle due to the possible scene cluttering, illumination variability, object truncations, and different shapes, sizes, textures, and similarities between objects. However, 6D pose estimation methods are used in multiple contexts like augmented reality, for example, where badly placed objects into the real-world can break the experience of augmented reality. Another application example is the use of augmented reality in the industry to train new and competent workers where virtual objects need to be placed in the correct positions to look like real objects or simulate their placement in the correct positions. In the context of Industry 4.0, robotic systems require adaptation to handle unconstrained pick-and-place tasks, human-robot interaction and collaboration, and autonomous robot movement. These environments and tasks are dependent on methods that perform object detection, object localization, object segmentation, and object pose estimation. To have accurate robotic manipulation, unconstrained pick-and-place, and scene understanding, accurate object detection and 6D pose estimation methods are needed. This thesis presents methods that were developed to tackle the 6D pose estimation problem as-well as the implementations of proposed pipelines in the real-world. To use the proposed pipelines in the real-world a data set needed to be capture and annotated to train and test the methods. Some controlling robot routines and interfaces were developed in order to be able to control a UR3 robot in the pipelines. The MaskedFusion method, proposed by us, achieves pose estimation accuracy below 6mm in the LineMOD dataset and an AUC score of 93.3% in the challenging YCB-Video dataset. Despite longer training time, MaskedFusion demonstrates low inference time, making it suitable for real-time applications. A study was performed about the effectiveness of employing different color spaces and improved segmentation algorithms to enhance the accuracy of 6D pose estimation methods. Moreover, the proposed MPF6D outperforms other approaches, achieving remarkable accuracy of 99.7% in the LineMOD dataset and 98.06% in the YCB-Video dataset, showcasing its potential for high-precision 6D pose estimation. Additionally, the thesis presents object grasping methods with exceptional accuracy. The first approach, comprising data capture, object detection, 6D pose estimation, grasping detection, robot planning, and motion execution, achieves a 90% success rate in non-controlled environment tests. Leveraging a diverse dataset with varying light conditions proves critical for accurate performance in real-world scenarios. Furthermore, an alternative method demonstrates accurate object grasping without relying on 6D pose estimation, offering faster execution and requiring less computational power. With a remarkable 96% accuracy and an average execution time of 5.59 seconds on a laptop without an NVIDIA GPU, this method demonstrates efficiency and practicality performing unconstrained pick-and-place tasks using a UR3 robot.Alexandre, Luís Filipe Barbosa de AlmeidauBibliorumPereira, Nuno José Matos2024-01-29T11:18:36Z2024-012024-01-01T00:00:00Zdoctoral thesisinfo:eu-repo/semantics/publishedVersionapplication/pdfhttp://hdl.handle.net/10400.6/14177urn:tid:101652321enginfo:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2025-03-11T15:30:33Zoai:ubibliorum.ubi.pt:10400.6/14177Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-29T01:26:43.440500Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse
dc.title.none.fl_str_mv 6D Pose Estimation and Object Recognition
title 6D Pose Estimation and Object Recognition
spellingShingle 6D Pose Estimation and Object Recognition
Pereira, Nuno José Matos
Inteligência Artificial
Postura de Objectos
Deep Learning
Redes Neuronais Artificiais
Redes Neuronais de convolução
Segmentação de imagens
Robótica
Pick-and-place
Object Grasping
Artificial Intelligence
Artificial Neural Networks
Robotics
title_short 6D Pose Estimation and Object Recognition
title_full 6D Pose Estimation and Object Recognition
title_fullStr 6D Pose Estimation and Object Recognition
title_full_unstemmed 6D Pose Estimation and Object Recognition
title_sort 6D Pose Estimation and Object Recognition
author Pereira, Nuno José Matos
author_facet Pereira, Nuno José Matos
author_role author
dc.contributor.none.fl_str_mv Alexandre, Luís Filipe Barbosa de Almeida
uBibliorum
dc.contributor.author.fl_str_mv Pereira, Nuno José Matos
dc.subject.por.fl_str_mv Inteligência Artificial
Postura de Objectos
Deep Learning
Redes Neuronais Artificiais
Redes Neuronais de convolução
Segmentação de imagens
Robótica
Pick-and-place
Object Grasping
Artificial Intelligence
Artificial Neural Networks
Robotics
topic Inteligência Artificial
Postura de Objectos
Deep Learning
Redes Neuronais Artificiais
Redes Neuronais de convolução
Segmentação de imagens
Robótica
Pick-and-place
Object Grasping
Artificial Intelligence
Artificial Neural Networks
Robotics
description 6D pose estimation is a computer vision task where the objective is to estimate the 3 degrees of freedom of the object’s position (translation vector) and the other 3 degrees of freedom for the object’s orientation (rotation matrix). 6D pose estimation is a hard problem to tackle due to the possible scene cluttering, illumination variability, object truncations, and different shapes, sizes, textures, and similarities between objects. However, 6D pose estimation methods are used in multiple contexts like augmented reality, for example, where badly placed objects into the real-world can break the experience of augmented reality. Another application example is the use of augmented reality in the industry to train new and competent workers where virtual objects need to be placed in the correct positions to look like real objects or simulate their placement in the correct positions. In the context of Industry 4.0, robotic systems require adaptation to handle unconstrained pick-and-place tasks, human-robot interaction and collaboration, and autonomous robot movement. These environments and tasks are dependent on methods that perform object detection, object localization, object segmentation, and object pose estimation. To have accurate robotic manipulation, unconstrained pick-and-place, and scene understanding, accurate object detection and 6D pose estimation methods are needed. This thesis presents methods that were developed to tackle the 6D pose estimation problem as-well as the implementations of proposed pipelines in the real-world. To use the proposed pipelines in the real-world a data set needed to be capture and annotated to train and test the methods. Some controlling robot routines and interfaces were developed in order to be able to control a UR3 robot in the pipelines. The MaskedFusion method, proposed by us, achieves pose estimation accuracy below 6mm in the LineMOD dataset and an AUC score of 93.3% in the challenging YCB-Video dataset. Despite longer training time, MaskedFusion demonstrates low inference time, making it suitable for real-time applications. A study was performed about the effectiveness of employing different color spaces and improved segmentation algorithms to enhance the accuracy of 6D pose estimation methods. Moreover, the proposed MPF6D outperforms other approaches, achieving remarkable accuracy of 99.7% in the LineMOD dataset and 98.06% in the YCB-Video dataset, showcasing its potential for high-precision 6D pose estimation. Additionally, the thesis presents object grasping methods with exceptional accuracy. The first approach, comprising data capture, object detection, 6D pose estimation, grasping detection, robot planning, and motion execution, achieves a 90% success rate in non-controlled environment tests. Leveraging a diverse dataset with varying light conditions proves critical for accurate performance in real-world scenarios. Furthermore, an alternative method demonstrates accurate object grasping without relying on 6D pose estimation, offering faster execution and requiring less computational power. With a remarkable 96% accuracy and an average execution time of 5.59 seconds on a laptop without an NVIDIA GPU, this method demonstrates efficiency and practicality performing unconstrained pick-and-place tasks using a UR3 robot.
publishDate 2024
dc.date.none.fl_str_mv 2024-01-29T11:18:36Z
2024-01
2024-01-01T00:00:00Z
dc.type.driver.fl_str_mv doctoral thesis
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10400.6/14177
urn:tid:101652321
url http://hdl.handle.net/10400.6/14177
identifier_str_mv urn:tid:101652321
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron:RCAAP
instname_str FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron_str RCAAP
institution RCAAP
reponame_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
collection Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository.name.fl_str_mv Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
repository.mail.fl_str_mv info@rcaap.pt
_version_ 1833600985044353024