Benchmarking Edge Computing Devices for Grape Bunches and Trunks Detection using Accelerated Object Detection Single Shot MultiBox Deep Learning Models

Magalhães, Sandro Costa; Santos, Filipe Neves; Machado, Pedro; Moreira, António Paulo; Dias, Jorge

Benchmarking Edge Computing Devices for Grape Bunches and Trunks Detection using Accelerated Object Detection Single Shot MultiBox Deep Learning Models

Bibliographic Details
Main Author:	Magalhães, Sandro Costa
Publication Date:	2022
Other Authors:	Santos, Filipe Neves, Machado, Pedro, Moreira, António Paulo, Dias, Jorge
Format:	Article
Language:	eng
Source:	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
Download full:	https://hdl.handle.net/10316/114621 https://doi.org/10.1016/j.engappai.2022.105604
Summary:	Purpose: Visual perception enables robots to perceive the environment. Visual data is processed using computer vision algorithms that are usually time-expensive and require powerful devices to process the visual data in real-time, which is unfeasible for open-field robots with limited energy. This work benchmarks the performance of different heterogeneous platforms for object detection in real-time. This research benchmarks three architectures: embedded GPU -- Graphical Processing Units (such as NVIDIA Jetson Nano 2 GB and 4 GB, and NVIDIA Jetson TX2), TPU -- Tensor Processing Unit (such as Coral Dev Board TPU), and DPU -- Deep Learning Processor Unit (such as in AMD-Xilinx ZCU104 Development Board, and AMD-Xilinx Kria KV260 Starter Kit). Method: The authors used the RetinaNet ResNet-50 fine-tuned using the natural VineSet dataset. After the trained model was converted and compiled for target-specific hardware formats to improve the execution efficiency. Conclusions and Results: The platforms were assessed in terms of performance of the evaluation metrics and efficiency (time of inference). Graphical Processing Units (GPUs) were the slowest devices, running at 3 FPS to 5 FPS, and Field Programmable Gate Arrays (FPGAs) were the fastest devices, running at 14 FPS to 25 FPS. The efficiency of the Tensor Processing Unit (TPU) is irrelevant and similar to NVIDIA Jetson TX2. TPU and GPU are the most power-efficient, consuming about 5W. The performance differences, in the evaluation metrics, across devices are irrelevant and have an F1 of about 70 % and mean Average Precision (mAP) of about 60 %.

Item metadata

id	RCAP_0b6f72c36bfa9136e12aea6c455a247b
oai_identifier_str	oai:estudogeral.uc.pt:10316/114621
network_acronym_str	RCAP
network_name_str	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository_id_str	https://opendoar.ac.uk/repository/7160
spelling	Benchmarking Edge Computing Devices for Grape Bunches and Trunks Detection using Accelerated Object Detection Single Shot MultiBox Deep Learning ModelsEmbedded systemsHeterogeneous platformsObject detectionSSD resNetRetinaNet resNetPurpose: Visual perception enables robots to perceive the environment. Visual data is processed using computer vision algorithms that are usually time-expensive and require powerful devices to process the visual data in real-time, which is unfeasible for open-field robots with limited energy. This work benchmarks the performance of different heterogeneous platforms for object detection in real-time. This research benchmarks three architectures: embedded GPU -- Graphical Processing Units (such as NVIDIA Jetson Nano 2 GB and 4 GB, and NVIDIA Jetson TX2), TPU -- Tensor Processing Unit (such as Coral Dev Board TPU), and DPU -- Deep Learning Processor Unit (such as in AMD-Xilinx ZCU104 Development Board, and AMD-Xilinx Kria KV260 Starter Kit). Method: The authors used the RetinaNet ResNet-50 fine-tuned using the natural VineSet dataset. After the trained model was converted and compiled for target-specific hardware formats to improve the execution efficiency. Conclusions and Results: The platforms were assessed in terms of performance of the evaluation metrics and efficiency (time of inference). Graphical Processing Units (GPUs) were the slowest devices, running at 3 FPS to 5 FPS, and Field Programmable Gate Arrays (FPGAs) were the fastest devices, running at 14 FPS to 25 FPS. The efficiency of the Tensor Processing Unit (TPU) is irrelevant and similar to NVIDIA Jetson TX2. TPU and GPU are the most power-efficient, consuming about 5W. The performance differences, in the evaluation metrics, across devices are irrelevant and have an F1 of about 70 % and mean Average Precision (mAP) of about 60 %.Elsevier2022-11-21info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articlehttps://hdl.handle.net/10316/114621https://hdl.handle.net/10316/114621https://doi.org/10.1016/j.engappai.2022.105604eng09521976Magalhães, Sandro CostaSantos, Filipe NevesMachado, PedroMoreira, António PauloDias, Jorgeinfo:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2024-04-03T08:26:26Zoai:estudogeral.uc.pt:10316/114621Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-29T06:07:47.955379Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse
dc.title.none.fl_str_mv	Benchmarking Edge Computing Devices for Grape Bunches and Trunks Detection using Accelerated Object Detection Single Shot MultiBox Deep Learning Models
title	Benchmarking Edge Computing Devices for Grape Bunches and Trunks Detection using Accelerated Object Detection Single Shot MultiBox Deep Learning Models
spellingShingle	Benchmarking Edge Computing Devices for Grape Bunches and Trunks Detection using Accelerated Object Detection Single Shot MultiBox Deep Learning Models Magalhães, Sandro Costa Embedded systems Heterogeneous platforms Object detection SSD resNet RetinaNet resNet
title_short	Benchmarking Edge Computing Devices for Grape Bunches and Trunks Detection using Accelerated Object Detection Single Shot MultiBox Deep Learning Models
title_full	Benchmarking Edge Computing Devices for Grape Bunches and Trunks Detection using Accelerated Object Detection Single Shot MultiBox Deep Learning Models
title_fullStr	Benchmarking Edge Computing Devices for Grape Bunches and Trunks Detection using Accelerated Object Detection Single Shot MultiBox Deep Learning Models
title_full_unstemmed	Benchmarking Edge Computing Devices for Grape Bunches and Trunks Detection using Accelerated Object Detection Single Shot MultiBox Deep Learning Models
title_sort	Benchmarking Edge Computing Devices for Grape Bunches and Trunks Detection using Accelerated Object Detection Single Shot MultiBox Deep Learning Models
author	Magalhães, Sandro Costa
author_facet	Magalhães, Sandro Costa Santos, Filipe Neves Machado, Pedro Moreira, António Paulo Dias, Jorge
author_role	author
author2	Santos, Filipe Neves Machado, Pedro Moreira, António Paulo Dias, Jorge
author2_role	author author author author
dc.contributor.author.fl_str_mv	Magalhães, Sandro Costa Santos, Filipe Neves Machado, Pedro Moreira, António Paulo Dias, Jorge
dc.subject.por.fl_str_mv	Embedded systems Heterogeneous platforms Object detection SSD resNet RetinaNet resNet
topic	Embedded systems Heterogeneous platforms Object detection SSD resNet RetinaNet resNet
description	Purpose: Visual perception enables robots to perceive the environment. Visual data is processed using computer vision algorithms that are usually time-expensive and require powerful devices to process the visual data in real-time, which is unfeasible for open-field robots with limited energy. This work benchmarks the performance of different heterogeneous platforms for object detection in real-time. This research benchmarks three architectures: embedded GPU -- Graphical Processing Units (such as NVIDIA Jetson Nano 2 GB and 4 GB, and NVIDIA Jetson TX2), TPU -- Tensor Processing Unit (such as Coral Dev Board TPU), and DPU -- Deep Learning Processor Unit (such as in AMD-Xilinx ZCU104 Development Board, and AMD-Xilinx Kria KV260 Starter Kit). Method: The authors used the RetinaNet ResNet-50 fine-tuned using the natural VineSet dataset. After the trained model was converted and compiled for target-specific hardware formats to improve the execution efficiency. Conclusions and Results: The platforms were assessed in terms of performance of the evaluation metrics and efficiency (time of inference). Graphical Processing Units (GPUs) were the slowest devices, running at 3 FPS to 5 FPS, and Field Programmable Gate Arrays (FPGAs) were the fastest devices, running at 14 FPS to 25 FPS. The efficiency of the Tensor Processing Unit (TPU) is irrelevant and similar to NVIDIA Jetson TX2. TPU and GPU are the most power-efficient, consuming about 5W. The performance differences, in the evaluation metrics, across devices are irrelevant and have an F1 of about 70 % and mean Average Precision (mAP) of about 60 %.
publishDate	2022
dc.date.none.fl_str_mv	2022-11-21
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/article
format	article
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	https://hdl.handle.net/10316/114621 https://hdl.handle.net/10316/114621 https://doi.org/10.1016/j.engappai.2022.105604
url	https://hdl.handle.net/10316/114621 https://doi.org/10.1016/j.engappai.2022.105604
dc.language.iso.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	09521976
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.publisher.none.fl_str_mv	Elsevier
publisher.none.fl_str_mv	Elsevier
dc.source.none.fl_str_mv	reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia instacron:RCAAP
instname_str	FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron_str	RCAAP
institution	RCAAP
reponame_str	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
collection	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository.name.fl_str_mv	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
repository.mail.fl_str_mv	info@rcaap.pt
_version_	1833602585822494720

Benchmarking Edge Computing Devices for Grape Bunches and Trunks Detection using Accelerated Object Detection Single Shot MultiBox Deep Learning Models

Similar Items