A Deep Learning Approach to Identify Not Suitable for Work Images

Bibliographic Details
Main Author: Bicho, Daniel
Publication Date: 2020
Other Authors: Ferreira, Artur, Datia, Nuno
Format: Article
Language: eng
Source: Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
Download full: https://doi.org/10.34629/ipl.isel.i-ETC.80
Summary: Web Archiving (WA) deals with the preservation of portions of the World Wide Web (WWW) allowing their availability for future access. Arquivo.pt is a WA initiativeholding a huge amount of content, including image files.However, some of these images contain nudity and pornography, that can be offensive for the users, and thus being Not SuitableFor Work (NSFW). This work proposes a methodology to classify NSFW images available at Arquivo.pt, using deep neural network approaches. A large dataset of images is built using Arquivo.pt data and two pre-trained neural network models, namely ResNet and SqueezeNet, are evaluated and improved for the NSFW classification task, using the dataset.The evaluation of these models reported an accuracy of 93% and 72%, respectively. After a fine tuning stage, the accuracy of these models improved to 94% and 89%, respectively.The proposed solution is integrated into the Arquivo.pt Image Search System, enabling the filtering of the problematic NSFW images. At the time of this writing, the proposed solution is in production at https://arquivo.pt/images.jsp
id RCAP_95f10f83d45607e216af7e96219ebb4b
oai_identifier_str oai:i-ETC.journals.isel.pt:article/80
network_acronym_str RCAP
network_name_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository_id_str https://opendoar.ac.uk/repository/7160
spelling A Deep Learning Approach to Identify Not Suitable for Work ImagesComputers; Informatics; MultimediaDeep Learning; Deep Neural Networks; Image Classification; Not Suitable for Work Images; ResNet; SqueezeNet; Redis Message QueueWeb Archiving (WA) deals with the preservation of portions of the World Wide Web (WWW) allowing their availability for future access. Arquivo.pt is a WA initiativeholding a huge amount of content, including image files.However, some of these images contain nudity and pornography, that can be offensive for the users, and thus being Not SuitableFor Work (NSFW). This work proposes a methodology to classify NSFW images available at Arquivo.pt, using deep neural network approaches. A large dataset of images is built using Arquivo.pt data and two pre-trained neural network models, namely ResNet and SqueezeNet, are evaluated and improved for the NSFW classification task, using the dataset.The evaluation of these models reported an accuracy of 93% and 72%, respectively. After a fine tuning stage, the accuracy of these models improved to 94% and 89%, respectively.The proposed solution is integrated into the Arquivo.pt Image Search System, enabling the filtering of the problematic NSFW images. At the time of this writing, the proposed solution is in production at https://arquivo.pt/images.jspISEL - High Institute of Engineering of Lisbon2020-10-16T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttps://doi.org/10.34629/ipl.isel.i-ETC.80oai:i-ETC.journals.isel.pt:article/80i-ETC : ISEL Academic Journal of Electronics Telecommunications and Computers; Vol 6, No 1 (2020): Volume 6; ID-3i-ETC : ISEL Academic Journal of Electronics Telecommunications and Computers; Vol 6, No 1 (2020): Volume 6; ID-32182-4010reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAPenghttp://journals.isel.pt/index.php/i-ETC/article/view/80https://doi.org/10.34629/ipl.isel.i-ETC.80http://journals.isel.pt/index.php/i-ETC/article/view/80/67Copyright (c) 2020 Artur Ferreira, Daniel Bichoinfo:eu-repo/semantics/openAccessBicho, DanielFerreira, ArturDatia, Nuno2022-09-20T15:26:06Zoai:i-ETC.journals.isel.pt:article/80Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-28T10:14:06.060245Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse
dc.title.none.fl_str_mv A Deep Learning Approach to Identify Not Suitable for Work Images
title A Deep Learning Approach to Identify Not Suitable for Work Images
spellingShingle A Deep Learning Approach to Identify Not Suitable for Work Images
Bicho, Daniel
Computers; Informatics; Multimedia
Deep Learning; Deep Neural Networks; Image Classification; Not Suitable for Work Images; ResNet; SqueezeNet; Redis Message Queue
title_short A Deep Learning Approach to Identify Not Suitable for Work Images
title_full A Deep Learning Approach to Identify Not Suitable for Work Images
title_fullStr A Deep Learning Approach to Identify Not Suitable for Work Images
title_full_unstemmed A Deep Learning Approach to Identify Not Suitable for Work Images
title_sort A Deep Learning Approach to Identify Not Suitable for Work Images
author Bicho, Daniel
author_facet Bicho, Daniel
Ferreira, Artur
Datia, Nuno
author_role author
author2 Ferreira, Artur
Datia, Nuno
author2_role author
author
dc.contributor.author.fl_str_mv Bicho, Daniel
Ferreira, Artur
Datia, Nuno
dc.subject.por.fl_str_mv Computers; Informatics; Multimedia
Deep Learning; Deep Neural Networks; Image Classification; Not Suitable for Work Images; ResNet; SqueezeNet; Redis Message Queue
topic Computers; Informatics; Multimedia
Deep Learning; Deep Neural Networks; Image Classification; Not Suitable for Work Images; ResNet; SqueezeNet; Redis Message Queue
description Web Archiving (WA) deals with the preservation of portions of the World Wide Web (WWW) allowing their availability for future access. Arquivo.pt is a WA initiativeholding a huge amount of content, including image files.However, some of these images contain nudity and pornography, that can be offensive for the users, and thus being Not SuitableFor Work (NSFW). This work proposes a methodology to classify NSFW images available at Arquivo.pt, using deep neural network approaches. A large dataset of images is built using Arquivo.pt data and two pre-trained neural network models, namely ResNet and SqueezeNet, are evaluated and improved for the NSFW classification task, using the dataset.The evaluation of these models reported an accuracy of 93% and 72%, respectively. After a fine tuning stage, the accuracy of these models improved to 94% and 89%, respectively.The proposed solution is integrated into the Arquivo.pt Image Search System, enabling the filtering of the problematic NSFW images. At the time of this writing, the proposed solution is in production at https://arquivo.pt/images.jsp
publishDate 2020
dc.date.none.fl_str_mv 2020-10-16T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://doi.org/10.34629/ipl.isel.i-ETC.80
oai:i-ETC.journals.isel.pt:article/80
url https://doi.org/10.34629/ipl.isel.i-ETC.80
identifier_str_mv oai:i-ETC.journals.isel.pt:article/80
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv http://journals.isel.pt/index.php/i-ETC/article/view/80
https://doi.org/10.34629/ipl.isel.i-ETC.80
http://journals.isel.pt/index.php/i-ETC/article/view/80/67
dc.rights.driver.fl_str_mv Copyright (c) 2020 Artur Ferreira, Daniel Bicho
info:eu-repo/semantics/openAccess
rights_invalid_str_mv Copyright (c) 2020 Artur Ferreira, Daniel Bicho
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv ISEL - High Institute of Engineering of Lisbon
publisher.none.fl_str_mv ISEL - High Institute of Engineering of Lisbon
dc.source.none.fl_str_mv i-ETC : ISEL Academic Journal of Electronics Telecommunications and Computers; Vol 6, No 1 (2020): Volume 6; ID-3
i-ETC : ISEL Academic Journal of Electronics Telecommunications and Computers; Vol 6, No 1 (2020): Volume 6; ID-3
2182-4010
reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron:RCAAP
instname_str FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron_str RCAAP
institution RCAAP
reponame_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
collection Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository.name.fl_str_mv Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
repository.mail.fl_str_mv info@rcaap.pt
_version_ 1833590615590305792