A Deep Learning Approach to Identify Not Suitable for Work Images
Main Author: | |
---|---|
Publication Date: | 2020 |
Other Authors: | , |
Format: | Article |
Language: | eng |
Source: | Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
Download full: | https://doi.org/10.34629/ipl.isel.i-ETC.80 |
Summary: | Web Archiving (WA) deals with the preservation of portions of the World Wide Web (WWW) allowing their availability for future access. Arquivo.pt is a WA initiativeholding a huge amount of content, including image files.However, some of these images contain nudity and pornography, that can be offensive for the users, and thus being Not SuitableFor Work (NSFW). This work proposes a methodology to classify NSFW images available at Arquivo.pt, using deep neural network approaches. A large dataset of images is built using Arquivo.pt data and two pre-trained neural network models, namely ResNet and SqueezeNet, are evaluated and improved for the NSFW classification task, using the dataset.The evaluation of these models reported an accuracy of 93% and 72%, respectively. After a fine tuning stage, the accuracy of these models improved to 94% and 89%, respectively.The proposed solution is integrated into the Arquivo.pt Image Search System, enabling the filtering of the problematic NSFW images. At the time of this writing, the proposed solution is in production at https://arquivo.pt/images.jsp |
id |
RCAP_95f10f83d45607e216af7e96219ebb4b |
---|---|
oai_identifier_str |
oai:i-ETC.journals.isel.pt:article/80 |
network_acronym_str |
RCAP |
network_name_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
repository_id_str |
https://opendoar.ac.uk/repository/7160 |
spelling |
A Deep Learning Approach to Identify Not Suitable for Work ImagesComputers; Informatics; MultimediaDeep Learning; Deep Neural Networks; Image Classification; Not Suitable for Work Images; ResNet; SqueezeNet; Redis Message QueueWeb Archiving (WA) deals with the preservation of portions of the World Wide Web (WWW) allowing their availability for future access. Arquivo.pt is a WA initiativeholding a huge amount of content, including image files.However, some of these images contain nudity and pornography, that can be offensive for the users, and thus being Not SuitableFor Work (NSFW). This work proposes a methodology to classify NSFW images available at Arquivo.pt, using deep neural network approaches. A large dataset of images is built using Arquivo.pt data and two pre-trained neural network models, namely ResNet and SqueezeNet, are evaluated and improved for the NSFW classification task, using the dataset.The evaluation of these models reported an accuracy of 93% and 72%, respectively. After a fine tuning stage, the accuracy of these models improved to 94% and 89%, respectively.The proposed solution is integrated into the Arquivo.pt Image Search System, enabling the filtering of the problematic NSFW images. At the time of this writing, the proposed solution is in production at https://arquivo.pt/images.jspISEL - High Institute of Engineering of Lisbon2020-10-16T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttps://doi.org/10.34629/ipl.isel.i-ETC.80oai:i-ETC.journals.isel.pt:article/80i-ETC : ISEL Academic Journal of Electronics Telecommunications and Computers; Vol 6, No 1 (2020): Volume 6; ID-3i-ETC : ISEL Academic Journal of Electronics Telecommunications and Computers; Vol 6, No 1 (2020): Volume 6; ID-32182-4010reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAPenghttp://journals.isel.pt/index.php/i-ETC/article/view/80https://doi.org/10.34629/ipl.isel.i-ETC.80http://journals.isel.pt/index.php/i-ETC/article/view/80/67Copyright (c) 2020 Artur Ferreira, Daniel Bichoinfo:eu-repo/semantics/openAccessBicho, DanielFerreira, ArturDatia, Nuno2022-09-20T15:26:06Zoai:i-ETC.journals.isel.pt:article/80Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-28T10:14:06.060245Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse |
dc.title.none.fl_str_mv |
A Deep Learning Approach to Identify Not Suitable for Work Images |
title |
A Deep Learning Approach to Identify Not Suitable for Work Images |
spellingShingle |
A Deep Learning Approach to Identify Not Suitable for Work Images Bicho, Daniel Computers; Informatics; Multimedia Deep Learning; Deep Neural Networks; Image Classification; Not Suitable for Work Images; ResNet; SqueezeNet; Redis Message Queue |
title_short |
A Deep Learning Approach to Identify Not Suitable for Work Images |
title_full |
A Deep Learning Approach to Identify Not Suitable for Work Images |
title_fullStr |
A Deep Learning Approach to Identify Not Suitable for Work Images |
title_full_unstemmed |
A Deep Learning Approach to Identify Not Suitable for Work Images |
title_sort |
A Deep Learning Approach to Identify Not Suitable for Work Images |
author |
Bicho, Daniel |
author_facet |
Bicho, Daniel Ferreira, Artur Datia, Nuno |
author_role |
author |
author2 |
Ferreira, Artur Datia, Nuno |
author2_role |
author author |
dc.contributor.author.fl_str_mv |
Bicho, Daniel Ferreira, Artur Datia, Nuno |
dc.subject.por.fl_str_mv |
Computers; Informatics; Multimedia Deep Learning; Deep Neural Networks; Image Classification; Not Suitable for Work Images; ResNet; SqueezeNet; Redis Message Queue |
topic |
Computers; Informatics; Multimedia Deep Learning; Deep Neural Networks; Image Classification; Not Suitable for Work Images; ResNet; SqueezeNet; Redis Message Queue |
description |
Web Archiving (WA) deals with the preservation of portions of the World Wide Web (WWW) allowing their availability for future access. Arquivo.pt is a WA initiativeholding a huge amount of content, including image files.However, some of these images contain nudity and pornography, that can be offensive for the users, and thus being Not SuitableFor Work (NSFW). This work proposes a methodology to classify NSFW images available at Arquivo.pt, using deep neural network approaches. A large dataset of images is built using Arquivo.pt data and two pre-trained neural network models, namely ResNet and SqueezeNet, are evaluated and improved for the NSFW classification task, using the dataset.The evaluation of these models reported an accuracy of 93% and 72%, respectively. After a fine tuning stage, the accuracy of these models improved to 94% and 89%, respectively.The proposed solution is integrated into the Arquivo.pt Image Search System, enabling the filtering of the problematic NSFW images. At the time of this writing, the proposed solution is in production at https://arquivo.pt/images.jsp |
publishDate |
2020 |
dc.date.none.fl_str_mv |
2020-10-16T00:00:00Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
https://doi.org/10.34629/ipl.isel.i-ETC.80 oai:i-ETC.journals.isel.pt:article/80 |
url |
https://doi.org/10.34629/ipl.isel.i-ETC.80 |
identifier_str_mv |
oai:i-ETC.journals.isel.pt:article/80 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
http://journals.isel.pt/index.php/i-ETC/article/view/80 https://doi.org/10.34629/ipl.isel.i-ETC.80 http://journals.isel.pt/index.php/i-ETC/article/view/80/67 |
dc.rights.driver.fl_str_mv |
Copyright (c) 2020 Artur Ferreira, Daniel Bicho info:eu-repo/semantics/openAccess |
rights_invalid_str_mv |
Copyright (c) 2020 Artur Ferreira, Daniel Bicho |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
ISEL - High Institute of Engineering of Lisbon |
publisher.none.fl_str_mv |
ISEL - High Institute of Engineering of Lisbon |
dc.source.none.fl_str_mv |
i-ETC : ISEL Academic Journal of Electronics Telecommunications and Computers; Vol 6, No 1 (2020): Volume 6; ID-3 i-ETC : ISEL Academic Journal of Electronics Telecommunications and Computers; Vol 6, No 1 (2020): Volume 6; ID-3 2182-4010 reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia instacron:RCAAP |
instname_str |
FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
collection |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
repository.name.fl_str_mv |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
repository.mail.fl_str_mv |
info@rcaap.pt |
_version_ |
1833590615590305792 |