A deep learning approach to identify not suitable for work images

Bibliographic Details
Main Author: Bicho, Daniel
Publication Date: 2020
Other Authors: J. Ferreira, Artur, Datia, Nuno
Format: Article
Language: eng
Source: Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
Download full: http://hdl.handle.net/10400.21/12354
Summary: Web Archiving (WA) deals with the preservation of portions of the World Wide Web (WWW) allowing their availability for the future. Arquivo.pt is a WA initiative holding a huge amount of content, including image files. However, some of these images contain nudity and pornography, that can be offensive for the users, and thus being Not Suitable For Work (NSFW). This work proposes a solution to classify NSFW images found at Arquivo.pt, with deep neural network approaches. A large dataset of images is built using Arquivo.pt data and two pre-trained neural network models, namely ResNet and SqueezeNet, are evaluated and improved for the NSFW classification task, using the dataset. The evaluation of these models reported an accuracy of 93% and 72%, respectively. After a fine tuning stage, the accuracy of these models improved to 94% and 89%, respectively. The proposed solution is integrated into the Arquivo.pt Image Search System, available at https://arquivo.pt/images.jsp.
id RCAP_1fb38df67481d6d87f7d2b6337b188d0
oai_identifier_str oai:repositorio.ipl.pt:10400.21/12354
network_acronym_str RCAP
network_name_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository_id_str https://opendoar.ac.uk/repository/7160
spelling A deep learning approach to identify not suitable for work imagesDeep learningDeep neural networksImage classificationNot suitable for work imagesResNetSqueezeNetWeb Archiving (WA) deals with the preservation of portions of the World Wide Web (WWW) allowing their availability for the future. Arquivo.pt is a WA initiative holding a huge amount of content, including image files. However, some of these images contain nudity and pornography, that can be offensive for the users, and thus being Not Suitable For Work (NSFW). This work proposes a solution to classify NSFW images found at Arquivo.pt, with deep neural network approaches. A large dataset of images is built using Arquivo.pt data and two pre-trained neural network models, namely ResNet and SqueezeNet, are evaluated and improved for the NSFW classification task, using the dataset. The evaluation of these models reported an accuracy of 93% and 72%, respectively. After a fine tuning stage, the accuracy of these models improved to 94% and 89%, respectively. The proposed solution is integrated into the Arquivo.pt Image Search System, available at https://arquivo.pt/images.jsp.ISELRCIPLBicho, DanielJ. Ferreira, ArturDatia, Nuno2020-11-06T12:05:51Z20202020-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/10400.21/12354eng2182-4010info:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2025-02-12T10:48:02Zoai:repositorio.ipl.pt:10400.21/12354Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-28T20:08:10.940947Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse
dc.title.none.fl_str_mv A deep learning approach to identify not suitable for work images
title A deep learning approach to identify not suitable for work images
spellingShingle A deep learning approach to identify not suitable for work images
Bicho, Daniel
Deep learning
Deep neural networks
Image classification
Not suitable for work images
ResNet
SqueezeNet
title_short A deep learning approach to identify not suitable for work images
title_full A deep learning approach to identify not suitable for work images
title_fullStr A deep learning approach to identify not suitable for work images
title_full_unstemmed A deep learning approach to identify not suitable for work images
title_sort A deep learning approach to identify not suitable for work images
author Bicho, Daniel
author_facet Bicho, Daniel
J. Ferreira, Artur
Datia, Nuno
author_role author
author2 J. Ferreira, Artur
Datia, Nuno
author2_role author
author
dc.contributor.none.fl_str_mv RCIPL
dc.contributor.author.fl_str_mv Bicho, Daniel
J. Ferreira, Artur
Datia, Nuno
dc.subject.por.fl_str_mv Deep learning
Deep neural networks
Image classification
Not suitable for work images
ResNet
SqueezeNet
topic Deep learning
Deep neural networks
Image classification
Not suitable for work images
ResNet
SqueezeNet
description Web Archiving (WA) deals with the preservation of portions of the World Wide Web (WWW) allowing their availability for the future. Arquivo.pt is a WA initiative holding a huge amount of content, including image files. However, some of these images contain nudity and pornography, that can be offensive for the users, and thus being Not Suitable For Work (NSFW). This work proposes a solution to classify NSFW images found at Arquivo.pt, with deep neural network approaches. A large dataset of images is built using Arquivo.pt data and two pre-trained neural network models, namely ResNet and SqueezeNet, are evaluated and improved for the NSFW classification task, using the dataset. The evaluation of these models reported an accuracy of 93% and 72%, respectively. After a fine tuning stage, the accuracy of these models improved to 94% and 89%, respectively. The proposed solution is integrated into the Arquivo.pt Image Search System, available at https://arquivo.pt/images.jsp.
publishDate 2020
dc.date.none.fl_str_mv 2020-11-06T12:05:51Z
2020
2020-01-01T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10400.21/12354
url http://hdl.handle.net/10400.21/12354
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 2182-4010
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv ISEL
publisher.none.fl_str_mv ISEL
dc.source.none.fl_str_mv reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron:RCAAP
instname_str FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron_str RCAAP
institution RCAAP
reponame_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
collection Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository.name.fl_str_mv Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
repository.mail.fl_str_mv info@rcaap.pt
_version_ 1833598509762215936