Violence detection in audio: evaluating the effectiveness of deep learning models and data augmentation

Durães, Dalila; Veloso, Bruno; Novais, Paulo

Violence detection in audio: evaluating the effectiveness of deep learning models and data augmentation

Bibliographic Details
Main Author:	Durães, Dalila
Publication Date:	2023
Other Authors:	Veloso, Bruno, Novais, Paulo
Format:	Article
Language:	eng
Source:	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
Download full:	https://hdl.handle.net/1822/89936
Summary:	Human nature is inherently intertwined with violence, impacting the lives of numerous individuals. Various forms of violence pervade our society, with physical violence being the most prevalent in our daily lives. The study of human actions has gained significant attention in recent years, with audio (captured by microphones) and video (captured by cameras) being the primary means to record instances of violence. While video requires substantial processing capacity and hardware-software performance, audio presents itself as a viable alternative, offering several advantages beyond these technical considerations. Therefore, it is crucial to represent audio data in a manner conducive to accurate classification. In the context of violence in a car, specific datasets dedicated to this domain are not readily available. As a result, we had to create a custom dataset tailored to this particular scenario. The purpose of curating this dataset was to assess whether it could enhance the detection of violence in car-related situations. Due to the imbalanced nature of the dataset, data augmentation techniques were implemented. Existing literature reveals that Deep Learning (DL) algorithms can effectively classify audio, with a commonly used approach involving the conversion of audio into a mel spectrogram image. Based on the results obtained for that dataset, the EfficientNetB1 neural network demonstrated the highest accuracy (95.06%) in detecting violence in audios, closely followed by EfficientNetB0 (94.19%). Conversely, MobileNetV2 proved to be less capable in classifying instances of violence.

Item metadata

id	RCAP_262477357dba33e522b9fa970033e690
oai_identifier_str	oai:repositorium.sdum.uminho.pt:1822/89936
network_acronym_str	RCAP
network_name_str	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository_id_str	https://opendoar.ac.uk/repository/7160
spelling	Violence detection in audio: evaluating the effectiveness of deep learning models and data augmentationAudioDeep learningHuman action recognitionMachine learningTransfer learningViolence detection in a carHuman nature is inherently intertwined with violence, impacting the lives of numerous individuals. Various forms of violence pervade our society, with physical violence being the most prevalent in our daily lives. The study of human actions has gained significant attention in recent years, with audio (captured by microphones) and video (captured by cameras) being the primary means to record instances of violence. While video requires substantial processing capacity and hardware-software performance, audio presents itself as a viable alternative, offering several advantages beyond these technical considerations. Therefore, it is crucial to represent audio data in a manner conducive to accurate classification. In the context of violence in a car, specific datasets dedicated to this domain are not readily available. As a result, we had to create a custom dataset tailored to this particular scenario. The purpose of curating this dataset was to assess whether it could enhance the detection of violence in car-related situations. Due to the imbalanced nature of the dataset, data augmentation techniques were implemented. Existing literature reveals that Deep Learning (DL) algorithms can effectively classify audio, with a commonly used approach involving the conversion of audio into a mel spectrogram image. Based on the results obtained for that dataset, the EfficientNetB1 neural network demonstrated the highest accuracy (95.06%) in detecting violence in audios, closely followed by EfficientNetB0 (94.19%). Conversely, MobileNetV2 proved to be less capable in classifying instances of violence.FCT - Fundação para a Ciência e a Tecnologia(UIDB/00319/2020)Universidad Internacional de La Rioja (UNIR)Universidade do MinhoDurães, DalilaVeloso, BrunoNovais, Paulo20232023-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttps://hdl.handle.net/1822/89936eng1989-166010.9781/ijimai.2023.08.007info:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2024-05-11T04:55:38Zoai:repositorium.sdum.uminho.pt:1822/89936Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-28T15:02:49.322558Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse
dc.title.none.fl_str_mv	Violence detection in audio: evaluating the effectiveness of deep learning models and data augmentation
title	Violence detection in audio: evaluating the effectiveness of deep learning models and data augmentation
spellingShingle	Violence detection in audio: evaluating the effectiveness of deep learning models and data augmentation Durães, Dalila Audio Deep learning Human action recognition Machine learning Transfer learning Violence detection in a car
title_short	Violence detection in audio: evaluating the effectiveness of deep learning models and data augmentation
title_full	Violence detection in audio: evaluating the effectiveness of deep learning models and data augmentation
title_fullStr	Violence detection in audio: evaluating the effectiveness of deep learning models and data augmentation
title_full_unstemmed	Violence detection in audio: evaluating the effectiveness of deep learning models and data augmentation
title_sort	Violence detection in audio: evaluating the effectiveness of deep learning models and data augmentation
author	Durães, Dalila
author_facet	Durães, Dalila Veloso, Bruno Novais, Paulo
author_role	author
author2	Veloso, Bruno Novais, Paulo
author2_role	author author
dc.contributor.none.fl_str_mv	Universidade do Minho
dc.contributor.author.fl_str_mv	Durães, Dalila Veloso, Bruno Novais, Paulo
dc.subject.por.fl_str_mv	Audio Deep learning Human action recognition Machine learning Transfer learning Violence detection in a car
topic	Audio Deep learning Human action recognition Machine learning Transfer learning Violence detection in a car
description	Human nature is inherently intertwined with violence, impacting the lives of numerous individuals. Various forms of violence pervade our society, with physical violence being the most prevalent in our daily lives. The study of human actions has gained significant attention in recent years, with audio (captured by microphones) and video (captured by cameras) being the primary means to record instances of violence. While video requires substantial processing capacity and hardware-software performance, audio presents itself as a viable alternative, offering several advantages beyond these technical considerations. Therefore, it is crucial to represent audio data in a manner conducive to accurate classification. In the context of violence in a car, specific datasets dedicated to this domain are not readily available. As a result, we had to create a custom dataset tailored to this particular scenario. The purpose of curating this dataset was to assess whether it could enhance the detection of violence in car-related situations. Due to the imbalanced nature of the dataset, data augmentation techniques were implemented. Existing literature reveals that Deep Learning (DL) algorithms can effectively classify audio, with a commonly used approach involving the conversion of audio into a mel spectrogram image. Based on the results obtained for that dataset, the EfficientNetB1 neural network demonstrated the highest accuracy (95.06%) in detecting violence in audios, closely followed by EfficientNetB0 (94.19%). Conversely, MobileNetV2 proved to be less capable in classifying instances of violence.
publishDate	2023
dc.date.none.fl_str_mv	2023 2023-01-01T00:00:00Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/article
format	article
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	https://hdl.handle.net/1822/89936
url	https://hdl.handle.net/1822/89936
dc.language.iso.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	1989-1660 10.9781/ijimai.2023.08.007
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	Universidad Internacional de La Rioja (UNIR)
publisher.none.fl_str_mv	Universidad Internacional de La Rioja (UNIR)
dc.source.none.fl_str_mv	reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia instacron:RCAAP
instname_str	FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron_str	RCAAP
institution	RCAAP
reponame_str	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
collection	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository.name.fl_str_mv	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
repository.mail.fl_str_mv	info@rcaap.pt
_version_	1833595063987339264

Violence detection in audio: evaluating the effectiveness of deep learning models and data augmentation

Similar Items