Violence detection in audio: evaluating the effectiveness of deep learning models and data augmentation

Bibliographic Details
Main Author: Durães, Dalila
Publication Date: 2023
Other Authors: Veloso, Bruno, Novais, Paulo
Format: Article
Language: eng
Source: Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
Download full: https://hdl.handle.net/1822/89936
Summary: Human nature is inherently intertwined with violence, impacting the lives of numerous individuals. Various forms of violence pervade our society, with physical violence being the most prevalent in our daily lives. The study of human actions has gained significant attention in recent years, with audio (captured by microphones) and video (captured by cameras) being the primary means to record instances of violence. While video requires substantial processing capacity and hardware-software performance, audio presents itself as a viable alternative, offering several advantages beyond these technical considerations. Therefore, it is crucial to represent audio data in a manner conducive to accurate classification. In the context of violence in a car, specific datasets dedicated to this domain are not readily available. As a result, we had to create a custom dataset tailored to this particular scenario. The purpose of curating this dataset was to assess whether it could enhance the detection of violence in car-related situations. Due to the imbalanced nature of the dataset, data augmentation techniques were implemented. Existing literature reveals that Deep Learning (DL) algorithms can effectively classify audio, with a commonly used approach involving the conversion of audio into a mel spectrogram image. Based on the results obtained for that dataset, the EfficientNetB1 neural network demonstrated the highest accuracy (95.06%) in detecting violence in audios, closely followed by EfficientNetB0 (94.19%). Conversely, MobileNetV2 proved to be less capable in classifying instances of violence.
id RCAP_262477357dba33e522b9fa970033e690
oai_identifier_str oai:repositorium.sdum.uminho.pt:1822/89936
network_acronym_str RCAP
network_name_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository_id_str https://opendoar.ac.uk/repository/7160
spelling Violence detection in audio: evaluating the effectiveness of deep learning models and data augmentationAudioDeep learningHuman action recognitionMachine learningTransfer learningViolence detection in a carHuman nature is inherently intertwined with violence, impacting the lives of numerous individuals. Various forms of violence pervade our society, with physical violence being the most prevalent in our daily lives. The study of human actions has gained significant attention in recent years, with audio (captured by microphones) and video (captured by cameras) being the primary means to record instances of violence. While video requires substantial processing capacity and hardware-software performance, audio presents itself as a viable alternative, offering several advantages beyond these technical considerations. Therefore, it is crucial to represent audio data in a manner conducive to accurate classification. In the context of violence in a car, specific datasets dedicated to this domain are not readily available. As a result, we had to create a custom dataset tailored to this particular scenario. The purpose of curating this dataset was to assess whether it could enhance the detection of violence in car-related situations. Due to the imbalanced nature of the dataset, data augmentation techniques were implemented. Existing literature reveals that Deep Learning (DL) algorithms can effectively classify audio, with a commonly used approach involving the conversion of audio into a mel spectrogram image. Based on the results obtained for that dataset, the EfficientNetB1 neural network demonstrated the highest accuracy (95.06%) in detecting violence in audios, closely followed by EfficientNetB0 (94.19%). Conversely, MobileNetV2 proved to be less capable in classifying instances of violence.FCT - Fundação para a Ciência e a Tecnologia(UIDB/00319/2020)Universidad Internacional de La Rioja (UNIR)Universidade do MinhoDurães, DalilaVeloso, BrunoNovais, Paulo20232023-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttps://hdl.handle.net/1822/89936eng1989-166010.9781/ijimai.2023.08.007info:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2024-05-11T04:55:38Zoai:repositorium.sdum.uminho.pt:1822/89936Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-28T15:02:49.322558Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse
dc.title.none.fl_str_mv Violence detection in audio: evaluating the effectiveness of deep learning models and data augmentation
title Violence detection in audio: evaluating the effectiveness of deep learning models and data augmentation
spellingShingle Violence detection in audio: evaluating the effectiveness of deep learning models and data augmentation
Durães, Dalila
Audio
Deep learning
Human action recognition
Machine learning
Transfer learning
Violence detection in a car
title_short Violence detection in audio: evaluating the effectiveness of deep learning models and data augmentation
title_full Violence detection in audio: evaluating the effectiveness of deep learning models and data augmentation
title_fullStr Violence detection in audio: evaluating the effectiveness of deep learning models and data augmentation
title_full_unstemmed Violence detection in audio: evaluating the effectiveness of deep learning models and data augmentation
title_sort Violence detection in audio: evaluating the effectiveness of deep learning models and data augmentation
author Durães, Dalila
author_facet Durães, Dalila
Veloso, Bruno
Novais, Paulo
author_role author
author2 Veloso, Bruno
Novais, Paulo
author2_role author
author
dc.contributor.none.fl_str_mv Universidade do Minho
dc.contributor.author.fl_str_mv Durães, Dalila
Veloso, Bruno
Novais, Paulo
dc.subject.por.fl_str_mv Audio
Deep learning
Human action recognition
Machine learning
Transfer learning
Violence detection in a car
topic Audio
Deep learning
Human action recognition
Machine learning
Transfer learning
Violence detection in a car
description Human nature is inherently intertwined with violence, impacting the lives of numerous individuals. Various forms of violence pervade our society, with physical violence being the most prevalent in our daily lives. The study of human actions has gained significant attention in recent years, with audio (captured by microphones) and video (captured by cameras) being the primary means to record instances of violence. While video requires substantial processing capacity and hardware-software performance, audio presents itself as a viable alternative, offering several advantages beyond these technical considerations. Therefore, it is crucial to represent audio data in a manner conducive to accurate classification. In the context of violence in a car, specific datasets dedicated to this domain are not readily available. As a result, we had to create a custom dataset tailored to this particular scenario. The purpose of curating this dataset was to assess whether it could enhance the detection of violence in car-related situations. Due to the imbalanced nature of the dataset, data augmentation techniques were implemented. Existing literature reveals that Deep Learning (DL) algorithms can effectively classify audio, with a commonly used approach involving the conversion of audio into a mel spectrogram image. Based on the results obtained for that dataset, the EfficientNetB1 neural network demonstrated the highest accuracy (95.06%) in detecting violence in audios, closely followed by EfficientNetB0 (94.19%). Conversely, MobileNetV2 proved to be less capable in classifying instances of violence.
publishDate 2023
dc.date.none.fl_str_mv 2023
2023-01-01T00:00:00Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/article
format article
status_str publishedVersion
dc.identifier.uri.fl_str_mv https://hdl.handle.net/1822/89936
url https://hdl.handle.net/1822/89936
dc.language.iso.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv 1989-1660
10.9781/ijimai.2023.08.007
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Universidad Internacional de La Rioja (UNIR)
publisher.none.fl_str_mv Universidad Internacional de La Rioja (UNIR)
dc.source.none.fl_str_mv reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron:RCAAP
instname_str FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron_str RCAAP
institution RCAAP
reponame_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
collection Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository.name.fl_str_mv Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
repository.mail.fl_str_mv info@rcaap.pt
_version_ 1833595063987339264