Towards Cyberbullying Detection: Building, Benchmarking and Longitudinal Analysis of Aggressiveness and Conflicts/Attacks Datasets from Twitter
Main Author: | |
---|---|
Publication Date: | 2024 |
Other Authors: | , , , , , , , , , , |
Format: | Article |
Language: | eng |
Source: | Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
Download full: | http://hdl.handle.net/10400.5/98143 |
Summary: | Offense and hate speech are a source of online conflicts which have become common in social media and, as such, their study is a growing topic of research in machine learning and natural language processing. This article presents two Portuguese language offense-related datasets that deepen the study of the subject: an Aggressiveness dataset and a Conflicts/Attacks dataset. While the former is similar to other offense detection related datasets, the latter constitutes a novelty due to the use of the history of the interaction between users. Several studies were carried out to construct and analyze the data in the datasets. The first study included gathering expressions of verbal aggression witnessed by adolescents to guide data extraction for the datasets. The second study included extracting data from Twitter (in Portuguese) that matched the most frequent expressions/words/sentences that were identified in the previous study. The third study consisted in the development of the Aggressiveness dataset, the Conflicts/Attacks dataset, and classification models. In our fourth study, we proposed to examine whether online aggression and conflicts/attacks revealed any trend changes over time with a sample of 86 adolescents. With this study, we also proposed to investigate whether the amount of tweets sent over a period of 273 days was related to online aggression and conflicts/attacks. Lastly, we analyzed the percentage of participants who participated in the aggressions and/or attacks/conflicts. |
id |
RCAP_a9a404824c7772257ca7d7a873769a3a |
---|---|
oai_identifier_str |
oai:repositorio.ulisboa.pt:10400.5/98143 |
network_acronym_str |
RCAP |
network_name_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
repository_id_str |
https://opendoar.ac.uk/repository/7160 |
spelling |
Towards Cyberbullying Detection: Building, Benchmarking and Longitudinal Analysis of Aggressiveness and Conflicts/Attacks Datasets from TwitterAggressionOffenseHate speechSocial networksNatural language processingDatasetOffense and hate speech are a source of online conflicts which have become common in social media and, as such, their study is a growing topic of research in machine learning and natural language processing. This article presents two Portuguese language offense-related datasets that deepen the study of the subject: an Aggressiveness dataset and a Conflicts/Attacks dataset. While the former is similar to other offense detection related datasets, the latter constitutes a novelty due to the use of the history of the interaction between users. Several studies were carried out to construct and analyze the data in the datasets. The first study included gathering expressions of verbal aggression witnessed by adolescents to guide data extraction for the datasets. The second study included extracting data from Twitter (in Portuguese) that matched the most frequent expressions/words/sentences that were identified in the previous study. The third study consisted in the development of the Aggressiveness dataset, the Conflicts/Attacks dataset, and classification models. In our fourth study, we proposed to examine whether online aggression and conflicts/attacks revealed any trend changes over time with a sample of 86 adolescents. With this study, we also proposed to investigate whether the amount of tweets sent over a period of 273 days was related to online aggression and conflicts/attacks. Lastly, we analyzed the percentage of participants who participated in the aggressions and/or attacks/conflicts.IEEERepositório da Universidade de LisboaFerreira, PaulaSalgado Pereira, NádiaRosa, HugoOliveira, SofiaCoheur, LuísaFrancisco, SofiaSouza, Sidclay B.Ribeiro, RicardoCarvalho, João P.Paulino, PaulaTrancoso, IsabelVeiga Simão, Ana2025-02-06T09:42:22Z20242024-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/10400.5/98143engFerreira, P., Pereira, N., Rosa, H., Oliveira, S., Coheur, L., Francisco, S., Souza, S., Ribeiro, R., Carvalho, J. P., Paulino, P., Trancoso, I., & Veiga-Simão, A. M. (2024). Towards cyberbullying detection: Building, benchmarking and longitudinal analysis of aggressiveness and conflicts/attacks datasets from Twitter. IEEE Transactions on Affective Computing. https://doi.org/10.1109/TAFFC.2024.351858710.1109/TAFFC.2024.35185871949-3045info:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2025-03-17T16:33:31Zoai:repositorio.ulisboa.pt:10400.5/98143Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-29T04:19:45.290755Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse |
dc.title.none.fl_str_mv |
Towards Cyberbullying Detection: Building, Benchmarking and Longitudinal Analysis of Aggressiveness and Conflicts/Attacks Datasets from Twitter |
title |
Towards Cyberbullying Detection: Building, Benchmarking and Longitudinal Analysis of Aggressiveness and Conflicts/Attacks Datasets from Twitter |
spellingShingle |
Towards Cyberbullying Detection: Building, Benchmarking and Longitudinal Analysis of Aggressiveness and Conflicts/Attacks Datasets from Twitter Ferreira, Paula Aggression Offense Hate speech Social networks Natural language processing Dataset |
title_short |
Towards Cyberbullying Detection: Building, Benchmarking and Longitudinal Analysis of Aggressiveness and Conflicts/Attacks Datasets from Twitter |
title_full |
Towards Cyberbullying Detection: Building, Benchmarking and Longitudinal Analysis of Aggressiveness and Conflicts/Attacks Datasets from Twitter |
title_fullStr |
Towards Cyberbullying Detection: Building, Benchmarking and Longitudinal Analysis of Aggressiveness and Conflicts/Attacks Datasets from Twitter |
title_full_unstemmed |
Towards Cyberbullying Detection: Building, Benchmarking and Longitudinal Analysis of Aggressiveness and Conflicts/Attacks Datasets from Twitter |
title_sort |
Towards Cyberbullying Detection: Building, Benchmarking and Longitudinal Analysis of Aggressiveness and Conflicts/Attacks Datasets from Twitter |
author |
Ferreira, Paula |
author_facet |
Ferreira, Paula Salgado Pereira, Nádia Rosa, Hugo Oliveira, Sofia Coheur, Luísa Francisco, Sofia Souza, Sidclay B. Ribeiro, Ricardo Carvalho, João P. Paulino, Paula Trancoso, Isabel Veiga Simão, Ana |
author_role |
author |
author2 |
Salgado Pereira, Nádia Rosa, Hugo Oliveira, Sofia Coheur, Luísa Francisco, Sofia Souza, Sidclay B. Ribeiro, Ricardo Carvalho, João P. Paulino, Paula Trancoso, Isabel Veiga Simão, Ana |
author2_role |
author author author author author author author author author author author |
dc.contributor.none.fl_str_mv |
Repositório da Universidade de Lisboa |
dc.contributor.author.fl_str_mv |
Ferreira, Paula Salgado Pereira, Nádia Rosa, Hugo Oliveira, Sofia Coheur, Luísa Francisco, Sofia Souza, Sidclay B. Ribeiro, Ricardo Carvalho, João P. Paulino, Paula Trancoso, Isabel Veiga Simão, Ana |
dc.subject.por.fl_str_mv |
Aggression Offense Hate speech Social networks Natural language processing Dataset |
topic |
Aggression Offense Hate speech Social networks Natural language processing Dataset |
description |
Offense and hate speech are a source of online conflicts which have become common in social media and, as such, their study is a growing topic of research in machine learning and natural language processing. This article presents two Portuguese language offense-related datasets that deepen the study of the subject: an Aggressiveness dataset and a Conflicts/Attacks dataset. While the former is similar to other offense detection related datasets, the latter constitutes a novelty due to the use of the history of the interaction between users. Several studies were carried out to construct and analyze the data in the datasets. The first study included gathering expressions of verbal aggression witnessed by adolescents to guide data extraction for the datasets. The second study included extracting data from Twitter (in Portuguese) that matched the most frequent expressions/words/sentences that were identified in the previous study. The third study consisted in the development of the Aggressiveness dataset, the Conflicts/Attacks dataset, and classification models. In our fourth study, we proposed to examine whether online aggression and conflicts/attacks revealed any trend changes over time with a sample of 86 adolescents. With this study, we also proposed to investigate whether the amount of tweets sent over a period of 273 days was related to online aggression and conflicts/attacks. Lastly, we analyzed the percentage of participants who participated in the aggressions and/or attacks/conflicts. |
publishDate |
2024 |
dc.date.none.fl_str_mv |
2024 2024-01-01T00:00:00Z 2025-02-06T09:42:22Z |
dc.type.status.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.driver.fl_str_mv |
info:eu-repo/semantics/article |
format |
article |
status_str |
publishedVersion |
dc.identifier.uri.fl_str_mv |
http://hdl.handle.net/10400.5/98143 |
url |
http://hdl.handle.net/10400.5/98143 |
dc.language.iso.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
Ferreira, P., Pereira, N., Rosa, H., Oliveira, S., Coheur, L., Francisco, S., Souza, S., Ribeiro, R., Carvalho, J. P., Paulino, P., Trancoso, I., & Veiga-Simão, A. M. (2024). Towards cyberbullying detection: Building, benchmarking and longitudinal analysis of aggressiveness and conflicts/attacks datasets from Twitter. IEEE Transactions on Affective Computing. https://doi.org/10.1109/TAFFC.2024.3518587 10.1109/TAFFC.2024.3518587 1949-3045 |
dc.rights.driver.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
IEEE |
publisher.none.fl_str_mv |
IEEE |
dc.source.none.fl_str_mv |
reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia instacron:RCAAP |
instname_str |
FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
instacron_str |
RCAAP |
institution |
RCAAP |
reponame_str |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
collection |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) |
repository.name.fl_str_mv |
Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia |
repository.mail.fl_str_mv |
info@rcaap.pt |
_version_ |
1833602019350282240 |