Towards Cyberbullying Detection: Building, Benchmarking and Longitudinal Analysis of Aggressiveness and Conflicts/Attacks Datasets from Twitter

Ferreira, Paula; Salgado Pereira, Nádia; Rosa, Hugo; Oliveira, Sofia; Coheur, Luísa; Francisco, Sofia; Souza, Sidclay B.; Ribeiro, Ricardo; Carvalho, João P.; Paulino, Paula; Trancoso, Isabel; Veiga Simão, Ana

Towards Cyberbullying Detection: Building, Benchmarking and Longitudinal Analysis of Aggressiveness and Conflicts/Attacks Datasets from Twitter

Bibliographic Details
Main Author:	Ferreira, Paula
Publication Date:	2024
Other Authors:	Salgado Pereira, Nádia, Rosa, Hugo, Oliveira, Sofia, Coheur, Luísa, Francisco, Sofia, Souza, Sidclay B., Ribeiro, Ricardo, Carvalho, João P., Paulino, Paula, Trancoso, Isabel, Veiga Simão, Ana
Format:	Article
Language:	eng
Source:	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
Download full:	http://hdl.handle.net/10400.5/98143
Summary:	Offense and hate speech are a source of online conflicts which have become common in social media and, as such, their study is a growing topic of research in machine learning and natural language processing. This article presents two Portuguese language offense-related datasets that deepen the study of the subject: an Aggressiveness dataset and a Conflicts/Attacks dataset. While the former is similar to other offense detection related datasets, the latter constitutes a novelty due to the use of the history of the interaction between users. Several studies were carried out to construct and analyze the data in the datasets. The first study included gathering expressions of verbal aggression witnessed by adolescents to guide data extraction for the datasets. The second study included extracting data from Twitter (in Portuguese) that matched the most frequent expressions/words/sentences that were identified in the previous study. The third study consisted in the development of the Aggressiveness dataset, the Conflicts/Attacks dataset, and classification models. In our fourth study, we proposed to examine whether online aggression and conflicts/attacks revealed any trend changes over time with a sample of 86 adolescents. With this study, we also proposed to investigate whether the amount of tweets sent over a period of 273 days was related to online aggression and conflicts/attacks. Lastly, we analyzed the percentage of participants who participated in the aggressions and/or attacks/conflicts.

Item metadata

id	RCAP_a9a404824c7772257ca7d7a873769a3a
oai_identifier_str	oai:repositorio.ulisboa.pt:10400.5/98143
network_acronym_str	RCAP
network_name_str	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository_id_str	https://opendoar.ac.uk/repository/7160
spelling	Towards Cyberbullying Detection: Building, Benchmarking and Longitudinal Analysis of Aggressiveness and Conflicts/Attacks Datasets from TwitterAggressionOffenseHate speechSocial networksNatural language processingDatasetOffense and hate speech are a source of online conflicts which have become common in social media and, as such, their study is a growing topic of research in machine learning and natural language processing. This article presents two Portuguese language offense-related datasets that deepen the study of the subject: an Aggressiveness dataset and a Conflicts/Attacks dataset. While the former is similar to other offense detection related datasets, the latter constitutes a novelty due to the use of the history of the interaction between users. Several studies were carried out to construct and analyze the data in the datasets. The first study included gathering expressions of verbal aggression witnessed by adolescents to guide data extraction for the datasets. The second study included extracting data from Twitter (in Portuguese) that matched the most frequent expressions/words/sentences that were identified in the previous study. The third study consisted in the development of the Aggressiveness dataset, the Conflicts/Attacks dataset, and classification models. In our fourth study, we proposed to examine whether online aggression and conflicts/attacks revealed any trend changes over time with a sample of 86 adolescents. With this study, we also proposed to investigate whether the amount of tweets sent over a period of 273 days was related to online aggression and conflicts/attacks. Lastly, we analyzed the percentage of participants who participated in the aggressions and/or attacks/conflicts.IEEERepositório da Universidade de LisboaFerreira, PaulaSalgado Pereira, NádiaRosa, HugoOliveira, SofiaCoheur, LuísaFrancisco, SofiaSouza, Sidclay B.Ribeiro, RicardoCarvalho, João P.Paulino, PaulaTrancoso, IsabelVeiga Simão, Ana2025-02-06T09:42:22Z20242024-01-01T00:00:00Zinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/10400.5/98143engFerreira, P., Pereira, N., Rosa, H., Oliveira, S., Coheur, L., Francisco, S., Souza, S., Ribeiro, R., Carvalho, J. P., Paulino, P., Trancoso, I., & Veiga-Simão, A. M. (2024). Towards cyberbullying detection: Building, benchmarking and longitudinal analysis of aggressiveness and conflicts/attacks datasets from Twitter. IEEE Transactions on Affective Computing. https://doi.org/10.1109/TAFFC.2024.351858710.1109/TAFFC.2024.35185871949-3045info:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2025-03-17T16:33:31Zoai:repositorio.ulisboa.pt:10400.5/98143Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-29T04:19:45.290755Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse
dc.title.none.fl_str_mv	Towards Cyberbullying Detection: Building, Benchmarking and Longitudinal Analysis of Aggressiveness and Conflicts/Attacks Datasets from Twitter
title	Towards Cyberbullying Detection: Building, Benchmarking and Longitudinal Analysis of Aggressiveness and Conflicts/Attacks Datasets from Twitter
spellingShingle	Towards Cyberbullying Detection: Building, Benchmarking and Longitudinal Analysis of Aggressiveness and Conflicts/Attacks Datasets from Twitter Ferreira, Paula Aggression Offense Hate speech Social networks Natural language processing Dataset
title_short	Towards Cyberbullying Detection: Building, Benchmarking and Longitudinal Analysis of Aggressiveness and Conflicts/Attacks Datasets from Twitter
title_full	Towards Cyberbullying Detection: Building, Benchmarking and Longitudinal Analysis of Aggressiveness and Conflicts/Attacks Datasets from Twitter
title_fullStr	Towards Cyberbullying Detection: Building, Benchmarking and Longitudinal Analysis of Aggressiveness and Conflicts/Attacks Datasets from Twitter
title_full_unstemmed	Towards Cyberbullying Detection: Building, Benchmarking and Longitudinal Analysis of Aggressiveness and Conflicts/Attacks Datasets from Twitter
title_sort	Towards Cyberbullying Detection: Building, Benchmarking and Longitudinal Analysis of Aggressiveness and Conflicts/Attacks Datasets from Twitter
author	Ferreira, Paula
author_facet	Ferreira, Paula Salgado Pereira, Nádia Rosa, Hugo Oliveira, Sofia Coheur, Luísa Francisco, Sofia Souza, Sidclay B. Ribeiro, Ricardo Carvalho, João P. Paulino, Paula Trancoso, Isabel Veiga Simão, Ana
author_role	author
author2	Salgado Pereira, Nádia Rosa, Hugo Oliveira, Sofia Coheur, Luísa Francisco, Sofia Souza, Sidclay B. Ribeiro, Ricardo Carvalho, João P. Paulino, Paula Trancoso, Isabel Veiga Simão, Ana
author2_role	author author author author author author author author author author author
dc.contributor.none.fl_str_mv	Repositório da Universidade de Lisboa
dc.contributor.author.fl_str_mv	Ferreira, Paula Salgado Pereira, Nádia Rosa, Hugo Oliveira, Sofia Coheur, Luísa Francisco, Sofia Souza, Sidclay B. Ribeiro, Ricardo Carvalho, João P. Paulino, Paula Trancoso, Isabel Veiga Simão, Ana
dc.subject.por.fl_str_mv	Aggression Offense Hate speech Social networks Natural language processing Dataset
topic	Aggression Offense Hate speech Social networks Natural language processing Dataset
description	Offense and hate speech are a source of online conflicts which have become common in social media and, as such, their study is a growing topic of research in machine learning and natural language processing. This article presents two Portuguese language offense-related datasets that deepen the study of the subject: an Aggressiveness dataset and a Conflicts/Attacks dataset. While the former is similar to other offense detection related datasets, the latter constitutes a novelty due to the use of the history of the interaction between users. Several studies were carried out to construct and analyze the data in the datasets. The first study included gathering expressions of verbal aggression witnessed by adolescents to guide data extraction for the datasets. The second study included extracting data from Twitter (in Portuguese) that matched the most frequent expressions/words/sentences that were identified in the previous study. The third study consisted in the development of the Aggressiveness dataset, the Conflicts/Attacks dataset, and classification models. In our fourth study, we proposed to examine whether online aggression and conflicts/attacks revealed any trend changes over time with a sample of 86 adolescents. With this study, we also proposed to investigate whether the amount of tweets sent over a period of 273 days was related to online aggression and conflicts/attacks. Lastly, we analyzed the percentage of participants who participated in the aggressions and/or attacks/conflicts.
publishDate	2024
dc.date.none.fl_str_mv	2024 2024-01-01T00:00:00Z 2025-02-06T09:42:22Z
dc.type.status.fl_str_mv	info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv	info:eu-repo/semantics/article
format	article
status_str	publishedVersion
dc.identifier.uri.fl_str_mv	http://hdl.handle.net/10400.5/98143
url	http://hdl.handle.net/10400.5/98143
dc.language.iso.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	Ferreira, P., Pereira, N., Rosa, H., Oliveira, S., Coheur, L., Francisco, S., Souza, S., Ribeiro, R., Carvalho, J. P., Paulino, P., Trancoso, I., & Veiga-Simão, A. M. (2024). Towards cyberbullying detection: Building, benchmarking and longitudinal analysis of aggressiveness and conflicts/attacks datasets from Twitter. IEEE Transactions on Affective Computing. https://doi.org/10.1109/TAFFC.2024.3518587 10.1109/TAFFC.2024.3518587 1949-3045
dc.rights.driver.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	IEEE
publisher.none.fl_str_mv	IEEE
dc.source.none.fl_str_mv	reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia instacron:RCAAP
instname_str	FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron_str	RCAAP
institution	RCAAP
reponame_str	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
collection	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository.name.fl_str_mv	Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
repository.mail.fl_str_mv	info@rcaap.pt
_version_	1833602019350282240

Towards Cyberbullying Detection: Building, Benchmarking and Longitudinal Analysis of Aggressiveness and Conflicts/Attacks Datasets from Twitter

Similar Items