Development, validation, and application of cyber-bert: using deep learning for large-scale identification and classification of cybersecurity disclosures in SEC filings

Pinheiro, José Ricardo Monteiro

Development, validation, and application of cyber-bert: using deep learning for large-scale identification and classification of cybersecurity disclosures in SEC filings

Detalhes bibliográficos
Ano de defesa:	2024
Autor(a) principal:	Pinheiro, José Ricardo Monteiro
Orientador(a):	Caldas, Miguel Pinto
Banca de defesa:	Não Informado pela instituição
Tipo de documento:	Tese
Tipo de acesso:	Acesso aberto
Idioma:	eng
Instituição de defesa:	Não Informado pela instituição
Programa de Pós-Graduação:	Não Informado pela instituição
Departamento:	Não Informado pela instituição
País:	Não Informado pela instituição
Palavras-chave em Português:	Cibersegurança Divulgação de informações Aprendizado de máquina Processamento de linguagem natural Incidente de segurança da informação
Palavras-chave em Inglês:	Cybersecurity Disclosure Machine learning Natural language processing NLP Deep learning BERT Large language model LLM Cybersecurity breach Text analysis
Link de acesso:	https://hdl.handle.net/10438/35562
Resumo:	As cybersecurity events have emerged among the top global risks, the necessity of firms providing transparent and timely information about them has become mandatory. This has led to the emergence of a rich cybersecurity disclosure research stream. However, some gaps persist in extant literature, including: (a) the use of small samples or short time spans; (b) binary classification (cybersecurity-related versus non-cybersecurity-related), rather than multiple disclosure categories; (c) the use of a dictionary approach instead of machine learning (ML) or superior Large Language Models (LLMs); (d) a scarcity of studies that include timely 8-K filings in addition to annual 10-K filings; and (e) the lack of cybersecurity disclosure studies thoroughly examining boilerplate patterns. This paper describes the development and validation of a deep learning model called CYBER-BERT, and to address these gaps, illustrates its application locating and categorizing 2.5 million cybersecurityrelated phrases contained in all 10-K and 8-K SEC filings over 18 years (2006–2023). As contributions of the study, beside the toolset (CYBER-BERT and a Cybersecurity BI), results from 4 illustrations of its use showed that (a) firms did not file cybersecurity disclosures as timely as intended by the SEC: 95.5% of disclosures were filed via 10-Ks rather than more timely 8-Ks; (b) cybersecurity disclosures have increased significantly: total disclosures grew 343% and breach disclosures grew 510%; (c) content-wise, two cybersecurity categories exhibited high (vulnerability) and medium (action) boilerplate use, and all categories had low readability in two independent measures; and (d) following a breach disclosure, vulnerability and action disclosure activities increased 97.4% and 77.4%, respectively, compared with the previous year. Moreover, implications for research and practice are discussed.

Development, validation, and application of cyber-bert: using deep learning for large-scale identification and classification of cybersecurity disclosures in SEC filings

Registros relacionados