Detalhes bibliográficos
Ano de defesa: |
2023 |
Autor(a) principal: |
Araújo, José Alan Firmiano |
Orientador(a): |
Não Informado pela instituição |
Banca de defesa: |
Não Informado pela instituição |
Tipo de documento: |
Dissertação
|
Tipo de acesso: |
Acesso aberto |
Idioma: |
por |
Instituição de defesa: |
Não Informado pela instituição
|
Programa de Pós-Graduação: |
Não Informado pela instituição
|
Departamento: |
Não Informado pela instituição
|
País: |
Não Informado pela instituição
|
Palavras-chave em Português: |
|
Link de acesso: |
http://repositorio.ufc.br/handle/riufc/78192
|
Resumo: |
Several crimes happen daily, and the first step in the investigation begins with a police report. In cities with high crime rates, it is challenging for the police to handle the detailed analysis of all criminal reports. However, incident reports may be similar as they present the same modus operandi. Given an incident report, the main objective of this work is to determine the most similar or duplicate. A similar police report may be another report with overlapping words or one that shares a similar modus operandi. One possible solution is to represent each police report as a vector of characters and compare the vectors using a similarity function. Different methods can be employed to represent the narrative, including embedding vectors and count-based approaches such as TF-IDF. This research explores the use of pre-trained embedding representations at both the word and sentence levels, such as Universal Sentence Encoder, Word2Vec, RoBERTa, Doc2Vec, among others. We determine the most effective representation for capturing semantic and lexical similarities between police reports by comparing different embedding models. Furthermore, we compare the effectiveness of available pre-trained embedding models with models specifically trained on a corpus of police reports. Another contribution of this work is the development of embedding models trained specifically for the domain of police reports. |