Contributions to automatic legal document summarization: Judgements from the Portuguese Supreme Court

Bibliographic Details
Main Author: Dias, Margarida Rebelo
Publication Date: 2024
Format: Master thesis
Language: eng
Source: Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
Download full: http://hdl.handle.net/10071/33681
Summary: As information continues to grow in an exponential way, overtaking humans capacity to reach all of it, it is crucial to develop strategies to minimize the time spent on reading and comprehending information. In the legal field, the process of summarization has been used for this purpose, however, it is still done manually by legal experts. This dissertation focuses on testing different summarization models in order to understand their efficacy in automating the summarization process, specifically for Portuguese legal documents from the Portuguese Supreme Court of Justice. Automatic summarization models have been developed in a variety of areas. Conversely, the legal field brings some constraints because of the length of the documents and the particular vocabulary used in them. We implemented three different models: a sentence-level model, a summary-level model, and a hybrid approach to evaluate the generation of summaries using both extractive and abstractive summarization methods. For each experiment, we used two different input texts: the original documents and specific sections from the original documents. For the evaluation process, we use the ROUGE and BERTscore metrics, where we compare the generated summaries with the reference summaries available for each document. The analysis of the results made us conclude that the extractive models are effective at reducing document length, particularly with the summary-level approach, and that abstractive techniques can improve summary fluency. Furthermore, it was confirmed that the use of a summary-level approach has a significant effect on the summarization of Portuguese legal documents.
id RCAP_802b997fb16d741a1dd1ce3667e076aa
oai_identifier_str oai:repositorio.iscte-iul.pt:10071/33681
network_acronym_str RCAP
network_name_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository_id_str https://opendoar.ac.uk/repository/7160
spelling Contributions to automatic legal document summarization: Judgements from the Portuguese Supreme CourtAutomatic text summarizationLegal document summarizationExtractive summarizationAbstractive summarizationEuropean PortugueseSumarização de texto automáticaSumarização de documentos jurídicosSumarização extrativaSumarização abstrativaPortuguês EuropeuAs information continues to grow in an exponential way, overtaking humans capacity to reach all of it, it is crucial to develop strategies to minimize the time spent on reading and comprehending information. In the legal field, the process of summarization has been used for this purpose, however, it is still done manually by legal experts. This dissertation focuses on testing different summarization models in order to understand their efficacy in automating the summarization process, specifically for Portuguese legal documents from the Portuguese Supreme Court of Justice. Automatic summarization models have been developed in a variety of areas. Conversely, the legal field brings some constraints because of the length of the documents and the particular vocabulary used in them. We implemented three different models: a sentence-level model, a summary-level model, and a hybrid approach to evaluate the generation of summaries using both extractive and abstractive summarization methods. For each experiment, we used two different input texts: the original documents and specific sections from the original documents. For the evaluation process, we use the ROUGE and BERTscore metrics, where we compare the generated summaries with the reference summaries available for each document. The analysis of the results made us conclude that the extractive models are effective at reducing document length, particularly with the summary-level approach, and that abstractive techniques can improve summary fluency. Furthermore, it was confirmed that the use of a summary-level approach has a significant effect on the summarization of Portuguese legal documents.Com o aumento exponencial das diferentes formas de informação, ultrapassando a capacidade humana de as acompanhar, torna-se crucial desenvolver estratégias que minimizem o tempo gasto tanto na leitura como na compreensão da informação. No meio jurídico, o processo de sumarização tem sido requerido para este fim, no entanto sendo feito manualmente. Esta dissertação foca-se na avaliação de diferentes modelos de sumarização cujo objetivo é entender a eficácia dos mesmos na automatização do processo de sumarização, especificamente para documentos jurídicos portugueses do Supremo Tribunal de Justiça. Diferentes modelos de sumarização têm sido desenvolvidos em várias áreas. O meio jurídico apresenta algumas limitações devido não só à extensão dos documentos, mas também ao vocabulário específico utilizado. Neste trabalho, foram desenvolvidos três modelos: um modelo ao nível das frases, um modelo ao nível do sumário e uma abordagem híbrida. Estas implementações tiveram como objetivo perceber as diferenças na geração de sumários usando tanto modelos de sumarização extrativos quanto abstrativos. Para cada implementação, usámos dois tipos de input: os documentos originais e secções específicas dos documentos. Para a fase de avaliação, usamos as métricas de avaliação ROUGE e BERTscore, onde comparamos os sumários gerados com os de referência. A análise dos resultados levou-nos a concluir que os modelos extrativos são eficazes na redução do tamanho dos documentos, especialmente no modelo ao nível do sumário e a utilizão de algoritmos abstractivos permite tornar o texto mais fluído. Além disso, verificou-se que a experiência ao nível do sumário teve um impacto substancial no processo de sumarização de documentos jurídicos portugueses.2025-03-10T16:17:55Z2024-11-26T00:00:00Z2024-11-262024-09info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/masterThesisapplication/pdfhttp://hdl.handle.net/10071/33681TID:203768434engDias, Margarida Rebeloinfo:eu-repo/semantics/openAccessreponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiainstacron:RCAAP2025-03-23T01:21:26Zoai:repositorio.iscte-iul.pt:10071/33681Portal AgregadorONGhttps://www.rcaap.pt/oai/openaireinfo@rcaap.ptopendoar:https://opendoar.ac.uk/repository/71602025-05-29T04:39:26.145142Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologiafalse
dc.title.none.fl_str_mv Contributions to automatic legal document summarization: Judgements from the Portuguese Supreme Court
title Contributions to automatic legal document summarization: Judgements from the Portuguese Supreme Court
spellingShingle Contributions to automatic legal document summarization: Judgements from the Portuguese Supreme Court
Dias, Margarida Rebelo
Automatic text summarization
Legal document summarization
Extractive summarization
Abstractive summarization
European Portuguese
Sumarização de texto automática
Sumarização de documentos jurídicos
Sumarização extrativa
Sumarização abstrativa
Português Europeu
title_short Contributions to automatic legal document summarization: Judgements from the Portuguese Supreme Court
title_full Contributions to automatic legal document summarization: Judgements from the Portuguese Supreme Court
title_fullStr Contributions to automatic legal document summarization: Judgements from the Portuguese Supreme Court
title_full_unstemmed Contributions to automatic legal document summarization: Judgements from the Portuguese Supreme Court
title_sort Contributions to automatic legal document summarization: Judgements from the Portuguese Supreme Court
author Dias, Margarida Rebelo
author_facet Dias, Margarida Rebelo
author_role author
dc.contributor.author.fl_str_mv Dias, Margarida Rebelo
dc.subject.por.fl_str_mv Automatic text summarization
Legal document summarization
Extractive summarization
Abstractive summarization
European Portuguese
Sumarização de texto automática
Sumarização de documentos jurídicos
Sumarização extrativa
Sumarização abstrativa
Português Europeu
topic Automatic text summarization
Legal document summarization
Extractive summarization
Abstractive summarization
European Portuguese
Sumarização de texto automática
Sumarização de documentos jurídicos
Sumarização extrativa
Sumarização abstrativa
Português Europeu
description As information continues to grow in an exponential way, overtaking humans capacity to reach all of it, it is crucial to develop strategies to minimize the time spent on reading and comprehending information. In the legal field, the process of summarization has been used for this purpose, however, it is still done manually by legal experts. This dissertation focuses on testing different summarization models in order to understand their efficacy in automating the summarization process, specifically for Portuguese legal documents from the Portuguese Supreme Court of Justice. Automatic summarization models have been developed in a variety of areas. Conversely, the legal field brings some constraints because of the length of the documents and the particular vocabulary used in them. We implemented three different models: a sentence-level model, a summary-level model, and a hybrid approach to evaluate the generation of summaries using both extractive and abstractive summarization methods. For each experiment, we used two different input texts: the original documents and specific sections from the original documents. For the evaluation process, we use the ROUGE and BERTscore metrics, where we compare the generated summaries with the reference summaries available for each document. The analysis of the results made us conclude that the extractive models are effective at reducing document length, particularly with the summary-level approach, and that abstractive techniques can improve summary fluency. Furthermore, it was confirmed that the use of a summary-level approach has a significant effect on the summarization of Portuguese legal documents.
publishDate 2024
dc.date.none.fl_str_mv 2024-11-26T00:00:00Z
2024-11-26
2024-09
2025-03-10T16:17:55Z
dc.type.status.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.driver.fl_str_mv info:eu-repo/semantics/masterThesis
format masterThesis
status_str publishedVersion
dc.identifier.uri.fl_str_mv http://hdl.handle.net/10071/33681
TID:203768434
url http://hdl.handle.net/10071/33681
identifier_str_mv TID:203768434
dc.language.iso.fl_str_mv eng
language eng
dc.rights.driver.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
instname:FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron:RCAAP
instname_str FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
instacron_str RCAAP
institution RCAAP
reponame_str Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
collection Repositórios Científicos de Acesso Aberto de Portugal (RCAAP)
repository.name.fl_str_mv Repositórios Científicos de Acesso Aberto de Portugal (RCAAP) - FCCN, serviços digitais da FCT – Fundação para a Ciência e a Tecnologia
repository.mail.fl_str_mv info@rcaap.pt
_version_ 1833602109252042752