Developing and evaluating a machine translation model for English-Brazilian Portuguese in the accounting domain
Ano de defesa: | 2024 |
---|---|
Autor(a) principal: | |
Orientador(a): | |
Banca de defesa: | |
Tipo de documento: | Dissertação |
Tipo de acesso: | Acesso aberto |
Idioma: | eng |
Instituição de defesa: |
Universidade Federal de Minas Gerais
Brasil FALE - FACULDADE DE LETRAS Programa de Pós-Graduação em Estudos Linguísticos UFMG |
Programa de Pós-Graduação: |
Não Informado pela instituição
|
Departamento: |
Não Informado pela instituição
|
País: |
Não Informado pela instituição
|
Palavras-chave em Português: | |
Link de acesso: | http://hdl.handle.net/1843/75576 https://orcid.org/0000-0002-4725-0758 |
Resumo: | This M.A. thesis reports on a study on machine translation (MT) aimed at investigating the performance of neural network-based machine translation (NMT) models, fine-tuned with domain information, compared with large language models (LLMs). To this end, a corpus of International Financial Reporting Standards (IFRS) written in English was compiled along with their human translation into Brazilian Portuguese (henceforth considered our gold standard) and used in experiments. Four experiments were carried out with such a corpus: (1) translation by a generic MT model for romance languages (opus-mt-en-romance) fine-tuned with domain data, developed by the Language Technology Research Group at the University of Helsinki; (2) translation by a generic MT model for Portuguese (opus-mt-tc-big-en-pt) fine-tuned with domain data, also developed by the Language Technology Research roup at the University of Helsinki; (3) translation by a LLM, namely GPT-3.5; and (4) translation by a LLM, namely GPT-4, carried out with the same prompt as task 3 and zero shot. The translation output was evaluated using the BLEU metric (Papineni et al., 2002), having as reference the gold-standard corpus of human translations. A sample of the output from the two models with the best BLEU scores was evaluated manually following a taxonomy of MT translation errors (Caseli; Inácio, 2020). The results of the BLEU metric showed that translation by the generic MT model en-pt (English to Portuguese) fine-tuned with domain data performed the best, with a BLEU value of 0,89. The second-best result was presented by the other fine-tuned model (for romance languages) with a BLEU value of 0,88. The third best performance was achieved by the LLM GPT-4 model translation (BLEU value of 0,83), closely followed by the generic MT model for en-pt translation (BLEU value of 0,79). The manual analysis of the output sample of the two of the best-performing models pointed out the categories of errors most frequently found (lexical and n-gram) when compared with the human gold standard. Overall, our study suggests that a generic MT model fine-tuned with domain data has a slightly better performance than a LLM, a finding that may impact decisions bearing in mind the processing costs of LLMs. |