Resumo: |
In recent years, there have been significant advances in machine translation technologies, leading to questions about their effectiveness compared to human translation. In this master's dissertation, we explore this issue through a corpus-based and machine-learning approach. The compiled corpus includes English texts from the financial area, specifically from listed companies, including translated texts from Portuguese to English and texts written in English by native speakers. The corpus was divided into three subcorpora: an English-native text corpus (comparable corpus), a human translation corpus, and an automatic translation corpus (parallel corpora). We used the Biber Tagger for grammatical analysis and Weka for lexical analysis of the corpora. With the Biber Tagger, we examined the grammatical structures of the corpus. Through Weka, we conducted a lexical analysis of the corpora, identifying differences and similarities between automatic translation, human translation, and texts written by native English speakers. This approach allowed us to create a probabilistic model that can predict, with 85% accuracy, if a translation was produced by a machine or a human translator. We concluded that lexically, it is possible to differentiate automatic translation from human translation; however, grammatically, both translations are nearly identical and at comparable levels to texts written by native English speakers. |
---|