Automated Multi-Language to English Machine Translation Using Generative Pre-Trained Transformers

Abstract

The task of accurate and efficient language translation is an extremelyimportant information processing task. Machine learning enabled and automatedtranslation that is accurate and fast is often a large topic of interest in themachine learning and data science communities. In this study, we examine usinglocal Generative Pretrained Transformer (GPT) models to perform automated zeroshot black-box, sentence wise, multi-natural-language translation into Englishtext. We benchmark 16 different open-source GPT models, with no customfine-tuning, from the Huggingface LLM repository for translating 50 differentnon-English languages into English using translated TED Talk transcripts as thereference dataset. These GPT model inference calls are performed strictlylocally, on single A100 Nvidia GPUs. Benchmark metrics that are reported arelanguage translation accuracy, using BLEU, GLEU, METEOR, and chrF text overlapmeasures, and wall-clock time for each sentence translation. The best overallperforming GPT model for translating into English text for the BLEU metric isReMM-v2-L2-13B with a mean score across all tested languages of $0.152$, forthe GLEU metric is ReMM-v2-L2-13B with a mean score across all tested languagesof $0.256$, for the chrF metric is Llama2-chat-AYT-13B with a mean score acrossall tested languages of $0.448$, and for the METEOR metric is ReMM-v2-L2-13Bwith a mean score across all tested languages of $0.438$.

Quick Read (beta)

loading the full paper ...