Investigating Neural Machine Translation for Low-Resource Languages: Using Bavarian as a Case Study

Abstract

Machine Translation has made impressive progress in recent years offeringclose to human-level performance on many languages, but studies have primarilyfocused on high-resource languages with broad online presence and resources.With the help of growing Large Language Models, more and more low-resourcelanguages achieve better results through the presence of other languages.However, studies have shown that not all low-resource languages can benefitfrom multilingual systems, especially those with insufficient training andevaluation data. In this paper, we revisit state-of-the-art Neural MachineTranslation techniques to develop automatic translation systems between Germanand Bavarian. We investigate conditions of low-resource languages such as datascarcity and parameter sensitivity and focus on refined solutions that combatlow-resource difficulties and creative solutions such as harnessing languagesimilarity. Our experiment entails applying Back-translation and TransferLearning to automatically generate more training data and achieve highertranslation performance. We demonstrate noisiness in the data and present ourapproach to carry out text preprocessing extensively. Evaluation was conductedusing combined metrics: BLEU, chrF and TER. Statistical significance resultswith Bonferroni correction show surprisingly high baseline systems, and thatBack-translation leads to significant improvement. Furthermore, we present aqualitative analysis of translation errors and system limitations.

Quick Read (beta)

loading the full paper ...