Empirical study of pretrained multilingual language models for zero-shot cross-lingual knowledge transfer in generation

Abstract

Zero-shot cross-lingual knowledge transfer enables the multilingualpretrained language model (mPLM), finetuned on a task in one language, makepredictions for this task in other languages. While being broadly studied fornatural language understanding tasks, the described setting is understudied forgeneration. Previous works notice a frequent problem of generation in a wronglanguage and propose approaches to address it, usually using mT5 as a backbonemodel. In this work, we test alternative mPLMs, such as mBART and NLLB-200,considering full finetuning and parameter-efficient finetuning with adapters.We find that mBART with adapters performs similarly to mT5 of the same size,and NLLB-200 can be competitive in some cases. We also underline the importanceof tuning learning rate used for finetuning, which helps to alleviate theproblem of generation in the wrong language.

Quick Read (beta)

loading the full paper ...