Abstract
Zero-shot cross-lingual knowledge transfer enables the multilingualpretrained language model (mPLM), finetuned on a task in one language, makepredictions for this task in other languages. While being broadly studied fornatural language understanding tasks, the described setting is understudied forgeneration. Previous works notice a frequent problem of generation in a wronglanguage and propose approaches to address it, usually using mT5 as a backbonemodel. In this work, we test alternative mPLMs, such as mBART and NLLB-200,considering full finetuning and parameter-efficient finetuning with adapters.We find that mBART with adapters performs similarly to mT5 of the same size,and NLLB-200 can be competitive in some cases. We also underline the importanceof tuning learning rate used for finetuning, which helps to alleviate theproblem of generation in the wrong language.