Mitigating Language-Level Performance Disparity in mPLMs via Teacher Language Selection and Cross-lingual Self-Distillation

Abstract

Large-scale multilingual Pretrained Language Models (mPLMs) yield impressiveperformance on cross-language tasks, yet significant performance disparitiesexist across different languages within the same mPLM. Previous studiesendeavored to narrow these disparities by supervise fine-tuning the mPLMs withmultilingual data. However, obtaining labeled multilingual data istime-consuming, and fine-tuning mPLM with limited labeled multilingual datamerely encapsulates the knowledge specific to the labeled data. Therefore, weintroduce ALSACE to leverage the learned knowledge from the well-performinglanguages to guide under-performing ones within the same mPLM, eliminating theneed for additional labeled multilingual data. Experiments show that ALSACEeffectively mitigates language-level performance disparity across various mPLMswhile showing the competitive performance on different multilingual NLU tasks,ranging from full resource to limited resource settings. The code for ourapproach is available at https://github.com/pkunlp-icler/ALSACE.

Quick Read (beta)

loading the full paper ...