Mitigating Language-Level Performance Disparity in mPLMs via Teacher Language Selection and Cross-lingual Self-Distillation

  • 2024-04-12 15:19:16
  • Haozhe Zhao, Zefan Cai, Shuzheng Si, Liang Chen, Yufeng He, Kaikai An, Baobao Chang
  • 0

Abstract

Large-scale multilingual Pretrained Language Models (mPLMs) yield impressiveperformance on cross-language tasks, yet significant performance disparitiesexist across different languages within the same mPLM. Previous studiesendeavored to narrow these disparities by supervise fine-tuning the mPLMs withmultilingual data. However, obtaining labeled multilingual data istime-consuming, and fine-tuning mPLM with limited labeled multilingual datamerely encapsulates the knowledge specific to the labeled data. Therefore, weintroduce ALSACE to leverage the learned knowledge from the well-performinglanguages to guide under-performing ones within the same mPLM, eliminating theneed for additional labeled multilingual data. Experiments show that ALSACEeffectively mitigates language-level performance disparity across various mPLMswhile showing the competitive performance on different multilingual NLU tasks,ranging from full resource to limited resource settings. The code for ourapproach is available at https://github.com/pkunlp-icler/ALSACE.

 

Quick Read (beta)

loading the full paper ...