MAPO: Advancing Multilingual Reasoning through Multilingual Alignment-as-Preference Optimization

Abstract

Though reasoning abilities are considered language-agnostic, existing LLMsexhibit inconsistent reasoning abilities across different languages, e.g.,reasoning in the dominant language like English is superior to other languagesdue to the imbalance of multilingual training data. To enhance reasoningabilities in non-dominant languages, we propose aMultilingual-Alignment-as-Preference Optimization framework (MAPO), aiming toalign the reasoning processes in other languages with the dominant language.Specifically, we harness an off-the-shelf translation model for the consistencybetween answers in non-dominant and dominant languages, which we adopt as thepreference for optimization, e.g., Direct Preference Optimization (DPO) orProximal Policy Optimization (PPO). Experiments show that MAPO stably achievessignificant improvements in the multilingual reasoning of various models on allthree benchmarks (MSVAMP +16.2%, MGSM +6.1%, and MNumGLUESub +13.3%), withimproved reasoning consistency across languages.

Quick Read (beta)

loading the full paper ...