Evaluating and Mitigating Linguistic Discrimination in Large Language Models

Abstract

By training on text in various languages, large language models (LLMs)typically possess multilingual support and demonstrate remarkable capabilitiesin solving tasks described in different languages. However, LLMs can exhibitlinguistic discrimination due to the uneven distribution of training dataacross languages. That is, LLMs are hard to keep the consistency of responseswhen faced with the same task but depicted in different languages. In this study, we first explore the consistency in the LLMs' outputsresponding to queries in various languages from two aspects: safety andquality. We conduct this analysis with two datasets (AdvBench and NQ) based onfour LLMs (Llama2-13b, Gemma-7b, GPT-3.5-turbo and Gemini-pro). The resultsshow that LLMs exhibit stronger human alignment capabilities with queries inEnglish, French, Russian, and Spanish (only 1.04\% of harmful queriessuccessfully jailbreak on average) compared to queries in Bengali, Georgian,Nepali and Maithili (27.7\% of harmful queries jailbreak successfully onaverage). Moreover, for queries in English, Danish, Czech and Slovenian, LLMstend to produce responses with a higher quality (with 0.1494 $F_1$ score onaverage) compared to the other languages. Upon these findings, we proposeLDFighter, a similarity-based voting, to mitigate the linguistic discriminationin LLMs. LDFighter ensures consistent service for different language speakers.We evaluate LDFighter with both benign queries and harmful queries. The resultsshow that LDFighter not only significantly reduces the jailbreak success ratebut also improve the response quality on average, demonstrating itseffectiveness.

Quick Read (beta)

loading the full paper ...