Ethical Reasoning and Moral Value Alignment of LLMs Depend on the Language we Prompt them in

Abstract

Ethical reasoning is a crucial skill for Large Language Models (LLMs).However, moral values are not universal, but rather influenced by language andculture. This paper explores how three prominent LLMs -- GPT-4, ChatGPT, andLlama2-70B-Chat -- perform ethical reasoning in different languages and iftheir moral judgement depend on the language in which they are prompted. Weextend the study of ethical reasoning of LLMs by Rao et al. (2023) to amultilingual setup following their framework of probing LLMs with ethicaldilemmas and policies from three branches of normative ethics: deontology,virtue, and consequentialism. We experiment with six languages: English,Spanish, Russian, Chinese, Hindi, and Swahili. We find that GPT-4 is the mostconsistent and unbiased ethical reasoner across languages, while ChatGPT andLlama2-70B-Chat show significant moral value bias when we move to languagesother than English. Interestingly, the nature of this bias significantly varyacross languages for all LLMs, including GPT-4.

Quick Read (beta)

loading the full paper ...