On the Multilingual Ability of Decoder-based Pre-trained Language Models: Finding and Controlling Language-Specific Neurons

Abstract

Current decoder-based pre-trained language models (PLMs) successfullydemonstrate multilingual capabilities. However, it is unclear how these modelshandle multilingualism. We analyze the neuron-level internal behavior ofmultilingual decoder-based PLMs, Specifically examining the existence ofneurons that fire ``uniquely for each language'' within decoder-onlymultilingual PLMs. We analyze six languages: English, German, French, Spanish,Chinese, and Japanese, and show that language-specific neurons are unique, witha slight overlap (< 5%) between languages. These neurons are mainly distributedin the models' first and last few layers. This trend remains consistent acrosslanguages and models. Additionally, we tamper with less than 1% of the totalneurons in each model during inference and demonstrate that tampering with afew language-specific neurons drastically changes the probability of targetlanguage occurrence in text generation.

Quick Read (beta)

loading the full paper ...