RTP-LX: Can LLMs Evaluate Toxicity in Multilingual Scenarios?

  • 2024-04-22 18:56:26
  • Adrian de Wynter, Ishaan Watts, Nektar Ege Altıntoprak, Tua Wongsangaroonsri, Minghui Zhang, Noura Farra, Lena Baur, Samantha Claudet, Pavel Gajdusek, Can Gören, Qilong Gu, Anna Kaminska, Tomasz Kaminski, Ruby Kuo, Akiko Kyuba, Jongho Lee, Kartik Mathur, Petter Merok, Ivana Milovanović, Nani Paananen, Vesa-Matti Paananen, Anna Pavlenko, Bruno Pereira Vidal, Luciano Strika, Yueh Tsao, Davide Turcato, Oleksandr Vakhno, Judit Velcsov, Anna Vickers, Stéphanie Visser, Herdyan Widarmanto, Andrey Zaikin, Si-Qing Chen
  • 0

Abstract

Large language models (LLMs) and small language models (SLMs) are beingadopted at remarkable speed, although their safety still remains a seriousconcern. With the advent of multilingual S/LLMs, the question now becomes amatter of scale: can we expand multilingual safety evaluations of these modelswith the same velocity at which they are deployed? To this end we introduceRTP-LX, a human-transcreated and human-annotated corpus of toxic prompts andoutputs in 28 languages. RTP-LX follows participatory design practices, and aportion of the corpus is especially designed to detect culturally-specifictoxic language. We evaluate seven S/LLMs on their ability to detect toxiccontent in a culturally-sensitive, multilingual scenario. We find that,although they typically score acceptably in terms of accuracy, they have lowagreement with human judges when judging holistically the toxicity of a prompt,and have difficulty discerning harm in context-dependent scenarios,particularly with subtle-yet-harmful content (e.g. microagressions, bias). Werelease of this dataset to contribute to further reduce harmful uses of thesemodels and improve their safe deployment.

 

Quick Read (beta)

loading the full paper ...