IndicGenBench: A Multilingual Benchmark to Evaluate Generation Capabilities of LLMs on Indic Languages

  • 2024-04-25 18:57:36
  • Harman Singh, Nitish Gupta, Shikhar Bharadwaj, Dinesh Tewari, Partha Talukdar
  • 0

Abstract

As large language models (LLMs) see increasing adoption across the globe, itis imperative for LLMs to be representative of the linguistic diversity of theworld. India is a linguistically diverse country of 1.4 Billion people. Tofacilitate research on multilingual LLM evaluation, we release IndicGenBench -the largest benchmark for evaluating LLMs on user-facing generation tasksacross a diverse set 29 of Indic languages covering 13 scripts and 4 languagefamilies. IndicGenBench is composed of diverse generation tasks likecross-lingual summarization, machine translation, and cross-lingual questionanswering. IndicGenBench extends existing benchmarks to many Indic languagesthrough human curation providing multi-way parallel evaluation data for manyunder-represented Indic languages for the first time. We evaluate a wide rangeof proprietary and open-source LLMs including GPT-3.5, GPT-4, PaLM-2, mT5,Gemma, BLOOM and LLaMA on IndicGenBench in a variety of settings. The largestPaLM-2 models performs the best on most tasks, however, there is a significantperformance gap in all languages compared to English showing that furtherresearch is needed for the development of more inclusive multilingual languagemodels. IndicGenBench is released atwww.github.com/google-research-datasets/indic-gen-bench

 

Quick Read (beta)

loading the full paper ...