UniverSLU: Universal Spoken Language Understanding for Diverse Tasks with Natural Language Instructions

  • 2024-04-03 15:12:36
  • Siddhant Arora, Hayato Futami, Jee-weon Jung, Yifan Peng, Roshan Sharma, Yosuke Kashiwagi, Emiru Tsunoo, Karen Livescu, Shinji Watanabe
  • 0

Abstract

Recent studies leverage large language models with multi-taskingcapabilities, using natural language prompts to guide the model's behavior andsurpassing performance of task-specific models. Motivated by this, we ask: canwe build a single model that jointly performs various spoken languageunderstanding (SLU) tasks? We start by adapting a pre-trained automatic speechrecognition model to additional tasks using single-token task specifiers. Weenhance this approach through instruction tuning, i.e., finetuning bydescribing the task using natural language instructions followed by the list oflabel options. Our approach can generalize to new task descriptions for theseen tasks during inference, thereby enhancing its user-friendliness. Wedemonstrate the efficacy of our single multi-task learning model "UniverSLU"for 12 speech classification and sequence generation task types spanning 17datasets and 9 languages. On most tasks, UniverSLU achieves competitiveperformance and often even surpasses task-specific models. Additionally, weassess the zero-shot capabilities, finding that the model generalizes to newdatasets and languages for seen task types.

 

Quick Read (beta)

loading the full paper ...