Enabling Natural Zero-Shot Prompting on Encoder Models via Statement-Tuning

Abstract

While Large Language Models (LLMs) exhibit remarkable capabilities inzero-shot and few-shot scenarios, they often require computationallyprohibitive sizes. Conversely, smaller Masked Language Models (MLMs) like BERTand RoBERTa achieve state-of-the-art results through fine-tuning but strugglewith extending to few-shot and zero-shot settings due to their architecturalconstraints. Hence, we propose Statement-Tuning, a technique that modelsdiscriminative tasks as a set of finite statements and trains an Encoder modelto discriminate between the potential statements to determine the label. We doStatement-Tuning on multiple tasks to enable cross-task generalization.Experimental results demonstrate that Statement Tuning achieves competitiveperformance compared to state-of-the-art LLMs with significantly fewerparameters. Moreover, the study investigates the impact of several designchoices on few-shot and zero-shot generalization, revealing that StatementTuning can achieve sufficient performance with modest training data andbenefits from task and statement diversity for unseen task generalizability.

Quick Read (beta)

loading the full paper ...