RetICL: Sequential Retrieval of In-Context Examples with Reinforcement Learning

Abstract

Recent developments in large pre-trained language models have enabledunprecedented performance on a variety of downstream tasks. Achieving bestperformance with these models often leverages in-context learning, where amodel performs a (possibly new) task given one or more examples. However,recent work has shown that the choice of examples can have a large impact ontask performance and that finding an optimal set of examples is non-trivial.While there are many existing methods for selecting in-context examples, theygenerally score examples independently, ignoring the dependency between themand the order in which they are provided to the model. In this work, we proposeRetrieval for In-Context Learning (RetICL), a learnable method for modeling andoptimally selecting examples sequentially for in-context learning. We frame theproblem of sequential example selection as a Markov decision process and trainan example retriever using reinforcement learning. We evaluate RetICL on mathword problem solving and scientific question answering tasks and show that itconsistently outperforms or matches heuristic and learnable baselines. We alsouse case studies to show that RetICL implicitly learns representations ofproblem solving strategies.

Quick Read (beta)

loading the full paper ...