Found in the Middle: Permutation Self-Consistency Improves Listwise Ranking in Large Language Models

Abstract

Large language models (LLMs) exhibit positional bias in how they use context,which especially complicates listwise ranking. To address this, we proposepermutation self-consistency, a form of self-consistency over ranking listoutputs of black-box LLMs. Our key idea is to marginalize out different listorders in the prompt to produce an order-independent ranking with lesspositional bias. First, given some input prompt, we repeatedly shuffle the listin the prompt and pass it through the LLM while holding the instructions thesame. Next, we aggregate the resulting sample of rankings by computing thecentral ranking closest in distance to all of them, marginalizing out promptorder biases in the process. Theoretically, we prove the robustness of ourmethod, showing convergence to the true ranking in the presence of randomperturbations. Empirically, on five list-ranking datasets in sorting andpassage reranking, our approach improves scores from conventional inference byup to 7-18% for GPT-3.5 and 8-16% for LLaMA v2 (70B), surpassing the previousstate of the art in passage reranking. Our code is athttps://github.com/castorini/perm-sc.

Quick Read (beta)

loading the full paper ...