MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents

Abstract

Recognizing if LLM output can be grounded in evidence is central to manytasks in NLP: retrieval-augmented generation, summarization, document-groundeddialogue, and more. Current approaches to this kind of "fact-checking" arebased on verifying each piece of a model generation against potential evidenceusing an LLM. However, this process can be very computationally expensive,requiring many calls to LLMs to check a single response. In this work, we showhow to build small models that have GPT-4-level performance but for 400x lowercost. We do this by constructing synthetic training data with GPT-4, whichinvolves creating realistic yet challenging instances of factual errors via astructured generation procedure. Training on this data teaches models to checkeach fact in the claim and recognize synthesis of information across sentences.For evaluation, we unify pre-existing datasets into a benchmark LLM-AggreFact,collected from recent work on fact-checking and grounding LLM generations. Ourbest system MiniCheck-FT5 (770M parameters) outperforms all systems ofcomparable size and reaches GPT-4 accuracy. We release LLM-AggreFact, code fordata synthesis, and models.

Quick Read (beta)

loading the full paper ...