Make Your LLM Fully Utilize the Context

  • 2024-04-25 18:55:14
  • Shengnan An, Zexiong Ma, Zeqi Lin, Nanning Zheng, Jian-Guang Lou
  • 0

Abstract

While many contemporary large language models (LLMs) can process lengthyinput, they still struggle to fully utilize information within the longcontext, known as the lost-in-the-middle challenge. We hypothesize that itstems from insufficient explicit supervision during the long-context training,which fails to emphasize that any position in a long context can hold crucialinformation. Based on this intuition, our study presents information-intensive(IN2) training, a purely data-driven solution to overcome lost-in-the-middle.Specifically, IN2 training leverages a synthesized long-context question-answerdataset, where the answer requires (1) fine-grained information awareness on ashort segment (~128 tokens) within a synthesized long context (4K-32K tokens),and (2) the integration and reasoning of information from two or more shortsegments. Through applying this information-intensive training on Mistral-7B,we present FILM-7B (FILl-in-the-Middle). To thoroughly assess the ability ofFILM-7B for utilizing long contexts, we design three probing tasks thatencompass various context styles (document, code, and structured-data context)and information retrieval patterns (forward, backward, and bi-directionalretrieval). The probing results demonstrate that FILM-7B can robustly retrieveinformation from different positions in its 32K context window. Beyond theseprobing tasks, FILM-7B significantly improves the performance on real-worldlong-context tasks (e.g., 23.5->26.9 F1 score on NarrativeQA), whilemaintaining a comparable performance on short-context tasks (e.g., 59.3->59.2accuracy on MMLU). Github Link: https://github.com/microsoft/FILM.

 

Quick Read (beta)

loading the full paper ...