Knowledgeable Agents by Offline Reinforcement Learning from Large Language Model Rollouts

  • 2024-04-14 14:19:40
  • Jing-Cheng Pang, Si-Hang Yang, Kaiyuan Li, Jiaji Zhang, Xiong-Hui Chen, Nan Tang, Yang Yu
  • 0

Abstract

Reinforcement learning (RL) trains agents to accomplish complex tasks throughenvironmental interaction data, but its capacity is also limited by the scopeof the available data. To obtain a knowledgeable agent, a promising approach isto leverage the knowledge from large language models (LLMs). Despite previousstudies combining LLMs with RL, seamless integration of the two componentsremains challenging due to their semantic gap. This paper introduces a novelmethod, Knowledgeable Agents from Language Model Rollouts (KALM), whichextracts knowledge from LLMs in the form of imaginary rollouts that can beeasily learned by the agent through offline reinforcement learning methods. Theprimary challenge of KALM lies in LLM grounding, as LLMs are inherently limitedto textual data, whereas environmental data often comprise numerical vectorsunseen to LLMs. To address this, KALM fine-tunes the LLM to perform varioustasks based on environmental data, including bidirectional translation betweennatural language descriptions of skills and their corresponding rollout data.This grounding process enhances the LLM's comprehension of environmentaldynamics, enabling it to generate diverse and meaningful imaginary rolloutsthat reflect novel skills. Initial empirical evaluations on the CLEVR-Robotenvironment demonstrate that KALM enables agents to complete complexrephrasings of task goals and extend their capabilities to novel tasksrequiring unprecedented optimal behaviors. KALM achieves a success rate of 46%in executing tasks with unseen goals, substantially surpassing the 26% successrate achieved by baseline methods. Furthermore, KALM effectively enables theLLM to comprehend environmental dynamics, resulting in the generation ofmeaningful imaginary rollouts that reflect novel skills and demonstrate theseamless integration of large language models and reinforcement learning.

 

Quick Read (beta)

loading the full paper ...