Cache-Aware Reinforcement Learning in Large-Scale Recommender Systems

Abstract

Modern large-scale recommender systems are built upon computation-intensiveinfrastructure and usually suffer from a huge difference in traffic betweenpeak and off-peak periods. In peak periods, it is challenging to performreal-time computation for each request due to the limited budget ofcomputational resources. The recommendation with a cache is a solution to thisproblem, where a user-wise result cache is used to provide recommendations whenthe recommender system cannot afford a real-time computation. However, thecached recommendations are usually suboptimal compared to real-timecomputation, and it is challenging to determine the items in the cache for eachuser. In this paper, we provide a cache-aware reinforcement learning (CARL)method to jointly optimize the recommendation by real-time computation and bythe cache. We formulate the problem as a Markov decision process with userstates and a cache state, where the cache state represents whether therecommender system performs recommendations by real-time computation or by thecache. The computational load of the recommender system determines the cachestate. We perform reinforcement learning based on such a model to improve userengagement over multiple requests. Moreover, we show that the cache willintroduce a challenge called critic dependency, which deteriorates theperformance of reinforcement learning. To tackle this challenge, we propose aneigenfunction learning (EL) method to learn independent critics for CARL.Experiments show that CARL can significantly improve the users' engagement whenconsidering the result cache. CARL has been fully launched in Kwai app, servingover 100 million users.

Quick Read (beta)

loading the full paper ...