Adaptive Retrieval and Scalable Indexing for k-NN Search with Cross-Encoders

Abstract

Cross-encoder (CE) models which compute similarity by jointly encoding aquery-item pair perform better than embedding-based models (dual-encoders) atestimating query-item relevance. Existing approaches perform k-NN search withCE by approximating the CE similarity with a vector embedding space fit eitherwith dual-encoders (DE) or CUR matrix factorization. DE-basedretrieve-and-rerank approaches suffer from poor recall on new domains and theretrieval with DE is decoupled from the CE. While CUR-based approaches can bemore accurate than the DE-based approach, they require a prohibitively largenumber of CE calls to compute item embeddings, thus making it impractical fordeployment at scale. In this paper, we address these shortcomings with ourproposed sparse-matrix factorization based method that efficiently computeslatent query and item embeddings to approximate CE scores and performs k-NNsearch with the approximate CE similarity. We compute item embeddings offlineby factorizing a sparse matrix containing query-item CE scores for a set oftrain queries. Our method produces a high-quality approximation while requiringonly a fraction of CE calls as compared to CUR-based methods, and allows forleveraging DE to initialize the embedding space while avoiding compute- andresource-intensive finetuning of DE via distillation. At test time, the itemembeddings remain fixed and retrieval occurs over rounds, alternating betweena) estimating the test query embedding by minimizing error in approximating CEscores of items retrieved thus far, and b) using the updated test queryembedding for retrieving more items. Our k-NN search method improves recall byup to 5% (k=1) and 54% (k=100) over DE-based approaches. Additionally, ourindexing approach achieves a speedup of up to 100x over CUR-based and 5x overDE distillation methods, while matching or improving k-NN search recall overbaselines.

Quick Read (beta)

loading the full paper ...