Stylus: Automatic Adapter Selection for Diffusion Models

Abstract

Beyond scaling base models with more data or parameters, fine-tuned adaptersprovide an alternative way to generate high fidelity, custom images at reducedcosts. As such, adapters have been widely adopted by open-source communities,accumulating a database of over 100K adapters-most of which are highlycustomized with insufficient descriptions. This paper explores the problem ofmatching the prompt to a set of relevant adapters, built on recent work thathighlight the performance gains of composing adapters. We introduce Stylus,which efficiently selects and automatically composes task-specific adaptersbased on a prompt's keywords. Stylus outlines a three-stage approach that firstsummarizes adapters with improved descriptions and embeddings, retrievesrelevant adapters, and then further assembles adapters based on prompts'keywords by checking how well they fit the prompt. To evaluate Stylus, wedeveloped StylusDocs, a curated dataset featuring 75K adapters withpre-computed adapter embeddings. In our evaluation on popular Stable Diffusioncheckpoints, Stylus achieves greater CLIP-FID Pareto efficiency and is twice aspreferred, with humans and multimodal models as evaluators, over the basemodel. See stylus-diffusion.github.io for more.

Quick Read (beta)

loading the full paper ...