Learning general Gaussian mixtures with efficient score matching

Abstract

We study the problem of learning mixtures of $k$ Gaussians in $d$ dimensions.We make no separation assumptions on the underlying mixture components: we onlyrequire that the covariance matrices have bounded condition number and that themeans and covariances lie in a ball of bounded radius. We give an algorithmthat draws $d^{\mathrm{poly}(k/\varepsilon)}$ samples from the target mixture,runs in sample-polynomial time, and constructs a sampler whose outputdistribution is $\varepsilon$-far from the unknown mixture in total variation.Prior works for this problem either (i) required exponential runtime in thedimension $d$, (ii) placed strong assumptions on the instance (e.g., sphericalcovariances or clusterability), or (iii) had doubly exponential dependence onthe number of components $k$. Our approach departs from commonly used techniques for this problem like themethod of moments. Instead, we leverage a recently developed reduction, basedon diffusion models, from distribution learning to a supervised learning taskcalled score matching. We give an algorithm for the latter by proving astructural result showing that the score function of a Gaussian mixture can beapproximated by a piecewise-polynomial function, and there is an efficientalgorithm for finding it. To our knowledge, this is the first example ofdiffusion models achieving a state-of-the-art theoretical guarantee for anunsupervised learning task.

Quick Read (beta)

loading the full paper ...