A New Robust Partial $p$-Wasserstein-Based Metric for Comparing Distributions

  • 2024-05-06 18:41:13
  • Sharath Raghvendra, Pouyan Shirzadian, Kaiyi Zhang
  • 0

Abstract

The $2$-Wasserstein distance is sensitive to minor geometric differencesbetween distributions, making it a very powerful dissimilarity metric. However,due to this sensitivity, a small outlier mass can also cause a significantincrease in the $2$-Wasserstein distance between two similar distributions.Similarly, sampling discrepancy can cause the empirical $2$-Wassersteindistance on $n$ samples in $\mathbb{R}^2$ to converge to the true distance at arate of $n^{-1/4}$, which is significantly slower than the rate of $n^{-1/2}$for $1$-Wasserstein distance. We introduce a new family of distances parameterized by $k \ge 0$, called$k$-RPW, that is based on computing the partial $2$-Wasserstein distance. Weshow that (1) $k$-RPW satisfies the metric properties, (2) $k$-RPW is robust tosmall outlier mass while retaining the sensitivity of $2$-Wasserstein distanceto minor geometric differences, and (3) when $k$ is a constant, $k$-RPWdistance between empirical distributions on $n$ samples in $\mathbb{R}^2$converges to the true distance at a rate of $n^{-1/3}$, which is faster thanthe convergence rate of $n^{-1/4}$ for the $2$-Wasserstein distance. Using the partial $p$-Wasserstein distance, we extend our distance to any $p\in [1,\infty]$. By setting parameters $k$ or $p$ appropriately, we can reduceour distance to the total variation, $p$-Wasserstein, and the L\'evy-Prokhorovdistances. Experiments show that our distance function achieves higher accuracyin comparison to the $1$-Wasserstein, $2$-Wasserstein, and TV distances forimage retrieval tasks on noisy real-world data sets.

 

Quick Read (beta)

loading the full paper ...