Learning Robust Classifiers with Self-Guided Spurious Correlation Mitigation

Abstract

Deep neural classifiers tend to rely on spurious correlations betweenspurious attributes of inputs and targets to make predictions, which couldjeopardize their generalization capability. Training classifiers robust tospurious correlations typically relies on annotations of spurious correlationsin data, which are often expensive to get. In this paper, we tackle anannotation-free setting and propose a self-guided spurious correlationmitigation framework. Our framework automatically constructs fine-grainedtraining labels tailored for a classifier obtained with empirical riskminimization to improve its robustness against spurious correlations. Thefine-grained training labels are formulated with different prediction behaviorsof the classifier identified in a novel spuriousness embedding space. Weconstruct the space with automatically detected conceptual attributes and anovel spuriousness metric which measures how likely a class-attributecorrelation is exploited for predictions. We demonstrate that training theclassifier to distinguish different prediction behaviors reduces its relianceon spurious correlations without knowing them a priori and outperforms priormethods on five real-world datasets.

Quick Read (beta)

loading the full paper ...