Cutting through buggy adversarial example defenses: fixing 1 line of code breaks Sabre

Abstract

Sabre is a defense to adversarial examples that was accepted at IEEE S&P2024. We first reveal significant flaws in the evaluation that point to clearsigns of gradient masking. We then show the cause of this gradient masking: abug in the original evaluation code. By fixing a single line of code in theoriginal repository, we reduce Sabre's robust accuracy to 0%. In response tothis, the authors modify the defense and introduce a new defense component notdescribed in the original paper. But this fix contains a second bug; modifyingone more line of code reduces robust accuracy to below baseline levels.

Quick Read (beta)

loading the full paper ...