Amodal Ground Truth and Completion in the Wild

Abstract

This paper studies amodal image segmentation: predicting entire objectsegmentation masks including both visible and invisible (occluded) parts. Inprevious work, the amodal segmentation ground truth on real images is usuallypredicted by manual annotaton and thus is subjective. In contrast, we use 3Ddata to establish an automatic pipeline to determine authentic ground truthamodal masks for partially occluded objects in real images. This pipeline isused to construct an amodal completion evaluation benchmark, MP3D-Amodal,consisting of a variety of object categories and labels. To better handle theamodal completion task in the wild, we explore two architecture variants: atwo-stage model that first infers the occluder, followed by amodal maskcompletion; and a one-stage model that exploits the representation power ofStable Diffusion for amodal segmentation across many categories. Without bellsand whistles, our method achieves a new state-of-the-art performance on Amodalsegmentation datasets that cover a large variety of objects, including COCOAand our new MP3D-Amodal dataset. The dataset, model, and code are available athttps://www.robots.ox.ac.uk/~vgg/research/amodal/.

Quick Read (beta)

loading the full paper ...