ALOHa: A New Measure for Hallucination in Captioning Models

  • 2024-04-03 18:59:36
  • Suzanne Petryk, David M. Chan, Anish Kachinthaya, Haodi Zou, John Canny, Joseph E. Gonzalez, Trevor Darrell
  • 0

Abstract

Despite recent advances in multimodal pre-training for visual description,state-of-the-art models still produce captions containing errors, such ashallucinating objects not present in a scene. The existing prominent metric forobject hallucination, CHAIR, is limited to a fixed set of MS COCO objects andsynonyms. In this work, we propose a modernized open-vocabulary metric, ALOHa,which leverages large language models (LLMs) to measure object hallucinations.Specifically, we use an LLM to extract groundable objects from a candidatecaption, measure their semantic similarity to reference objects from captionsand object detections, and use Hungarian matching to produce a finalhallucination score. We show that ALOHa correctly identifies 13.6% morehallucinated objects than CHAIR on HAT, a new gold-standard subset of MS COCOCaptions annotated for hallucinations, and 30.8% more on nocaps, where objectsextend beyond MS COCO categories. Our code is available athttps://davidmchan.github.io/aloha/.

 

Quick Read (beta)

loading the full paper ...