Quantity Matters: Towards Assessing and Mitigating Number Hallucination in Large Vision-Language Models

  • 2024-04-16 03:28:48
  • Huixuan Zhang, Junzhe Zhang, Xiaojun Wan
  • 0

Abstract

Large-scale vision-language models have demonstrated impressive skill inhandling tasks that involve both areas. Nevertheless, these models frequentlyexperience significant issues with generating inaccurate information, which ishallucination. In this study, we concentrate on a specific type ofhallucination-number hallucination, referring to models incorrectly identifyingthe number of certain objects in pictures. We perform quantitative evaluationsregarding number hallucination, showing it to be critical in major open-sourcelarge vision-language models. Furthermore, we utilizes two related tasks toconduct an in-depth analysis of number hallucination, revealing the severeinner and outer inconsistency among all tasks. Based on this examination, wedevise a training approach aimed at improving consistency to reduce numberhallucinations, which leads to an 8% enhancement in performance over directfinetuning methods. Our code and dataset will be released to the community.

 

Quick Read (beta)

loading the full paper ...