Hallucination of Multimodal Large Language Models: A Survey

Abstract

This survey presents a comprehensive analysis of the phenomenon ofhallucination in multimodal large language models (MLLMs), also known as LargeVision-Language Models (LVLMs), which have demonstrated significantadvancements and remarkable abilities in multimodal tasks. Despite thesepromising developments, MLLMs often generate outputs that are inconsistent withthe visual content, a challenge known as hallucination, which poses substantialobstacles to their practical deployment and raises concerns regarding theirreliability in real-world applications. This problem has attracted increasingattention, prompting efforts to detect and mitigate such inaccuracies. Wereview recent advances in identifying, evaluating, and mitigating thesehallucinations, offering a detailed overview of the underlying causes,evaluation benchmarks, metrics, and strategies developed to address this issue.Additionally, we analyze the current challenges and limitations, formulatingopen questions that delineate potential pathways for future research. Bydrawing the granular classification and landscapes of hallucination causes,evaluation benchmarks, and mitigation methods, this survey aims to deepen theunderstanding of hallucinations in MLLMs and inspire further advancements inthe field. Through our thorough and in-depth review, we contribute to theongoing dialogue on enhancing the robustness and reliability of MLLMs,providing valuable insights and resources for researchers and practitionersalike. Resources are available at:https://github.com/showlab/Awesome-MLLM-Hallucination.

Quick Read (beta)

loading the full paper ...