OneChart: Purify the Chart Structural Extraction via One Auxiliary Token

Abstract

Chart parsing poses a significant challenge due to the diversity of styles,values, texts, and so forth. Even advanced large vision-language models (LVLMs)with billions of parameters struggle to handle such tasks satisfactorily. Toaddress this, we propose OneChart: a reliable agent specifically devised forthe structural extraction of chart information. Similar to popular LVLMs,OneChart incorporates an autoregressive main body. Uniquely, to enhance thereliability of the numerical parts of the output, we introduce an auxiliarytoken placed at the beginning of the total tokens along with an additionaldecoder. The numerically optimized (auxiliary) token allows subsequent tokensfor chart parsing to capture enhanced numerical features through causalattention. Furthermore, with the aid of the auxiliary token, we have devised aself-evaluation mechanism that enables the model to gauge the reliability ofits chart parsing results by providing confidence scores for the generatedcontent. Compared to current state-of-the-art (SOTA) chart parsing models,e.g., DePlot, ChartVLM, ChartAst, OneChart significantly outperforms in AveragePrecision (AP) for chart structural extraction across multiple publicbenchmarks, despite enjoying only 0.2 billion parameters. Moreover, as a chartparsing agent, it also brings 10%+ accuracy gains for the popular LVLM(LLaVA-1.6) in the downstream ChartQA benchmark.

Quick Read (beta)

loading the full paper ...