Pitfalls of Conversational LLMs on News Debiasing

Abstract

This paper addresses debiasing in news editing and evaluates theeffectiveness of conversational Large Language Models in this task. We designedan evaluation checklist tailored to news editors' perspectives, obtainedgenerated texts from three popular conversational models using a subset of apublicly available dataset in media bias, and evaluated the texts according tothe designed checklist. Furthermore, we examined the models as evaluator forchecking the quality of debiased model outputs. Our findings indicate that noneof the LLMs are perfect in debiasing. Notably, some models, including ChatGPT,introduced unnecessary changes that may impact the author's style and createmisinformation. Lastly, we show that the models do not perform as proficientlyas domain experts in evaluating the quality of debiased outputs.

Quick Read (beta)

loading the full paper ...