V2A-Mark: Versatile Deep Visual-Audio Watermarking for Manipulation Localization and Copyright Protection

Abstract

AI-generated video has revolutionized short video production, filmmaking, andpersonalized media, making video local editing an essential tool. However, thisprogress also blurs the line between reality and fiction, posing challenges inmultimedia forensics. To solve this urgent issue, V2A-Mark is proposed toaddress the limitations of current video tampering forensics, such as poorgeneralizability, singular function, and single modality focus. Combining thefragility of video-into-video steganography with deep robust watermarking, ourmethod can embed invisible visual-audio localization watermarks and copyrightwatermarks into the original video frames and audio, enabling precisemanipulation localization and copyright protection. We also design a temporalalignment and fusion module and degradation prompt learning to enhance thelocalization accuracy and decoding robustness. Meanwhile, we introduce asample-level audio localization method and a cross-modal copyright extractionmechanism to couple the information of audio and video frames. Theeffectiveness of V2A-Mark has been verified on a visual-audio tamperingdataset, emphasizing its superiority in localization precision and copyrightaccuracy, crucial for the sustainable development of video editing in the AIGCvideo era.

Quick Read (beta)

loading the full paper ...