VVS: Accelerating Speculative Decoding for Visual Autoregressive Generation via Partial Verification Skipping
Published in IEEE/CVF Conference on Computer Vision and Pattern Recognition 2026 (CVPR 2026), 2026
Recommended citation: Haotian Dong, Ye Li, Rongwei Lu, Chen Tang, Shu-Tao Xia, Zhi Wang. "VVS: Accelerating Speculative Decoding for Visual Autoregressive Generation via Partial Verification Skipping." CVPR 2026. https://openaccess.thecvf.com/content/CVPR2026/html/Dong_VVS_Accelerating_Speculative_Decoding_for_Visual_Autoregressive_Generation_via_Partial_CVPR_2026_paper.html

Visual autoregressive (AR) generation models show strong potential for image generation, but their next-token-prediction paradigm incurs considerable inference latency. Although speculative decoding (SD) accelerates visual AR models, its “draft one step, then verify one step” paradigm prevents a direct reduction in the number of forward passes, limiting its acceleration potential.
Motivated by the interchangeability of visual tokens, VVS explores verification skipping in the SD process for the first time, explicitly cutting the number of target-model forward passes. Building on the observations that verification redundancy and stale-feature reusability are key to preserving quality during verification-free steps, VVS integrates three complementary modules:
- a verification-free token selector with dynamic truncation,
- token-level feature caching and reuse, and
- fine-grained skipped-step scheduling.
VVS reduces target-model forward passes by 2.8× relative to vanilla AR decoding while maintaining competitive generation quality, offering a superior speed-quality trade-off over conventional SD frameworks.