Research Projects

ECCV 2026

RoboStream: Weaving Spatio-Temporal Reasoning with Memory in Vision-Language Models for Robotics

Y. Huang, J. Wu, W. Bu, Z. Xiong, G. Jiang, Y. Li, K. Ji, S. Xie, Y. Huang, C. Wu, J. Jiang, Z. Wang

🧠 spatio-temporal memory · training-free · long-horizon manipulation

RoboStream is a training-free framework that equips VLM planners with persistent spatio-temporal reasoning and memory — via Spatio-Temporal Fusion Tokens and a Causal Spatio-Temporal Graph — for robust long-horizon robotic manipulation.

📄 PDF 🔗 Project

ICML 2026

Sparse ActionGen: Accelerating Diffusion Policy with Real-time Pruning

K. Ji, J. Zhou, Y. Meng, Y. Li, H. Cui, Z. Wang

⚡ up to 4× generation speedup · no performance loss

SAG accelerates Diffusion Policy to real time via a rollout-adaptive prune-then-reuse scheme — an observation-conditioned pruner identifies prunable computations on the fly, and a one-for-all strategy reuses activations across timesteps and blocks.

📄 PDF 💻 Code 🔗 Project

Preprint

ElegantVLA: Learning When to Think for Efficient Vision-Language-Action Models

Y. Li*, H. Liu, K. Ji, Y. Meng, J. Fan, Y. Wang, S. Qin, C. Wu, S.-T. Xia, Z. Wang

⚡ up to 2.55× (GR00T) · 3.77× (CogACT) · real-world (Franka) 13.8→26.3 Hz

ElegantVLA accelerates the full VLA pipeline end to end: by analyzing redundancy in both high-level semantics and action generation, it adaptively schedules computation across every module — the vision encoder, LLM, and action head — for extreme speedups.

📄 PDF 🔗 Project

ICLR 2026

Block-wise Adaptive Caching for Accelerating Diffusion Policy

K. Ji, Y. Meng, H. Cui, Y. Li, J. Zhou, S. Hua, L. Chen, Z. Wang

⚡ up to 3× inference speedup · training-free plugin · lossless

BAC is a training-free plugin that accelerates Diffusion Policy by caching intermediate action features per transformer block — each block gets its own optimal update schedule, with a Bubbling Union Algorithm to stop cross-block cache-error propagation.

📄 PDF 💻 Code 🔗 Project