RoboStream: Weaving Spatio-Temporal Reasoning with Memory in Vision-Language Models for Robotics
Published in European Conference on Computer Vision 2026 (ECCV 2026), 2026
Recommended citation: Yuzhi Huang, Jie Wu, Weijue Bu, Ziyi Xiong, Gaoyang Jiang, Ye Li, Kangye Ji, Shuzhao Xie, Yue Huang, Chenglei Wu, Jingyan Jiang, Zhi Wang. "RoboStream: Weaving Spatio-Temporal Reasoning with Memory in Vision-Language Models for Robotics." arXiv:2603.12939, 2026. https://arxiv.org/abs/2603.12939

RoboStream is a training-free framework for long-horizon robotic manipulation that adds persistent spatio-temporal reasoning and memory to VLM planners via Spatio-Temporal Fusion Tokens and a Causal Spatio-Temporal Graph.