Video-trained world models hit robotics: DreamDojo and RynnBrain
Large video-trained world models are converging with open-source VLA stacks to push unified perception-to-action robotics closer to real-time use. NVIDIA researchers detail DreamDojo, a world model trained on 44,000 hours of egocentric human video with continuous latent actions and a distillation pipeline that hits ~10.8 FPS for planning and teleop [DreamDojo coverage](https://quantumzeitgeist.com/000-learning-robot-brains-boosted-hours-human/)[^1]. Alibaba open-sourced RynnBrain as a unified VLA control stack, while CineScene shows scalable 3D-aware scene representations relevant to controllable video and world modeling [RynnBrain overview](https://www.webpronews.com/alibabas-rynnbrain-gambit-how-a-chinese-tech-giant-is-betting-that-open-source-robotics-ai-will-reshape-the-physical-world/)[^2], [CineScene summary](https://quantumzeitgeist.com/ai-virtual-film-sets-become-reality/)[^3]. [^1]: Summary of DreamDojo’s dataset scale, latent actions, FPS distillation, and planning/teleop use cases. [^2]: Report on Alibaba releasing the open-source RynnBrain VLA model and its end-to-end control aims. [^3]: Research summary on CineScene’s decoupled 3D-aware scene representation for consistent, camera-controlled video generation.