BridgeVLA: Input-Output Alignment for Efficient 3D Manipulation Learning with Vision-Language Models

Опубликовано: 17 Июнь 2026
на канале: PY L

461

💥 Can we combine 2D VLA generalization with 3D policy efficiency?

Introducing BridgeVLA – a 3D Visual Language Action model bridging pretrained VLM backbones and 3D VLAs. Reusing VLM weights isn’t enough – it needs smarter design.

🚀 Results:
· 1st on RLBench, COLOSSEUM, GemBench 🏆
· +32% real-world performance over baselines 🔧
· 96.8% success with 3 demo trajectories 😱

📦 Code, data, models open-source.

👉 https://bridgevla.github.io