💥 Can we combine 2D VLA generalization with 3D policy efficiency?
Introducing BridgeVLA – a 3D Visual Language Action model bridging pretrained VLM backbones and 3D VLAs. Reusing VLM weights isn’t enough – it needs smarter design.
🚀 Results:
· 1st on RLBench, COLOSSEUM, GemBench 🏆
· +32% real-world performance over baselines 🔧
· 96.8% success with 3 demo trajectories 😱
📦 Code, data, models open-source.
👉 https://bridgevla.github.io