August 9, 2024
Title: Geometric Learning for Manipulating Scenes and Objects
Abstract:
Despite recent advancements toward agents with high-level reasoning abilities, building robots that perform useful tasks in the real world requires further progress toward general-purpose geometric intelligence and spatial reasoning. This thesis presents research on (i) understanding components of geometric intelligence that are missing from current systems and (ii) proposing techniques to close some of these gaps. We show how the proposed insights and techniques enable new capabilities in robotic manipulation, focusing on 6-DoF rigid object rearrangement tasks in real unknown scenes.
First, we study what properties of a geometric object representation support both generalization and data efficiency in learning components of a planner that performs tasks like scene rearrangement and 6-DoF pick-and-place. We show how the learned features of an equivariant neural field, trained to perform category-level 3D reconstruction, can be re-purposed as an object representation that enables data-efficient imitation with unseen object shapes in out-ofdistribution poses. Next, we consider the more general category of relational rearrangement problems. Starting with scenes containing two unknown objects, we present applications of our neural descriptor fields for modeling pairwise relations between task-relevant object parts and using such relational models to plan multi-step manipulation tasks. We then show how rearrangement prediction among multi-object scenes leads to additional challenges, such as generalizing to diverse scene layouts and achieving good coverage over the multi-modal space of rearrangement solutions. Our study examines how predicting combined object-scene point clouds by de-noising relative object poses with diffusion models naturally handles these unique challenges. Finally, we consider the complementary problem of closed-loop policy learning to improve online robustness and reliability of rearrangement task execution. We propose a system-level perspective on combining paradigms like simulation-based reinforcement learning, 3D reconstruction, and imitation learning to facilitate such robust policy acquisition.