ORC IAP Seminar 2026 Talk 3 Daniel Russo

Опубликовано: 14 Май 2026
на канале: OR Center

142

Daniel Russo
Philip H. Geier Jr. Associate Professor of Business
Columbia University

Title
Core Reinforcement Learning Primitives as Sequence Modeling: Two Case Studies

Abstract
Sequence models—and foundation models in particular—represent a unique convergence of massive engineering investment, multimodal capability, and the ability to transfer knowledge and procedures across tasks. Yet sequential decision-making demands more than next-step prediction: it requires planning, policy improvement, robustness to distribution shift, uncertainty quantification, and active exploration. How do we tackle these fundamental challenges while resting on sequence modeling as the technological backbone?

This talk presents two case studies showing that core reinforcement learning primitives admit precise reductions to sequence modeling. The first examines policy improvement via success conditioning—training models to imitate actions from successful trajectories, as in rejection sampling + SFT, goal-conditioned reinforcement learning, and Decision Transformers. We show that this widespread heuristic exactly solves a trust-region policy optimization problem, clarifying both its safety guarantees and its characteristic failure mode. The second reinterprets posterior sampling—the basis for principled exploration and uncertainty quantification—as autoregressive generation of missing data. In this setting, offline sequence prediction loss provably controls the quality of online uncertainty quantification and exploration. Together, these case studies point to a broader research agenda in which sequence modeling serves as a principled substrate for sequential decision-making.

Bio
Daniel Russo is a Philip H. Geier Jr. Associate Professor in the Decision, Risk, and Operations division of the Columbia Business School. His research lies at the intersection of machine learning and online decision making, mostly falling under the broad umbrella of reinforcement learning. Outside academia, Dan works as an Amazon scholar applying reinforcement learning to supply chain optimization. He previously spent five years working with Spotify to apply reinforcement learning and large language models to audio recommendations. Dan completed his undergraduate studies in Math and Economics at the University of Michigan, doctoral studies at Stanford University under the supervision of Benjamin Van Roy, and worked as a postdoctoral researcher at Microsoft Research in New England. His research has been recognized by the Erlang Prize, the Frederick W. Lanchester Prize, a Junior Faculty Interest Group Best Paper Award, and first place in the George Nicholson Student Paper Competition.

https://orc.mit.edu/calendar_event/or...