Deep Exploration via Randomized Value Functions: Cartpole Swing Up

Опубликовано: 17 Июнь 2026
на канале: Ian Osband
1,287
4

Classic cartpole swing up task except:
small cost for moving the cart -0.01.
sparse reward +1 only when pole upright and steady.

Deep exploration is crucial to learn a successful policy.

Accompanying video to: https://arxiv.org/abs/1703.07608