Classic cartpole swing up task except:
small cost for moving the cart -0.01.
sparse reward +1 only when pole upright and steady.
Deep exploration is crucial to learn a successful policy.
Accompanying video to: https://arxiv.org/abs/1703.07608