New learning environment demo with the Unity Machine Learning Agents Toolkit.
Roller ball is the agent, given a reward of 1.0 for reaching the target cube. Using Proximal Policy Optimization (PPO) for reinforcement learning, hyperparameters as given in example documentation.
Agent play after 10 000, 20 000, 50 000, 100 000 and final 500 000 learning steps, TensorBoard summary statistics at the end.
Implemented as described in https://github.com/Unity-Technologies....
Conclusion: agent performs quite well only after 50 000 learning steps, very well after 100 000 steps. Seems to get somewhat too eager to reach target cube as fast as possible after final 500 000 learning steps.
Unity version 2020.1.1f1
Anaconda Miniconda environment in Windows for learning with Unity-Technologies/ml-agents GitHub repository installed for development (including TensorFlow 2.3.0)
Agent play recorded with Windows Game Capture
Edited with Windows Photos Video editor
Music: Royalty Free Music from Bensound (www.bensound.com) - Buddy