- 论文题目:Continuous Control With Deep Reinforcement Learning
所解决的问题?
这篇文章将Deep Q-Learning
运用到Deterministic Policy Gradient
算法中。如果了解DPG
的话,那这篇文章就是引入DQN
改进了一下DPG
的state value function
。解决了DQN
需要寻找maximizes action-value
只能运用于离散动作空间 的局限。
背景
其实就是这两篇文章的组合:
- 【5分钟 Paper】Playing Atari with Deep Reinforcement Learning
- 【5分钟 Paper】Deterministic Policy Gradient Algorithms
所采用的方法?
这个DDPG
我太熟悉,我实在不想再写啥了,附录一个伪代码吧:
取得的效果?
实验结果如下图所示:
所出版信息?作者信息?
这篇文章是ICLR2016
上面的一篇文章。第一作者TimothyP.Lillicrap
是Google DeepMind
的research Scientist
。
Research focuses on machine learning and statistics for optimal control and decision making, as well as using these mathematical frameworks to understand how the brain learns. In recent work, I’ve developed new algorithms and approaches for exploiting deep neural networks in the context of reinforcement learning, and new recurrent memory architectures for one-shot learning. Applications of this work include approaches for recognizing images from a single example, visual question answering, deep learning for robotics problems, and playing games such as Go and StarCraft. I’m also fascinated by the development of deep network models that might shed light on how robust feedback control laws are learned and employed by the central nervous system.