An Intuitive Explanation of Policy Gradient


Some of today’s most successful reinforcement learning algorithms, from A3C to TRPO to PPO belong to the policy gradient family of algorithm, and often more specifically to the actor-critic family. Clearly as an RL enthusiast, you owe it to yourself to have a good understanding of the policy gradient method, which is why so many tutorials out there attempt to describe them.

Yet, if you’ve ever tried to follow one of these tutorials, you were probably faced with complex formulas. Amazingly, there actually is a perfectly intuitive explanation of these formulas, which I attempt to explain here.

View full post here

In the event you would want to cite this blog post, you could use this template:

  title={An Intuitive Explanation of Policy Gradient},
  author={Ecoffet, Adrien},
  url = {},