An Intuitive Explanation of Policy Gradient
Published:
Some of today’s most successful reinforcement learning algorithms, from A3C to TRPO to PPO belong to the policy gradient family of algorithm, and often more specifically to the actor-critic family. Clearly as an RL enthusiast, you owe it to yourself to have a good understanding of the policy gradient method, which is why so many tutorials out there attempt to describe them.
Yet, if you’ve ever tried to follow one of these tutorials, you were probably faced with complex formulas. Amazingly, there actually is a perfectly intuitive explanation of these formulas, which I attempt to explain here.
In the event you would want to cite this blog post, you could use this template:
@electronic{ecoffet2018an_intuitive_explanation_of_policy_gradient,
title={An Intuitive Explanation of Policy Gradient},
author={Ecoffet, Adrien},
url = {https://medium.com/towards-data-science/an-intuitive-explanation-of-policy-gradient-part-1-reinforce-aa4392cbfd3c},
year={2018}
}