Estimating Q(s,s’) with Deep Deterministic Dynamics Gradients
Published in ICML, 2020
In this paper, we introduce a novel form of value function, Q(s, s’), that expresses the utility of transitioning from a state s to a neighboring state s’ and then acting optimally thereafter. In order to derive an optimal policy, we develop a forward dynamics model that learns to make next-state predictions that maximize this value. This formulation decouples actions from values while still learning off-policy. We highlight the benefits of this approach in terms of value function transfer, learning within redundant action spaces, and learning off-policy from state observations generated by sub-optimal or completely random policies. Code and videos are available at http://sites.google.com/view/qss-paper.
You can cite this work using the BibTeX from ICML, which should be as follows:
@InProceedings{pmlr-v119-edwards20a,
title = {Estimating Q(s,s’) with Deep Deterministic Dynamics Gradients},
author = {Edwards, Ashley and Sahni, Himanshu and Liu, Rosanne and Hung, Jane and Jain, Ankit and Wang, Rui and Ecoffet, Adrien and Miconi, Thomas and Isbell, Charles and Yosinski, Jason},
booktitle = {Proceedings of the 37th International Conference on Machine Learning},
pages = {2825--2835},
year = {2020},
editor = {Hal Daumé III and Aarti Singh},
volume = {119},
series = {Proceedings of Machine Learning Research},
address = {Virtual},
month = {13--18 Jul},
publisher = {PMLR},
pdf = {http://proceedings.mlr.press/v119/edwards20a/edwards20a.pdf},
url = {http://proceedings.mlr.press/v119/edwards20a.html}
}