An Improved Deep Reinforcement Learning with Sparse Rewards

Maxwell Hwang; Kao-Shing Hwang; Lu-Cheng Chi

PDF

views: 196 |

downloads: 57

Published Nov 12, 2020

Maxwell Hwang Kao-Shing Hwang Lu-Cheng Chi

Abstract

This paper presents an improved deep reinforcement learning which encourages an agent to explore unvisited states in an environment with sparse rewards. The improved method is based on an actor-critic approach. It uses some neglected observations from the background as the target output of supervised learning, providing the agent denser training signals to bootstrap reinforcement learning. Moreover, the improved method uses the prediction loss from supervised learning as feedback for the agent's exploration in the environment, called the label reward, to encourage the agent to explore unvisited states. Finally, the improved method constructs multiple neural networks to learn a policy by the Asynchronous Advantage Actor-Critic algorithm.

Download Statistics

Keywords

Reinforcement Learning, Actor-Critic algorithm, Asynchronous Advantage Actor-Critic algorithm, Supervised Learning, Sparse Rewards

References

Citation Format

Issue

Vol 3 No 3 (2020): iRobotics Journal

Section

Articles

Creative Commons CC BY 4.0

##plugins.themes.bootstrap3.article.sidebar##

##plugins.themes.bootstrap3.article.main##

Abstract

Download Statistics

##plugins.themes.bootstrap3.article.details##