This paper presents an improved deep reinforcement learning which encourages an agent to explore unvisited states in an environment with sparse rewards. The improved method is based on an actor-critic approach. It uses some neglected observations from the background as the target output of supervised learning, providing the agent denser training signals to bootstrap reinforcement learning. Moreover, the improved method uses the prediction loss from supervised learning as feedback for the agent's exploration in the environment, called the label reward, to encourage the agent to explore unvisited states. Finally, the improved method constructs multiple neural networks to learn a policy by the Asynchronous Advantage Actor-Critic algorithm.
Reinforcement Learning, Actor-Critic algorithm, Asynchronous Advantage Actor-Critic algorithm, Supervised Learning, Sparse Rewards