The paper by DeepMind introduced Deep Q-Networks (DQN), the first deep learning model to learn control policies directly from raw pixel input using reinforcement learning. By combining Q-learning with convolutional neural networks and experience replay, DQN achieved superhuman performance on several Atari 2600 games without handcrafted features or game-specific tweaks. Its impact was profound: it proved deep learning could master complex tasks with sparse, delayed rewards, catalyzing the modern wave of deep reinforcement learning research and paving the way for later breakthroughs like AlphaGo.
We present the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning. The model is a convolutional neural network, trained with a variant of Q-learning, whose input is raw pixels and whose output is a value function estimating future rewards. We apply our method to seven Atari 2600 games from the Arcade Learning Environment, with no adjustment of the architecture or learning algorithm. We find that it outperforms all previous approaches on six of the games and surpasses a human expert on three of them.