
  • Zambaldi, Vinicius
  • Raposo, David
  • Santoro, Adam
  • Bapst, Victor
  • Li, Yujia
  • Babuschkin, Igor
  • Tuyls, Karl
  • Reichert, David
  • Lillicrap, Timothy
  • Lockhart, Edward
  • Shanahan, Murray
  • Langston, Victoria
  • Pascanu, Razvan
  • Botvinick, Matthew
  • Vinyals, Oriol
  • Battaglia, Peter


We introduce an approach for deep reinforcement learning (RL) that improves upon theefficiency, generalization capacity, and interpretability of conventional approaches throughstructured perception and relational reasoning. It uses self-attention to iteratively reason aboutthe relations between entities in a scene and to guide a model-free policy. Our results show thatin a novel navigation and planning task called Box-World, our agent finds interpretable solutionsthat improve upon baselines in terms of sample complexity, ability to generalize to more complexscenes than experienced during training, and overall performance. In the StarCraft II LearningEnvironment, our agent achieves state-of-the-art performance on six mini-games – surpassinghuman grandmaster performance on four. By considering architectural inductive biases, ourwork opens new directions for overcoming important, but stubborn, challenges in deep RL.



