Author(s):

  • Zambaldi, Vinicius
  • Raposo, David
  • Santoro, Adam
  • Bapst, Victor
  • Li, Yujia
  • Babuschkin, Igor
  • Tuyls, Karl
  • Reichert, David
  • Lillicrap, Timothy
  • Lockhart, Edward
  • Shanahan, Murray
  • Langston, Victoria
  • Pascanu, Razvan
  • Botvinick, Matthew
  • Vinyals, Oriol
  • Battaglia, Peter

Abstract:

We introduce an approach for deep reinforcement learning (RL) that improves upon theefficiency, generalization capacity, and interpretability of conventional approaches throughstructured perception and relational reasoning. It uses self-attention to iteratively reason aboutthe relations between entities in a scene and to guide a model-free policy. Our results show thatin a novel navigation and planning task called Box-World, our agent finds interpretable solutionsthat improve upon baselines in terms of sample complexity, ability to generalize to more complexscenes than experienced during training, and overall performance. In the StarCraft II LearningEnvironment, our agent achieves state-of-the-art performance on six mini-games – surpassinghuman grandmaster performance on four. By considering architectural inductive biases, ourwork opens new directions for overcoming important, but stubborn, challenges in deep RL.

Document:

https://arxiv.org/abs/1806.01830

References:

[1]Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare,Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. Human-level controlthrough deep reinforcement learning.Nature, 518(7540):529, 2015.

[2]David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche,Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. Mastering thegame of go with deep neural networks and tree search.nature, 529(7587):484–489, 2016.

[3]Andrei A. Rusu, Matej Vecerik, Thomas Rothörl, Nicolas Heess, Razvan Pascanu, and Raia Hadsell.Sim-to-real robot learning from pixels with progressive nets. In1st Annual Conference on Robot Learning,CoRL 2017, Mountain View, California, USA, November 13-15, 2017, Proceedings, pages 262–270, 2017.

[4]Marta Garnelo, Kai Arulkumaran, and Murray Shanahan. Towards deep symbolic reinforcement learning.arXiv preprint arXiv:1609.05518, 2016.

[5]Chiyuan Zhang, Oriol Vinyals, Remi Munos, and Samy Bengio. A study on overfitting in deepreinforcement learning.arXiv preprint arXiv:1804.06893, 2018.

[6]Brenden M Lake, Tomer D Ullman, Joshua B Tenenbaum, and Samuel J Gershman. Building machinesthat learn and think like people.Behavioral and Brain Sciences, 40, 2017.

[7]Ken Kansky, Tom Silver, David A Mély, Mohamed Eldawy, Miguel Lázaro-Gredilla, Xinghua Lou,Nimrod Dorfman, Szymon Sidor, Scott Phoenix, and Dileep George. Schema networks: Zero-shottransfer with a generative causal model of intuitive physics.arXiv preprint arXiv:1706.04317, 2017.

[8]Saso Dzeroski, Luc De Raedt, and Hendrik Blockeel. Relational reinforcement learning. InInductiveLogic Programming, 8th International Workshop, ILP-98, Madison, Wisconsin, USA, July 22-24, 1998,Proceedings, pages 11–22, 1998.

[9]Saso Dzeroski, Luc De Raedt, and Kurt Driessens. Relational reinforcement learning.Machine Learning,43(1/2):7–52, 2001.

[10]Peter Battaglia, Razvan Pascanu, Matthew Lai, Danilo Jimenez Rezende, et al. Interaction networks forlearning about objects, relations and physics. InAdvances in neural information processing systems,pages 4502–4510, 2016.

[11]David Raposo, Adam Santoro, David Barrett, Razvan Pascanu, Timothy Lillicrap, and PeterBattaglia. Discovering objects and their relations from entangled scene representations.arXiv preprintarXiv:1702.05068, 2017.

[12]Peter W. Battaglia, Jessica B. Hamrick, Victor Bapst, Alvaro Sanchez-Gonzalez, Vinicius Zambaldi,Mateusz Malinowski, Andrea Tacchetti, David Raposo, Adam Santoro, Ryan Faulkner, Caglar Gulcehre,Francis Song, Andrew Ballard, Justin Gilmer, George Dahl, Ashish Vaswani, Kelsey Allen, CharlesNash, Victoria Langston, Chris Dyer, Nicolas Heess, Daan Wierstra, Pushmeet Kohli, Matt Botvinick,Oriol Vinyals, Yujia Li, and Razvan Pascanu. Relational inductive biases, deep learning, and graphnetworks.arXiv, 2018.

[13]Adam Santoro, David Raposo, David G Barrett, Mateusz Malinowski, Razvan Pascanu, Peter Battaglia,and Tim Lillicrap. A simple neural network module for relational reasoning. InAdvances in neuralinformation processing systems, pages 4974–4983, 2017.

[14]Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, ŁukaszKaiser, and Illia Polosukhin. Attention is all you need. InAdvances in Neural Information ProcessingSystems, pages 6000–6010, 2017.

[15]Oriol Vinyals, Timo Ewalds, Sergey Bartunov, Petko Georgiev, Alexander Sasha Vezhnevets, MichelleYeo, Alireza Makhzani, Heinrich Küttler, John Agapiou, Julian Schrittwieser, et al. Starcraft ii: a newchallenge for reinforcement learning.arXiv preprint arXiv:1708.04782, 2017.

[16]Stephen Muggleton and Luc De Raedt. Inductive logic programming: Theory and methods.J. Log.Program., 19/20:629–679, 1994.

[17]Kurt Driessens and Jan Ramon. Relational instance based regression for relational reinforcement learning.InMachine Learning, Proceedings of the Twentieth International Conference (ICML 2003), August21-24, 2003, Washington, DC, USA, pages 123–130, 2003.

[18]Kurt Driessens and Saso Dzeroski. Integrating guidance into relational reinforcement learning.MachineLearning, 57(3):271–304, 2004.10

[19]M. van Otterlo. Relational representations in reinforcement learning: Review and open problems, 72002.

[20]Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. Non-local neural networks.arXivpreprint arXiv:1711.07971, 2017.[21]Nicholas Watters, Andrea Tacchetti, Theophane Weber, Razvan Pascanu, Peter Battaglia, and DanielZoran. Visual interaction networks.arXiv preprint arXiv:1706.01433, 2017.

[22]Han Hu, Jiayuan Gu, Zheng Zhang, Jifeng Dai, and Yichen Wei. Relation networks for object detection.arXiv preprint arXiv:1711.11575, 2017.

[23]Irwan Bello, Hieu Pham, Quoc V Le, Mohammad Norouzi, and Samy Bengio. Neural combinatorialoptimization with reinforcement learning.arXiv preprint arXiv:1611.09940, 2016.

[24]Hanjun Dai, Elias Khalil, Yuyu Zhang, Bistra Dilkina, and Le Song. Learning combinatorial optimizationalgorithms over graphs. InAdvances in Neural Information Processing Systems, pages 6351–6361, 2017.

[25]Nikhil Mishra, Mostafa Rohaninejad, Xi Chen, and Pieter Abbeel. A simple neural attentive meta-learner.InNIPS 2017 Workshop on Meta-Learning, 2017.

[26] WWM Kool and M Welling. Attention solves your tsp.arXiv preprint arXiv:1803.08475, 2018.

[27]Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfardini. Thegraph neural network model.IEEE Transactions on Neural Networks, 20(1):61–80, 2009.

[28]Mathias Niepert, Mohamed Ahmed, and Konstantin Kutzkov. Learning convolutional neural networksfor graphs. InInternational conference on machine learning, pages 2014–2023, 2016.

[29]Thomas N Kipf and Max Welling. Semi-supervised classification with graph convolutional networks.arXiv preprint arXiv:1609.02907, 2016.

[30]Misha Denil, Sergio Gómez Colmenarejo, Serkan Cabi, David Saxton, and Nando de Freitas. Pro-grammable agents.arXiv preprint arXiv:1706.06383, 2017.

[31]Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. Layer normalization.arXiv preprintarXiv:1607.06450, 2016.

[32]Lasse Espeholt, Hubert Soyer, Remi Munos, Karen Simonyan, Volodymir Mnih, Tom Ward, YotamDoron, Vlad Firoiu, Tim Harley, Iain Dunning, et al. Importance weighted actor-learner architecture:Scalable distributed deep-rl with importance weighted actor-learner architectures.arXiv preprintarXiv:1802.01561, 2018.

[33]Max Jaderberg, Valentin Dalibard, Simon Osindero, Wojciech M Czarnecki, Jeff Donahue, Ali Razavi,Oriol Vinyals, Tim Green, Iain Dunning, Karen Simonyan, et al. Population based training of neuralnetworks.arXiv preprint arXiv:1711.09846, 2017.

[34]Xinlei Chen, Li-Jia Li, Li Fei-Fei, and Abhinav Gupta. Iterative visual reasoning beyond convolutions.arXiv preprint arXiv:1803.11189, 2018.

[35]Alexander Sasha Vezhnevets, Simon Osindero, Tom Schaul, Nicolas Heess, Max Jaderberg, DavidSilver, and Koray Kavukcuoglu. Feudal networks for hierarchical reinforcement learning.arXiv preprintarXiv:1703.01161, 2017.

[36] Arthur Guez, Théophane Weber, Ioannis Antonoglou, Karen Simonyan, Oriol Vinyals, Daan Wierstra,Rémi Munos, and David Silver. Learning to search with mctsnets.arXiv preprint arXiv:1802.04697,2018.

[37]Jessica B Hamrick, Andrew J Ballard, Razvan Pascanu, Oriol Vinyals, Nicolas Heess, and Peter WBattaglia. Metacontrol for adaptive imagination-based optimization.arXiv preprint arXiv:1705.02670,2017.

[38]Razvan Pascanu, Yujia Li, Oriol Vinyals, Nicolas Heess, Lars Buesing, Sebastien Racanière, DavidReichert, Théophane Weber, Daan Wierstra, and Peter Battaglia. Learning model-based planning fromscratch.arXiv preprint arXiv:1707.06170, 2017.

[39]Théophane Weber, Sébastien Racanière, David P Reichert, Lars Buesing, Arthur Guez, Danilo JimenezRezende, Adria Puigdomènech Badia, Oriol Vinyals, Nicolas Heess, Yujia Li, et al. Imagination-augmentedagents for deep reinforcement learning.arXiv preprint arXiv:1707.06203, 2017.[40]Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Identity mappings in deep residual networks.InEuropean Conference on Computer Vision, pages 630–645. Springer, 2016.11