Author(s):

  • Alber, Maximilian
  • Bello, Irwan
  • Zoph, Barret
  • Kindermans, Pieter-Jan
  • Ramachandran, Prajit
  • Le, Quoc

Abstract:

The back-propagation algorithm is the cornerstone of deep learning. Despite its importance, few variations of the algorithm have been attempted. This work presents an approach to discover new variations of the back-propagation equation. We use a domain specific lan- guage to describe update equations as a list of primitive functions. An evolution-based method is used to discover new propagation rules that maximize the generalization per- formance after a few epochs of training. We find several update equations that can train faster with short training times than standard back-propagation, and perform similar as standard back-propagation at convergence.

Document:

https://arxiv.org/abs/1808.02822

References:
  1. Mart ́ın Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean,Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. Tensorflow: Asystem for large-scale machine learning. InOSDI, volume 16, pages 265–283, 2016.
  2. Bowen Baker, Otkrist Gupta, Nikhil Naik, and Ramesh Raskar. Designing neural networkarchitectures using reinforcement learning. InInternational Conference on Learning Rep-resentations, 2017.
  3. Irwan Bello, Barret Zoph, Vijay Vasudevan, and Quoc V Le. Neural optimizer search withreinforcement learning. InInternational Conference on Machine Learning, 2017.
  4. Gabriel Bender, Pieter-Jan Kindermans, Barret Zoph, Vijay Vasudevan, and Quoc Le. Un-derstanding and simplifying one-shot architecture search. In Jennifer Dy and AndreasKrause, editors,Proceedings of the 35th International Conference on Machine Learn-ing, volume 80 ofProceedings of Machine Learning Research, pages 549–558, Stock-holmsmssan, Stockholm Sweden, 10–15 Jul 2018a. PMLR. URLhttp://proceedings.mlr.press/v80/bender18a.html.
  5. Gabriel Bender, Pieter-jan Kindermans, Barret Zoph, Vijay Vasudevan, and Quoc V Le. De-mystifying one-shot architecture search. InInternational Conference on Machine Learn-ing, 2018b.
  6. Samy Bengio, Yoshua Bengio, and Jocelyn Cloutier. Use of genetic programming for thesearch of a new learning rule for neural networks. InEvolutionary Computation, 1994.IEEE World Congress on Computational Intelligence., Proceedings of the First IEEEConference on, pages 324–327. IEEE, 1994.
  7. Andrew Brock, Theodore Lim, James M Ritchie, and Nick Weston. SMASH: one-shot modelarchitecture search through hypernetworks. InInternational Conference on LearningRepresentations, 2017.
  8. Fran ̧cois Chollet et al. Keras, 2015.Dong-Hyun Lee, Saizheng Zhang, Asja Fischer, and Yoshua Bengio. Difference target prop-agation. InJoint european conference on machine learning and knowledge discovery indatabases, pages 498–515. Springer, 2015.
  9. Lisha Li, Kevin Jamieson, Giulia DeSalvo, Afshin Rostamizadeh, and Ameet Talwalkar. Hy-perband: A novel bandit-based approach to hyperparameter optimization.InternationalConference on Learning Representations, 2017.
  10. Qianli Liao, Joel Z Leibo, and Tomaso A Poggio. How important is weight symmetry inbackpropagation? InAAAI, 2016.
  11. Timothy P Lillicrap, Daniel Cownden, Douglas B Tweed, and Colin J Akerman. Ran-dom feedback weights support learning in deep neural networks.arXiv preprintarXiv:1411.0247, 2014.
  12. Seppo Linnainmaa. The representation of the cumulative rounding error of an algorithm asa taylor expansion of the local rounding errors. Master’s thesis, 1970.
  13. Ilya Loshchilov and Frank Hutter. Sgdr: stochastic gradient descent with restarts.arXivpreprint arXiv:1608.03983, 2016.
  14. Arild Nøkland. Direct feedback alignment provides learning in deep neural networks. InAdvances in Neural Information Processing Systems, pages 1037–1045, 2016.
  15. Prajit Ramachandran, Barret Zoph, and Quoc V Le. Searching for activation functions. InInternational Conference on Learning Representations (Workshop), 2018.
  16. Esteban Real, Sherry Moore, Andrew Selle, Saurabh Saxena, Yutaka Leon Suematsu, JieTan, Quoc Le, and Alex Kurakin. Large-scale evolution of image classifiers. InInterna-tional Conference on Machine Learning, 2017.
  17. Esteban Real, Alok Aggarwal, Yanping Huang, and Quoc V Le. Regularized evolution forimage classifier architecture search.arXiv preprint arXiv:1802.01548, 2018.
  18. David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. Learning representations by back-propagating errors.nature, 323(6088):533, 1986.
  19. Paul Werbos.Beyond Regression: New Tools for Prediction and Analysis in the BehavioralSciences. PhD thesis, 1974.
  20. Sergey Zagoruyko and Nikos Komodakis.Wide residual networks.arXiv preprintarXiv:1605.07146, 2016.
  21. Barret Zoph and Quoc V Le. Neural architecture search with reinforcement learning. In International Conference on Learning Representations, 2017.
  22. Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V Le. Learning transferablearchitectures for scalable image recognition. InProceedings of the IEEE Conference onComputer Vision and Pattern Recognition, 2018.