Author:
Higham, Catherine F.
Higham, Desmond J.
Abstract:
Multilayered artificial neural networks are becoming a pervasive tool in a host of application fields. At the heart of this deep learning revolution are familiar concepts from applied and computational mathematics; notably, in calculus, approximation theory, optimization and linear algebra. This article provides a very brief introduction to the basic ideas that underlie deep learning from an applied mathematics perspective. Our target audience includes postgraduate and final year undergraduate students in mathematics who are keen to learn about the area. The article may also be useful for instructors in mathematics who wish to enliven their classes with references to the application of deep learning techniques. We focus on three fundamental questions: what is a deep neural network? how is a network trained? what is the stochastic gradient method? We illustrate the ideas with a short MATLAB code that sets up and trains a network. We also show the use of state-of-the art software on a large scale image classification problem. We finish with references to the current literature.
Document:
https://arxiv.org/abs/1801.05894
References:
1. M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, M. Kudlur, J. Levenberg, R. Monga, S. Moore, D. G. Murray, B. Steiner, P. Tucker, V. Vasudevan, P. Warden, M. Wicke, Y. Yu, and X. Zheng, TensorFlow: A system for large-scale machine learning, in 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), 2016, pp. 265–283. [Google Scholar]
2. R. Al-Rfou et al., Theano: A Python Framework for Fast Computation of Mathematical Expressions, preprint, https://arxiv.org/abs/1605.02688, 2016. [Google Scholar]
3. L. Bottou, F. E. Curtis, and J. Nocedal, Optimization methods for large-scale machine learning, SIAM Rev., 60 (2018), pp. 223–311, https://doi.org/10.1137/16M1080173. [Abstract] [ISI] [Google Scholar]
4. T. B. Brown, D. Mané, A. R. M. Abadi, and J. Gilmer, Adversarial Patch, preprint, https://arxiv.org/abs/1712.09665, 2017. [Google Scholar]
5. P. Caramazza, A. Boccolini, D. Buschek, M. Hullin, C. F. Higham, R. Henderson, R. Murray-Smith, and D. Faccio, Neural network identification of people hidden from view with a single-pixel, single-photon detector, Sci. Rep., 8 (2018), art. 11945. [Crossref] [ISI] [Google Scholar]
6. F. Chollet et al., Keras, GitHub, 2015. [Google Scholar]
7. R. Collobert, K. Kavukcuoglu, and C. Farabet, Torch7: A Matlab-like environment for machine learning, in BigLearn, NIPS Workshop, 2011. [Google Scholar]
8. J. H. Davenport, The debate about algorithms, Math. Today, 53 (2017), pp. 162–165. [Google Scholar]
9. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and F.-F. Li, ImageNet: A large-scale hierarchical image database, in CVPR, IEEE Computer Society, 2009, pp. 248–255. [Google Scholar]
10. R. Fletcher, Practical Methods of Optimization, 2nd ed., Wiley, Chichester, UK, 1987. [Google Scholar]
11. I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, MIT Press, Boston, 2016. [Google Scholar]
12. I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. C. Courville, and Y. Bengio, Generative adversarial nets, in Advances in Neural Information Processing Systems 27, Montreal, Canada, 2014, pp. 2672–2680. [Google Scholar]
13. A. N. Gorban and I. Y. Tyukin, Stochastic separation theorems, Neural Networks, 94 (2017), pp. 255–259. [Crossref] [ISI] [Google Scholar]
14. A. Griewank and A. Walther, Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation, 2nd ed., SIAM, Philadelphia, 2008, https://doi.org/10.1137/1.9780898717761. [Google Scholar]
15. P. Grindrod, Beyond privacy and exposure: Ethical issues within citizen-facing analytics, Phil. Trans. Roy. Soc. A, 374 (2016), p. 2083. [Crossref] [ISI] [Google Scholar]
16. M. Hardt, B. Recht, and Y. Singer, Train faster, generalize better: Stability of stochastic gradient descent, in Proceedings of the 33rd International Conference on Machine Learning, 2016, pp. 1225–1234. [Google Scholar]
17. C. F. Higham, R. Murray-Smith, M. J. Padgett, and M. P. Edgar, Deep learning for real-time single-pixel video, Sci. Rep., 8 (2018), art. 2369. [Crossref] [ISI] [Google Scholar]
18. D. J. Higham, Trust region algorithms and timestep selection, SIAM J. Numer. Anal., 37 (1999), pp. 194–210, https://doi.org/10.1137/S0036142998335972. [Abstract] [ISI] [Google Scholar]
19. D. J. Higham and N. J. Higham, MATLAB Guide, 3rd ed., SIAM, Philadelphia, 2017. [Google Scholar]
20. Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell, Caffe: Convolutional Architecture for Fast Feature Embedding, in Proceedings of the 22nd ACM International Conference on Multimedia, ACM, New York, 2014, pp. 675–678. [Google Scholar]
21. A. Krizhevsky, Learning Multiple Layers of Features from Tiny Images, Tech. rep., University of Toronto, 2009. [Google Scholar]
22. A. Krizhevsky, I. Sutskever, and G. E. Hinton, Imagenet classification with deep convolutional neural networks, in Advances in Neural Information Processing Systems 25, F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, eds., 2012, pp. 1097–1105. [Google Scholar]
23. Y. LeCun, Y. Bengio, and G. Hinton, Deep learning, Nature, 521 (2015), pp. 436–444. [Crossref] [ISI] [Google Scholar]
24. Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, Gradient-based learning applied to document recognition, Proc. IEEE, 86 (1998), pp. 2278–2324. [Crossref] [ISI] [Google Scholar]
25. S. Mallat, Understanding deep convolutional networks, Philos. Trans. Roy. Soc. London A, 374 (2016), art. 20150203. [Crossref] [ISI] [Google Scholar]
26. G. Marcus, Deep Learning: A Critical Appraisal, preprint, https://arxiv.org/abs/1801.00631, 2018. [Google Scholar]
27. M. Nielsen, Neural Networks and Deep Learning, Determination Press, 2015. [Google Scholar]
28. J. Nocedal and S. J. Wright, Numerical Optimization, 2nd ed., Springer, Berlin, 2006. [Google Scholar]
29. D. E. Rumelhart, G. E. Hinton, and R. J. Williams, Learning internal representations by error propagation, in Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Vol. 1, MIT Press, Cambridge, MA, 1986, pp. 318–362. [Google Scholar]
30. J. Schmidhuber, Deep learning in neural networks: An overview, Neural Networks, 61 (2015), pp. 85–117. [Crossref] [ISI] [Google Scholar]
31. D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis, Mastering the game of Go with deep neural networks and tree search, Nature, 2529 (2016), pp. 484–489. [Crossref] [Google Scholar]
32. J. Sirignano and K. Spiliopoulos, Stochastic gradient descent in continuous time, SIAM J. Finan. Math., 8 (2017), pp. 933–961, https://doi.org/10.1137/17M1126825. [Abstract] [Google Scholar]
33. J. Su, D. V. Vargas, and S. Kouichi, One pixel attack for fooling deep neural networks, IEEE Trans. Evol. Comput, to appear. [Google Scholar]
34. A. Vedaldi and K. Lenc, MatConvNet: Convolutional neural networks for MATLAB, in ACM International Conference on Multimedia, Brisbane, 2015, pp. 689–692. [Google Scholar]
35. R. Vidal, R. Giryes, J. Bruna, and S. Soatto, Mathematics of deep learning, in Proc. Conf. Decision and Control (CDC), 2017. [Google Scholar]
36. H. Wang and B. Raj, On the Origin of Deep Learning, preprint, https://arxiv.org/abs/1702.07800, 2017. [Google Scholar]
37. C. Zhang, S. Bengio, M. Hardt, B. Recht, and O. Vinyals, Understanding deep learning requires rethinking generalization, in 5th International Conference on Learning Representations, 2017. [Google Scholar]