On the Origin of Deep Learning

Author:

Wang, Haohan

Raj, Bhiksha

Abstract:

This paper is a review of the evolutionary history of deep learning models. It covers from the genesis of neural networks when associationism modeling of the brain is studied, to the models that dominate the last decade of research in deep learning like convolutional neural networks, deep belief networks, and recurrent neural networks. In addition to a review of these models, this paper primarily focuses on the precedents of the models above, examining how the initial ideas are assembled to construct the early models and how these preliminary models are developed into their current forms. Many of these evolutionary paths last more than half a century and have a diversity of directions. For example, CNN is built on prior knowledge of biological vision system; DBN is evolved from a trade-off of modeling power and computation complexity of graphical models and many nowadays models are neural counterparts of ancient linear models. This paper reviews these evolutionary paths and offers a concise thought flow of how these models are developed, and aims to provide a thorough background for deep learning. More importantly, along with the path, this paper summarizes the gist behind these milestones and proposes many directions to guide the future research of deep learning.

Document:

https://arxiv.org/abs/1702.07800

References:

Emile Aarts and Jan Korst. Simulated annealing and boltzmann machines. 1988.

David H Ackley, Geoffrey E Hinton, and Terrence J Sejnowski. A learning algorithm forboltzmann machines.Cognitive science, 9(1):147–169, 1985.

James A Anderson and Edward Rosenfeld.Talking nets: An oral history of neural networks.MiT Press, 2000.

Martin Arjovsky, Soumith Chintala, and L ́eon Bottou. Wasserstein gan.arXiv preprintarXiv:1701.07875, 2017.

Jimmy Ba, Volodymyr Mnih, and Koray Kavukcuoglu. Multiple object recognition withvisual attention.arXiv preprint arXiv:1412.7755, 2014.

Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. Layer normalization.arXivpreprint arXiv:1607.06450, 2016.

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation byjointly learning to align and translate.arXiv preprint arXiv:1409.0473, 2014.

Alexander Bain.Mind and Body the Theories of Their Relation by Alexander Bain. HenryS. King & Company, 1873.

Pierre Baldi and Peter J Sadowski. Understanding dropout. InAdvances in Neural Infor-mation Processing Systems, pages 2814–2822, 2013.

Peter L Bartlett and Wolfgang Maass. Vapnik chervonenkis dimension of neural nets.Thehandbook of brain theory and neural networks, pages 1188–1192, 2003.

Herbert Bay, Tinne Tuytelaars, and Luc Van Gool. Surf: Speeded up robust features. InEuropean conference on computer vision, pages 404–417. Springer, 2006.

Yoshua Bengio. Learning deep architectures for ai.Foundations and trendsR©in MachineLearning, 2(1):1–127, 2009.

Yoshua Bengio and Olivier Delalleau. On the expressive power of deep architectures. InInternational Conference on Algorithmic Learning Theory, pages 18–36. Springer, 2011.

Yoshua Bengio, Patrice Simard, and Paolo Frasconi. Learning long-term dependencies withgradient descent is difficult.IEEE transactions on neural networks, 5(2):157–166, 1994.

Yoshua Bengio, Nicolas L Roux, Pascal Vincent, Olivier Delalleau, and Patrice Marcotte.Convex neural networks. InAdvances in neural information processing systems, pages123–130, 2005.

Yoshua Bengio, Pascal Lamblin, Dan Popovici, Hugo Larochelle, et al. Greedy layer-wisetraining of deep networks.Advances in neural information processing systems, 19:153,2007.60

Wang and RajKyungHyun Cho. Understanding dropout: training multi-layer perceptrons with auxil-iary independent stochastic neurons. InInternational Conference on Neural InformationProcessing, pages 474–481. Springer, 2013.

Kyunghyun Cho, Bart Van Merri ̈enboer, Caglar Gulcehre, Dzmitry Bahdanau, FethiBougares, Holger Schwenk, and Yoshua Bengio. Learning phrase representations usingrnn encoder-decoder for statistical machine translation.arXiv preprint arXiv:1406.1078,2014.

Kyunghyun Cho, Aaron Courville, and Yoshua Bengio. Describing multimedia contentusing attention-based encoder-decoder networks.IEEE Transactions on Multimedia, 17(11):1875–1886, 2015.

Anna Choromanska, Mikael Henaff, Michael Mathieu, G ́erard Ben Arous, and Yann LeCun.The loss surfaces of multilayer networks. InAISTATS, 2015.

Jan K Chorowski, Dzmitry Bahdanau, Dmitriy Serdyuk, Kyunghyun Cho, and Yoshua Ben-gio. Attention-based models for speech recognition. InAdvances in Neural InformationProcessing Systems, pages 577–585, 2015.

Avital Cnaan, NM Laird, and Peter Slasor. Tutorial in biostatistics: Using the generallinear mixed model to analyse unbalanced repeated measures and longitudinal data.StatMed, 16:2349–2380, 1997.

Alberto Colorni, Marco Dorigo, Vittorio Maniezzo, et al. Distributed optimization by antcolonies. InProceedings of the first European conference on artificial life, volume 142,pages 134–142.

Paris, France, 1991.David Daniel Cox and Thomas Dean. Neural networks and neuroscience-inspired computervision.Current Biology, 24(18):R921–R929, 2014.

G Cybenko.Continuous valued neural networks with two hidden layers are sufficient. 1988.

George Cybenko. Approximation by superpositions of a sigmoidal function.Mathematicsof control, signals and systems, 2(4):303–314, 1989.

Zihang Dai, Amjad Almahairi, Bachman Philip, Eduard Hovy, and Aaron Courville. Cali-brating energy-based generative adversarial networks.ICLR submission, 2017.

Yann N Dauphin, Razvan Pascanu, Caglar Gulcehre, Kyunghyun Cho, Surya Ganguli, andYoshua Bengio. Identifying and attacking the saddle point problem in high-dimensionalnon-convex optimization. InAdvances in neural information processing systems, pages2933–2941, 2014.

Bert De Brabandere, Xu Jia, Tinne Tuytelaars, and Luc Van Gool. Dynamic filter networks.InNeural Information Processing Systems (NIPS), 2016.

Vincent De Ladurantaye, Jacques Vanden-Abeele, and Jean Rouat.Models of informationprocessing in the visual cortex. Citeseer, 2012.

Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Mark Mao, AndrewSenior, Paul Tucker, Ke Yang, Quoc V Le, et al. Large scale distributed deep networks.InAdvances in neural information processing systems, pages 1223–1231, 2012.

Olivier Delalleau and Yoshua Bengio. Shallow vs. deep sum-product networks. InAdvancesin Neural Information Processing Systems, pages 666–674, 2011.

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. InComputer Vision and Pattern Recognition, 2009.CVPR 2009. IEEE Conference on, pages 248–255. IEEE, 2009.

Misha Denil, Loris Bazzani, Hugo Larochelle, and Nando de Freitas. Learning where toattend with deep architectures for image tracking.Neural computation, 24(8):2151–2184,2012.

Jean-Pierre Didier and Emmanuel Bigand.Rethinking physical and rehabilitation medicine:New technologies induce new learning strategies. Springer Science & Business Media,2011.

Carl Doersch. Tutorial on variational autoencoders.arXiv preprint arXiv:1606.05908, 2016.

John Duchi, Elad Hazan, and Yoram Singer. Adaptive subgradient methods for onlinelearning and stochastic optimization.Journal of Machine Learning Research, 12(Jul):2121–2159, 2011.

Angela Lee Duckworth, Eli Tsukayama, and Henry May. Establishing causality using lon-gitudinal hierarchical linear modeling: An illustration predicting achievement from self-control.Social psychological and personality science, 2010.

Samuel Frederick Edwards and Phil W Anderson. Theory of spin glasses.Journal of PhysicsF: Metal Physics, 5(5):965, 1975.

Ronen Eldan and Ohad Shamir. The power of depth for feedforward neural networks.arXivpreprint arXiv:1512.03965, 2015.

Jeffrey L Elman. Finding structure in time.Cognitive science, 14(2):179–211, 1990.

Dumitru Erhan, Yoshua Bengio, Aaron Courville, Pierre-Antoine Manzagol, Pascal Vincent,and Samy Bengio. Why does unsupervised pre-training help deep learning?Journal ofMachine Learning Research, 11(Feb):625–660, 2010.

Scott E Fahlman and Christian Lebiere. The cascade-correlation learning architecture.1989.Marcus Frean. The upstart algorithm: A method for constructing and training feedforwardneural networks.Neural computation, 2(2):198–209, 1990.

Kunihiko Fukushima. Neocognitron: A self-organizing neural network model for a mecha-nism of pattern recognition unaffected by shift in position.Biological cybernetics, 36(4):193–202, 1980.

Wang and RajLeon A Gatys, Alexander S Ecker, and Matthias Bethge. A neural algorithm of artisticstyle.arXiv preprint arXiv:1508.06576, 2015.Tom Germano.Self organizing maps.Available in http://davis. wpi. edu/ ̃matt/courses/soms, 1999.

Felix A Gers and J ̈urgen Schmidhuber. Recurrent nets that time and count. InNeuralNetworks, 2000. IJCNN 2000, Proceedings of the IEEE-INNS-ENNS International JointConference on, volume 3, pages 189–194. IEEE, 2000.

Ross Girshick. Fast r-cnn. InProceedings of the IEEE International Conference on Com-puter Vision, pages 1440–1448, 2015.

Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Rich feature hierarchiesfor accurate object detection and semantic segmentation. InProceedings of the IEEEconference on computer vision and pattern recognition, pages 580–587, 2014.

Ian Goodfellow. Nips 2016 tutorial: Generative adversarial networks.arXiv preprintarXiv:1701.00160, 2016.

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, SherjilOzair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. InAdvances inNeural Information Processing Systems, pages 2672–2680, 2014.

Marco Gori and Alberto Tesi. On the problem of local minima in backpropagation.IEEETransactions on Pattern Analysis and Machine Intelligence, 14(1):76–86, 1992.

C ́eline Gravelines.Deep Learning via Stacked Sparse Autoencoders for Automated Voxel-Wise Brain Parcellation Based on Functional Connectivity. PhD thesis, The Universityof Western Ontario, 1991.

Alex Graves, Navdeep Jaitly, and Abdel-rahman Mohamed. Hybrid speech recognition withdeep bidirectional lstm. InAutomatic Speech Recognition and Understanding (ASRU),2013 IEEE Workshop on, pages 273–278. IEEE, 2013.

Klaus Greff, Rupesh Kumar Srivastava, Jan Koutn ́ık, Bas R Steunebrink, and J ̈urgenSchmidhuber. Lstm: A search space odyssey.arXiv preprint arXiv:1503.04069, 2015.

Richard Gregory and Patrick Cavanagh. The blind spot.Scholarpedia, 6(10):9618, 2011.Stephen Grossberg. Recurrent neural networks.Scholarpedia, 8(2):1888, 2013.

Aman Gupta, Haohan Wang, and Madhavi Ganapathiraju. Learning structure in geneexpression data using deep architectures, with an application to gene clustering. InBioinformatics and Biomedicine (BIBM), 2015 IEEE International Conference on, pages1328–1335. IEEE, 2015.

Kevin Gurney.An introduction to neural networks. CRC press, 1997.Shyam M Guthikonda. Kohonen self-organizing maps.Wittenberg University, 2005.

Bharath Hariharan, Pablo Arbel ́aez, Ross Girshick, and Jitendra Malik. Hypercolumns forobject segmentation and fine-grained localization. InProceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition, pages 447–456, 2015.

David Hartley.Observations on Man, volume 1. Cambridge University Press, 2013.

Johan Hastad. Almost optimal lower bounds for small depth circuits. InProceedings of theeighteenth annual ACM symposium on Theory of computing, pages 6–20. ACM, 1986.

James V Haxby, Elizabeth A Hoffman, and M Ida Gobbini. The distributed human neuralsystem for face perception.Trends in cognitive sciences, 4(6):223–233, 2000.

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for imagerecognition.arXiv preprint arXiv:1512.03385, 2015.

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Identity mappings in deepresidual networks.arXiv preprint arXiv:1603.05027, 2016.

Donald Olding Hebb.The organization of behavior: A neuropsychological theory. Psychol-ogy Press, 1949.Robert Hecht-Nielsen. Theory of the backpropagation neural network. InNeural Networks,1989. IJCNN., International Joint Conference on, pages 593–605. IEEE, 1989.

Geoffrey E Hinton. Training products of experts by minimizing contrastive divergence.Neural computation, 14(8):1771–1800, 2002.

Geoffrey E Hinton, Peter Dayan, Brendan J Frey, and Radford M Neal. The” wake-sleep”algorithm for unsupervised neural networks.Science, 268(5214):1158, 1995.

Geoffrey E Hinton, Simon Osindero, and Yee-Whye Teh. A fast learning algorithm for deepbelief nets.Neural computation, 18(7):1527–1554, 2006.

Geoffrey E Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan RSalakhutdinov. Improving neural networks by preventing co-adaptation of feature de-tectors.arXiv preprint arXiv:1207.0580, 2012.

Sepp Hochreiter and J ̈urgen Schmidhuber. Long short-term memory.Neural computation,9(8):1735–1780, 1997.

John J Hopfield. Neural networks and physical systems with emergent collective computa-tional abilities.Proceedings of the national academy of sciences, 79(8):2554–2558, 1982.

Kurt Hornik, Maxwell Stinchcombe, and Halbert White. Multilayer feedforward networksare universal approximators.Neural networks, 2(5):359–366, 1989.

Zhiting Hu, Xuezhe Ma, Zhengzhong Liu, Eduard Hovy, and Eric Xing. Harnessing deepneural networks with logic rules.arXiv preprint arXiv:1603.06318, 2016.

David H Hubel and Torsten N Wiesel. Receptive fields of single neurones in the cat’s striatecortex.The Journal of physiology, 148(3):574–591, 1959.

Wang and RajSergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network train-ing by reducing internal covariate shift.arXiv preprint arXiv:1502.03167, 2015.

Michael I Jordan. Serial order: A parallel distributed processing approach.Advances inpsychology, 121:471–495, 1986.

Nal Kalchbrenner, Aaron van den Oord, Karen Simonyan, Ivo Danihelka, Oriol Vinyals,Alex Graves, and Koray Kavukcuoglu.Video pixel networks.arXiv preprintarXiv:1610.00527, 2016.

Kiyoshi Kawaguchi. A multithreaded software model for backpropagation neural networkapplications. 2000.

AG Khachaturyan, SV Semenovskaya, and B Vainstein. A statistical-thermodynamic ap-proach to determination of structure amplitude phases.Sov. Phys. Crystallogr, 24:519–524, 1979.

Diederik Kingma and Jimmy Ba. Adam: A method for stochastic optimization.arXivpreprint arXiv:1412.6980, 2014.Diederik P Kingma and Max Welling. Auto-encoding variational bayes.arXiv preprintarXiv:1312.6114, 2013.

Diederik P Kingma, Shakir Mohamed, Danilo Jimenez Rezende, and Max Welling. Semi-supervised learning with deep generative models. InAdvances in Neural InformationProcessing Systems, pages 3581–3589, 2014.

Tinne Hoff Kjeldsen. John von neumann’s conception of the minimax theorem: a journeythrough different mathematical contexts.Archive for history of exact sciences, 56(1):39–68, 2001.

Teuvo Kohonen. The self-organizing map.Proceedings of the IEEE, 78(9):1464–1480, 1990.

Bryan Kolb, Ian Q Whishaw, and G Campbell Teskey.An introduction to brain and behav-ior, volume 1273. 2014.Mark L Krieg. A tutorial on bayesian belief networks. 2001.

Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz,Stephanie Chen, Yannis Kalantidis, Li-Jia Li, David A Shamma, et al. Visual genome:Connecting language and vision using crowdsourced dense image annotations.arXivpreprint arXiv:1602.07332, 2016.

Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images.2009.

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deepconvolutional neural networks. InAdvances in neural information processing systems,pages 1097–1105, 2012.

Tejas D Kulkarni, William F Whitney, Pushmeet Kohli, and Josh Tenenbaum. Deep convo-lutional inverse graphics network. InAdvances in Neural Information Processing Systems,pages 2539–2547, 2015.

Brenden M Lake, Ruslan Salakhutdinov, and Joshua B Tenenbaum. Human-level conceptlearning through probabilistic program induction.Science, 350(6266):1332–1338, 2015.

Alan S Lapedes and Robert M Farber. How neural nets work. InNeural informationprocessing systems, pages 442–456, 1988.Hugo Larochelle and Geoffrey E Hinton. Learning to combine foveal glimpses with a third-order boltzmann machine. InAdvances in neural information processing systems, pages1243–1251, 2010.

B Boser Le Cun, John S Denker, D Henderson, Richard E Howard, W Hubbard, andLawrence D Jackel. Handwritten digit recognition with a back-propagation network. InAdvances in neural information processing systems. Citeseer, 1990.

Yann LeCun, L ́eon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learningapplied to document recognition.Proceedings of the IEEE, 86(11):2278–2324, 1998a.

Yann LeCun, Corinna Cortes, and Christopher JC Burges. The mnist database of hand-written digits, 1998b.

Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning.Nature, 521(7553):436–444, 2015.

Honglak Lee, Roger Grosse, Rajesh Ranganath, and Andrew Y Ng. Convolutional deepbelief networks for scalable unsupervised learning of hierarchical representations. InPro-ceedings of the 26th annual international conference on machine learning, pages 609–616.ACM, 2009.

Zhaoping Li. A neural model of contour integration in the primary visual cortex.Neuralcomputation, 10(4):903–940, 1998.

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan,Piotr Doll ́ar, and C Lawrence Zitnick. Microsoft coco: Common objects in context. InEuropean Conference on Computer Vision, pages 740–755. Springer, 2014.

Cheng-Yuan Liou and Shiao-Lin Lin. Finite memory loading in hairy neurons.NaturalComputing, 5(1):15–42, 2006.

Cheng-Yuan Liou and Shao-Kuo Yuan. Error tolerant associative memory.Biological Cy-bernetics, 81(4):331–342, 1999.

Christoph Lippert, Jennifer Listgarten, Ying Liu, Carl M Kadie, Robert I Davidson, andDavid Heckerman. Fast linear mixed models for genome-wide association studies.Naturemethods, 8(10):833–835, 2011.

Zachary C Lipton, John Berkowitz, and Charles Elkan. A critical review of recurrent neuralnetworks for sequence learning.arXiv preprint arXiv:1506.00019, 2015.

Wang and RajDavid G Lowe. Object recognition from local scale-invariant features. InComputer vision,1999.

The proceedings of the seventh IEEE international conference on, volume 2, pages1150–1157. Ieee, 1999.

Xuezhe Ma and Eduard Hovy. End-to-end sequence labeling via bi-directional lstm-cnns-crf.arXiv preprint arXiv:1603.01354, 2016.

Xuezhe Ma, Yingkai Gao, Zhiting Hu, Yaoliang Yu, Yuntian Deng, and Eduard Hovy.Dropout with expectation-linear regularization.arXiv preprint arXiv:1609.08017, 2016.

M Maschler, Eilon Solan, and Shmuel Zamir. Game theory. translated from the hebrew byziv hellman and edited by mike borns, 2013.

Charles E McCulloch and John M Neuhaus.Generalized linear mixed models. Wiley OnlineLibrary, 2001.

Warren S McCulloch and Walter Pitts. A logical calculus of the ideas immanent in nervousactivity.The bulletin of mathematical biophysics, 5(4):115–133, 1943.

Marc M ́ezard and Jean-P Nadal. Learning in feedforward layered networks: The tilingalgorithm.Journal of Physics A: Mathematical and General, 22(12):2191, 1989.

Marc M ́ezard, Giorgio Parisi, and Miguel-Angel Virasoro. Spin glass theory and beyond.1990.Marvin L Minski and Seymour A Papert. Perceptrons: an introduction to computationalgeometry.MA: MIT Press, Cambridge, 1969.

Melanie Mitchell.An introduction to genetic algorithms. MIT press, 1998.Tom M Mitchell et al. Machine learning. wcb, 1997.

Jeffrey Moran and Robert Desimone. Selective attention gates visual processing in theextrastriate cortex.Frontiers in cognitive neuroscience, 229:342–345, 1985.

Michael C Mozer. A focused back-propagation algorithm for temporal pattern recognition.Complex systems, 3(4):349–381, 1989.

Kevin P Murphy.Machine learning: a probabilistic perspective. MIT press, 2012.

Vinod Nair and Geoffrey E Hinton. Rectified linear units improve restricted boltzmannmachines. InProceedings of the 27th International Conference on Machine Learning(ICML-10), pages 807–814, 2010.

John Nash. Non-cooperative games.Annals of mathematics, pages 286–295, 1951.

John F Nash et al. Equilibrium points in n-person games.Proc. Nat. Acad. Sci. USA, 36(1):48–49, 1950.

Anh Nguyen, Jason Yosinski, and Jeff Clune. Deep neural networks are easily fooled: Highconfidence predictions for unrecognizable images. In 2015 IEEE Conference on ComputerVision and Pattern Recognition (CVPR), pages 427–436. IEEE, 2015.

Danh V Nguyen, Damla S ̧ent ̈urk, and Raymond J Carroll. Covariate-adjusted linear mixed-effects model with an application to longitudinal data.Journal of nonparametric statistics, 20(6):459–481, 2008.

Erkki Oja. Simplified neuron model as a principal component analyzer.Journal of mathematical biology, 15(3):267–273, 1982.

Erkki Oja and Juha Karhunen. On stochastic approximation of the eigenvectors and eigenvalues of the expectation of a random matrix.Journal of mathematical analysis and applications, 106(1):69–84, 1985.

Aaron van den Oord, Nal Kalchbrenner, and Koray Kavukcuoglu. Pixel recurrent neuralnetworks.arXiv preprint arXiv:1601.06759, 2016.

Keiichi Osako, Rita Singh, and Bhiksha Raj. Complex recurrent neural networks for denois-ing speech signals. InApplications of Signal Processing to Audio and Acoustics (WAS-PAA), 2015 IEEE Workshop on, pages 1–5. IEEE, 2015.

Rajesh G Parekh, Jihoon Yang, and Vasant Honavar. Constructive neural network learning algorithms for multi-category real-valued pattern classification. 1997.

Razvan Pascanu, Caglar Gulcehre, Kyunghyun Cho, and Yoshua Bengio. How to constructdeep recurrent neural networks.arXiv preprint arXiv:1312.6026, 2013a.

Razvan Pascanu, Tomas Mikolov, and Yoshua Bengio. On the difficulty of training recurrentneural networks.ICML (3), 28:1310–1318, 2013b.

Razvan Pascanu, Yann N Dauphin, Surya Ganguli, and Yoshua Bengio. On the saddle pointproblem for non-convex optimization.arXiv preprint arXiv:1405.4604, 2014.

Bryan A Plummer, Liwei Wang, Chris M Cervantes, Juan C Caicedo, Julia Hockenmaier,and Svetlana Lazebnik. Flickr30k entities: Collecting region-to-phrase correspondencesfor richer image-to-sentence models. InProceedings of the IEEE International Conferenceon Computer Vision, pages 2641–2649, 2015.

Tomaso Poggio and Thomas Serre. Models of visual cortex.Scholarpedia, 8(4):3516, 2013.

Christopher Poultney, Sumit Chopra, Yann L Cun, et al. Efficient learning of sparse representations with an energy-based model. InAdvances in neural information processingsystems, pages 1137–1144, 2006.

Jose C Principe, Neil R Euliano, and W Curt Lefebvre.Neural and adaptive systems:fundamentals through simulations with CD-ROM. John Wiley & Sons, Inc., 1999.

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. InAdvances in neural informationprocessing systems, pages 91–99, 2015.

Martin Riedmiller and Heinrich Braun. A direct adaptive method for faster backpropa-gation learning: The rprop algorithm. InNeural Networks, 1993., IEEE InternationalConference On, pages 586–591. IEEE, 1993.

Wang and RajAJ Robinson and Frank Fallside.The utility driven dynamic error propagation network.University of Cambridge Department of Engineering, 1987.

Frank Rosenblatt. The perceptron: a probabilistic model for information storage and orga-nization in the brain.Psychological review, 65(6):386, 1958.

David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. Learning internal repre-sentations by error propagation. Technical report, DTIC Document, 1985.

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhi-heng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. Imagenet largescale visual recognition challenge.International Journal of Computer Vision, 115(3):211–252, 2015.

S Rasoul Safavian and David Landgrebe. A survey of decision tree classifier methodology.1990.

Ruslan Salakhutdinov and Geoffrey E Hinton. Deep boltzmann machines. InAISTATS,volume 1, page 3, 2009.

Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, andXi Chen. Improved techniques for training gans. InAdvances in Neural InformationProcessing Systems, pages 2226–2234, 2016.

J ̈urgen Schmidhuber. Deep learning in neural networks: An overview.Neural Networks,61:85–117, 2015.

Mike Schuster and Kuldip K Paliwal. Bidirectional recurrent neural networks.IEEE Trans-actions on Signal Processing, 45(11):2673–2681, 1997.

Thomas Serre, Lior Wolf, and Tomaso Poggio. Object recognition with features inspiredby visual cortex. In2005 IEEE Computer Society Conference on Computer Vision andPattern Recognition (CVPR’05), volume 2, pages 994–1000.

IEEE, 2005.Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hin-ton, and Jeff Dean. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer.arXiv preprint arXiv:1701.06538, 2017.

Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scaleimage recognition.arXiv preprint arXiv:1409.1556, 2014.

Paul Smolensky. Information processing in dynamical systems: Foundations of harmony theory. Technical report, DTIC Document, 1986.

Kihyuk Sohn, Honglak Lee, and Xinchen Yan. Learning structured output representation using deep conditional generative models. InAdvances in Neural Information processing systems, pages 3483–3491, 2015.

Nitish Srivastava, Geoffrey E Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhut-dinov. Dropout: a simple way to prevent neural networks from overfitting.Journal ofMachine Learning Research, 15(1):1929–1958, 2014.

Amos Storkey. Increasing the capacity of a hopfield network without sacrificing functionality.InInternational Conference on Artificial Neural Networks, pages 451–456. Springer, 1997.

Richard S Sutton, David A McAllester, Satinder P Singh, Yishay Mansour, et al. Pol-icy gradient methods for reinforcement learning with function approximation. InNIPS,volume 99, pages 1057–1063, 1999.

Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, IanGoodfellow, and Rob Fergus. Intriguing properties of neural networks.arXiv preprintarXiv:1312.6199, 2013.

Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, DragomirAnguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. InProceedings of the IEEE Conference on Computer Vision and Pat-tern Recognition, pages 1–9, 2015.

Aaron van den Oord, Nal Kalchbrenner, Lasse Espeholt, Oriol Vinyals, Alex Graves, et al.Conditional image generation with pixelcnn decoders. InAdvances In Neural Information Processing Systems, pages 4790–4798, 2016.

Andreas Veit, Michael J Wilber, and Serge Belongie. Residual networks behave like en-sembles of relatively shallow networks. InAdvances in Neural Information ProcessingSystems, pages 550–558, 2016.

Pascal Vincent, Hugo Larochelle, Isabelle Lajoie, Yoshua Bengio, and Pierre-Antoine Man-zagol. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion.Journal of Machine Learning Research, 11(Dec):3371–3408, 2010.

Martin J Wainwright, Michael I Jordan, et al. Graphical models, exponential families,and variational inference.Foundations and TrendsR©in Machine Learning, 1(1–2):1–305,2008.

Brian A Wandell.Foundations of vision.Sinauer Associates, 1995.Haohan Wang and Jingkang Yang. Multiple confounders correction with regularized linear mixed effect models, with application in biological processes. 2016.

Haohan Wang, Aaksha Meghawat, Louis-Philippe Morency, and Eric P Xing. Select-additivelearning: Improving cross-individual generalization in multimodal sentiment analysis.arXiv preprint arXiv:1609.05244, 2016.

Paul J Werbos. Generalization of backpropagation with application to a recurrent gasmarket model.Neural Networks, 1(4):339–356, 1988.

Paul J Werbos. Backpropagation through time: what it does and how to do it.Proceedingsof the IEEE, 78(10):1550–1560, 1990.

Bernard Widrow et al.Adaptive” adaline” Neuron Using Chemical” memistors.”.1960.71

On the Origin of Deep Learning

Author:

Abstract:

Document:

References:

Recent Posts

Archive