Author(s):

Agarwal, Shivang

Terrail, Jean Ogier Du

Jurie, Frédéric

Abstract:

Object detection-the computer vision task dealing with detecting instances of objects of a certain class (e.g., ‘car’, ‘plane’, etc.) in images-attracted a lot of attention from the community during the last 5 years. This strong interest can be explained not only by the importance this task has for many applications but also by the phenomenal advances in this area since the arrival of deep convolutional neural networks (DCNN). This article reviews the recent literature on object detection with deep CNN, in a comprehensive way, and provides an in-depth view of these recent advances. The survey covers not only the typical architectures (SSD, YOLO, Faster-RCNN) but also discusses the challenges currently met by the community and goes on to show how the problem of object detection can be extended. This survey also reviews the public datasets and associated state-of-the-art algorithms.

Document:

https://arxiv.org/abs/1809.03193

References:

[1] Takuya Akiba, Shuji Suzuki, and KeisukeFukuda. Extremely large minibatch SGD:training resnet-50 on imagenet in 15 min-utes.CoRR, abs/1711.04325, 2017. URLhttp://arxiv.org/abs/1711.04325.

[2] Bogdan Alexe, Thomas Deselaers, and Vit-torio Ferrari. What is an object?InTheTwenty-Third IEEE Conference on Com-puter Vision and Pattern Recognition, CVPR2010, San Francisco, CA, USA, 13-18 June2010, pages 73–80, 2010.

[3] Bogdan Alexe, Thomas Deselaers, and Vit-torio Ferrari. Measuring the objectness ofimage windows.IEEE Transactions on Pat-tern Analysis and Machine Intelligence, 34(11):2189–2202, 2012.

[4] HassanAbuAlhaija,SivaKarthikMustikovela, Lars M. Mescheder, AndreasGeiger, and Carsten Rother.Augmentedreality meets computer vision:Efficientdata generation for urban driving scenes.International Journal of Computer Vision(IJCV), 126(9):961–972, 2018.

[5] Phil Ammirato, Patrick Poirson, EunbyungPark, Jana Kosecka, and Alexander C. Berg.A dataset for developing and benchmarkingactive vision.IEEE International Conferenceon Robotics and Automation (ICRA), cs.CV,2017.

[6] Anelia Angelova, Alex Krizhevsky, VincentVanhoucke, Abhijit S Ogale, and Dave Fer-guson. Real-time pedestrian detection withdeep network cascades. InProceedings ofthe British Machine Vision Conference 2015,BMVC 2015, Swansea, UK, September 7-10,2015, volume 2, page 4, 2015.

[7] Antreas Antoniou, Amos J. Storkey, andHarrison Edwards.Data augmentationgenerative adversarial networks.CoRR,abs/1711.04340, 2017. URLhttp://arxiv.org/abs/1711.04340.

[8] Seung-Hwan Bae, Youngwan Lee, YoungjooJo, Yuseok Bae, and Joong-won Hwang.Rank of experts: Detection network ensem-ble.CoRR, abs/1712.00185, 2017.URLhttp://arxiv.org/abs/1712.00185.

[9] Yancheng Bai, Yongqiang Zhang, MingliDing, and Bernard Ghanem. SOD-MTGAN:Small Object Detection via Multi-Task Gen-erative Adversarial Network. InComputerVision – ECCV 2018 – 15th European Con-ference, Munich, Germany, September 8 -14, 2018, page 16, 2018.

[10] AnkanBansal,KaranSikka,GauravSharma,RamaChellappa,andAjayDivakaran.Zero-shot object detection.CoRR, abs/1804.04340,2018.URLhttp://arxiv.org/abs/1804.04340.

[11] Peter W. Battaglia, Jessica B. Hamrick,Victor Bapst, Alvaro Sanchez-Gonzalez,Vin ́ıcius Flores Zambaldi, Mateusz Mali-nowski, Andrea Tacchetti, David Raposo,Adam Santoro, Ryan Faulkner, C ̧ aglarG ̈ul ̧cehre, Francis Song, Andrew J. Bal-lard, Justin Gilmer, George E. Dahl, AshishVaswani, Kelsey Allen, Charles Nash, Vic-toria Langston, Chris Dyer, Nicolas Heess,Daan Wierstra, Pushmeet Kohli, MatthewBotvinick, Oriol Vinyals, Yujia Li, and Raz-van Pascanu.Relational inductive biases,deep learning, and graph networks.CoRR,abs/1806.01261, 2018. URLhttp://arxiv.org/abs/1806.01261.

[12] LorisBazzani,AlessandroBergamo,Dragomir Anguelov,and Lorenzo Tor-resani. Self-taught object localization withdeep networks. In2016 IEEE Winter Con-ference on Applications of Computer Vision,WACV 2016, Lake Placid, NY, USA, March7-10, 2016, pages 1–9, 2016. URLhttps://doi.org/10.1109/WACV.2016.7477688.

[13] Karsten Behrendt and Libor Novak. A DeepLearning Approach to Traffic Lights: De-tection, Tracking, and Classification.In Robotics and Automation (ICRA), 2017IEEE International Conference On, 2017.

[14] Sean Bell, C. Lawrence Zitnick, Kavita Bala,and Ross Girshick. Inside-Outside Net: De-tecting Objects in Context with Skip Pool-ing and Recurrent Neural Networks. In2016IEEE Conference on Computer Vision andPattern Recognition, CVPR 2016, Las Ve-gas,NV, USA, June 27-30, 2016, 2016.

[15] Jorge Beltr ́an,Carlos Guindel,Fran-cisco Miguel Moreno,Daniel Cruzado,FernandoGarc ́ıa,andArturodelaEscalera.Birdnet:a 3d object detec-tion framework from lidar information.CoRR, abs/1805.01195,2018.URLhttp://arxiv.org/abs/1805.01195.

[16] Rodrigo Benenson, Markus Mathias, RaduTimofte, and Luc Van Gool. Pedestrian de-tection at 100 frames per second. In2012IEEE Conference on Computer Vision andPattern Recognition, Providence, RI, USA,June 16-21, 2012, pages 2903–2910, 2012.

[17] Simone Bianco, Marco Buzzelli, DavideMazzini, and Raimondo Schettini.DeepLearning for Logo Recognition.Neurocom-puting, 245:23–30, July 2017.

[18] Hakan Bilen and Andrea Vedaldi. Weakly Su-pervised Deep Detection Networks. In2016IEEE Conference on Computer Vision andPattern Recognition, CVPR 2016, Las Ve-gas,NV, USA, June 27-30, 2016, 2016.

[19] Hakan Bilen, Marco Pedersoli, and TinneTuytelaars. Weakly supervised object detec-tion with convex clustering. InIEEE Confer-ence on Computer Vision and Pattern Recog-nition, CVPR 2015, Boston, MA, USA, June7-12, 2015, June 2015.

[20] Bin Yang, Junjie Yan, Zhen Lei, and Stan Z.Li. Fine-grained evaluation on face detectionin the wild. InAutomatic Face and GestureRecognition (FG), pages 1–7, 2015.

[21] Navaneeth Bodla, Bharat Singh, Rama Chel-lappa, and Larry S Davis.Soft-nms—improving object detection with one line ofcode. InIEEE International Conference onComputer Vision, ICCV 2017, Venice, Italy,October 22-29, 2017, pages 5562–5570, 2017.

[22] Lubomir Bourdev, Subhransu Maji, ThomasBrox, and Jitendra Malik. Detecting peopleusing mutually consistent poselet activations.InComputer Vision – ECCV 2010, 11th Eu-ropean Conference on Computer Vision, Her-aklion, Crete, Greece, September 5-11, 2010,pages 168–181, 2010.

[23] Konstantinos Bousmalis, Nathan Silberman,David Dohan, Dumitru Erhan, and Dilip Kr-ishnan. Unsupervised Pixel-Level DomainAdaptation with Generative Adversarial Net-works. In2017 IEEE Conference on Com-puter Vision and Pattern Recognition, CVPR2017, Honolulu, HI, USA, July 21-26, 2017,pages 95–104, 2017.

[24] Samarth Brahmbhatt, Henrik I. Chris-tensen, and James Hays. StuffNet – Using'Stuff' to Improve Object Detec-tion. InIEEE Winter Conf. on Applicationsof Computer Vision (WACV), 2017.

[25] Markus Braun, Sebastian Krebs, FabianFlohr, and Dariu M. Gavrila. The eurocitypersons dataset: A novel benchmark for ob-ject detection.CoRR, abs/1805.07193, 2018.URLhttp://arxiv.org/abs/1805.07193.[

26] Michal Busta, Lukas Neumann, and JiriMatas.Deep textspotter: An end-to-endtrainable scene text localization and recogni-tion framework. InIEEE International Con-ference on Computer Vision, ICCV 2017,Venice, Italy, October 22-29, 2017, pages2223–2231. IEEE Computer Society, 2017.

[27] Zhaowei Cai and Nuno Vasconcelos. Cas-cade R-CNN: delving into high quality ob-ject detection. InComputer Vision and Pat-tern Recognition (CVPR), 2018 IEEE Con-55ference on, pages 6154–6162, 2018.doi:10.1109/CVPR.2018.00644.

[28] Zhaowei Cai, Quanfu Fan, Rogerio S Feris,and Nuno Vasconcelos.A unified multi-scale deep convolutional neural network forfast object detection. InComputer Vision- ECCV 2016 – 14th European Conference,Amsterdam, The Netherlands, October 11-14, 2016, pages 354–370, 2016.

[29] Guimei Cao, Xuemei Xie, Wenzhe Yang,Quan Liao, Guangming Shi, and Jinjian Wu.Feature-fused SSD: fast detection for smallobjects.CoRR, abs/1709.05054, 2017. URLhttp://arxiv.org/abs/1709.05054.

[30] Joao Carreira and Cristian Sminchisescu.Constrained parametric min-cuts for auto-matic object segmentation. InThe Twenty-Third IEEE Conference on Computer Visionand Pattern Recognition, CVPR 2010, SanFrancisco, CA, USA, 13-18 June 2010, pages3241–3248, 2010.

[31] Joao Carreira and Cristian Sminchisescu.Cpmc: Automatic object segmentation us-ing constrained parametric min-cuts.IEEETransactions on Pattern Analysis and Ma-chine Intelligence, 34(7):1312–1328, 2011.

[32] Llu ́ıs Castrej ́on, Kaustav Kundu, Raquel Ur-tasun, and Sanja Fidler.Annotating ob-ject instances with a polygon-rnn. In2017IEEE Conference on Computer Vision andPattern Recognition, CVPR 2017, Honolulu,HI, USA, July 21-26, 2017, pages 4485–4493,2017. doi: 10.1109/CVPR.2017.477.

[33] Francisco M. Castro, Manuel J. Mar ́ın-Jim ́enez, Nicol ́as Guil, Cordelia Schmid, andKarteek Alahari. End-to-End IncrementalLearning. InComputer Vision – ECCV 2018- 15th European Conference, Munich, Ger-many, September 8 – 14, 2018, 2018.

[34] FlorianChabot,MohamedChaouch,Jaonary Rabarisoa, C ́eline Teuli`ere, andThierry Chateau. Deep MANTA: A coarse-to-fine many-task network for joint 2d and3d vehicle analysis from monocular image.In2017 IEEE Conference on ComputerVision and Pattern Recognition, CVPR2017, Honolulu, HI, USA, July 21-26, 2017,pages 1827–1836, 2017.

[35] Karanbir Singh Chahal and Kuntal Dey. Asurvey of modern object detection literatureusing deep learning.CoRR, 2018.

[36] Chenyi Chen, Ming-Yu Liu 0001, OncelTuzel, and Jianxiong Xiao. R-CNN for SmallObject Detection.Computer Vision – ACCV2016 – 13th Asian Conference on ComputerVision, Taipei, Taiwan, November 20-24,2016, 10115:214–230, 2016.

[37] D. Chen, G. Hua, F. Wen, and J. Sun.Supervised transformer network for efficientface detection. InComputer Vision – ECCV2016 – 14th European Conference, Amster-dam, The Netherlands, October 11-14, 2016,2016.

[38] Guang Chen, Yuanyuan Ding, Jing Xiao, andTony X Han. Detection evolution with multi-order contextual co-occurrence.In2013IEEE Conference on Computer Vision andPattern Recognition, Portland, OR, USA,June 23-28, 2013, pages 1798–1805, 2013.

[39] Hao Chen, Yali Wang, Guoyou Wang,and Yu Qiao. LSTD: A low-shot transferdetector for object detection. In Sheila A.McIlraith and Kilian Q. Weinberger, ed-itors,Proceedings of the Thirty-SecondAAAI Conference on Artificial Intelligence,New Orleans,Louisiana,USA, Febru-ary 2-7, 2018. AAAI Press, 2018.URLhttps://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16778.

[40] Kai Chen, Hang Song, Chen Change Loy,and Dahua Lin. Discover and Learn NewObjects from Documentaries. In2017 IEEE56Conference on Computer Vision and Pat-tern Recognition, CVPR 2017, Honolulu, HI,USA, July 21-26, 2017, pages 1111–1120,July 2017.

[41] Kai Chen,Jiaqi Wang,Shuo Yang,XingchengZhang,YuanjunXiong,Chen Change Loy, and Dahua Lin. Optimiz-ing video object detection via a scale-timelattice.CoRR, abs/1804.05472, 2018. URLhttp://arxiv.org/abs/1804.05472.

[42] Liang-Chieh Chen, George Papandreou, Ia-sonas Kokkinos, Kevin Murphy, and Alan L.Yuille.Deeplab:Semantic image seg-mentation with deep convolutional nets,atrous convolution, and fully connectedcrfs.IEEE Transactions on Pattern Analy-sis and Machine Intelligence, 40(4):834–848,2018.URLhttps://doi.org/10.1109/TPAMI.2017.2699184.

[43] Shang-Tse Chen, Cory Cornelius, Jason Mar-tin, and Duen Horng Chau. Robust physicaladversarial attack on faster R-CNN objectdetector.CoRR, abs/1804.05810, 2018. URLhttp://arxiv.org/abs/1804.05810.

[44] X. Chen, K. Kundu, Z. Zhang, H. Ma, andS. Fidler. Monocular 3d object detection forautonomous driving. In 2016 IEEE Confer-ence on Computer Vision and Pattern Recog-nition, CVPR 2016, Las Vegas,NV, USA,June 27-30, 2016, 2016.

[45] Xiaozhi Chen, Kaustav Kundu, Yukun Zhu,Andrew G. Berneshawi, Huimin Ma, SanjaFidler, and Raquel Urtasun.3d objectproposals for accurate object class detec-tion. In Corinna Cortes, Neil D. Lawrence,Daniel D. Lee, Masashi Sugiyama, andRoman Garnett, editors,Advances in NeuralInformation Processing Systems 28:An-nual Conference on Neural InformationProcessing Systems 2015, December 7-12,2015, Montreal, Quebec, Canada, pages424–432, 2015. URLhttp://papers.nips.cc/paper/5644-3d-object-proposals-for-accurate-object-class-detection.

[46] Xiaozhi Chen, Huimin Ma, Xiang Wang, andZhichen Zhao.Improving object propos-als with multi-thresholding straddling expan-sion. InIEEE Conference on Computer Vi-sion and Pattern Recognition, CVPR 2015,Boston, MA, USA, June 7-12, 2015, 2015.

[47] Xiaozhi Chen, Huimin Ma, Ji Wan, Bo Li,and Tian Xia. Multi-view 3d object detec-tion network for autonomous driving. In2017IEEE Conference on Computer Vision andPattern Recognition, CVPR 2017, Honolulu,HI, USA, July 21-26, 2017, pages 6526–6534.IEEE Computer Society, 2017.

[48] Xinlei Chen and Abhinav Gupta. SpatialMemory for Context Reasoning in Object De-tection. In 2017 IEEE Conference on Com-puter Vision and Pattern Recognition, CVPR2017, Honolulu, HI, USA, July 21-26, 2017,2017.

[49] Yuhua Chen, Wen Li, Christos Sakaridis,Dengxin Dai, and Luc Van Gool. Domainadaptive faster R-CNN for object detectionin the wild.CoRR, abs/1803.03243, 2018.URLhttp://arxiv.org/abs/1803.03243.

[50] Yunpeng Chen, Jianan Li, Huaxin Xiao, Xi-aojie Jin, Shuicheng Yan, and Jiashi Feng.Dual path networks. InAdvances in NeuralInformation Processing Systems 30: AnnualConference on Neural Information Process-ing Systems 2017, 4-9 December 2017, LongBeach, CA, USA, pages 4467–4475, 2017.

[51] Yunpeng Chen, Jianshu Li, Bin Zhou, JiashiFeng, and Shuicheng Yan. Weaving multi-scale context for single shot detector.CoRR,abs/1712.03149, 2017. URLhttp://arxiv.org/abs/1712.03149.

[52] G. Cheng, P. Zhou, and J. Han. RIFD-CNN:Rotation-Invariant and Fisher DiscriminativeConvolutional Neural Networks for Object57Detection.In2016 IEEE Conference onComputer Vision and Pattern Recognition,CVPR 2016, Las Vegas,NV, USA, June 27-30, 2016, 2016.

[53] Gong Cheng and Junwei Han. A Survey onObject Detection in Optical Remote SensingImages.ISPRS Journal of Photogrammetryand Remote Sensing, 117:11–28, 2016.

[54] Gong Cheng, Peicheng Zhou, and JunweiHan.Learning rotation-invariant convo-lutional neural networks for object detec-tion in vhr optical remote sensing images.IEEE Transactions on Geoscience and Re-mote Sensing, 54(12):7405–7415, 2016.

[55] Jianpeng Cheng, Li Dong, and Mirella Lap-ata. Long short-term memory-networks formachine reading. InProceedings of the 2016Conference on Empirical Methods in NaturalLanguage Processing, EMNLP 2016, Austin,Texas, USA, November 1-4, 2016, pages 551–561, 2016.

[56] Ming-Ming Cheng, Ziming Zhang, Wen-YanLin, and Philip Torr.Bing:Binarizednormed gradients for objectness estimationat 300fps.In2014 IEEE Conference onComputer Vision and Pattern Recognition,CVPR 2014, Columbus, OH, USA, June 23-28, 2014, pages 3286–3293, 2014.

[57] Fran ̧cois Chollet.Xception: Deep learn-ing with depthwise separable convolutions.In2017 IEEE Conference on Computer Vi-sion and Pattern Recognition, CVPR 2017,Honolulu, HI, USA, July 21-26, 2017, pages1800–1807, 2017.

[58] Marius Cordts, Mohamed Omran, SebastianRamos, Timo Rehfeld, Markus Enzweiler,Rodrigo Benenson, Uwe Franke, Stefan Roth,and Bernt Schiele. The cityscapes datasetfor semantic urban scene understanding. InProc. of the IEEE Conference on ComputerVision and Pattern Recognition (CVPR),2016.

[59] Gabriela Csurka.A comprehensive sur-vey on domain adaptation for visual appli-cations.In Gabriela Csurka, editor,Do-main Adaptation in Computer Vision Appli-cations., Advances in Computer Vision andPattern Recognition, pages 1–35. Springer,2017.URLhttps://doi.org/10.1007/978-3-319-58347-1_1.

[60] Ekin Dogus Cubuk, Barret Zoph, DandelionMan ́e, Vijay Vasudevan, and Quoc V. Le.Autoaugment: Learning augmentation poli-cies from data.CoRR, abs/1805.09501, 2018.URLhttp://arxiv.org/abs/1805.09501.

[61] Jifeng Dai, Kaiming He, and Jian Sun.Instance-aware semantic segmentation viamulti-task network cascades. In2016 IEEEConference on Computer Vision and PatternRecognition, CVPR 2016, Las Vegas,NV,USA, June 27-30, 2016, pages 3150–3158,2016.

[62] Jifeng Dai, Yi Li, Kaiming He, and JianSun. R-fcn: Object detection via region-based fully convolutional networks. InAd-vances in Neural Information Processing Sys-tems 29: Annual Conference on Neural In-formation Processing Systems 2016, Decem-ber 5-10, 2016, Barcelona, Spain, pages 379–387, 2016.

[63] Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li,Guodong Zhang, Han Hu, and Yichen Wei.Deformable convolutional networks. InIEEEInternational Conference on Computer Vi-sion, ICCV 2017, Venice, Italy, October 22-29, 2017, pages 764–773. IEEE Computer So-ciety, 2017.

[64] Navneet Dalal and Bill Triggs. Histogramsof oriented gradients for human detection.In2005 IEEE Computer Society Conferenceon Computer Vision and Pattern Recognition(CVPR 2005), 20-26 June 2005, San Diego,CA, USA, volume 1, pages 886–893, 2005.58

[65] Manolis Delakis and Christophe Garcia. textdetection with convolutional neural net-works.InInternational Joint Conferenceon Computer Vision, Imaging and ComputerGraphics Theory and Applications (VISAP),pages 290–294, 2008.

[66] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet:A large-scale hierarchical image database.In2009 IEEE Computer Society Conferenceon Computer Vision and Pattern Recogni-tion (CVPR 2009), 20-25 June 2009, Miami,Florida, USA, pages 248–255, 2009.

[67] Zhipeng Deng, Hao Sun, Shilin Zhou, Juan-ping Zhao, and Huanxin Zou. Toward Fastand Accurate Vehicle Detection in Aerial Im-ages Using Coupled Region-Based Convolu-tional Neural Networks.IEEE Journal of Se-lected Topics in Applied Earth Observationsand Remote Sensing, 10:3652–3664, 2017.

[68] Zhuo Deng and Longin Jan Latecki. AmodalDetection of 3D Objects:Inferring 3DBounding Boxes from 2D Ones in RGB-Depth Images. In2017 IEEE Conference onComputer Vision and Pattern Recognition,CVPR 2017, Honolulu, HI, USA, July 21-26,2017, pages 398–406, 2017.

[69] Terrance Devries and Graham W. Tay-lor. Dataset augmentation in feature space.CoRR, abs/1702.05538, 2017. URLhttp://arxiv.org/abs/1702.05538.[70] Terrance Devries and Graham W. Tay-lor.Improved regularization of convolu-tional neural networks with cutout.CoRR,abs/1708.04552, 2017. URLhttp://arxiv.org/abs/1708.04552.

[71] Piotr Dollar, Christian Wojek, Bernt Schiele,and Pietro Perona. Pedestrian detection: Anevaluation of the state of the art.IEEETransactions on Pattern Analysis and Ma-chine Intelligence, 34(4):743–761, 2012.

[72] Piotr Doll ́ar, Ron Appel, Serge J. Belongie,and Pietro Perona. Fast feature pyramids forobject detection.IEEE Transactions on Pat-tern Analysis and Machine Intelligence, 36(8):1532–1545, 2014.

[73] Xuanyi Dong, Liang Zheng, Fan Ma,Yi Yang, and Deyu Meng.Few-shot ob-ject detection.CoRR, abs/1706.08249, 2017.URLhttp://arxiv.org/abs/1706.08249.

[74] Thibaut Durand,Nicolas Thome,andMatthieu Cord. MANTRA: Minimum Maxi-mum Latent Structural SVM for Image Clas-sification and Ranking.InIEEE Inter-national Conference on Computer Vision,ICCV 2015, Santiago, Chile, December 7-13,2015, 2015.

[75] Thibaut Durand,Nicolas Thome,andMatthieu Cord. Weldon: Weakly supervisedlearning of deep convolutional neural net-works. In2016 IEEE Conference on Com-puter Vision and Pattern Recognition, CVPR2016, Las Vegas,NV, USA, June 27-30, 2016,2016.

[76] Thibaut Durand, Taylor Mordan, NicolasThome, and Matthieu Cord.WILDCAT:Weakly Supervised Learning of Deep Con-vNets for Image Classification, Pointwise Lo-calization and Segmentation. In2017 IEEEConference on Computer Vision and Pat-tern Recognition, CVPR 2017, Honolulu, HI,USA, July 21-26, 2017, 2017.

[77] Nikita Dvornik, Julien Mairal, and CordeliaSchmid. Modeling Visual Context is Keyto Augmenting Object Detection Datasets.InComputer Vision – ECCV 2018 – 15thEuropean Conference, Munich, Germany,September 8 – 14, 2018, page 18, 2018.

[78] D. Dwibedi. Synthesizing scenes for instancedetection. Master’s thesis, Carnegie MellonUniversity, 2017.59

[79] Debidatta Dwibedi, Ishan Misra, and Mar-tial Hebert. Cut, paste and learn: Surpris-ingly easy synthesis for instance detection. InIEEE International Conference on ComputerVision, ICCV 2017, Venice, Italy, October22-29, 2017, pages 1310–1319. IEEE Com-puter Society, 2017.

[80] Christian Eggert, Dan Zecha, StephanBrehm, and Rainer Lienhart.Improvingsmall object proposals for company logo de-tection. InProceedings of the 2017 ACM onInternational Conference on Multimedia Re-trieval, pages 167–174, 2017.

[81] Ian Endres and Derek Hoiem. Category inde-pendent object proposals. InComputer Vi-sion – ECCV 2010, 11th European Confer-ence on Computer Vision, Heraklion, Crete,Greece, September 5-11, 2010, pages 575–588, 2010.

[82] Ian Endres and Derek Hoiem.Category-independent object proposals with diverseranking.IEEE Transactions on PatternAnalysis and Machine Intelligence, 36(2):222–234, 2014.

[83] Martin Engelcke,Dushyant Rao,Do-minic Zeng Wang, Chi Hay Tong, and IngmarPosner. Vote3Deep: Fast Object Detectionin 3D Point Clouds Using Efficient Convolu-tional Neural Networks. InIEEE Interna-tional Conference on Robotics and Automa-tion (ICRA), 2017.

[84] Markus Enzweiler and Dariu M Gavrila.Monocular pedestrian detection: Survey andexperiments.IEEE Transactions on PatternAnalysis and Machine Intelligence, 31(12):2179–2195, 2008.[85] Markus Enzweiler and Dariu M. Gavrila.A multilevel mixture-of-experts frameworkfor pedestrian classification.IEEE Trans-actions on Image Processing, 20(10):2967–2979, 2011.

[86] Dumitru Erhan, Christian Szegedy, Alexan-der Toshev, and Dragomir Anguelov. Scal-able Object Detection Using Deep NeuralNetworks.In2014 IEEE Conference onComputer Vision and Pattern Recognition,CVPR 2014, Columbus, OH, USA, June 23-28, 2014, 2014.

[87] Andreas Ess, Bastian Leibe, and LucVan Gool. Depth and appearance for mo-bile scene analysis.InIEEE 11th Inter-national Conference on Computer Vision,ICCV 2007, Rio de Janeiro, Brazil, October14-20, 2007, pages 1–8, 2007.

[88] Mark Everingham, Luc Van Gool, Christo-pher KI Williams, John Winn, and An-drew Zisserman. The pascal visual objectclasses (voc) challenge.International Journalof Computer Vision (IJCV), 88(2):303–338,2010.

[89] Christoph Feichtenhofer, Axel Pinz, and An-drew Zisserman. Detect to track and trackto detect. In2017 IEEE Conference on Com-puter Vision and Pattern Recognition, CVPR2017, Honolulu, HI, USA, July 21-26, 2017,pages 3038–3046, 2017.

[90] Pedro F. Felzenszwalb,Ross B. Gir-shick, David A. McAllester, and Deva Ra-manan.Object detection with discrimi-natively trained part-based models.IEEETransactions on Pattern Analysis and Ma-chine Intelligence, 32(9):1627–1645, 2010.

[91] Ruth C. Fong and Andrea Vedaldi.In-terpretable explanations of black boxes bymeaningful perturbation.InIEEE Inter-national Conference on Computer Vision,ICCV 2017, Venice, Italy, October 22-29,2017, pages 3449–3457. IEEE Computer So-ciety, 2017.

[92] Cheng-Yang Fu, Wei Liu, Ananth Ranga,Ambrish Tyagi, and Alexander C. Berg.DSSD : Deconvolutional single shot detec-tor.CoRR, abs/1701.06659, 2017.URLhttp://arxiv.org/abs/1701.06659.60

[93] Yanwei Fu, Tao Xiang, Yu-Gang Jiang, Xi-angyang Xue, Leonid Sigal, and ShaogangGong. Recent advances in zero-shot recogni-tion: Toward data-efficient understanding ofvisual content.IEEE Signal Processing Mag-azine, 35(1):112–125, 2018.

[94] A Gaidon, Q Wang, Y Cabon, and E Vig.Virtual worlds as proxy for multi-objecttracking analysis. In2016 IEEE Conferenceon Computer Vision and Pattern Recogni-tion, CVPR 2016, Las Vegas,NV, USA, June27-30, 2016, 2016.

[95] Mingfei Gao, Ruichi Yu, Ang Li, Vlad I.Morariu, and Larry S. Davis. Dynamic zoom-in network for fast object detection in largeimages.CoRR, abs/1711.05187, 2017. URLhttp://arxiv.org/abs/1711.05187.

[96] Christophe Garcia and Manolis Delakis. Aneural architecture for fast and robust facedetection. InPattern Recognition, 2002. Pro-ceedings. 16th International Conference on,volume 2, pages 44–47, 2002.

[97] Weifeng Ge, Sibei Yang, and Yizhou Yu.Multi-evidence filtering and fusion for multi-label classification, object detection and se-mantic segmentation based on weakly super-vised learning. InComputer Vision and Pat-tern Recognition (CVPR), 2018 IEEE Con-ference on, June 2018.

[98] Andreas Geiger, Philip Lenz, and Raquel Ur-tasun. Are we ready for autonomous driving?the kitti vision benchmark suite. In2012IEEE Conference on Computer Vision andPattern Recognition, Providence, RI, USA,June 16-21, 2012, pages 3354–3361, 2012.

[99] Georgios Georgakis, Arsalan Mousavian,Alexander C. Berg, and Jana Kosecka. Syn-thesizing training data for object detectionin indoor scenes.In Nancy M. Amato,Siddhartha S. Srinivasa, Nora Ayanian,and Scott Kuindersma, editors,Robotics:Science and Systems XIII, MassachusettsInstitute of Technology, Cambridge, Mas-sachusetts, USA, July 12-16, 2017, 2017.URLhttp://www.roboticsproceedings.org/rss13/p43.html.

[100] David Ger ́onimo, Angel Domingo Sappa, An-tonio L ́opez, and Daniel Ponsa. Adaptiveimage sampling and windows classificationfor on-board pedestrian detection. InPro-ceedings of the 5th International Conferenceon Computer Vision Systems (ICVS 2007),2007.

[101] Spyridon Gidaris and Nikos Komodakis. At-tend Refine Repeat – Active Box ProposalGeneration via In-Out Localization.InProceedings of the British Machine VisionConference 2016, BMVC 2016, York, UK,September 19-22, 2016, 2016.

[102] Spyros Gidaris and Nikos Komodakis. Ob-ject detection via a multi-region and seman-tic segmentation-aware cnn model. InIEEEConference on Computer Vision and Pat-tern Recognition, CVPR 2015, Boston, MA,USA, June 7-12, 2015, pages 1134–1142,2015.

[103] Spyros Gidaris and Nikos Komodakis. Loc-Net: Improving Localization Accuracy forObject Detection. In2016 IEEE Conferenceon Computer Vision and Pattern Recogni-tion, CVPR 2016, Las Vegas,NV, USA, June27-30, 2016, 2016.

[104] Ross Girshick.Fast r-cnn.InIEEE In-ternational Conference on Computer Vision,ICCV 2015, Santiago, Chile, December 7-13,2015, pages 1440–1448, 2015.

[105] Ross Girshick, Jeff Donahue, Trevor Darrell,and Jitendra Malik. Rich feature hierarchiesfor accurate object detection and semanticsegmentation. In2014 IEEE Conference onComputer Vision and Pattern Recognition,CVPR 2014, Columbus, OH, USA, June 23-28, 2014, pages 580–587, 2014.61

[106] Ross B. Girshick, Forrest N. Iandola, TrevorDarrell, and Jitendra Malik. Deformable partmodels are convolutional neural networks. InIEEE Conference on Computer Vision andPattern Recognition, CVPR 2015, Boston,MA, USA, June 7-12, 2015, 2015.

[107] Georgia Gkioxari and Jitendra Malik. Find-ing action tubes. InIEEE Conference onComputer Vision and Pattern Recognition,CVPR 2015, Boston, MA, USA, June 7-12,2015, pages 759–768, 2015. doi: 10.1109/CVPR.2015.7298676.

[108] Abel Gonzalez-Garcia, Davide Modolo, andVittorio Ferrari.Objects as context forpart detection.CoRR, abs/1703.09529, 2017.URLhttp://arxiv.org/abs/1703.09529.

[109] Ian Goodfellow, Jean Pouget-Abadie, MehdiMirza, Bing Xu, David Warde-Farley, Sher-jil Ozair, Aaron Courville, and Yoshua Ben-gio.Generative adversarial nets.InAd-vances in Neural Information Processing Sys-tems 27: Annual Conference on Neural In-formation Processing Systems 2014, Decem-ber 8-13 2014, Montreal, Quebec, Canada,pages 2672–2680, 2014.

[110] Ian J. Goodfellow, David Warde-Farley,Mehdi Mirza, Aaron C. Courville, andYoshua Bengio.Maxout networks.InProceedingsofthe30thInternationalConference on Machine Learning, ICML2013,Atlanta,GA, USA, 16-21 June2013,pages 1319–1327,2013.URLhttp://jmlr.org/proceedings/papers/v28/goodfellow13.html.

[111] Priya Goyal, Piotr Doll ́ar, Ross B. Girshick,Pieter Noordhuis, Lukasz Wesolowski, AapoKyrola, Andrew Tulloch, Yangqing Jia, andKaiming He.Accurate, large minibatchSGD: training imagenet in 1 hour.CoRR,abs/1706.02677, 2017. URLhttp://arxiv.org/abs/1706.02677.

[112] Ankush Gupta, Andrea Vedaldi, and An-drew Zisserman. Synthetic Data for TextLocalisation in Natural Images.In2016IEEE Conference on Computer Vision andPattern Recognition, CVPR 2016, Las Ve-gas,NV, USA, June 27-30, 2016, pages 2315–2324, June 2016.

[113] Saurabh Gupta, Bharath Hariharan, and Ji-tendra Malik.Exploring person contextand local scene context for object detection.CoRR, abs/1511.08177, 2015. URLhttp://arxiv.org/abs/1511.08177.

[114] Song Han, Huizi Mao, and William J. Dally.Deep compression: Compressing deep neuralnetwork with pruning, trained quantizationand huffman coding.CoRR, abs/1510.00149,2015. URLhttp://arxiv.org/abs/1510.00149.

[115] Wei Han,Pooya Khorrami,Tom LePaine, Prajit Ramachandran, MohammadBabaeizadeh,Honghui Shi,Jianan Li,Shuicheng Yan, and Thomas S. Huang. Seq-nms for video object detection.CoRR,abs/1602.08465, 2016. URLhttp://arxiv.org/abs/1602.08465.

[116] Kaiming He, Xiangyu Zhang, Shaoqing Ren,and Jian Sun. Spatial pyramid pooling indeep convolutional networks for visual recog-nition.IEEE Transactions on Pattern Anal-ysis and Machine Intelligence, 37(9):1904–1916, 2015.

[117] Kaiming He, Xiangyu Zhang, Shaoqing Ren,and Jian Sun. Deep residual learning for im-age recognition. In2016 IEEE Conferenceon Computer Vision and Pattern Recogni-tion, CVPR 2016, Las Vegas,NV, USA, June27-30, 2016, pages 770–778, 2016.

[118] Kaiming He, Georgia Gkioxari, Piotr Doll ́ar,and Ross Girshick. Mask r-cnn. InIEEEInternational Conference on Computer Vi-sion, ICCV 2017, Venice, Italy, October 22-29, 2017, pages 2980–2988, 2017.62

[119] Tong He, Zhi Tian, Weilin Huang, ChunhuaShen, Yu Qiao, and Changming Sun. An end-to-end textspotter with explicit alignmentand attention. InComputer Vision and Pat-tern Recognition (CVPR), 2018 IEEE Con-ference on, 2018.

[120] Geremy Heitz and Daphne Koller. Learn-ing Spatial Context – Using Stuff to FindThings. InComputer Vision – ECCV 2008,10th European Conference on Computer Vi-sion, Marseille, France, October 12-18, 2008,Berlin, Heidelberg, 2008.

[121] Paul Henderson and Vittorio Ferrari. End-to-end training of object class detectors formean average precision. InComputer Vision- ACCV 2016 – 13th Asian Conference onComputer Vision, Taipei, Taiwan, November20-24, 2016, pages 198–213, 2016.

[122] Jo ̃ao F. Henriques and Andrea Vedaldi.Warped Convolutions – Efficient Invarianceto Spatial Transformations.InternationalConference on Machine Learning (ICML),2017.

[123] Congrui Hetang, Hongwei Qin, Shaohui Liu,and Junjie Yan. Impression network for videoobject detection.CoRR, abs/1712.05896,2017. URLhttp://arxiv.org/abs/1712.05896.

[124] MichaelHimmelsbach,AndreMueller,ThorstenL ̈uttel,andHans-JoachimW ̈unsche.Lidar-based 3d object per-ception. InProceedings of 1st internationalworkshop on cognition for technical systems,volume 1, 2008.

[125] Stefan Hinterstoisser, Vincent Lepetit, PaulWohlhart, and Kurt Konolige.On pre-trained image features and synthetic imagesfor deep learning.CoRR, abs/1710.10710,2017. URLhttp://arxiv.org/abs/1710.10710.

[126] Erik Hjelm ̊as and Boon Kee Low. Face De-tection: A Survey.Computer Vision and Im-age Understanding (CVIU), 83(3):236–274,September 2001.

[127] Judy Hoffman, Sergio Guadarrama, Eric STzeng, Ronghang Hu, Jeff Donahue, RossGirshick, Trevor Darrell, and Kate Saenko.Lsda: Large scale detection through adap-tation. InAdvances in Neural InformationProcessing Systems 27: Annual Conferenceon Neural Information Processing Systems2014, December 8-13 2014, Montreal, Que-bec, Canada, pages 3536–3544, 2014.

[128] Judy Hoffman, Deepak Pathak, Trevor Dar-rell, and Kate Saenko. Detector discovery inthe wild: Joint multiple instance and repre-sentation learning. InIEEE Conference onComputer Vision and Pattern Recognition,CVPR 2015, Boston, MA, USA, June 7-12,2015, pages 2883–2891, 2015.

[129] Derek Hoiem, Yodsawalai Chodpathumwan,and Qieyun Dai. Diagnosing error in ob-ject detectors. InComputer Vision – ECCV2012 – 12th European Conference on Com-puter Vision, Florence, Italy, October 7-13,2012, pages 340–353, 2012.

[130] Jan Hosang, Rodrigo Benenson, and BerntSchiele. A convnet for non-maximum sup-pression. InGerman Conference on PatternRecognition, pages 192–204, 2016.

[131] Jan Hendrik Hosang, Rodrigo Benenson, andBernt Schiele. How good are detection pro-posals, really?. InBritish Machine VisionConference, BMVC 2014, Nottingham, UK,September 1-5, 2014, 2014.

[132] Jan Hendrik Hosang, Rodrigo Benenson, andBernt Schiele. Learning non-maximum sup-pression. In2017 IEEE Conference on Com-puter Vision and Pattern Recognition, CVPR2017, Honolulu, HI, USA, July 21-26, 2017,pages 6469–6477, 2017.63

[133] Sebastian Houben, Johannes Stallkamp, JanSalmen, Marc Schlipsing, and Christian Igel.Detection of traffic signs in real-world images:The German Traffic Sign Detection Bench-mark. InInternational Joint Conference onNeural Networks, number 1288, 2013.

[134] AndrewG.Howard,MenglongZhu,Bo Chen,Dmitry Kalenichenko,Wei-junWang,TobiasWeyand,MarcoAndreetto, and Hartwig Adam.Mo-bilenets:Efficient convolutional neuralnetworks for mobile vision applications.CoRR, abs/1704.04861,2017.URLhttp://arxiv.org/abs/1704.04861.

[135] Han Hu, Jiayuan Gu, Zheng Zhang, JifengDai, and Yichen Wei. Relation networks forobject detection. InComputer Vision andPattern Recognition (CVPR), 2018 IEEEConference on, June 2018.

[136] Jie Hu, Li Shen, and Gang Sun. Squeeze-and-excitation networks.CoRR, abs/1709.01507,2017. URLhttp://arxiv.org/abs/1709.01507.

[137] Peiyun Hu and Deva Ramanan. Finding tinyfaces. In2017 IEEE Conference on Com-puter Vision and Pattern Recognition, CVPR2017, Honolulu, HI, USA, July 21-26, 2017,pages 1522–1530. IEEE Computer Society,2017.

[138] Gao Huang, Shichen Liu, Laurens van derMaaten, and Kilian Q. Weinberger. Con-densenet: An efficient densenet using learnedgroup convolutions.CoRR, abs/1711.09224,2017. URLhttp://arxiv.org/abs/1711.09224.

[139] Gao Huang, Zhuang Liu, Laurens VanDer Maaten, and Kilian Q Weinberger.Densely connected convolutional networks.In2017 IEEE Conference on Computer Vi-sion and Pattern Recognition, CVPR 2017,Honolulu, HI, USA, July 21-26, 2017, vol-ume 1, page 3, 2017.

[140] Jonathan Huang, Vivek Rathod, ChenSun, Menglong Zhu, Anoop Korattikara,Alireza Fathi, Ian Fischer, Zbigniew Wo-jna, Yang Song, Sergio Guadarrama, et al.Speed/accuracy trade-offs for modern con-volutional object detectors. In2017 IEEEConference on Computer Vision and Pat-tern Recognition, CVPR 2017, Honolulu, HI,USA, July 21-26, 2017, 2017.

[141] Qiangui Huang, Shaohua Kevin Zhou, SuyaYou, and Ulrich Neumann. Learning to prunefilters in convolutional neural networks. In2018 IEEE Winter Conference on Applica-tions of Computer Vision, WACV 2018, LakeTahoe, NV, USA, March 12-15, 2018, pages709–718. IEEE Computer Society, 2018.

[142] Xun Huang, Ming-Yu Liu, Serge J. Be-longie, and Jan Kautz. Multimodal unsu-pervised image-to-image translation.CoRR,abs/1804.04732, 2018. URLhttp://arxiv.org/abs/1804.04732.

[143] Itay Hubara, Matthieu Courbariaux, DanielSoudry, Ran El-Yaniv, and Yoshua Ben-gio. Binarized neural networks. InAdvancesin Neural Information Processing Systems29: Annual Conference on Neural Informa-tion Processing Systems 2016, December 5-10, 2016, Barcelona, Spain, pages 4107–4115,2016.

[144] Itay Hubara, Matthieu Courbariaux, DanielSoudry, Ran El-Yaniv, and Yoshua Bengio.Quantized neural networks: Training neuralnetworks with low precision weights and ac-tivations.The Journal of Machine LearningResearch, 18(1):6869–6898, 2017.

[145] Ahmad Humayun, Fuxin Li, and James MRehg. Rigor: Reusing inference in graphcuts for generating object regions. In2014IEEE Conference on Computer Vision andPattern Recognition, CVPR 2014, Columbus,OH, USA, June 23-28, 2014, pages 336–343,2014.64

[146] Brody Huval, Adam Coates, and Andrew Y.Ng. Deep learning for class-generic objectdetection.CoRR, abs/1312.6885, 2013. URLhttp://arxiv.org/abs/1312.6885.

[147] Forrest N. Iandola, Matthew W. Moskewicz,Sergey Karayev, Ross B. Girshick, TrevorDarrell, and Kurt Keutzer. Densenet: Im-plementing efficient convnet descriptor pyra-mids.CoRR, abs/1404.1869, 2014. URLhttp://arxiv.org/abs/1404.1869.

[148] Forrest N Iandola, Song Han, Matthew WMoskewicz, Khalid Ashraf, William J Dally,and Kurt Keutzer. Squeezenet: Alexnet-levelaccuracy with 50x fewer parameters and¡ 0.5mb model size.CoRR, abs/1602.07360v3,2016. URLhttp://arxiv.org/abs/1602.07360v3.

[149] Hiroshi Inoue. Data augmentation by pair-ing samples for images classification.CoRR,abs/1801.02929, 2018. URLhttp://arxiv.org/abs/1801.02929.

[150] Naoto Inoue, Ryosuke Furuta, Toshihiko Ya-masaki, and Kiyoharu Aizawa. Cross-domainweakly-supervised object detection throughprogressive domain adaptation.CoRR,abs/1803.11365, 2018. URLhttp://arxiv.org/abs/1803.11365.

[151] Sergey Ioffe. Batch renormalization: Towardsreducing minibatch dependence in batch-normalized models. InAdvances in NeuralInformation Processing Systems 30: AnnualConference on Neural Information Process-ing Systems 2017, 4-9 December 2017, LongBeach, CA, USA, pages 1942–1950, 2017.URLhttp://papers.nips.cc/paper/6790-batch-renormalization-towards-reducing-minibatch-dependence-in-batch-normalized-models.

[152] Sergey Ioffe and Christian Szegedy. Batchnormalization: Accelerating deep networktraining by reducing internal covariate shift.InProceedings of the 32nd InternationalConference on Machine Learning, ICML2015, Lille, France, 6-11 July 2015, pages448–456, 2015.URLhttp://jmlr.org/proceedings/papers/v37/ioffe15.html.

[153] Max Jaderberg, Karen Simonyan, AndreaVedaldi, and Andrew Zisserman.Syn-thetic data and artificial neural networksfor natural scene text recognition.CoRR,abs/1406.2227, 2014. URLhttp://arxiv.org/abs/1406.2227.

[154] Max Jaderberg, Karen Simonyan, and An-drew Zisserman.Spatial transformer net-works. InAdvances in Neural InformationProcessing Systems 28: Annual Conferenceon Neural Information Processing Systems2015, December 7-12, 2015, Montreal, Que-bec, Canada, 2015.

[155] Vidit Jain and Erik Learned-Miller.FDDB:A Benchmark for Face Detection in Uncon-strained Settings. UM-CS-2010-009, Univer-sity of Massachusetts Amherst, 2010.

[156] Jisoo Jeong, Hyojin Park, and Nojun Kwak.Enhancement of SSD by concatenating fea-ture maps for object detection.CoRR,abs/1705.09587, 2017. URLhttp://arxiv.org/abs/1705.09587.

[157] Saurav Jha, Nikhil Agarwal, and SuneetaAgarwal.Towards improved cartoon facedetection and recognition systems.CoRR,abs/1804.01753, 2018. URLhttp://arxiv.org/abs/1804.01753.

[158] Borui Jiang, Ruixuan Luo, Jiayuan Mao,Tete Xiao, and Yuning Jiang. Acquisitionof localization confidence for accurate ob-ject detection.CoRR, abs/1807.11590, 2018.URLhttp://arxiv.org/abs/1807.11590.

159] Yingying Jiang, Xiangyu Zhu, XiaobingWang, Shuli Yang, Wei Li, Hua Wang, PeiFu, and Zhenbo Luo. R2CNN: rotational re-gion CNN for orientation robust scene textdetection.CoRR, abs/1706.09579, 2017.URLhttp://arxiv.org/abs/1706.09579.65

[160] Alexis Joly and Olivier Buisson. Logo re-trieval with a contrario visual query expan-sion. In Wen Gao, Yong Rui, Alan Hanjalic,Changsheng Xu, Eckehard G. Steinbach, Ab-dulmotaleb El-Saddik, and Michelle X. Zhou,editors,Proceedings of the 17th InternationalConference on Multimedia 2009, Vancouver,British Columbia, Canada, October 19-24,2009, pages 581–584. ACM, 2009.

[161] Kinjal A Joshi and Darshak G Thakore.A Survey on Moving Object Detection andTracking in Video Surveillance System.In-ternational Journal of Soft Computing andEngineering (IJSCE), 2(3):5, 2012.

[162] Hongwen Kang, Martial Hebert, Alexei AEfros, and Takeo Kanade. Data-driven ob-jectness.IEEE Transactions on PatternAnalysis and Machine Intelligence, (1):189–195, 2015.

[163] Kai Kang, Wanli Ouyang, Hongsheng Li, andXiaogang Wang. Object detection from videotubelets with convolutional neural networks.In2016 IEEE Conference on Computer Vi-sion and Pattern Recognition, CVPR 2016,Las Vegas,NV, USA, June 27-30, 2016, pages817–825, 2016.

[164] Kai Kang, Hongsheng Li, Junjie Yan, XingyuZeng, Bin Yang, Tong Xiao, Cong Zhang, ZheWang, Ruohui Wang, Xiaogang Wang, andWanli Ouyang. T-CNN: Tubelets with Con-volutional Neural Networks for Object De-tection from Videos.IEEE Transactions onCircuits and Systems for Video Technology,pages 1–1, 2017.

[165] D. Karatzas, L. Gomez-Bigorda, A. Nico-laou, S. Ghosh, A. Bagdanov, M. Iwamura,J. Matas, L. Neumann, V. R. Chandrasekhar,S. Lu, F. Shafait, S. Uchida, and E. Valveny.Icdar 2015 competition on robust reading. In2015 13th International Conference on Doc-ument Analysis and Recognition (ICDAR),pages 1156–1160, Aug 2015.

[166] Tero Karras, Timo Aila, Samuli Laine,and Jaakko Lehtinen.Progressive grow-ing of GANs for improved quality, stabil-ity, and variation.InInternational Con-ference on Learning Representations, 2018.URLhttps://openreview.net/forum?id=Hk99zCeAb.

[167] Harish Katti, Marius V. Peelen, and S. P.Arun. Object detection can be improved us-ing human-derived contextual expectations.CoRR, abs/1611.07218, 2016. URLhttp://arxiv.org/abs/1611.07218.

[168] Gil Keren, Maximilian Schmitt, ThomasKehrenberg, and Bj ̈orn W. Schuller. Weaklysupervised one-shot detection with attentionsiamese networks.CoRR, abs/1801.03329,2018. URLhttp://arxiv.org/abs/1801.03329.

[169] Aditya Khosla, Tinghui Zhou, Tomasz Mal-isiewicz, Alexei A Efros, and Antonio Tor-ralba. Undoing the damage of dataset bias.InComputer Vision – ECCV 2012 – 12th Eu-ropean Conference on Computer Vision, Flo-rence, Italy, October 7-13, 2012, pages 158–171, 2012.

[170] Kye-Hyeon Kim, Yeongjae Cheon, SanghoonHong, Byung-Seok Roh, and Minje Park.PVANET: deep but lightweight neural net-works for real-time object detection.CoRR,abs/1608.08021, 2016. URLhttp://arxiv.org/abs/1608.08021.

[171] Diederik P. Kingma and Jimmy Ba. Adam:A method for stochastic optimization.CoRR,abs/1412.6980, 2014. URLhttp://arxiv.org/abs/1412.6980.

[172] Brendan F Klare,Ben Klein,EmmaTaborsky, Austin Blanton, Jordan Cheney,Kristen Allen, Patrick Grother, Alan Mah,and Anil K Jain. Pushing the frontiers ofunconstrained face detection and recognition:Iarpa janus benchmark a. InIEEE Confer-ence on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June7-12, 2015, pages 1931–1939, 2015.

[173] Iasonas Kokkinos. Ubernet: Training a uni-versal convolutional neural network for low-, mid-, and high-level vision using diversedatasets and limited memory. In2017 IEEEConference on Computer Vision and Pat-tern Recognition, CVPR 2017, Honolulu, HI,USA, July 21-26, 2017, pages 5454–5463.IEEE Computer Society, 2017.

[174] Tao Kong, Anbang Yao, Yurong Chen, andFuchun Sun. HyperNet: Towards AccurateRegion Proposal Generation and Joint Ob-ject Detection. In2016 IEEE Conference onComputer Vision and Pattern Recognition,CVPR 2016, Las Vegas,NV, USA, June 27-30, 2016, April 2016.

[175] Tao Kong, Fuchun Sun, Anbang Yao, Huap-ing Liu, Ming Lu, and Yurong Chen. RON:reverse connection with objectness prior net-works for object detection. In2017 IEEEConference on Computer Vision and Pat-tern Recognition, CVPR 2017, Honolulu, HI,USA, July 21-26, 2017, pages 5244–5252.IEEE Computer Society, 2017.

[176] Tao Kong, Fuchun Sun, Wen-bing Huang,and Huaping Liu. Deep feature pyramid re-configuration for object detection.CoRR,abs/1808.07993, 2018. URLhttp://arxiv.org/abs/1808.07993.

[177] Martin Kostinger, Paul Wohlhart, Peter M.Roth, and Horst Bischof.Annotated Fa-cial Landmarks in the Wild: A large-scale,real-world database for facial landmark local-ization. InFirst IEEE International Work-shop on Benchmarking Facial Image AnalysisTechnologies, pages 2144–2151, 2011.

[178] Ivan Krasin, Tom Duerig, Neil Alldrin,Vittorio Ferrari, Sami Abu-El-Haija, AlinaKuznetsova, Hassan Rom, Jasper Uijlings,Stefan Popov, Shahab Kamali, Matteo Mal-loci, Jordi Pont-Tuset, Andreas Veit, SergeBelongie, Victor Gomes, Abhinav Gupta,Chen Sun, Gal Chechik, David Cai, ZheyunFeng, Dhyanesh Narayanan, and KevinMurphy.Openimages: A public datasetfor large-scale multi-label and multi-classimage classification.Dataset available fromhttps://storage.googleapis.com/openimages/web/index.html,2017.

[179] Ranjay Krishna, Yuke Zhu, Oliver Groth,Justin Johnson, Kenji Hata, Joshua Kravitz,Stephanie Chen, Yannis Kalantidis, Li-JiaLi, David A. Shamma, Michael S. Bernstein,and Li Fei-Fei. Visual genome: Connect-ing language and vision using crowdsourceddense image annotations.International Jour-nal of Computer Vision (IJCV), 123(1):32–73, 2017.

[180] Alex Krizhevsky.One weird trick forparallelizing convolutional neural networks.CoRR, abs/1404.5997, 2014.URLhttp://arxiv.org/abs/1404.5997.

[181] Alex Krizhevsky, Ilya Sutskever, and Ge-offrey E. Hinton.Imagenet classificationwith deep convolutional neural networks. InPeter L. Bartlett, Fernando C. N. Pereira,Christopher J. C. Burges, L ́eon Bottou,and Kilian Q. Weinberger, editors,Ad-vances in Neural Information ProcessingSystems 25:26th Annual Conference onNeuralInformationProcessingSystems2012. Proceedings of a meeting held De-cember 3-6, 2012, Lake Tahoe, Nevada,United States, pages 1106–1114, 2012. URLhttp://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.

[182] Jason Ku, Melissa Mozifian, Jungwook Lee,Ali Harakeh, and Steven Lake Waslander.Joint 3d proposal generation and objectdetection from view aggregation.CoRR,abs/1712.02294, 2017. URLhttp://arxiv.org/abs/1712.02294.67

[183] Krishna Kumar Singh, Fanyi Xiao, and YongJae Lee. Track and transfer: Watching videosto simulate strong human supervision forweakly-supervised object detection. In2016IEEE Conference on Computer Vision andPattern Recognition, CVPR 2016, Las Ve-gas,NV, USA, June 27-30, 2016, pages 3548–3556, 2016.

[184] Weicheng Kuo, Bharath Hariharan, and Ji-tendra Malik. Deepbox: Learning objectnesswith convolutional networks. InIEEE In-ternational Conference on Computer Vision,ICCV 2015, Santiago, Chile, December 7-13,2015, pages 2479–2487, 2015.

[185] John D. Lafferty, Andrew McCallum, andFernando C. N. Pereira. Conditional ran-dom fields: Probabilistic models for segment-ing and labeling sequence data.InPro-ceedings of the Eighteenth International Con-ference on Machine Learning, ICML ’01,pages 282–289, San Francisco, CA, USA,2001. URLhttp://dl.acm.org/citation.cfm?id=645530.655813.

[186] Darius Lam, Richard Kuzma, Kevin McGee,Samuel Dooley, Michael Laielli, MatthewKlaric, Yaroslav Bulatov, and Brendan Mc-Cord. xview: Objects in context in overheadimagery.CoRR, abs/1802.07856, 2018. URLhttp://arxiv.org/abs/1802.07856.

[187] ChristophH.Lampert,MatthewB.Blaschko, and Thomas Hofmann.Be-yond sliding windows: Object localizationby efficient subwindow search. In2008 IEEEComputer Society Conference on ComputerVision and Pattern Recognition (CVPR2008), 24-26 June 2008, Anchorage, Alaska,USA, 2008.

[188] Dmitry Laptev, Nikolay Savinov, Joachim M.Buhmann,and Marc Pollefeys.TI-POOLING: transformation-invariant poolingfor feature learning in convolutional neuralnetworks. In2016 IEEE Conference on Com-puter Vision and Pattern Recognition, CVPR2016, Las Vegas, NV, USA, June 27-30,2016, pages 289–297. IEEE Computer Soci-ety, 2016.

[189] Hei Law and Jia Deng. Cornernet: Detectingobjects as paired keypoints. InComputer Vi-sion – ECCV 2018 – 15th European Confer-ence, Munich, Germany, September 8 – 14,2018, 2018.

[190] Yann LeCun, L ́eon Bottou, Genevieve B.Orr, and Klaus-Robert M ̈uller.Effi-cient backprop.In Gr ́egoire Montavon,Genevieve B. Orr, and Klaus-Robert M ̈uller,editors,Neural Networks: Tricks of the Trade- Second Edition, volume 7700 ofLectureNotes in Computer Science, pages 9–48.Springer, 2012. URLhttps://doi.org/10.1007/978-3-642-35289-8_3.

[191] Byungjae Lee, Enkhbayar Erdenee, Song-Guo Jin, Mi Young Nam, Young Giu Jung,and Phill-Kyu Rhee. Multi-class multi-objecttracking using changing point detection. InGang Hua and Herv ́e J ́egou, editors,Com-puter Vision – ECCV 2016 – 14th EuropeanConference, Amsterdam, The Netherlands,October 11-14, 2016, volume 9914 ofLec-ture Notes in Computer Science, pages 68–83, 2016. URLhttps://doi.org/10.1007/978-3-319-48881-3_6.

[192] Kyoungmin Lee, Jaeseok Choi, Jisoo Jeong,and Nojun Kwak. Residual features and uni-fied prediction network for single stage de-tection.CoRR, abs/1707.05031, 2017. URLhttp://arxiv.org/abs/1707.05031.

[193] Youngwan Lee, Huieun Kim, Eunsoo Park,Xuenan Cui, and Hakil Kim. Wide-residual-inception networks for real-time object detec-tion. InIntelligent Vehicles Symposium (IV),2017 IEEE, pages 758–764, 2017.

[194] Joseph Lemley, Shabab Bazrafkan, and PeterCorcoran. Smart augmentation learning anoptimal data augmentation strategy.IEEEAccess, 5:5858–5869, 2017.68[195] Bo Li. 3D Fully Convolutional Network forVehicle Detection in Point Cloud. InIROS,2017.

[196] Bo Li, Tianfu Wu, Shuai Shao, Lun Zhang,and Rufeng Chu. Object detection via end-to-end integration of aspect ratio and con-text aware part-based models and fully con-volutional networks.CoRR, abs/1612.00534,2016. URLhttp://arxiv.org/abs/1612.00534.

[197] Bo Li, Tianlei Zhang, and Tian Xia.Vehicle detection from 3d lidar usingfully convolutional network.In DavidHsu, Nancy M. Amato, Spring Berman,and Sam Ade Jacobs, editors,Robotics:Science and Systems XII, University ofMichigan, Ann Arbor, Michigan, USA,June 18 – June 22, 2016, 2016.URLhttp://www.roboticsproceedings.org/rss12/p42.html.

[198] Haoxiang Li, Zhe Lin, Xiaohui Shen,Jonathan Brandt, and Gang Hua. A convolu-tional neural network cascade for face detec-tion. InIEEE Conference on Computer Vi-sion and Pattern Recognition, CVPR 2015,Boston, MA, USA, June 7-12, 2015, pages5325–5334, 2015.

[199] Hongyang Li, Yu Liu, Wanli Ouyang, andXiaogang Wang. Zoom out-and-in networkwith recursive training for object proposal.CoRR, abs/1702.05711, 2017. URLhttp://arxiv.org/abs/1702.05711.

[200] Jianan Li, Xiaodan Liang, ShengMei Shen,Tingfa Xu, Jiashi Feng, and Shuicheng Yan.Scale-aware fast r-cnn for pedestrian detec-tion.IEEE Transactions on Multimedia,2017.

[201] Jianan Li, Xiaodan Liang, Yunchao Wei,Tingfa Xu, Jiashi Feng, and Shuicheng Yan.Perceptual generative adversarial networksfor small object detection. In2017 IEEEConference on Computer Vision and Pat-tern Recognition, CVPR 2017, Honolulu, HI,USA, July 21-26, 2017, pages 1951–1959.IEEE Computer Society, 2017.

[202] Xiaofei Li, Fabian Flohr, Yue Yang, HuiXiong, Markus Braun, Shuyue Pan, KeqiangLi, and Dariu M Gavrila. A new benchmarkfor vision-based cyclist detection. InIntelli-gent Vehicles Symposium (IV), 2016 IEEE,pages 1028–1033, 2016.

[203] Yi Li, Haozhi Qi, Jifeng Dai, XiangyangJi, and Yichen Wei.Fully convolutionalinstance-aware semantic segmentation.In2017 IEEE Conference on Computer Visionand Pattern Recognition, CVPR 2017, Hon-olulu, HI, USA, July 21-26, 2017, pages4438–4446, 2017.doi:10.1109/CVPR.2017.472. URLhttps://doi.org/10.1109/CVPR.2017.472.

[204] Yikang Li, Wanli Ouyang, Bolei Zhou, KunWang, and Xiaogang Wang. Scene graph gen-eration from objects, phrases and caption re-gions.CoRR, abs/1707.09700, 2017. URLhttp://arxiv.org/abs/1707.09700.

[205] Yuxi Li, Jiuwei Li, Weiyao Lin, and JianguoLi. Tiny-DSOD: Lightweight Object Detec-tion for Resource-Restricted Usages. InPro-ceedings of the British Machine Vision Con-ference 2018, BMVC 2018, Newcastle, UK,September 3-6, 2018, July 2018.

[206] Zeming Li, Chao Peng, Gang Yu, XiangyuZhang, Yangdong Deng, and Jian Sun. Light-head R-CNN: in defense of two-stage objectdetector.CoRR, abs/1711.07264, 2017. URLhttp://arxiv.org/abs/1711.07264.

[207] Zeming Li, Yilun Chen, Gang Yu, and Yang-dong Deng. R-FCN++: Towards AccurateRegion-Based Fully Convolutional Networksfor Object Detection. InAAAI, page 8, 2018.

[208] Zeming Li, Chao Peng, Gang Yu, XiangyuZhang, Yangdong Deng, and Jian Sun. Det-69net: A backbone network for object detec-tion.CoRR, abs/1804.06215, 2018. URLhttp://arxiv.org/abs/1804.06215.

[209] Zhizhong Li and Derek Hoiem.Learningwithout Forgetting.IEEE Transactions onPattern Analysis and Machine Intelligence,(to appear), 2018.

[210] Zuoxin Li and Fuqiang Zhou. FSSD: featurefusion single shot multibox detector.CoRR,abs/1712.00960, 2017. URLhttp://arxiv.org/abs/1712.00960.

[211] Minghui Liao, Baoguang Shi, and Xiang Bai.Textboxes++: A single-shot oriented scenetext detector.CoRR, abs/1801.02765, 2018.URLhttp://arxiv.org/abs/1801.02765.

[212] Minghui Liao, Zhen Zhu, Baoguang Shi, Gui-Song Xia, and Xiang Bai. Rotation-sensitiveregression for oriented scene text detection.CoRR, abs/1803.05265, 2018. URLhttp://arxiv.org/abs/1803.05265.

[213] Yuan Liao, Xiaoqing Lu, Chengcui Zhang,Yongtao Wang, and Zhi Tang. Mutual En-hancement for Detection of Multiple Logos inSports Videos. InIEEE International Con-ference on Computer Vision, ICCV 2017,Venice, Italy, October 22-29, 2017, pages4856–4865, October 2017.

[214] Tsung-Yi Lin, Michael Maire, Serge Be-longie, James Hays, Pietro Perona, Deva Ra-manan, Piotr Doll ́ar, and C Lawrence Zit-nick. Microsoft coco: Common objects incontext. InComputer Vision – ECCV 2014 -13th European Conference, Zurich, Switzer-land, September 6-12, 2014, pages 740–755,2014.

[215] Tsung-Yi Lin, Piotr Doll ́ar, Ross Girshick,Kaiming He, Bharath Hariharan, and SergeBelongie. Feature pyramid networks for ob-ject detection. In2017 IEEE Conference onComputer Vision and Pattern Recognition,CVPR 2017, Honolulu, HI, USA, July 21-26,2017, volume 1, page 4, 2017.

[216] Tsung-Yi Lin, Priya Goyal, Ross B. Gir-shick, Kaiming He, and Piotr Doll ́ar. Fo-cal loss for dense object detection. InIEEEInternational Conference on Computer Vi-sion, ICCV 2017, Venice, Italy, October 22-29, 2017, pages 2999–3007. IEEE ComputerSociety, 2017.

[217] Zhe Lin, Larry S. Davis, David S. Doer-mann, and Daniel DeMenthon. Hierarchi-cal part-template matching for human detec-tion and segmentation. InIEEE 11th In-ternational Conference on Computer Vision,ICCV 2007, Rio de Janeiro, Brazil, October14-20, 2007, pages 1–8, 2007.

[218] Zhouhan Lin, Matthieu Courbariaux, RolandMemisevic, and Yoshua Bengio.Neuralnetworks with few multiplications.CoRR,abs/1510.03009, 2015. URLhttp://arxiv.org/abs/1510.03009.

[219] Kang Liu and Gellert Mattyus. Fast multi-class vehicle detection on aerial images.IEEEGeoscience and Remote Sensing Letters, 12(9):1938–1942, 2015.

[220] Li Liu, Wanli Ouyang, Xiaogang Wang, PaulFieguth, Jie Chen, Xinwang Liu, and MattiPietik ̈ainen. Deep learning for generic objectdetection: A survey.CoRR, abs/1809.02165,2018. URLhttps://arxiv.org/abs/1809.02165.

[221] Wei Liu, Dragomir Anguelov, Dumitru Er-han, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Sin-gle shot multibox detector. InComputer Vi-sion – ECCV 2016 – 14th European Confer-ence, Amsterdam, The Netherlands, October11-14, 2016, pages 21–37, 2016.

[222] Yuliang Liu and Lianwen Jin. Deep matchingprior network: Toward tighter multi-orientedtext detection. In2017 IEEE Conference onComputer Vision and Pattern Recognition,CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pages 3454–3461. IEEE ComputerSociety, 2017.70

[223] Zikun Liu, Liu Yuan, Lubin Weng, and Yip-ing Yang. A high resolution optical satelliteimage dataset for ship recognition and somenew baselines. InICPRAM, pages 324–331,2017.

[224] David G Lowe. Object recognition from localscale-invariant features. InComputer vision,1999. The proceedings of the seventh IEEEinternational conference on, volume 2, pages1150–1157, 1999.

[225] David G Lowe. Distinctive image featuresfrom scale-invariant keypoints.InternationalJournal of Computer Vision (IJCV), 60(2):91–110, 2004.

[226] Jiajun Lu, Hussein Sibai, Evan Fabry, andDavid A. Forsyth. Standard detectors aren’t(currently) fooled by physical adversarialstop signs.CoRR, abs/1710.03337, 2017.URLhttp://arxiv.org/abs/1710.03337.

[227] S. M. Lucas, A. Panaretos, L. Sosa, A. Tang,S. Wong, and R. Young. Icdar 2003 robustreading competitions. InSeventh Interna-tional Conference on Document Analysis andRecognition, 2003. Proceedings., pages 682–687, Aug 2003.

[228] Jianqi Ma, Weiyuan Shao, Hao Ye, Li Wang,Hong Wang, Yingbin Zheng, and XiangyangXue. Arbitrary-Oriented Scene Text Detec-tion via Rotation Proposals.IEEE Transac-tions on Multimedia, pages 1–1, 2018.

[229] Santiago Manen, Matthieu Guillaumin, andLuc Van Gool. Prime object proposals withrandomized prim’s algorithm. InIEEE In-ternational Conference on Computer Vision,ICCV 2013, Sydney, Australia, December 1-8, 2013, pages 2536–2543, 2013.

[230] Kevis-Kokitsi Maninis, Sergi Caelles, JordiPont-Tuset, and Luc Van Gool. Deep ex-treme cut: From extreme points to objectsegmentation. InComputer Vision and Pat-tern Recognition (CVPR), 2018 IEEE Con-ference on, pages 616–625. IEEE ComputerSociety, 2018.doi: 10.1109/CVPR.2018.00071.

[231] Jiayuan Mao, Tete Xiao, Yuning Jiang, andZhimin Cao. What can help pedestrian de-tection?In2017 IEEE Conference onComputer Vision and Pattern Recognition(CVPR), pages 6034–6043, 2017.

[232] V.Y. Mariano, Junghye Min, Jin-HyeongPark, R. Kasturi, D. Mihalcik, Huiping Li,D. Doermann, and T. Drayer. Performanceevaluation of object detection algorithms. InInternational Conference on Pattern Recog-nition (ICPR), volume 3, pages 965–969,2002.

[233] Oded Maron and Tom ́as Lozano-P ́erez. Aframework for multiple-instance learning. InMichael I. Jordan, Michael J. Kearns, andSara A. Solla, editors,Advances in NeuralInformation Processing Systems 10, [NIPSConference, Denver, Colorado, USA, 1997],pages 570–576. The MIT Press, 1997. URLhttp://papers.nips.cc/paper/1346-a-framework-for-multiple-instance-learning.

[234] Marc Masana, Joost van de Weijer, andAndrew D. Bagdanov.On-the-fly net-work pruning for object detection.CoRR,abs/1605.03477, 2016. URLhttp://arxiv.org/abs/1605.03477.

[235] Marc Masana, Joost van de Weijer, Luis Her-ranz, Andrew D. Bagdanov, and Jose M. ́Alvarez.Domain-adaptive deep networkcompression. InIEEE International Con-ference on Computer Vision, ICCV 2017,Venice, Italy, October 22-29, 2017, pages4299–4307. IEEE Computer Society, 2017.

[236] Francisco Massa, Bryan C. Russell, andMathieu Aubry. Deep Exemplar 2D-3D De-tection by Adapting from Real to RenderedViews.2016 IEEE Conference on Com-puter Vision and Pattern Recognition, CVPR712016, Las Vegas,NV, USA, June 27-30, 2016,pages 6024–6033, 2016.

[237] Ofer Matan, Henry S. Baird, Jane Bromley,Christopher J. C. Burges, John S. Denker,Lawrence D. Jackel, Yann Le Cun, Ed-win P. D. Pednault, William D Satterfield,Charles E. Stenard, et al. Reading handwrit-ten digits: A zip code recognition system.IEEE Computer, 25(7):59–63, 1992.

[238] Brianna Maze, Jocelyn Adams, James ADuncan, Nathan Kalka, Tim Miller, CharlesOtto, Anil K Jain, W Tyler Niggel, Janet An-derson, Jordan Cheney, and Patrick Grother.IARPA Janus Benchmark – C: Face Datasetand Protocol. InICB, page 8, 2018.

[239] John McCormac, Ankur Handa, StefanLeutenegger,and Andrew J. Davison.Scenenet RGB-D: can 5m synthetic imagesbeat generic imagenet pre-training on indoorsegmentation? InIEEE International Con-ference on Computer Vision, ICCV 2017,Venice, Italy, October 22-29, 2017, pages2697–2706. IEEE Computer Society, 2017.

[240] Kazuki Minemura, Hengfui Liau, AbrahamMonrroy, and Shinpei Kato. Lmnet: Real-time multiclass object detection on CPU us-ing 3d lidar.CoRR, abs/1805.04902, 2018.URLhttp://arxiv.org/abs/1805.04902.

[241] A. Mishra, S. Nandan Rai, A. Mishra, andC. V. Jawahar. IIIT-CFW: A BenchmarkDatabase of Cartoon Faces in the Wild. InVASE ECCVW, 2016.

[242] Anand Mishra,Karteek Alahari,andCV Jawahar. Scene text recognition usinghigher order language priors.InBritishMachine Vision Conference, BMVC 2012,Surrey, UK, September 3-7, 2012, 2012.

[243] I. Misra, A. Shrivastava, and M. Hebert.Watch and learn: Semi-supervised learningof object detectors from videos. InIEEEConference on Computer Vision and Pat-tern Recognition, CVPR 2015, Boston, MA,USA, June 7-12, 2015, pages 3593–3602,June 2015.

[244] Chaitanya Mitash, Kun Wang, Kostas EBekris, and Abdeslam Boularias. Physics-aware Self-supervised Training of CNNs forObject Detection.InIEEE InternationalConference on Robotics and Automation(ICRA), 2017.

[245] T M Mitchell. Never-Ending Learning.Com-mun. ACM, 61(5):103–115, 2018.[246] A. Mogelmose, M. M. Trivedi, and T. B.Moeslund.Vision-Based Traffic Sign De-tection and Analysis for Intelligent DriverAssistance Systems: Perspectives and Sur-vey.IEEE Transactions on Intelligent Trans-portation Systems, 13:1484–1497, November2012.

[247] Taylor Mordan, Nicolas Thome, MatthieuCord, and Gilles Henaff. Deformable Part-based Fully Convolutional Network for Ob-ject Detection. InProceedings of the BritishMachine Vision Conference 2017, BMVC2017, London, UK, September 4-7, 2017,2017.

[248] Taylor Mordan, Nicolas Thome, GillesHenaff, and Matthieu Cord.End-to-EndLearning of Latent Deformable Part-BasedRepresentations for Object Detection.Inter-national Journal of Computer Vision, 2018.doi: 10.1007/s11263-018-1109-z.

[249] Arsalan Mousavian, Dragomir Anguelov,John Flynn, and Jana Kosecka. 3d bound-ing box estimation using deep learning andgeometry. In2017 IEEE Conference on Com-puter Vision and Pattern Recognition, CVPR2017, Honolulu, HI, USA, July 21-26, 2017,pages 5632–5640. IEEE Computer Society,2017.

[250] Damian Mrowca, Marcus Rohrbach, JudyHoffman, Ronghang Hu, Kate Saenko, andTrevor Darrell. Spatial semantic regularisa-tion for large scale object detection. InIEEE72International Conference on Computer Vi-sion, ICCV 2015, Santiago, Chile, December7-13, 2015, pages 2003–2011, 2015.

[251] Seongkyu Mun, Sangwook Park, David KHan, and Hanseok Ko. Generative adversar-ial network based acoustic scene training setaugmentation and selection using svm hyper-plane.Proc. DCASE, pages 93–97, 2017.

[252] T Nathan Mundhenk, Goran Konjevod, We-sam A Sakla, and Kofi Boakye. A large con-textual dataset for classification, detectionand counting of cars with deep learning. InComputer Vision – ECCV 2016 – 14th Eu-ropean Conference, Amsterdam, The Nether-lands, October 11-14, 2016, pages 785–800,2016.

[253] Hajime Nada,Vishwanath A. Sindagi,He Zhang, and Vishal M. Patel.Push-ing the limits of unconstrained face detec-tion: a challenge dataset and baseline re-sults.CoRR, abs/1804.10275, 2018. URLhttp://arxiv.org/abs/1804.10275.

[254] Mahyar Najibi, Mohammad Rastegari, andLarry S. Davis. G-CNN: An Iterative GridBased Object Detector. In2016 IEEE Con-ference on Computer Vision and PatternRecognition, CVPR 2016, Las Vegas,NV,USA, June 27-30, 2016, 2016.

[255] Mahyar Najibi, Pouya Samangouei, RamaChellappa, and Larry Davis. SSH: SingleStage Headless Face Detector. InIEEE In-ternational Conference on Computer Vision,ICCV 2017, Venice, Italy, October 22-29,2017, 2017.

[256] Alejandro Newell, Kaiyu Yang, and Jia Deng.Stacked hourglass networks for human poseestimation. InComputer Vision – ECCV2016 – 14th European Conference, Amster-dam, The Netherlands, October 11-14, 2016,pages 483–499, 2016.

[257] Mathias Niepert, Mohamed Ahmed, andKonstantin Kutzkov. Learning convolutionalneural networks for graphs. InInternationalconference on machine learning, pages 2014–2023, 2016.

[258] Steven J Nowlan and John C Platt. A convo-lutional neural network hand tracker. InAd-vances in Neural Information Processing Sys-tems 8, NIPS, Denver, CO, USA, November27-30, 1995, pages 901–908, 1995.

[259] Jean Ogier Du Terrail and Fr ́ed ́eric Ju-rie. ON THE USE OF DEEP NEURALNETWORKS FOR THE DETECTION OFSMALL VEHICLES IN ORTHO-IMAGES.InIEEE International Conference on Im-age Processing, Beijing, China, Septem-ber 2017.URLhttps://hal.archives-ouvertes.fr/hal-01527906.

[260] Kemal Oksuz, Baris Can Cam, Emre Ak-bas, and Sinan Kalkan. Localization RecallPrecision (LRP): A New Performance Metricfor Object Detection. InComputer Vision- ECCV 2018 – 15th European Conference,Munich, Germany, September 8 – 14, 2018,July 2018.

[261] M. Oquab, L. Bottou, I. Laptev, and J. Sivic.Weakly supervised object recognition withconvolutional neural networks. InAdvancesin Neural Information Processing Systems27: Annual Conference on Neural Informa-tion Processing Systems 2014, December 8-132014, Montreal, Quebec, Canada, 2014.

[262] Maxime Oquab, L ́eon Bottou, Ivan Laptev,and Josef Sivic. Is object localization forfree? – weakly-supervised learning with con-volutional neural networks. InIEEE Confer-ence on Computer Vision and Pattern Recog-nition, CVPR 2015, Boston, MA, USA, June7-12, 2015, pages 685–694, 2015.

[263] Margarita Osadchy, Yann Le Cun, andMatthew L Miller. Synergistic face detectionand pose estimation with energy-based mod-els.Journal of Machine Learning Research,8(May):1197–1215, 2007.73

[264] W. Ouyang, X. Wang, and C. Zhang. Fac-tors in finetuning deep model for object de-tection with long-tail distribution. In2016IEEE Conference on Computer Vision andPattern Recognition, CVPR 2016, Las Ve-gas,NV, USA, June 27-30, 2016, 2016.

[265] Wanli Ouyang and Xiaogang Wang. Jointdeep learning for pedestrian detection. InIEEE International Conference on ComputerVision, ICCV 2013, Sydney, Australia, De-cember 1-8, 2013, 2013.

[266] Wanli Ouyang and Xiaogang Wang. Single-pedestriandetectionaidedbymulti-pedestrian detection.In2013 IEEEConference on Computer Vision and PatternRecognition,Portland,OR, USA, June23-28, 2013, pages 3198–3205, 2013.

[267] Wanli Ouyang, Xiaogang Wang, XingyuZeng, Shi Qiu, Ping Luo, Yonglong Tian,Hongsheng Li, Shuo Yang, Zhe Wang, Chen-Change Loy, and Xiaoou Tang.DeepID-Net: Deformable deep convolutional neuralnetworks for object detection. InAdvancesin Neural Information Processing Systems28: Annual Conference on Neural Informa-tion Processing Systems 2015, December 7-12, 2015, Montreal, Quebec, Canada, 2015.

[268] Wanli Ouyang, Ku Wang, Xin Zhu, and Xi-aogang Wang. Learning chained deep fea-tures and classifiers for cascade in object de-tection.CoRR, abs/1702.07054, 2017. URLhttp://arxiv.org/abs/1702.07054.

[269] Xi Ouyang, Yu Cheng, Yifan Jiang, Chun-Liang Li, and Pan Zhou.Pedestrian-synthesis-gan: Generating pedestrian datain real scene and beyond.CoRR,abs/1804.02047, 2018. URLhttp://arxiv.org/abs/1804.02047.

[270] Dim P. Papadopoulos, Jasper R. R. Uijlings,Frank Keller, and Vittorio Ferrari. We don’tneed no bounding-boxes: Training objectclass detectors using only human verifica-tion. In2016 IEEE Conference on Com-puter Vision and Pattern Recognition, CVPR2016, Las Vegas,NV, USA, June 27-30, 2016,February 2016.

[271] Dim P. Papadopoulos, Jasper R. R. Uijlings,Frank Keller, and Vittorio Ferrari. Trainingobject class detectors with click supervision.In2017 IEEE Conference on Computer Vi-sion and Pattern Recognition, CVPR 2017,Honolulu, HI, USA, July 21-26, 2017, pages180–189. IEEE Computer Society, 2017.

[272] Constantine Papageorgiou and Tomaso Pog-gio. A trainable system for object detec-tion.International Journal of Computer Vi-sion (IJCV), 38(1):15–33, 2000.

[273] Bo Peng, Wenming Tan, Zheyang Li, ShunZhang, Di Xie, and Shiliang Pu. Extremenetwork compression via filter group approx-imation.CoRR, abs/1807.11254, 2018. URLhttp://arxiv.org/abs/1807.11254.

[274] Chao Peng, Tete Xiao, Zeming Li, YuningJiang, Xiangyu Zhang, Kai Jia, Gang Yu,and Jian Sun. Megdet: A large mini-batchobject detector.CoRR, abs/1711.07240,2017. URLhttp://arxiv.org/abs/1711.07240.

[275] Chao Peng, Xiangyu Zhang, Gang Yu, Guim-ing Luo, and Jian Sun. Large kernel mat-ters???improve semantic segmentation byglobal convolutional network. In2017 IEEEConference on Computer Vision and Pat-tern Recognition, CVPR 2017, Honolulu, HI,USA, July 21-26, 2017, pages 1743–1751,2017.

[276] Xingchao Peng and Kate Saenko. Syntheticto real adaptation with generative correlationalignment networks. In2018 IEEE WinterConference on Applications of Computer Vi-sion, WACV 2018, Lake Tahoe, NV, USA,March 12-15, 2018, pages 1982–1991. IEEEComputer Society, 2018.74

[277] Xingchao Peng, Baochen Sun, Karim Ali0002, and Kate Saenko. Learning Deep Ob-ject Detectors from 3D Models. InIEEE In-ternational Conference on Computer Vision,ICCV 2015, Santiago, Chile, December 7-13,2015, 2015.

[278] Alex Pentland, Baback Moghaddam, andThad Starner.View-based and modulareigenspaces for face recognition. InConfer-ence on Computer Vision and Pattern Recog-nition, CVPR 1994, 21-23 June, 1994, Seat-tle, WA, USA, pages 84–91, 1994.

[279] Bojan Pepik, Rodrigo Benenson, TobiasRitschel, and Bernt Schiele. What is hold-ing back convnets for detection? InGermanConference on Pattern Recognition, pages517–528, 2015.

[280] Luis Perez and Jason Wang.The ef-fectiveness of data augmentation in imageclassification using deep learning.CoRR,abs/1712.04621, 2017. URLhttp://arxiv.org/abs/1712.04621.[281] Phuoc Pham, Duy Nguyen, Tien Do,Thanh Duc Ngo, and Duy-Dinh Le. Eval-uation of Deep Models for Real-Time SmallObject Detection.ICONIP, 10636:516–526,2017.

[282] Pedro H. O. Pinheiro, Ronan Collobert,and Piotr Doll ́ar. Learning to segment ob-ject candidates. In Corinna Cortes, Neil D.Lawrence, Daniel D. Lee, Masashi Sugiyama,and Roman Garnett, editors,Advances inNeural Information Processing Systems 28:Annual Conference on Neural Informa-tion Processing Systems 2015, December7-12, 2015, Montreal, Quebec, Canada,pages 1990–1998, 2015.URLhttp://papers.nips.cc/paper/5852-learning-to-segment-object-candidates.

[283] Pedro O. Pinheiro and Ronan Collobert.From Image-level to Pixel-level Labeling withConvolutional Networks. InIEEE Confer-ence on Computer Vision and Pattern Recog-nition, CVPR 2015, Boston, MA, USA, June7-12, 2015, 2015.

[284] Pedro O Pinheiro, Tsung-Yi Lin, Ronan Col-lobert, and Piotr Doll ́ar. Learning to re-fine object segments. InComputer Vision- ECCV 2016 – 14th European Conference,Amsterdam, The Netherlands, October 11-14, 2016, pages 75–91, 2016.

[285] Alex D. Pon, Oles Andrienko, Ali Harakeh,and Steven L. Waslander. A HierarchicalDeep Architecture and Mini-Batch SelectionMethod For Joint Traffic Sign and Light De-tection. InIEEE Conference on Computerand Robot Vision, June 2018.

[286] JordiPont-Tuset,PabloArbelaez,Jonathan T Barron, Ferran Marques, andJitendra Malik.Multiscale combinatorialgrouping for image segmentation and objectproposal generation.IEEE Transactions onPattern Analysis and Machine Intelligence,39(1):128–140, 2017.

[287] Fatih Murat Porikli. Integral histogram: Afast way to extract histograms in cartesianspaces.In2005 IEEE Computer SocietyConference on Computer Vision and PatternRecognition (CVPR 2005), 20-26 June 2005,San Diego, CA, USA, pages 829–836, 2005.

[288] Charles R. Qi, Hao Su, Kaichun Mo, andLeonidas J. Guibas. Pointnet: Deep learningon point sets for 3d classification and segmen-tation. In2017 IEEE Conference on Com-puter Vision and Pattern Recognition, CVPR2017, Honolulu, HI, USA, July 21-26, 2017,July 2017.

[289] Charles Ruizhongtai Qi, Wei Liu, ChenxiaWu, Hao Su, and Leonidas J. Guibas. Frus-tum pointnets for 3d object detection fromRGB-D data.CoRR, abs/1711.08488, 2017.

[290] Charles Ruizhongtai Qi, Li Yi, Hao Su, andLeonidas J. Guibas.Pointnet++: Deephierarchical feature learning on point sets ina metric space. In Isabelle Guyon, Ulrike vonLuxburg, Samy Bengio, Hanna M. Wallach,Rob Fergus, S. V. N. Vishwanathan, andRoman Garnett, editors,Advances in NeuralInformation Processing Systems 30: AnnualConference on Neural Information Process-ing Systems 2017, 4-9 December 2017, LongBeach, CA, USA, pages 5105–5114, 2017.URLhttp://papers.nips.cc/paper/7095-pointnet-deep-hierarchical-feature-learning-on-point-sets-in-a-metric-space.

[291] Weichao Qiu and Alan L. Yuille. Unrealcv:Connecting computer vision to unreal en-gine. In Gang Hua and Herv ́e J ́egou, editors,Computer Vision – ECCV 2016 – 14th Eu-ropean Conference, Amsterdam, The Nether-lands, October 11-14, 2016, volume 9915 ofLecture Notes in Computer Science, pages909–916, 2016. URLhttps://doi.org/10.1007/978-3-319-49409-8_75.

[292] Shafin Rahman, Salman Hameed Khan,and Fatih Porikli.Zero-shot object de-tection: Learning to simultaneously recog-nize and localize novel concepts.CoRR,abs/1803.06049, 2018. URLhttp://arxiv.org/abs/1803.06049.

[293] Esa Rahtu, Juho Kannala, and MatthewBlaschko. Learning a category independentobject detection cascade.InIEEE Inter-national Conference on Computer Vision,ICCV 2011, Barcelona, Spain, November 6-13, 2011, pages 1052–1059, 2011.

[294] Anant Raj, Vinay P. Namboodiri, and TinneTuytelaars. Subspace Alignment Based Do-main Adaptation for RCNN Detector. InProceedings of the British Machine VisionConference 2015, BMVC 2015, Swansea,UK, September 7-10, 2015, pages 166.1–166.11, Swansea, 2015.

[295] Rakesh N. Rajaram, Eshed Ohn-Bar, andMohan M. Trivedi.RefineNet: Iterativerefinement for accurate object localization.InIEEE 19th International Conference onIntelligent Transportation Systems (ITSC),pages 1528–1533, November 2016.

[296] Param S. Rajpura, Ravi S. Hegde, andHristo Bojinov. Object detection using deepcnns trained on synthetic images.CoRR,abs/1706.06782, 2017. URLhttp://arxiv.org/abs/1706.06782.

[297] Rajeev Ranjan, Vishal M. Patel, and RamaChellappa. A deep pyramid deformable partmodel for face detection. InIEEE 7th In-ternational Conference on Biometrics The-ory, Applications and Systems, BTAS 2015,Arlington, VA, USA, September 8-11, 2015,pages 1–8. IEEE, 2015.

[298] Pekka Rantalankila, Juho Kannala, and EsaRahtu. Generating object segmentation pro-posals using global and local search. In2014IEEE Conference on Computer Vision andPattern Recognition, CVPR 2014, Columbus,OH, USA, June 23-28, 2014, pages 2417–2424, 2014.

[299] Mohammad Rastegari, Vicente Ordonez,Joseph Redmon, and Ali Farhadi. Xnor-net:Imagenet classification using binary convo-lutional neural networks. InComputer Vi-sion – ECCV 2016 – 14th European Confer-ence, Amsterdam, The Netherlands, October11-14, 2016, pages 525–542, 2016.

[300] Alexander J Ratner, Henry Ehrenberg, Ze-shan Hussain, Jared Dunnmon, and Christo-pher R ́e.Learning to compose domain-specific transformations for data augmenta-tion.InAdvances in Neural InformationProcessing Systems 30: Annual Conferenceon Neural Information Processing Systems2017, 4-9 December 2017, Long Beach, CA,USA, pages 3236–3246, 2017.76

[301] Kumar S. Ray, Vijayan K. Asari, and SomaChakraborty. Object detection by spatio-temporal analysis and tracking of the de-tected objects in a video with variable back-ground.CoRR, abs/1705.02949, 2017. URLhttp://arxiv.org/abs/1705.02949.

[302] S ́ebastien Razakarivony and Fr ́ed ́eric Jurie.Vehicle detection in aerial imagery: A smalltarget detection benchmark.Journal of Vi-sual Communication and Image Representa-tion, 34:187–203, 2016.

[303] Esteban Real, Jonathon Shlens, StefanoMazzocchi, Xin Pan, and Vincent Van-houcke. Youtube-boundingboxes: A largehigh-precision human-annotated data set forobject detection in video.In2017 IEEEConference on Computer Vision and Pat-tern Recognition, CVPR 2017, Honolulu, HI,USA, July 21-26, 2017, pages 7464–7473.IEEE Computer Society, 2017.

[304] Sashank J Reddi, Satyen Kale, and SanjivKumar. On the convergence of adam and be-yond. InInternational Conference on Learn-ing Representations (ICLR), 2018.

[305] Joseph Redmon and Anelia Angelova. Real-time grasp detection using convolutional neu-ral networks. InIEEE International Confer-ence on Robotics and Automation (ICRA),2015.

[306] JosephRedmonandAliFarhadi.YOLO9000: better, faster, stronger.In2017 IEEE Conference on Computer Visionand Pattern Recognition,CVPR 2017,Honolulu, HI, USA, July 21-26, 2017, pages6517–6525. IEEE Computer Society, 2017.

[307] Joseph Redmon and Ali Farhadi. Yolov3:An incremental improvement.CoRR,abs/1804.02767, 2018. URLhttp://arxiv.org/abs/1804.02767.

[308] Joseph Redmon, Santosh Divvala, Ross Gir-shick, and Ali Farhadi. You only look once:Unified, real-time object detection. In2016IEEE Conference on Computer Vision andPattern Recognition, CVPR 2016, Las Ve-gas,NV, USA, June 27-30, 2016, pages 779–788, 2016.

[309] Shaoqing Ren, Kaiming He, Ross Girshick,and Jian Sun. Faster r-cnn: Towards real-time object detection with region proposalnetworks. InAdvances in Neural Informa-tion Processing Systems 28: Annual Confer-ence on Neural Information Processing Sys-tems 2015, December 7-12, 2015, Montreal,Quebec, Canada, pages 91–99, 2015.

[310] Shaoqing Ren, Kaiming He, Ross B. Gir-shick, Xiangyu Zhang, and Jian Sun. Ob-ject detection networks on convolutional fea-ture maps.IEEE Transactions on PatternAnalysis and Machine Intelligence, 39(7):1476–1481, 2017. URLhttps://doi.org/10.1109/TPAMI.2016.2601099.

[311] M. Rochan and Yang Wang. Weakly super-vised localization of novel objects using ap-pearance transfer. InIEEE Conference onComputer Vision and Pattern Recognition,CVPR 2015, Boston, MA, USA, June 7-12,2015, 2015.

[312] Mikel Rodriguez, Ivan Laptev, Josef Sivic,and Jean-Yves Audibert.Density-awareperson detection and tracking in crowds.InIEEE International Conference on Com-puter Vision, ICCV 2011, Barcelona, Spain,November 6-13, 2011, pages 2423–2430,2011.

[313] Stefan Romberg, Lluis Garcia Pueyo, RainerLienhart, and Roelof Van Zwol. Scalable logorecognition in real-world images. InProceed-ings of the 1st ACM International Confer-ence on Multimedia Retrieval, page 25, 2011.

[314] Amir Rosenfeld, Richard Zemel, and John K.Tsotsos. The elephant in the room.CoRR,abs/1808.03305, 2018. URLhttp://arxiv.org/abs/1808.03305.77

[315] Rasmus Rothe, Matthieu Guillaumin, andLuc Van Gool. Non-maximum suppressionfor object detection by passing messages be-tween windows. InComputer Vision – ACCV2014 – 12th Asian Conference on ComputerVision, Singapore, Singapore, November 1-5,2014, pages 290–306, 2014.

[316] Soumya Roy, Vinay P. Namboodiri, andArijit Biswas.Active learning with ver-sion spaces for object detection.CoRR,abs/1611.07285, 2016. URLhttp://arxiv.org/abs/1611.07285.

[317] Sitapa Rujikietgumjorn and Robert TCollins. Optimized pedestrian detection formultiple and occluded people. In2013 IEEEConference on Computer Vision and PatternRecognition, Portland, OR, USA, June 23-28, 2013, pages 3690–3697, 2013.

[318] David E Rumelhart, Geoffrey E Hinton, andRonald J Williams. Learning internal rep-resentations by error propagation. Technicalreport, California Univ San Diego La JollaInst for Cognitive Science, 1985.

[319] Olga Russakovsky, Jia Deng, Hao Su,Jonathan Krause, Sanjeev Satheesh, SeanMa, Zhiheng Huang, Andrej Karpathy,Aditya Khosla, Michael Bernstein, Alexan-der C. Berg, and Li Fei-Fei. ImageNet LargeScale Visual Recognition Challenge.Interna-tional Journal of Computer Vision (IJCV),115(3):211–252, 2015.

[320] Payam Sabzmeydani and Greg Mori. Detect-ing pedestrians by learning shapelet features.In2007 IEEE Computer Society Conferenceon Computer Vision and Pattern Recognition(CVPR 2007), 18-23 June 2007, Minneapo-lis, Minnesota, USA, 2007.

[321] Mohammad Amin Sadeghi and Ali Farhadi.Recognition using visual phrases. InThe 24thIEEE Conference on Computer Vision andPattern Recognition, CVPR 2011, ColoradoSprings, CO, USA, 20-25 June 2011, pages1745–1752, 2011.

[322] Mohammad Amin Sadeghi and David A.Forsyth. 30hz object detection with DPMV5.In David J. Fleet, Tom ́as Pajdla,Bernt Schiele, and Tinne Tuytelaars, edi-tors,Computer Vision – ECCV 2014 – 13thEuropean Conference, Zurich, Switzerland,September 6-12, 2014, volume 8689 ofLec-ture Notes in Computer Science, pages 65–79. Springer, 2014. URLhttps://doi.org/10.1007/978-3-319-10590-1_5.

[323] Wesam A. Sakla, Goran Konjevod, andT. Nathan Mundhenk. Deep multi-modal ve-hicle detection in aerial ISR imagery. In2017IEEE Winter Conference on Applications ofComputer Vision, WACV 2017, Santa Rosa,CA, USA, March 24-31, 2017, pages 916–923. IEEE, 2017.

[324] Mark Sandler, Andrew Howard, MenglongZhu, Andrey Zhmoginov, and Liang-ChiehChen. Mobilenetv2: Inverted residuals andlinear bottlenecks. InComputer Vision andPattern Recognition (CVPR), 2018 IEEEConference on, pages 4510–4520, 2018.

[325] P. A. Savalle and S. Tsogkas. Deformablepart models with cnn features. InSAICSITConf., 2014.[326] Henry Schneiderman and Takeo Kanade. Ob-ject detection using the statistics of parts.International Journal of Computer Vision(IJCV), 56(3):151–177, 2004.

[327] Pierre Sermanet, David Eigen, Xiang Zhang,Micha ̈el Mathieu, Rob Fergus, and Yann Le-Cun. Overfeat: Integrated recognition, lo-calization and detection using convolutionalnetworks.CoRR, abs/1312.6229, 2013. URLhttp://arxiv.org/abs/1312.6229.

[328] PierreSermanet,KorayKavukcuoglu,SoumithChintala,andYannLeCun.Pedestrian detection with unsupervised78multi-stage feature learning. In2013 IEEEConference on Computer Vision and PatternRecognition,Portland,OR, USA, June23-28, 2013, pages 3626–3633, 2013.

[329] Mohammad Javad Shafiee, Brendan Chywl,Francis Li, and Alexander Wong.FastYOLO: A fast you only look once systemfor real-time embedded object detection invideo.CoRR, abs/1709.05943, 2017. URLhttp://arxiv.org/abs/1709.05943.

[330] Yunhan Shen, Rongrong Ji, ShengchuanZhang, Wangmeng Zuo, and Yan Wang.Generative adversarial learning towards fastweakly supervised detection. InComputerVision and Pattern Recognition (CVPR),2018 IEEE Conference on, June 2018.

[331] Zhiqiang Shen, Zhuang Liu, Jianguo Li, Yu-Gang Jiang, Yurong Chen, and XiangyangXue. Dsod: Learning deeply supervised ob-ject detectors from scratch. InIEEE In-ternational Conference on Computer Vision,ICCV 2017, Venice, Italy, October 22-29,2017, volume 3, page 7, 2017.

[332] ZhiqiangShen,HonghuiShi,Rog ́erio Schmidt Feris, Liangliang Cao,Shuicheng Yan, Ding Liu, Xinchao Wang,Xiangyang Xue, and Thomas S. Huang.Learning object detectors from scratchwith gated recurrent feature pyramids.CoRR, abs/1712.00886,2017.URLhttp://arxiv.org/abs/1712.00886.

[333] Baoguang Shi, Xiang Bai, and Serge J. Be-longie. Detecting oriented text in naturalimages by linking segments. In2017 IEEEConference on Computer Vision and Pat-tern Recognition, CVPR 2017, Honolulu, HI,USA, July 21-26, 2017, pages 3482–3490.IEEE Computer Society, 2017.

[334] Baoguang Shi, Cong Yao, Minghui Liao,Mingkun Yang, Pei Xu, Linyan Cui, Serge J.Belongie, Shijian Lu, and Xiang Bai. IC-DAR2017 competition on reading chinesetext in the wild (RCTW-17).CoRR,abs/1708.09585, 2017. URLhttp://arxiv.org/abs/1708.09585.

[335] Xuepeng Shi, Shiguang Shan, Meina Kan,Shuzhe Wu, and Xilin Chen.Real-timerotation-invariant face detection with pro-gressive calibration networks. InComputerVision and Pattern Recognition (CVPR),2018 IEEE Conference on, June 2018.

[336] Konstantin Shmelkov, Cordelia Schmid, andKarteek Alahari. Incremental learning of ob-ject detectors without catastrophic forget-ting. InIEEE International Conference onComputer Vision, ICCV 2017, Venice, Italy,October 22-29, 2017, pages 3420–3429, 2017.

[337] Abhinav Shrivastava, Abhinav Gupta, andRoss Girshick. Training region-based objectdetectors with online hard example mining.In2016 IEEE Conference on Computer Vi-sion and Pattern Recognition, CVPR 2016,Las Vegas,NV, USA, June 27-30, 2016, pages761–769, 2016.

[338] Abhinav Shrivastava, Rahul Sukthankar, Ji-tendra Malik, and Abhinav Gupta. Beyondskip connections: Top-down modulation forobject detection.CoRR, abs/1612.06851,2016. URLhttp://arxiv.org/abs/1612.06851.

[339] Ashish Shrivastava, Tomas Pfister, OncelTuzel, Joshua Susskind, Wenda Wang, andRussell Webb.Learning from Simulatedand Unsupervised Images through Adversar-ial Training.2017 IEEE Conference on Com-puter Vision and Pattern Recognition, CVPR2017, Honolulu, HI, USA, July 21-26, 2017,pages 2242–2251, 2017.

[340] Shai Silberstein, Dan Levi, Victoria Kogan,and Ran Gazit. Vision-based pedestrian de-tection for rear-view cameras.InIntelli-gent Vehicles Symposium Proceedings, 2014IEEE, pages 853–860, 2014.79

[341] Daniel L Silver, Qiang Yang, and LianghaoLi. Lifelong Machine Learning Systems: Be-yond Learning Algorithms. In2013 AAAISpring Symposium, page 7, 2013.

[342] Martin Simon, Stefan Milz, Karl Amende,and Horst-Michael Gross.Complex-yolo:Real-time 3d object detection on pointclouds.CoRR, abs/1803.06199, 2018. URLhttp://arxiv.org/abs/1803.06199.

[343] KarenSimonyanandAndrewZisser-man.Very deep convolutional net-works for large-scale image recognition.CoRR,abs/1409.1556,2014.URLhttp://arxiv.org/abs/1409.1556.

[344] Karen Simonyan, Andrea Vedaldi, and An-drew Zisserman.Deep inside convolu-tional networks: Visualising image classifi-cation models and saliency maps.CoRR,abs/1312.6034, 2013. URLhttp://arxiv.org/abs/1312.6034.

[345] Bharat Singh and Larry S Davis. An analysisof scale invariance in object detection-snip.In2017 IEEE Conference on Computer Vi-sion and Pattern Recognition, CVPR 2017,Honolulu, HI, USA, July 21-26, 2017, 2018.

[346] Bharat Singh,Hengduo Li,AbhishekSharma, and Larry S. Davis. R-FCN-3000at 30fps: Decoupling detection and classifi-cation.CoRR, abs/1712.01802, 2017. URLhttp://arxiv.org/abs/1712.01802.

[347] Bharat Singh, Mahyar Najibi, and Larry S.Davis. SNIPER: efficient multi-scale training.CoRR, abs/1805.09300, 2018. URLhttp://arxiv.org/abs/1805.09300.

[348] Leon Sixt, Benjamin Wild, and Tim Land-graf. Rendergan: Generating realistic labeleddata.Front. Robotics and AI, 2018, 2018.

[349] Arnold W M Smeulders, Amarnath Gupta,and Ramesh Jain. Content-Based Image Re-trieval at the End of the Early Years.IEEETransactions on Pattern Analysis and Ma-chine Intelligence, 22(12):32, 2000.

[350] Lars W. Sommer, Tobias Schuchert, Jur-gen Beyerer, Firooz A. Sadjadi, and Abhi-jit Mahalanobis. Deep learning based multi-category object detection in aerial images. InSPIE Defense+ Security, May 2017.

[351] Lars Wilko Sommer, Tobias Schuchert, andJ ̈urgen Beyerer. Fast deep vehicle detectionin aerial images. In2017 IEEE Winter Con-ference on Applications of Computer Vision,WACV 2017, Santa Rosa, CA, USA, March24-31, 2017, pages 311–319. IEEE, 2017.

[352] Lars Wilko Sommer, Arne Schumann, TobiasSchuchert, and J ̈urgen Beyerer. Multi fea-ture deconvolutional faster R-CNN for pre-cise vehicle detection in aerial imagery. In2018 IEEE Winter Conference on Applica-tions of Computer Vision, WACV 2018, LakeTahoe, NV, USA, March 12-15, 2018, pages635–642. IEEE Computer Society, 2018.

[353] Hyun Oh Song, Ross B. Girshick, Ste-fanie Jegelka, Julien Mairal, Za ̈ıd Har-chaoui, and Trevor Darrell.On learningto localize objects with minimal supervi-sion.InProceedings of the 31th Inter-national Conference on Machine Learning,ICML 2014, Beijing, China, 21-26 June2014, volume 32 ofJMLR Workshop andConference Proceedings, pages 1611–1619.JMLR.org, 2014. URLhttp://jmlr.org/proceedings/papers/v32/songb14.html.

[354] Hyun Oh Song, Yong Jae Lee, StefanieJegelka, and Trevor Darrell.Weakly-supervised discovery of visual pattern con-figurations. InAdvances in Neural Informa-tion Processing Systems 27: Annual Confer-ence on Neural Information Processing Sys-tems 2014, December 8-13 2014, Montreal,Quebec, Canada, pages 1637–1645, 2014.

[355] Jost Tobias Springenberg, Alexey Dosovit-skiy, Thomas Brox, and Martin A. Ried-80miller. Striving for simplicity: The all con-volutional net.CoRR, abs/1412.6806, 2014.URLhttp://arxiv.org/abs/1412.6806.

[356] Siddharth Srivastava, Gaurav Sharma, andBrejesh Lall. Large scale novel object discov-ery in 3d. In2018 IEEE Winter Conferenceon Applications of Computer Vision, WACV2018, Lake Tahoe, NV, USA, March 12-15,2018, pages 179–188. IEEE Computer Soci-ety, 2018.

[357] Russell Stewart, Mykhaylo Andriluka, andAndrew Y Ng. End-to-end people detectionin crowded scenes. In2016 IEEE Conferenceon Computer Vision and Pattern Recogni-tion, CVPR 2016, Las Vegas,NV, USA, June27-30, 2016, pages 2325–2333, 2016.

[358] Hang Su, Shaogang Gong, and Xiatian Zhu.WebLogo-2M: Scalable Logo Detection byDeep Learning from the Web.InICCBWorkshops, pages 270–279, October 2017.

[359] Hang Su, Xiatian Zhu, and Shaogang Gong.Deep Learning Logo Detection with Data Ex-pansion by Synthesising Context.IEEE Win-ter Conf. on Applications of Computer Vi-sion (WACV), pages 530–539, 2017.

[360] Hang Su, Xiatian Zhu, and Shaogang Gong.Open Logo Detection Challenge.InPro-ceedings of the British Machine Vision Con-ference 2018, BMVC 2018, Newcastle, UK,September 3-6, 2018, 2018.

[361] Baochen Sun and Kate Saenko. From vir-tual to reality: Fast adaptation of virtual ob-ject detectors to real domains. InBritishMachine Vision Conference, BMVC 2014,Nottingham, UK, September 1-5, 2014, vol-ume 1, page 3, 2014.

[362] Chen Sun, Manohar Paluri, Ronan Col-lobert, Ram Nevatia, and Lubomir Bourdev.ProNet: Learning to Propose Object-SpecificBoxes for Cascaded Neural Networks. In2016IEEE Conference on Computer Vision andPattern Recognition, CVPR 2016, Las Ve-gas,NV, USA, June 27-30, 2016, 2016.

[363] Christian Szegedy, Scott E. Reed, DumitruErhan, and Dragomir Anguelov.Scal-able, high-quality object detection.CoRR,abs/1412.1441, 2014. URLhttp://arxiv.org/abs/1412.1441.

[364] Christian Szegedy, Wei Liu, Yangqing Jia,Pierre Sermanet, Scott Reed, DragomirAnguelov, Dumitru Erhan, Vincent Van-houcke, Andrew Rabinovich, et al. Goingdeeper with convolutions. InIEEE Confer-ence on Computer Vision and Pattern Recog-nition, CVPR 2015, Boston, MA, USA, June7-12, 2015, pages 1–9, 2015.

[365] Christian Szegedy,Vincent Vanhoucke,Sergey Ioffe, Jonathon Shlens, and ZbigniewWojna. Rethinking the inception architec-ture for computer vision.In2016 IEEEConference on Computer Vision and PatternRecognition, CVPR 2016, Las Vegas, NV,USA, June 27-30, 2016, pages 2818–2826.IEEE Computer Society, 2016.

[366] Christian Szegedy, Sergey Ioffe, Vincent Van-houcke, and Alexander A Alemi. Inception-v4, inception-resnet and the impact of resid-ual connections on learning. InAAAI, vol-ume 4, page 12, 2017.

[367] Mingxing Tan, Bo Chen, Ruoming Pang, Vi-jay Vasudevan, and Quoc V. Le. Mnasnet:Platform-aware neural architecture search formobile.CoRR, abs/1807.11626, 2018. URLhttp://arxiv.org/abs/1807.11626.

[368] Kevin D. Tang, Vignesh Ramanathan, Fei-Fei Li, and Daphne Koller. Shifting Weights:Adapting Object Detectors from Image toVideo. InAdvances in Neural InformationProcessing Systems 25: 26th Annual Confer-ence on Neural Information Processing Sys-tems 2012. Proceedings of a meeting heldDecember 3-6, 2012, Lake Tahoe, Nevada,United States, 2012.81

[369] Peng Tang, Xinggang Wang, Xiang Bai, andWenyu Liu. Multiple instance detection net-work with online instance classifier refine-ment. In2017 IEEE Conference on Com-puter Vision and Pattern Recognition, CVPR2017, Honolulu, HI, USA, July 21-26, 2017,2017.

[370] Siyu Tang, Mykhaylo Andriluka, and BerntSchiele. Detection and tracking of occludedpeople.International Journal of ComputerVision (IJCV), 110(1):58–69, 2014.

[371] Siyu Tang, Bjoern Andres, Miykhaylo An-driluka, and Bernt Schiele. Subgraph de-composition for multi-target tracking.InIEEE Conference on Computer Vision andPattern Recognition, CVPR 2015, Boston,MA, USA, June 7-12, 2015, pages 5033–5041, 2015.

[372] Tianyu Tang, Shilin Zhou, Zhipeng Deng,Lin Lei, and Huanxin Zou.Arbitrary-Oriented Vehicle Detection in Aerial Imagerywith Single Convolutional Neural Networks.Remote Sensing, 9:1170–17, November 2017.

[373] Tianyu Tang, Shilin Zhou, Zhipeng Deng,Huanxin Zou, and Lin Lei. Vehicle Detectionin Aerial Images Based on Region Convolu-tional Neural Networks and Hard NegativeExample Mining.Sensors, 17:336–17, Febru-ary 2017.

[374] Y. Tang, J. K. Wang, B. Gao, and E. Del-landr ́ea. Large Scale Semi-supervised ObjectDetection using Visual and Semantic Knowl-edge Transfer. In2016 IEEE Conference onComputer Vision and Pattern Recognition,CVPR 2016, Las Vegas,NV, USA, June 27-30, 2016, 2016.

[375] Franklin Tanner, Brian Colder, Craig Pullen,David Heagy, Michael Eppolito, VeronicaCarlan, Carsten Oertel, and Phil Sallee.Overhead imagery research data set???an an-notated data library & tools to aid in the de-velopment of computer vision algorithms. In2009 IEEE Applied Imagery Pattern Recog-nition Workshop (AIPR 2009), pages 1–8,2009.

[376] Luke Taylor and Geoff Nitschke. Improvingdeep learning using generic data augmenta-tion.CoRR, abs/1708.06020, 2017. URLhttp://arxiv.org/abs/1708.06020.

[377] Yonglin Tian, Xuan Li, Kunfeng Wang, andFei-Yue Wang.Training and testing ob-ject detectors with virtual images.CoRR,abs/1712.08470, 2017. URLhttp://arxiv.org/abs/1712.08470.

[378] Tijmen Tieleman and Geoffrey Hinton. Lec-ture 6.5-rmsprop: Divide the gradient bya running average of its recent magnitude.COURSERA: Neural networks for machinelearning, 4(2):26–31, 2012.

[379] Radu Timofte, Karel Zimmermann, and LucVan Gool. Multi-view traffic sign detection,recognition, and 3d localisation.Machine vi-sion and applications, 25(3):633–647, 2014.

[380] Tatiana Tommasi, Novi Patricia, BarbaraCaputo, and Tinne Tuytelaars. A deeperlook at dataset bias. In Gabriela Csurka,editor,Domain Adaptation in Computer Vi-sion Applications., Advances in ComputerVision and Pattern Recognition, pages 37–55. Springer, 2017. URLhttps://doi.org/10.1007/978-3-319-58347-1_2.

[381] Antonio Torralba and Alexei A Efros. Un-biased look at dataset bias. InThe 24thIEEE Conference on Computer Vision andPattern Recognition, CVPR 2011, ColoradoSprings, CO, USA, 20-25 June 2011, pages1521–1528, 2011.

[382] Toan Tran, Trung Pham, Gustavo Carneiro,Lyle Palmer, and Ian Reid. A bayesian dataaugmentation approach for learning deepmodels. InAdvances in Neural InformationProcessing Systems 30: Annual Conferenceon Neural Information Processing Systems822017, 4-9 December 2017, Long Beach, CA,USA, pages 2797–2806, 2017.

[383] Jonathan Tremblay, Aayush Prakash, DavidAcuna, Mark Brophy, Varun Jampani, CemAnil, Thang To, Eric Cameracci, ShaadBoochoon, and Stan Birchfield.Trainingdeep networks with synthetic data: Bridg-ing the reality gap by domain randomization.InThe IEEE Conference on Computer Vi-sion and Pattern Recognition (CVPR) Work-shops, June 2018.

[384] Jonathan Tremblay, Thang To, and StanBirchfield.Falling things:A syntheticdataset for 3d object detection and pose esti-mation.CoRR, abs/1804.06534, 2018. URLhttp://arxiv.org/abs/1804.06534.

[385] Subarna Tripathi,Zachary C. Lipton,Serge J. Belongie, and Truong Q. Nguyen.Context matters:Refining object detec-tion in video with recurrent neural net-works.In Richard C. Wilson, Edwin R.Hancock, and William A. P. Smith, edi-tors,Proceedings of the British Machine Vi-sion Conference 2016, BMVC 2016, York,UK, September 19-22, 2016. BMVA Press,2016.URLhttp://www.bmva.org/bmvc/2016/papers/paper044/index.html.

[386] Zhuowen Tu and Xiang Bai. Auto-contextand its application to high-level vision tasksand 3d brain image segmentation.IEEETransactions on Pattern Analysis and Ma-chine Intelligence, 32(10):1744–1757, 2010.

[387] Zhuowen Tu, Yi Ma, Wenyu Liu, Xiang Bai,and Cong Yao. Detecting texts of arbitraryorientations in natural images. In2012 IEEEConference on Computer Vision and PatternRecognition, pages 1083–1090, 2012.[388] Oncel Tuzel, Fatih Porikli, and Peter Meer.Pedestrian detection via classification on rie-mannian manifolds.IEEE Transactions onPattern Analysis and Machine Intelligence,30(10):1713–1727, 2008.

[389] Andras T ̈uzk ̈o, Christian Herrmann, DanielManger, and J ̈urgen Beyerer. Open set logodetection and retrieval. In Francisco H. Imai,Alain Tr ́emeau, and Jos ́e Braz, editors,Pro-ceedings of the 13th International Joint Con-ference on Computer Vision, Imaging andComputer Graphics Theory and Applications(VISIGRAPP 2018) – Volume 5: VISAPP,Funchal, Madeira, Portugal, January 27-29,2018., pages 284–292. SciTePress, 2018.

[390] Lachlan Tychsen-Smith and Lars Petersson.Improving object localization with fitnessNMS and bounded iou loss. InComputer Vi-sion and Pattern Recognition (CVPR), 2018IEEE Conference on, pages 6877–6885, 2018.doi: 10.1109/CVPR.2018.00719.

[391] Jasper RR Uijlings, Koen EA Van De Sande,Theo Gevers, and Arnold WM Smeul-ders.Selective search for object recogni-tion.International Journal of Computer Vi-sion (IJCV), 104(2):154–171, 2013.

[392] R ́egis Vaillant, Christophe Monrocq, andYann Le Cun. Original approach for thelocalisation of objects in images.IEEProceedings-Vision, Image and Signal Pro-cessing, 141(4):245–250, 1994.

[393] Koen EA Van de Sande, Jasper RR Uijlings,Theo Gevers, and Arnold WM Smeulders.Segmentation as selective search for objectrecognition.InIEEE International Con-ference on Computer Vision, ICCV 2011,Barcelona, Spain, November 6-13, 2011,pages 1879–1886, 2011.

[394] Grant Van Horn, Oisin Mac Aodha, YangSong, Yin Cui, Chen Sun, Alex Shepard,Hartwig Adam, Pietro Perona, and Serge Be-longie. The iNaturalist Species Classifica-tion and Detection Dataset. InComputer Vi-sion and Pattern Recognition (CVPR), 2018IEEE Conference on, 2018.

[395] G ̈ul Varol, Javier Romero, Xavier Martin,Naureen Mahmood, Michael J. Black, Ivan83Laptev, and Cordelia Schmid. Learning fromsynthetic humans. In2017 IEEE Conferenceon Computer Vision and Pattern Recogni-tion, CVPR 2017, Honolulu, HI, USA, July21-26, 2017, pages 4627–4635. IEEE Com-puter Society, 2017.

[396] Andreas Veit, Tomas Matera, Lukas Neu-mann, Jiri Matas, and Serge J. Belongie.Coco-text: Dataset and benchmark for textdetection and recognition in natural images.CoRR, abs/1601.07140, 2016. URLhttp://arxiv.org/abs/1601.07140.

[397] Alexander Vezhnevets and Vittorio Ferrari.Object localization in imagenet by look-ing out of the window. In Xianghua Xie,Mark W. Jones, and Gary K. L. Tam, editors,Proceedings of the British Machine VisionConference 2015, BMVC 2015, Swansea,UK, September 7-10, 2015, pages 27.1–27.12.BMVA Press, 2015.

[398] Paul A. Viola, Michael J. Jones, and DanielSnow. Detecting pedestrians using patternsof motion and appearance.InternationalJournal of Computer Vision (IJCV), 63(2):153–161, 2005.

[399] Stefan Walk, Nikodem Majer, KonradSchindler, and Bernt Schiele.New fea-tures and insights for pedestrian detection.InThe Twenty-Third IEEE Conference onComputer Vision and Pattern Recognition,CVPR 2010, San Francisco, CA, USA, 13-18 June 2010, pages 1030–1037, 2010.

[400] Fang Wan, Pengxu Wei, Jianbin Jiao, Zhen-jun Han, and Qixiang Ye. Min-entropy latentmodel for weakly supervised object detection.InComputer Vision and Pattern Recognition(CVPR), 2018 IEEE Conference on, June2018.

[401] Li Wan, David Eigen, and Rob Fergus.End-to-end integration of a convolutionalnetwork, deformable parts model and non-maximum suppression. InIEEE Conferenceon Computer Vision and Pattern Recogni-tion, CVPR 2015, Boston, MA, USA, June7-12, 2015, pages 851–859. IEEE ComputerSociety, 2015.

[402] Chong Wang, Weiqiang Ren, Kaiqi Huang,and Tieniu Tan. Weakly Supervised ObjectLocalization with Latent Category Learning.InComputer Vision – ECCV 2014 – 13thEuropean Conference, Zurich, Switzerland,September 6-12, 2014, 2014.

[403] Kai Wang and Serge Belongie. Word spot-ting in the wild.InComputer Vision -ECCV 2010, 11th European Conference onComputer Vision, Heraklion, Crete, Greece,September 5-11, 2010, pages 591–604, 2010.

[404] Li Wang, Yao Lu, Hong Wang, YingbinZheng, Hao Ye, and Xiangyang Xue. Evolv-ing boxes for fast vehicle detection.ICME,pages 1135–1140, 2017.

[405] Robert J. Wang, Xiang Li, Shuang Ao, andCharles X. Ling. Pelee: A Real-Time Ob-ject Detection System on Mobile Devices. InInternational Conference on Learning Repre-sentations (ICLR), 2018.

[406] Xiaolong Wang, Ross B. Girshick, AbhinavGupta, and Kaiming He.Non-local neu-ral networks.CoRR, abs/1711.07971, 2017.URLhttp://arxiv.org/abs/1711.07971.

[407] Xiaolong Wang, Abhinav Shrivastava, andAbhinav Gupta. A-fast-rcnn: Hard positivegeneration via adversary for object detection.In2017 IEEE Conference on Computer Vi-sion and Pattern Recognition, CVPR 2017,Honolulu, HI, USA, July 21-26, 2017, pages3039–3048. IEEE Computer Society, 2017.

[408] Xiaoyu Wang, Tony X. Han, and ShuichengYan. An HOG-LBP human detector withpartial occlusion handling. InIEEE 12th In-ternational Conference on Computer Vision,ICCV 2009, Kyoto, Japan, September 27 -October 4, 2009, pages 32–39, 2009.84

[409] Xinlong Wang, Tete Xiao, Yuning Jiang,Shuai Shao, Jian Sun, and Chunhua Shen.Repulsion Loss: Detecting Pedestrians in aCrowd.InComputer Vision and PatternRecognition (CVPR), 2018 IEEE Conferenceon, 2018.

[410] Maurice Weiler, Fred A. Hamprecht, andMartin Storath. Learning steerable filters forrotation equivariant cnns. InComputer Vi-sion and Pattern Recognition (CVPR), 2018IEEE Conference on, June 2018.

[411] Longyin Wen, Dawei Du, Zhaowei Cai, ZhenLei, Ming-Ching Chang, Honggang Qi, Jong-woo Lim, Ming-Hsuan Yang, and SiweiLyu.DETRAC: A new benchmark andprotocol for multi-object tracking.CoRR,abs/1511.04136, 2015. URLhttp://arxiv.org/abs/1511.04136.

[412] CameronWhitelam,EmmaTaborsky,Austin Blanton, Brianna Maze, JocelynAdams, Tim Miller, Nathan Kalka, Anil KJain, James A Duncan, Kristen Allen, et al.Iarpa janus benchmark-b face dataset. InCVPR Workshop on Biometrics, 2017.

[413] Christian Wojek,Gyuri Dork ́o,Andr ́eSchulz, and Bernt Schiele. Sliding-windowsfor rapid object class localization: A paral-lel technique. InJoint Pattern RecognitionSymposium, pages 71–81, 2008.

[414] Christian Wojek, Stefan Walk, and BerntSchiele. Multi-cue onboard pedestrian de-tection.In2009 IEEE Computer SocietyConference on Computer Vision and PatternRecognition (CVPR 2009), 20-25 June 2009,Miami, Florida, USA, pages 794–801. IEEEComputer Society, 2009.

[415] Sanghyun Woo, Soonmin Hwang, and In SoKweon. Stairnet: Top-down semantic aggre-gation for accurate one shot detection. In2018 IEEE Winter Conference on Applica-tions of Computer Vision, WACV 2018, LakeTahoe, NV, USA, March 12-15, 2018, pages1093–1102. IEEE Computer Society, 2018.

[416] Bichen Wu, Forrest N. Iandola, Peter H.Jin, and Kurt Keutzer. Squeezedet: Unified,small, low power fully convolutional neuralnetworks for real-time object detection forautonomous driving. In2017 IEEE Confer-ence on Computer Vision and Pattern Recog-nition Workshops, CVPR Workshops, Hon-olulu, HI, USA, July 21-26, 2017, pages 446–454. IEEE Computer Society, 2017.

[417] Bo Wu and Ram Nevatia. Cluster boostedtree classifier for multi-view, multi-poseobject detection.InIEEE 11th Inter-national Conference on Computer Vision,ICCV 2007, Rio de Janeiro, Brazil, October14-20, 2007, pages 1–8, 2007.

[418] Bo Wu and Ramakant Nevatia. Detectionof multiple, partially occluded humans ina single image by bayesian combination ofedgelet part detectors. In10th IEEE In-ternational Conference on Computer Vision(ICCV 2005), 17-20 October 2005, Beijing,China, pages 90–97, 2005.

[419] Tianfu Wu, Bo Li, and Song-Chun Zhu.Learning and-or model to represent contextand occlusion for car detection and view-point estimation.IEEE Transactions on Pat-tern Analysis and Machine Intelligence, 38(9):1829–1843, 2016.

[420] Yue Wu and Qiang Ji. Facial Landmark De-tection: A Literature Survey.InternationalJournal of Computer Vision (IJCV), To ap-pear, May 2018.

[421] Gui-Song Xia, Xiang Bai, Jian Ding, ZhenZhu, Serge J. Belongie, Jiebo Luo, Mi-hai Datcu, Marcello Pelillo, and LiangpeiZhang.DOTA: A large-scale dataset forobject detection in aerial images.CoRR,abs/1711.10398, 2017. URLhttp://arxiv.org/abs/1711.10398.85

[422] Wei Xiang, Dong-Qing Zhang, Heather Yu,and Vassilis Athitsos. Context-aware single-shot detector. pages 1784–1793, 2018. doi:10.1109/WACV.2018.00198.

[423] Yu Xiang and S. Savarese. Estimating theaspect layout of object categories. In2012IEEE Conference on Computer Vision andPattern Recognition, Providence, RI, USA,June 16-21, 2012, 2012.

[424] Yu Xiang, Wongun Choi, Yuanqing Lin, andSilvio Savarese. Data-driven 3d voxel pat-terns for object category recognition.InIEEE Conference on Computer Vision andPattern Recognition, CVPR 2015, Boston,MA, USA, June 7-12, 2015, pages 1903–1911. IEEE Computer Society, 2015.

[425] Yao Xiao, Cewu Lu, E. Tsougenis, YongyiLu, and Chi-Keung Tang.Complexity-adaptive distance metric for object propos-als generation. InIEEE Conference on Com-puter Vision and Pattern Recognition, CVPR2015, Boston, MA, USA, June 7-12, 2015,2015.

[426] Saining Xie, Ross Girshick, Piotr Doll ́ar,Zhuowen Tu, and Kaiming He. Aggregatedresidual transformations for deep neural net-works. In2017 IEEE Conference on Com-puter Vision and Pattern Recognition, CVPR2017, Honolulu, HI, USA, July 21-26, 2017,pages 5987–5995, 2017.

[427] Hongyu Xu, Xutao Lv, Xiaoyu Wang, ZhouRen, and Rama Chellappa. Deep regionletsfor object detection.CoRR, abs/1712.02408,2017. URLhttp://arxiv.org/abs/1712.02408.

[428] Jiaolong Xu,Sebastian Ramos,DavidV ́azquez, and Antonio M L ́opez. Domainadaptation of deformable part-based mod-els.IEEE Transactions on Pattern Analysisand Machine Intelligence, 36(12):2367–2380,2014.

[429] Zhaozhuo Xu, Xin Xu, Lei Wang, Rui Yang,and Fangling Pu. Deformable ConvNet withAspect Ratio Constrained NMS for ObjectDetection in Remote Sensing Imagery.Re-mote Sensing, 9:1312–19, December 2017.

[430] Junjie Yan, Xuzong Zhang, Zhen Lei, andStan Z. Li. Face detection by structural mod-els.Image and Vision Computing, 32(10):790–799, October 2014.

[431] Fan Yang, Wongun Choi, and Yuanqing Lin.Exploit all the layers: Fast and accurate cnnobject detector with scale dependent poolingand cascaded rejection classifiers. In2016IEEE Conference on Computer Vision andPattern Recognition, CVPR 2016, Las Ve-gas,NV, USA, June 27-30, 2016, pages 2129–2137, 2016.

[432] Shuo Yang, Ping Luo, Chen Change Loy,and Xiaoou Tang.From facial parts re-sponses to face detection: A deep learningapproach. In2015 IEEE International Con-ference on Computer Vision, ICCV 2015,Santiago, Chile, December 7-13, 2015, pages3676–3684. IEEE Computer Society, 2015.

[433] Shuo Yang, Ping Luo, Chen-Change Loy, andXiaoou Tang. Wider face: A face detectionbenchmark. In2016 IEEE Conference onComputer Vision and Pattern Recognition,CVPR 2016, Las Vegas,NV, USA, June 27-30, 2016, pages 5525–5533, 2016.

[434] Zhenheng Yang and Ramakant Nevatia. Amulti-scale cascade fully convolutional net-work face detector. In23rd InternationalConference on Pattern Recognition, ICPR2016, Canc ́un, Mexico, December 4-8, 2016,pages 633–638. IEEE, 2016.

[435] Cong Yao, Xiang Bai, Nong Sang, XinyuZhou, Shuchang Zhou, and Zhimin Cao.Scene text detection via holistic, multi-channel prediction.CoRR, abs/1606.09002,2016. URLhttp://arxiv.org/abs/1606.09002.86

[436] Ryota Yoshihashi, Tu Tuan Trinh, ReiKawakami, Shaodi You, Makoto Iida, andTakeshi Naemura.Learning multi-framevisual representation for joint detectionand tracking of small objects.CoRR,abs/1709.04666, 2017. URLhttp://arxiv.org/abs/1709.04666.

[437] Yang You, Zhao Zhang, Cho-Jui Hsieh,James Demmel, and Kurt Keutzer.Ima-genet training in minutes. InProceedings ofthe 47th International Conference on ParallelProcessing, ICPP 2018, Eugene, OR, USA,August 13-16, 2018, pages 1:1–1:10. ACM,2018.

[438] Fisher Yu and Vladlen Koltun. Multi-scalecontext aggregation by dilated convolutions.CoRR, abs/1511.07122, 2015. URLhttp://arxiv.org/abs/1511.07122.

[439] Fisher Yu, Vladlen Koltun, and Thomas A.Funkhouser. Dilated residual networks. In2017 IEEE Conference on Computer Visionand Pattern Recognition, CVPR 2017, Hon-olulu, HI, USA, July 21-26, 2017, pages 636–644. IEEE Computer Society, 2017.doi:10.1109/CVPR.2017.75.

[440] Fisher Yu, Wenqi Xian, Yingying Chen,Fangchen Liu, Mike Liao, Vashisht Madha-van, and Trevor Darrell. BDD100K: A di-verse driving video database with scalableannotation tooling.CoRR, abs/1805.04687,2018. URLhttp://arxiv.org/abs/1805.04687.

[441] Jiahui Yu, Yuning Jiang, Zhangyang Wang,Zhimin Cao, and Thomas S. Huang. Unitbox:An advanced object detection network. InProceedings of the 2016 ACM Conference onMultimedia Conference, MM 2016, Amster-dam, The Netherlands, October 15-19, 2016,pages 516–520, 2016.

[442] Ruichi Yu, Xi Chen, Vlad I. Morariu, andLarry S. Davis. The Role of Context Selec-tion in Object Detection. InProceedings ofthe British Machine Vision Conference 2016,BMVC 2016, York, UK, September 19-22,2016, September 2016.

[443] Yuan Yuan, Xiaodan Liang, Xiaolong Wang,Dit-Yan Yeung, and Abhinav Gupta. Tem-poral dynamic graph lstm for action-drivenvideo object detection.InIEEE Inter-national Conference on Computer Vision,ICCV 2017, Venice, Italy, October 22-29,2017, Oct 2017.

[444] Mehmet Kerim Yucel, Yunus Can Bilge,Oguzhan Oguz, Nazli Ikizler-Cinbis, PinarDuygulu, and Ramazan Gokberk Cinbis.Wildest faces: Face detection and recognitionin violent settings.CoRR, abs/1805.07566,2018. URLhttp://arxiv.org/abs/1805.07566.

[445] Sergey Zagoruyko and Nikos Komodakis.Wide residual networks. In Richard C. Wil-son, Edwin R. Hancock, and William A. P.Smith, editors,Proceedings of the BritishMachine Vision Conference 2016, BMVC2016, York, UK, September 19-22, 2016.BMVA Press. URLhttp://www.bmva.org/bmvc/2016/papers/paper087/index.html.

[446] Sergey Zagoruyko, Adam Lerer, Tsung-YiLin, Pedro Oliveira Pinheiro, Sam Gross,Soumith Chintala, and Piotr Doll ́ar.Amultipath network for object detection. InRichard C. Wilson, Edwin R. Hancock, andWilliam A. P. Smith, editors,Proceedings ofthe British Machine Vision Conference 2016,BMVC 2016, York, UK, September 19-22,2016, 2016. URLhttp://www.bmva.org/bmvc/2016/papers/paper015/index.html.

[447] Matthew D. Zeiler.ADADELTA: anadaptive learning rate method.CoRR,abs/1212.5701, 2012. URLhttp://arxiv.org/abs/1212.5701.

[448] Matthew D. Zeiler and Rob Fergus.Vi-sualizing and understanding convolutionalnetworks.InComputer Vision – ECCV872014 – 13th European Conference, Zurich,Switzerland, September 6-12, 2014, pages818–833, 2014. URLhttps://doi.org/10.1007/978-3-319-10590-1_53.

[449] Matthew D Zeiler and Rob Fergus. Visu-alizing and understanding convolutional net-works. InComputer Vision – ECCV 2014 -13th European Conference, Zurich, Switzer-land, September 6-12, 2014, pages 818–833,2014.

[450] Xingyu Zeng, Wanli Ouyang, Bin Yang, Jun-jie Yan, and Xiaogang Wang.Gated Bi-directional CNN for Object Detection. InComputer Vision – ECCV 2016 – 14th Eu-ropean Conference, Amsterdam, The Nether-lands, October 11-14, 2016, October 2016.

[451] Xingyu Zeng, Wanli Ouyang, Junjie Yan,Hongsheng Li, Tong Xiao, Kun Wang,Yu Liu, Yucong Zhou, Bin Yang, Zhe Wang,et al. Crafting gbd-net for object detection.IEEE Transactions on Pattern Analysis andMachine Intelligence, 2017.

[452] Yao Zhai, Jingjing Fu, Yan Lu, and HouqiangLi. Feature selective networks for object de-tection. InComputer Vision and PatternRecognition (CVPR), 2018 IEEE Conferenceon, June 2018.

[453] Cha Zhang and Zhengyou Zhang. A survey ofrecent advances in face detection. Technicalreport, Tech. rep., Microsoft Research, 2010.

[454] DongqingZhang,JiaolongYang,Dongqiangzi Ye, and Gang Hua.Lq-nets:Learned quantization for highlyaccurate and compact deep neural net-works.CoRR, abs/1807.10029, 2018. URLhttp://arxiv.org/abs/1807.10029.

[455] Liliang Zhang, Liang Lin, Xiaodan Liang,and Kaiming He.Is faster R-CNN do-ing well for pedestrian detection?InBastian Leibe, Jiri Matas, Nicu Sebe, andMax Welling, editors,Computer Vision -ECCV 2016 – 14th European Conference,Amsterdam, The Netherlands, October 11-14, 2016, volume 9906 ofLecture Notes inComputer Science, pages 443–457. Springer,2016.URLhttps://doi.org/10.1007/978-3-319-46475-6_28.

[456] Shanshan Zhang, Rodrigo Benenson, andBernt Schiele. Citypersons: A diverse datasetfor pedestrian detection.In2017 IEEEConference on Computer Vision and Pat-tern Recognition, CVPR 2017, Honolulu, HI,USA, July 21-26, 2017, pages 4457–4465.IEEE Computer Society, 2017.

[457] Shanshan Zhang, Jian Yang, and BerntSchiele.Occluded Pedestrian DetectionThrough Guided Attention in CNNs.InComputer Vision and Pattern Recognition(CVPR), 2018 IEEE Conference on, page 9,2018.

[458] Shifeng Zhang, Xiangyu Zhu, Zhen Lei,Hailin Shi, Xiaobo Wang, and Stan Z. Li.S$ˆ3$FD: Single Shot Scale-invariant Face De-tector. InIEEE International Conference onComputer Vision, ICCV 2017, Venice, Italy,October 22-29, 2017, 2017.

[459] Shifeng Zhang, Longyin Wen, Xiao Bian,Zhen Lei, and Stan Z. Li. Occlusion-awareR-CNN: detecting pedestrians in a crowd.CoRR, abs/1807.08407, 2018. URLhttp://arxiv.org/abs/1807.08407.

[460] Shifeng Zhang, Longyin Wen, Xiao Bian,Zhen Lei, and Stan Z. Li. Single-shot re-finement neural network for object detection.InComputer Vision and Pattern Recognition(CVPR), 2018 IEEE Conference on, 2018.

[461] Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin,and Jian Sun. Shufflenet: An extremely effi-cient convolutional neural network for mobiledevices.CoRR, abs/1707.01083, 2017. URLhttp://arxiv.org/abs/1707.01083.88

[462] Xiaolin Zhang, Yunchao Wei, Jiashi Feng,Yi Yang, and Thomas S. Huang. Adversar-ial complementary learning for weakly super-vised object localization. InComputer Vi-sion and Pattern Recognition (CVPR), 2018IEEE Conference on, June 2018.

[463] Xiaopeng Zhang, Jiashi Feng, HongkaiXiong, and Qi Tian.Zigzag learning forweakly supervised object detection.InComputer Vision and Pattern Recognition(CVPR), 2018 IEEE Conference on, June2018.

[464] Yongqiang Zhang, Yancheng Bai, Min-gli Ding,Yongqiang Li,and BernardGhanem. W2f: A weakly-supervised to fully-supervised framework for object detection.InComputer Vision and Pattern Recognition(CVPR), 2018 IEEE Conference on, June2018.

[465] Yuting Zhang, Kihyuk Sohn, R. Villegas,Gang Pan, and Honglak Lee. Improving ob-ject detection with deep convolutional net-works via Bayesian optimization and struc-tured prediction. InIEEE Conference onComputer Vision and Pattern Recognition,CVPR 2015, Boston, MA, USA, June 7-12,2015, 2015.

[466] Zheng Zhang, Chengquan Zhang, Wei Shen,Cong Yao, Wenyu Liu, and Xiang Bai. Multi-oriented text detection with fully convolu-tional networks. In2016 IEEE Conference onComputer Vision and Pattern Recognition,CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pages 4159–4167. IEEE ComputerSociety, 2016.

[467] Zhishuai Zhang, Siyuan Qiao, Cihang Xie,Wei Shen, Bo Wang, and Alan L. Yuille.Single-shot object detection with enriched se-mantics. InComputer Vision and PatternRecognition (CVPR), 2018 IEEE Conferenceon, June 2018.

[468] Fan Zhao, Yao Yang, Hai-yan Zhang, Lin-lin Yang, and Lin Zhang. Sign text detec-tion in street view images using an integratedfeature.Multimedia Tools and Applications,April 2018.

[469] Xiangyun Zhao, Shuang Liang, and YichenWei. Pseudo mask augmented object detec-tion. InComputer Vision and Pattern Recog-nition (CVPR), 2018 IEEE Conference on,June 2018.

[470] Zhong-Qiu Zhao, Peng Zheng, Shou-tao Xu,and Xindong Wu. Object detection with deeplearning: A review.CoRR, abs/1807.05511,2018. URLhttp://arxiv.org/abs/1807.05511.

[471] Liwen Zheng, Canmiao Fu, and Yong Zhao.Extend the shallow part of single shot multi-box detector via convolutional neural net-work.CoRR, abs/1801.05918, 2018. URLhttp://arxiv.org/abs/1801.05918.

[472] 15 Bolei Zhou, Aditya Khosla,`AgataLapedriza, Aude Oliva, and Antonio Tor-ralba. Object detectors emerge in deep scenecnns. InIEEE Conference on Computer Vi-sion and Pattern Recognition, CVPR 2015,Boston, MA, USA, June 7-12, 2015, 2015.

[473] Bolei Zhou,`Agata Lapedriza, JianxiongXiao, Antonio Torralba, and Aude Oliva.Learning deep features for scene recognitionusing places database. InAdvances in Neu-ral Information Processing Systems 27: An-nual Conference on Neural Information Pro-cessing Systems 2014, December 8-13 2014,Montreal, Quebec, Canada, 2014.

[474] Bolei Zhou, Aditya Khosla,`Agata Lapedriza,Aude Oliva, and Antonio Torralba. Learningdeep features for discriminative localization.In2016 IEEE Conference on Computer Vi-sion and Pattern Recognition, CVPR 2016,Las Vegas, NV, USA, June 27-30, 2016,pages 2921–2929. IEEE Computer Society,2016.89

[475] Peng Zhou, Bingbing Ni, Cong Geng, Jian-guo Hu, and Yi Xu. Scale-Transferrable Ob-ject Detection. InComputer Vision and Pat-tern Recognition (CVPR), 2018 IEEE Con-ference on, page 10, 2018.

[476] Shuchang Zhou, Zekun Ni, Xinyu Zhou,He Wen, Yuxin Wu, and Yuheng Zou.Dorefa-net: Training low bitwidth convolu-tional neural networks with low bitwidth gra-dients.CoRR, abs/1606.06160, 2016. URLhttp://arxiv.org/abs/1606.06160.

[477] Xinyu Zhou, Cong Yao, He Wen, YuzhiWang, Shuchang Zhou, Weiran He, and Ji-ajun Liang. East: An efficient and accuratescene text detector. In2017 IEEE Confer-ence on Computer Vision and Pattern Recog-nition, CVPR 2017, Honolulu, HI, USA,July 21-26, 2017, July 2017.

[478] Yin Zhou and Oncel Tuzel. Voxelnet: End-to-end learning for point cloud based 3d ob-ject detection.CoRR, abs/1711.06396, 2017.URLhttp://arxiv.org/abs/1711.06396.

[479] Haigang Zhu, Xiaogang Chen, Weiqun Dai,Kun Fu, Qixiang Ye, and Jianbin Jiao. Ori-entation robust object detection in aerial im-ages using deep convolutional neural net-work.InImage Processing (ICIP), 2015IEEE International Conference on, pages3735–3739, 2015.

[480] Jun-Yan Zhu, Taesung Park, Phillip Isola,and Alexei A. Efros.Unpaired image-to-image translation using cycle-consistent ad-versarial networks. InIEEE InternationalConference on Computer Vision, ICCV2017, Venice, Italy, October 22-29, 2017,pages 2242–2251. IEEE Computer Society,2017.

[481] Pengfei Zhu, Longyin Wen, Xiao Bian,Haibin Ling, and Qinghua Hu. Vision meetsdrones: A challenge.CoRR, abs/1804.07437,2018. URLhttp://arxiv.org/abs/1804.07437.

[482] Pengkai Zhu, Hanxiao Wang, Tolga Boluk-basi, and Venkatesh Saligrama. Zero-shot de-tection.CoRR, abs/1803.07113, 2018. URLhttp://arxiv.org/abs/1803.07113.

[483] Xiangxin Zhu and Deva Ramanan. Face de-tection, pose estimation, and landmark lo-calization in the wild. In2012 IEEE Confer-ence on Computer Vision and Pattern Recog-nition, Providence, RI, USA, June 16-21,2012, pages 2879–2886. IEEE Computer So-ciety, 2012.

[484] Xizhou Zhu, Yujie Wang, Jifeng Dai,Lu Yuan, and Yichen Wei.Flow-guidedfeature aggregation for video object detec-tion. InIEEE International Conference onComputer Vision, ICCV 2017, Venice, Italy,October 22-29, 2017, pages 408–417. IEEEComputer Society, 2017.

[485] Xizhou Zhu, Yujie Wang, Jifeng Dai,Lu Yuan, and Yichen Wei. Flow-guided fea-ture aggregation for video object detection.InIEEE International Conference on Com-puter Vision, ICCV 2017, Venice, Italy, Oc-tober 22-29, 2017, pages 408–417, 2017. doi:10.1109/ICCV.2017.52.

[486] Xizhou Zhu, Yuwen Xiong, Jifeng Dai,Lu Yuan, and Yichen Wei. Deep featureflow for video recognition. In2017 IEEEConference on Computer Vision and Pat-tern Recognition, CVPR 2017, Honolulu, HI,USA, July 21-26, 2017, volume 2, page 7,2017.

[487] Xizhou Zhu, Jifeng Dai, Xingchi Zhu, YichenWei, and Lu Yuan. Towards high perfor-mance video object detection for mobiles.CoRR, abs/1804.05830, 2018. URLhttp://arxiv.org/abs/1804.05830.

[488] Yousong Zhu, Chaoyang Zhao, JinqiaoWang, Xu Zhao, Yi Wu, and Hanqing Lu.Couplenet: Coupling global structure withlocal parts for object detection. InIEEE90International Conference on Computer Vi-sion, ICCV 2017, Venice, Italy, October 22-29, 2017, pages 4126–4134, 2017. doi: 10.1109/ICCV.2017.444.

[489] Yukun Zhu, R. Urtasun, R. Salakhutdinov,and S. Fidler. segDeepM: Exploiting segmen-tation and context in deep neural networksfor object detection. InIEEE Conference onComputer Vision and Pattern Recognition,CVPR 2015, Boston, MA, USA, June 7-12,2015, 2015

[490] Zhe Zhu, Dun Liang, Songhai Zhang, XiaoleiHuang, Baoli Li, and Shimin Hu. Traffic-sign detection and classification in the wild.In2016 IEEE Conference on Computer Vi-sion and Pattern Recognition, CVPR 2016,Las Vegas,NV, USA, June 27-30, 2016, pages2110–2118, 2016.

[491] C. L. Zitnick and P. Dollar. Edge boxes: Lo-cating object proposals from edges. InCom-puter Vision – ECCV 2014 – 13th EuropeanConference, Zurich, Switzerland, September6-12, 2014, 2014.

[492] Zhen Zuo, Bing Shuai, Gang Wang 0012,Xiao Liu, Xingxing Wang, Bing Wang, andYushi Chen. Learning Contextual Depen-dence With Convolutional Hierarchical Re-current Neural Networks.IEEE Transactionson Image Processing, 2016.