Author(s):
- Bolanos, Marc
- Dimiccoli, Mariella
- Radeva, Petia
Abstract:
Visual lifelogging consists of acquiring images that capture the daily experiences of the user by wearing a camera over a long period of time. The pictures taken offer considerable potential for knowledge mining concerning how people live their lives; hence, they open up new opportunities for many potential applications in fields including healthcare, security, leisure, and the quantified self. However, automatically building a story from a huge collection of unstructured egocentric data presents major challenges. This paper provides a thorough review of advances made so far in egocentric data analysis and, in view of the current state of the art, indicates new lines of research to move us toward storytelling from visual lifelogging.
Document:
https://doi.org/10.1109/THMS.2016.2616296
References:
1. M. Aghaei, M. Dimiccoli and P. Radeva, “Multi-face tracking by extended bag-of-tracklets in egocentric photo-streams”, Comput. Vision Image Understanding, vol. 149, pp. 146-156, 2015. Show Context CrossRef Google Scholar
2. M. Aghaei, M. Dimiccoli and P. Radeva, “Towards social interaction detection in egocentric photo streams”, Proc. Int. Conf. Mach. Vision, 2015. Google Scholar
3. M. Aghaei and P. Radeva, “Bag-of-tracklets for person tracking in life-logging data” in Artificial Intelligence Research and Development: Recent Advances and Applications, Amsterdam, The Netherlands:IOS Press, vol. 269, 2014. Show Context Google Scholar
4. O. Aghazadeh, J. Sullivan and S. Carlsson, “Novelty detection from an ego-centric perspective”, Proc. IEEE Conf. Comput. Vision Pattern Recognit., pp. 3297-3304, 2011. Show Context View Article Full Text: PDF (11827KB) Google Scholar
5. A. Alletto, G. Serra, S. Calderara and R. Cucchiara, “Head pose estimation in first-person camera views”, Proc. 22nd Int. Conf. Pattern Recognit., pp. 4188-4193, 2014. Show Context Google Scholar
6. S. Alletto, G. Serra, S. Calderara, F. Solera and R. Cucchiara, “From ego to Nos-vision: Detecting social relationships in first-person views”, Proc. IEEE Conf. Comput. Vision Pattern Recognit. Workshops, pp. 594-599, 2014. Show Context Google Scholar
7. S. Bambach, S. Lee, D. Crandall and C. Yu, “Lending a hand: Detecting hands and recognizing activities in complex egocentric interactions”, Proc. IEEE Int. Conf. Comput. Vision, pp. 1949-1957, 2015. Show Context View Article Full Text: PDF (1467KB) Google Scholar
8. L. Baraldi, F. Paci, G. Serra, L. Benini and R. Cucchiara, “Gesture recognition in ego-centric videos using dense trajectories and hand segmentation”, Proc. IEEE Conf. Comput. Vision Pattern Recognit. Workshops, pp. 702-707, 2014. Show Context Google Scholar
9. L. Bazzani et al., “Social interactions by visual focus of attention in a three-dimensional environment”, Expert Syst., vol. 30, no. 2, pp. 115-127, 2013. Show Context CrossRef Google Scholar
10. A. Behera, D. C. Hogg and A. G. Cohn, “Egocentric activity monitoring and recovery”, Proc. 11th Asian Conf. Comput. Vision, pp. 519-532, 2013. Show Context CrossRef Google Scholar
11. A. Betancourt, P. Morerio, E. I Barakova, L. Marcenaro, M. Rauterberg and C. S. Regazzoni, “A dynamic approach and a new dataset for hand-detection in first person vision” in Computer Analysis of Images and Patterns, Berlin, Germany:Springer, pp. 274-287, 2015. Show Context CrossRef Google Scholar
12. A. Betancourt, P. Morerio, C. S. Regazzoni and M. Rauterberg, “The evolution of first person vision methods: A survey”, IEEE Trans. Circuits Syst. Video Technol., vol. 25, no. 5, pp. 744-760, May 2015. Show Context View Article Full Text: PDF (2756KB) Google Scholar
13. V. Bettadapura, I. Essa and C. Pantofaru, “Egocentric field-of-view localization using first-person point-of-view devices”, Proc. IEEE Winter Conf. Appl. Comput. Vision, pp. 626-633, 2015. Show Context View Article Full Text: PDF (4999KB) Google Scholar
14. V. Bettadapura, E. Thomaz, A. Parnami, G. D. Abowd and I. Essa, “Leveraging context to support automated food recognition in restaurants”, Proc. IEEE Winter Conf. Appl. Comput. Vision, pp. 580-587, 2015. Google Scholar
15. M. Bolaños, R. Mestre, E. Talavera, X. Giró-i Nieto and P. Radeva, “Visual summary of egocentric photostreams by representative keyframes”, Proc. IEEE Int. Conf. Multimedia Expo. Workshops, pp. 1-6, 2015. Show Context Google Scholar
16. M. Bolaños and P. Radeva, “Ego-object discovery”, arXiv preprint arXiv:1504.01639, 2015. Show Context Google Scholar
17. M. Bolaños, M. Garolera and P. Radeva, “Active labeling application applied to food-related object recognition”, Proc. ACM Int. Workshop Multimedia Cooking Eating Activities, pp. 45-50, 2013. Show Context Google Scholar
18. M. Bolaños, M. Garolera and P. Radeva, “Video segmentation of life-logging videos” in Articulated Motion and Deformable Objects, Berlin, Germany:Springer, pp. 1-9, 2014. Show Context Google Scholar
19. M. Bolaños, M. Garolera and P. Radeva, “Object discovery using CNN features in egocentric videos” in Pattern Recognition and Image Analysis, Berlin, Germany:Springer, pp. 67-74, 2015. Show Context CrossRef Google Scholar
20. I. M. Bullock, T. Feix and A. M. Dollar, “The yale human grasping dataset: Grasp object and task data in household and machine shop environments”, Int. J. Robot. Res., vol. 34, no. 3, pp. 251-255, 2015. Show Context CrossRef Google Scholar
21. D. Byrne, A. R. Doherty, C. G. M. Snoek, G. J. F. Jones and A. F. Smeaton, “Everyday concept detection in visual lifelogs: Validation relationships and trends”, Multimedia Tools Appl., vol. 49, no. 1, pp. 119-144, 2010. Show Context CrossRef Google Scholar
22. M. Cai, K. M. Kitani and Y. Sato, “A scalable approach for understanding the visual structures of hand grasps”, Proc. IEEE Int. Conf. Robot. Autom., pp. 1360-1366, 2015. Show Context Google Scholar
23. D. Castro et al., “Predicting daily activities from egocentric images using deep learning”, Proc. ACM Int. Symp. Wearable Comput., pp. 75-82, 2015. Show Context Google Scholar
24. V. Chandrasekhar, C. Tan, W. Min, L. Liyuan, L. Xiaoli and L. J. Hwee, “Incremental graph clustering for efficient retrieval from streaming egocentric video data”, Proc. IEEE Int. Conf. Pattern Recognit., pp. 2631-2636, 2014. Show Context Google Scholar
25. S. Chowdhury, P. J. McParlane, S. Ferdous and J. Jose, “My day in review: Visually summarising noisy lifelog data”, Proc. ACM Int. Conf. Multimedia Inf. Retrieval, pp. 607-610, 2015. Show Context Access at ACM Google Scholar
26. D. Damen, O. Haines, T. Leelasawassk, A. Calway and W. Mayol-Cuevas, “Multi-user egocentric online system for unsupervised assistance on object usage”, Proc. Eur. Conf. Comput. Vision Workshops, pp. 481-492, 2014. Show Context CrossRef Google Scholar
27. M. Dimiccoli and P. Radeva, “Visual lifelogging in the era of outstanding digitization”, Digit. Presentation Preservation Cultural Sci. Heritage, vol. V, pp. 59-64, 2015. Show Context Google Scholar
28. A. R. Doherty, C. Ó Conaire, M. Blighe, A. F. Smeaton and N. E. O’Connor, “Combining image descriptors to effectively retrieve events from visual lifelogs”, Proc. ACM Int. Conf. Multimedia Inf. Retrieval, pp. 10-17, 2008. Show Context Access at ACM Google Scholar
29. A. R. Doherty et al., “Experiences of aiding autobiographical memory using the SenseCam”, Human–Comput. Interact., vol. 27, no. 1/2, pp. 151-174, 2012. Show Context Google Scholar
30. A. R. Doherty and A. F. Smeaton, “Automatically segmenting lifelog data into events”, Proc. Int. Workshop Image Audio Anal. Multimedia Interactive Serv., pp. 20-23, 2008. Show Context Google Scholar
31. A. R. Doherty and A. F. Smeaton, “Combining face detection and novelty to identify important events in a visual lifelog”, Proc. IEEE Int. Conf. Comput. Inf. Technol. Workshops, pp. 348-353, 2008. Show Context Google Scholar
32. A. R. Doherty et al., “Wearable cameras in health: The state of the art and future possibilities”, Amer. J. Preventive Med., vol. 44, no. 3, pp. 320-323, 2013. Show Context CrossRef Google Scholar
33. C. Farabet, C. Couprie, L. Najman and Y. LeCun, “Learning hierarchical features for scene labeling”, IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 8, pp. 1915-1929, Aug. 2013. Show Context View Article Full Text: PDF (2618KB) Google Scholar
34. A. Fathi, A. Farhadi and J. M. Rehg, “Understanding egocentric activities”, IEEE Int. Conf. Proc. Comput. Vision, pp. 407-414, 2011. Show Context Google Scholar
35. A. Fathi, J. K. Hodgins and J. M. Rehg, “Social interactions: A first-person perspective”, Proc. IEEE Conf. Comput. Vision Pattern Recognit., pp. 1226-1233, 2012. Show Context Google Scholar
36. A. Fathi, Y. Li and J. M. Rehg, “Learning to recognize daily actions using gaze”, Proc. Eur. Conf. Comput. Vision, pp. 314-327, 2012. Show Context Google Scholar
37. A. Fathi, X. Ren and J. M. Rehg, “Learning to recognize objects in egocentric activities”, Proc. IEEE Conf. Comput. Vision Pattern Recognit., pp. 3281-3288, 2011. Show Context View Article Full Text: PDF (1417KB) Google Scholar
38. T. Feix, R. Pawlik, H.-B. Schmiedmayer, J. Romero and D. Kragic, “A comprehensive grasp taxonomy”, Proc. Robot. Sci. Syst. Workshop Understanding Human Hand Adv. Robot. Manipulation, pp. 2-3, 2009. Google Scholar
39. A. Furlan, S. Miller, D. G Sorrenti, L. Fei-Fei and S. Savarese, “Free your camera: 3d indoor scene understanding from arbitrary camera motion”, Proc. Brit. Mach. Vision Conf., pp. 24.1-24.12, 2013. CrossRef Google Scholar
40. J. Ghosh, Y. J. Lee and K. Grauman, “Discovering important people and objects for egocentric video summarization”, Proc. IEEE Conf. Comput. Vision Pattern Recognit., pp. 1346-1353, 2012. Show Context Google Scholar
41. C. Gurrin, A. F. Smeaton and A. R. Doherty, “Lifelogging: Personal big data”, Found. Trends Inf. Retrieval, vol. 8, no. 1, pp. 1-125, 2014. Show Context CrossRef Google Scholar
42. M. Harvey, M. Langheinrich and G. Ward, “Remembering through lifelogging: A survey of human memory augmentation”, Pervasive Mobile Comput., vol. 27, pp. 14-26, 2016. Show Context CrossRef Google Scholar
43. D. S. Hayden et al., “The accuracy-obtrusiveness tradeoff for wearable vision platforms”, Proc. IEEE Conf. Comput. Vision Pattern Recognit. Workshop Egocentric Vision, 2012. Show Context Google Scholar
44. S. Hodges et al., “SenseCam: A retrospective memory aid”, Proc. 8th Int. Conf. Ubiquitous Comput., pp. 177-193, 2006. Show Context Google Scholar
45. F. Hopfgartner, Y. Yang, L. M. Zhou and C. Gurrin, “User interaction templates for the design of lifelogging systems”, Proc. Semantic Models Adaptive Interactive Syst., pp. 187-204, 2013. Show Context CrossRef Google Scholar
46. H. Hung and B. Kröse, “Detecting f-formations as dominant sets”, Proc. ACM Int. Conf. Multimodal Interfaces, pp. 231-238, 2011. Show Context Access at ACM Google Scholar
47. P. Isola, J. Xiao, A. Torralba and A. Oliva, “What makes an image memorable”, Proc. IEEE Conf. Comput. Vision Pattern Recognit., pp. 145-152, 2011. Show Context View Article Full Text: PDF (4465KB) Google Scholar
48. Y. Iwashita, A. Takamine, R. Kurazume and M. S. Ryoo, “First-person animal activity recognition from egocentric videos”, Proc. IEEE Int. Conf. Pattern Recognit., pp. 4310-4315, 2014. Show Context Google Scholar
49. A. Jinda-Apiraksa, J. Machajdik and R. Sablatnig, “A keyframe selection of lifelog image sequences”, 2012. Show Context Google Scholar
50. N. Jojic, A. Perina and V. Murino, “Structural epitome: A way to summarize ones visual experience”, Proc. Adv. Neural Inf. Process. Syst. Conf., pp. 1027-1035, 2010. Show Context Google Scholar
51. T. Kanade and M. Hebert, “First-person vision”, Proc. IEEE, vol. 100, no. 8, pp. 2442-2453, Aug. 2012. Show Context View Article Full Text: PDF (1642KB) Google Scholar
52. H. Kang, M. Hebert and T. Kanade, “Discovering object instances from scenes of daily living”, Proc. IEEE Int. Conf. Comput. Vision, pp. 762-769, 2011. Show Context Google Scholar
53. A. Kendon, Studies in the Behavior of Social Interaction, Atlantic Highlands, NJ, USA:Humanities Press Int., vol. 6, 1977. Show Context Google Scholar
54. A. Kendon, Conducting Interaction: Patterns of Behavior in Focused Encounters, Cambridge, U.K.:Cambridge Univ. Press, vol. 7, 1990. Show Context Google Scholar
55. B. Kikhia, A.y Boytsov, J. Hallberg, H. Jonsson and K. Synnes, “Structuring and presenting lifelogs based on location data” in Pervasive Computing Paradigms for Mental Health, Berlin, Germany:Springer, pp. 133-144, 2014. CrossRef Google Scholar
56. K. M. Kitani, T. Okabe, Y. Sato and A. Sugimoto, “Fast unsupervised ego-action learning for first-person sports videos”, Proc. IEEE Conf. Comput. Vision Pattern Recognit., pp. 3241-3248, 2011. Show Context View Article Full Text: PDF (1970KB) Google Scholar
57. M. L. Lee and A. K. Dey, “Lifelogging memory appliance for people with episodic memory impairment”, Proc. 10th Int. Conf. Ubiquitous Comput., pp. 44-53, 2008. Show Context Access at ACM Google Scholar
58. S. Lee, S. Bambach, D. J Crandall, J. M. Franchak and C. Yu, “This hand is my hand: A probabilistic approach to hand disambiguation in egocentric video”, Proc. IEEE Conf. Comput. Vision Pattern Recognit. Workshops, pp. 557-564, 2014. Show Context Google Scholar
59. C. Li and K. M. Kitani, “Pixel-level hand detection in ego-centric videos”, Proc. IEEE Conf. Comput. Vision Pattern Recognit., pp. 3570-3577, 2013. Show Context Google Scholar
60. N. Li, M. Crane and H. J. Ruskin, “Automatically detecting “significant events” on SenseCam”, Int. J. Wavelets Multiresolution Inf. Process., vol. 11, no. 6, 2013. Show Context CrossRef Google Scholar
61. A. Lidon, M. Bolaños, M. Dimiccoli, P. Radeva, M. Garolera and X. Giró-i Nieto, “Semantic summarization of egocentric photo stream events”, arXiv preprint arXiv:1511.00438, 2015. Show Context Google Scholar
62. W.-H. Lin and A. Hauptmann, “Structuring continuous video recordings of everyday life using time-constrained clustering”, Proc. SPIE, vol. 6073, 2006. Show Context CrossRef Google Scholar
63. J. Long, E. Shelhamer and T. Darrell, “Fully convolutional networks for semantic segmentation”, Proc. IEEE Conf. Comput. Vision Pattern Recognit., pp. 3431-3440, 2015. Show Context Google Scholar
64. Z. Lu and K. Grauman, “Story-driven summarization for egocentric video”, Proc. IEEE Conf. Comput. Vision Pattern Recognit., pp. 2714-2721, 2013. Show Context Google Scholar
65. K. Matsuo, K. Yamada, S. Ueno and S. Naito, “An attention-based activity recognition for egocentric video”, Proc. IEEE Conf. Comput. Vision Pattern Recognit. Workshops, pp. 565-570, 2014. Show Context Google Scholar
66. W. W. Mayol-Cuevas, B. J. Tordoff and D. W. Murray, “On the choice and placement of wearable vision sensors”, IEEE Trans. Syst. Man Cybern. A Syst. Humans, vol. 39, no. 2, pp. 414-425, Mar. 2009. Show Context View Article Full Text: PDF (1585KB) Google Scholar
67. A. Mehrabian, “Significance of posture and position in the communication of attitude and status relationships”, Psychol. Bull., vol. 71, no. 5, pp. 359-372, 1969. Show Context CrossRef Google Scholar
68. W. Min, X. Li, C. Tan, B. Mandal, L. Li and J.-H. Lim, “Efficient retrieval from large-scale egocentric visual data using a sparse graph representation”, Proc. IEEE Conf. Comput. Vision Pattern Recognit. Workshops, pp. 541-548, 2014. Show Context Google Scholar
69. M. Naaman, S. Harada, Q. Wang, H. Garcia-Molina and A. Paepcke, “Context data in geo-referenced digital photo collections”, Proc. ACM Int. Conf. Multimedia, pp. 196-203, 2004. Show Context Access at ACM Google Scholar
70. S. Narayan, M. S. Kankanhalli and K. R. Ramakrishnan, “Action and interaction recognition in first-person videos”, Proc. IEEE Conf. Comput. Vision Pattern Recognit. Workshops, pp. 526-532, 2014. Show Context Google Scholar
71. H. Pirsiavash and D. Ramanan, “Detecting activities of daily living in first-person camera views”, Proc. IEEE Conf. Comput. Vision Pattern Recognit., pp. 2847-2854, 2012. Show Context Google Scholar
72. F. Poiesi and A. Cavallaro, “Predicting and recognizing human interactions in public spaces”, J. Real-Time Image Process., vol. 10, pp. 785-803, 2014. Show Context CrossRef Google Scholar
73. Y. Poleg, C. Arora and S. Peleg, “Temporal segmentation of egocentric videos”, Proc. IEEE Conf. Comput. Vision Pattern Recognit., pp. 2537-2544, 2014. Show Context Google Scholar
74. X. Ren and C. Gu, “Figure-ground segmentation improves handled object recognition in egocentric video”, Proc. IEEE Conf. Comput. Vision Pattern Recognit., pp. 3137-3144, 2010. Show Context Google Scholar
75. X. Ren and M. Philipose, “Egocentric recognition of handled objects: Benchmark and analysis”, Proc. IEEE Conf. Comput. Vision Pattern Recognit. Workshops, pp. 1-8, 2009. Show Context Google Scholar
76. G. Rogez, M. Khademi, J. S. Supančič, J.-M. M. Montiel and D. Ramanan, “3D hand pose detection in egocentric RGB-D images”, Proc. Eur. Conf. Comput. Vision Workshops, pp. 356-371, 2014. Show Context Google Scholar
77. G. Rogez, J. S. Supancic and D. Ramanan, “Egocentric pose recognition in four lines of code”, arXiv preprint arXiv:1412.0060, 2014. Show Context Google Scholar
78. R. J. Rummel, “Social behavior and interaction” in Understanding Conflict and War—The Conflict., New York, NY, USA:Wiley, 1976. Show Context Google Scholar
79. M. S. Ryoo and L. Matthies, “First-person activity recognition: What are they doing to me”, Proc. IEEE Conf. Comput. Vision Pattern Recognit., pp. 2730-2737, 2013. Show Context Google Scholar
80. M. S. Ryoo, B. Rothrock and L. Matthies, “Pooled motion features for first-person videos”, Proc. IEEE Conf. Comput. Vision Pattern Recognit., pp. 896-904, 2015. Show Context Google Scholar
81. M. Schinas, S. Papadopoulos, Y. Kompatsiaris and P. A. Mitkas, “Visual event summarization on social media using topic modelling and graph-based ranking algorithms”, Proc. 5th ACM Int. Conf. Multimedia Inf. Retrieval, pp. 203-210, 2015. Show Context Access at ACM Google Scholar
82. F. Setti, C. Russell, C. Bassetti and M. Cristani, “F-formation detection: Individuating free-standing conversational groups in images”, PloS One, vol. 10, 2015. Show Context CrossRef Google Scholar
83. A. F. Smeaton, P. Over and A. R. Doherty, “Video shot boundary detection: Seven years of TRECVid activity”, Comput. Vision Image Understanding, vol. 114, no. 4, pp. 411-418, 2010. Show Context CrossRef Google Scholar
84. S. Song, V. Chandrasekhar, N.-M. Cheung, S. Narayan, L. Li and J.-H. Lim, “Activity recognition in egocentric life-logging videos”, Proc. Comput. Vision ACCV Workshops, pp. 445-458, 2014. Show Context CrossRef Google Scholar
85. H. Soo Park and J. Shi, “Social saliency prediction”, Proc. IEEE Conf. Comput. Vision Pattern Recognit., pp. 4777-4785, 2015. Show Context Google Scholar
86. E. H. Spriggs, F. De La Torre and M. Hebert, “Temporal segmentation and activity classification from first-person sensing”, Proc. IEEE Comput. Soc. Conf. Comput. Vision Pattern Recognit. Workshops, pp. 17-24, 2009. Show Context Google Scholar
87. S. Sundaram and W. W. Mayol-Cuevas, “Egocentric visual event classification with location-based priors”, Proc. 6th Int. Conf. Adv. Visual Comput., pp. 596-605, 2010. Show Context CrossRef Google Scholar
88. E. Talavera, M. Dimiccoli, M. Bolaños, M. Aghaei and P. Radeva, “R-clustering for egocentric video segmentation” in Pattern Recognition and Image Analysis, Berlin, Germany:Springer, pp. 327-336, 2015. Show Context Google Scholar
89. C. Tan, H. Goh, V. Chandrasekhar, L. Li and J.-H. Lim, “Understanding the nature of first-person videos: Characterization and classification using low-level features”, Proc. Comput. Vision Pattern Recognit. Workshops, pp. 549-556, 2014. Show Context Google Scholar
90. B. T. Truong and S. Venkatesh, “Video abstraction: A systematic review and classification”, ACM Trans. Multimedia Comput. Commun. Appl., vol. 3, 2007. Show Context Access at ACM Google Scholar
91. P. Varini, R. Serra, G. and Cucchiara, “Personalized egocentric video summarization for cultural experience”, Proc. ACM Int. Conf. Multimedia Inf. Retrieval, pp. 539-542, 2015. Show Context Google Scholar
92. S. Venugopalan, M. Rohrbach, J. Donahue, R. Mooney, T. Darrell and K. Saenko, “Sequence to sequence-video to text”, Proc. IEEE Conf. Comput. Vision, pp. 4534-4542, 2015. Show Context View Article Full Text: PDF (948KB) Google Scholar
93. P. Wang and A. F. Smeaton, “Semantics-based selection of everyday concepts in visual lifelogging”, Int. J. Multimedia Inf. Retrieval, vol. 1, no. 2, pp. 87-101, 2012. Show Context CrossRef Google Scholar
94. Z. Wang, M. D. Hoffman, P. R. Cook and K. Li, “Vferret: Content-based similarity search tool for continuous archived video”, Proc. ACM Workshop Continuous Archival Retrieval Pers. Experiences, pp. 19-26, 2006. Show Context Access at ACM Google Scholar
95. H. Wannous, V. Dovgalecs, R. Mégret and M. Daoudi, Place Recognition Via 3D Modeling for Personal Activity Lifelog Using Wearable Camera, New York, NY, USA:Springer, 2012. Show Context Google Scholar
96. B. Xiong and K. Grauman, “Detecting snap points in egocentric video with a web photo prior”, Proc. Eur. Conf. Comput. Vision, pp. 282-298, 2014. Show Context Google Scholar
97. Y. Yan, E. Ricci, G. Liu and N. Sebe, “Recognizing daily activities from first-person videos with multi-task clustering”, Proc. 12th Asian Conf. Comput. Vision, pp. 522-537, 2014. Show Context CrossRef Google Scholar
98. J. Yang, B. Price, S. Cohen and M.-H. Yang, “Context driven scene parsing with attention to rare classes”, Proc. IEEE Conf. Comput. Vision Pattern Recognit., pp. 3294-3301, 2014. Show Context Google Scholar
99. L. Yao et al., “Describing videos by exploiting temporal structure”, Stat, vol. 1050, pp. 25, 2015. Show Context View Article Full Text: PDF (1095KB) Google Scholar
100. B. Zhou, A. Lapedriza, J. Xiao, A. Torralba and A. Oliva, “Learning deep features for scene recognition using places database”, Proc. Adv. Neural Inf. Process. Syst. Conf., pp. 487-495, 2014. Show Context Google Scholar