Author(s):

  • Bolanos, Marc
  • Dimiccoli, Mariella
  • Radeva, Petia

Abstract:

Visual lifelogging consists of acquiring images that capture the daily experiences of the user by wearing a camera over a long period of time. The pictures taken offer considerable potential for knowledge mining concerning how people live their lives; hence, they open up new opportunities for many potential applications in fields including healthcare, security, leisure, and the quantified self. However, automatically building a story from a huge collection of unstructured egocentric data presents major challenges. This paper provides a thorough review of advances made so far in egocentric data analysis and, in view of the current state of the art, indicates new lines of research to move us toward storytelling from visual lifelogging.

Document:

https://doi.org/10.1109/THMS.2016.2616296

References:

1. M. Aghaei, M. Dimiccoli and P. Radeva, “Multi-face tracking by extended bag-of-tracklets in egocentric photo-streams”, Comput. Vision Image Understanding, vol. 149, pp. 146-156, 2015. Show Context CrossRef Google Scholar

2. M. Aghaei, M. Dimiccoli and P. Radeva, “Towards social interaction detection in egocentric photo streams”, Proc. Int. Conf. Mach. Vision, 2015. Google Scholar

3. M. Aghaei and P. Radeva, “Bag-of-tracklets for person tracking in life-logging data” in Artificial Intelligence Research and Development: Recent Advances and Applications, Amsterdam, The Netherlands:IOS Press, vol. 269, 2014. Show Context Google Scholar

4. O. Aghazadeh, J. Sullivan and S. Carlsson, “Novelty detection from an ego-centric perspective”, Proc. IEEE Conf. Comput. Vision Pattern Recognit., pp. 3297-3304, 2011. Show Context View Article Full Text: PDF (11827KB) Google Scholar

5. A. Alletto, G. Serra, S. Calderara and R. Cucchiara, “Head pose estimation in first-person camera views”, Proc. 22nd Int. Conf. Pattern Recognit., pp. 4188-4193, 2014. Show Context Google Scholar

6. S. Alletto, G. Serra, S. Calderara, F. Solera and R. Cucchiara, “From ego to Nos-vision: Detecting social relationships in first-person views”, Proc. IEEE Conf. Comput. Vision Pattern Recognit. Workshops, pp. 594-599, 2014. Show Context Google Scholar

7. S. Bambach, S. Lee, D. Crandall and C. Yu, “Lending a hand: Detecting hands and recognizing activities in complex egocentric interactions”, Proc. IEEE Int. Conf. Comput. Vision, pp. 1949-1957, 2015. Show Context View Article Full Text: PDF (1467KB) Google Scholar

8. L. Baraldi, F. Paci, G. Serra, L. Benini and R. Cucchiara, “Gesture recognition in ego-centric videos using dense trajectories and hand segmentation”, Proc. IEEE Conf. Comput. Vision Pattern Recognit. Workshops, pp. 702-707, 2014. Show Context Google Scholar

9. L. Bazzani et al., “Social interactions by visual focus of attention in a three-dimensional environment”, Expert Syst., vol. 30, no. 2, pp. 115-127, 2013. Show Context CrossRef Google Scholar

10. A. Behera, D. C. Hogg and A. G. Cohn, “Egocentric activity monitoring and recovery”, Proc. 11th Asian Conf. Comput. Vision, pp. 519-532, 2013. Show Context CrossRef Google Scholar

11. A. Betancourt, P. Morerio, E. I Barakova, L. Marcenaro, M. Rauterberg and C. S. Regazzoni, “A dynamic approach and a new dataset for hand-detection in first person vision” in Computer Analysis of Images and Patterns, Berlin, Germany:Springer, pp. 274-287, 2015. Show Context CrossRef Google Scholar

12. A. Betancourt, P. Morerio, C. S. Regazzoni and M. Rauterberg, “The evolution of first person vision methods: A survey”, IEEE Trans. Circuits Syst. Video Technol., vol. 25, no. 5, pp. 744-760, May 2015. Show Context View Article Full Text: PDF (2756KB) Google Scholar

13. V. Bettadapura, I. Essa and C. Pantofaru, “Egocentric field-of-view localization using first-person point-of-view devices”, Proc. IEEE Winter Conf. Appl. Comput. Vision, pp. 626-633, 2015. Show Context View Article Full Text: PDF (4999KB) Google Scholar

14. V. Bettadapura, E. Thomaz, A. Parnami, G. D. Abowd and I. Essa, “Leveraging context to support automated food recognition in restaurants”, Proc. IEEE Winter Conf. Appl. Comput. Vision, pp. 580-587, 2015. Google Scholar

15. M. Bolaños, R. Mestre, E. Talavera, X. Giró-i Nieto and P. Radeva, “Visual summary of egocentric photostreams by representative keyframes”, Proc. IEEE Int. Conf. Multimedia Expo. Workshops, pp. 1-6, 2015. Show Context Google Scholar

16. M. Bolaños and P. Radeva, “Ego-object discovery”, arXiv preprint arXiv:1504.01639, 2015. Show Context Google Scholar

17. M. Bolaños, M. Garolera and P. Radeva, “Active labeling application applied to food-related object recognition”, Proc. ACM Int. Workshop Multimedia Cooking Eating Activities, pp. 45-50, 2013. Show Context Google Scholar

18. M. Bolaños, M. Garolera and P. Radeva, “Video segmentation of life-logging videos” in Articulated Motion and Deformable Objects, Berlin, Germany:Springer, pp. 1-9, 2014. Show Context Google Scholar

19. M. Bolaños, M. Garolera and P. Radeva, “Object discovery using CNN features in egocentric videos” in Pattern Recognition and Image Analysis, Berlin, Germany:Springer, pp. 67-74, 2015. Show Context CrossRef Google Scholar

20. I. M. Bullock, T. Feix and A. M. Dollar, “The yale human grasping dataset: Grasp object and task data in household and machine shop environments”, Int. J. Robot. Res., vol. 34, no. 3, pp. 251-255, 2015. Show Context CrossRef Google Scholar

21. D. Byrne, A. R. Doherty, C. G. M. Snoek, G. J. F. Jones and A. F. Smeaton, “Everyday concept detection in visual lifelogs: Validation relationships and trends”, Multimedia Tools Appl., vol. 49, no. 1, pp. 119-144, 2010. Show Context CrossRef Google Scholar

22. M. Cai, K. M. Kitani and Y. Sato, “A scalable approach for understanding the visual structures of hand grasps”, Proc. IEEE Int. Conf. Robot. Autom., pp. 1360-1366, 2015. Show Context Google Scholar

23. D. Castro et al., “Predicting daily activities from egocentric images using deep learning”, Proc. ACM Int. Symp. Wearable Comput., pp. 75-82, 2015. Show Context Google Scholar

24. V. Chandrasekhar, C. Tan, W. Min, L. Liyuan, L. Xiaoli and L. J. Hwee, “Incremental graph clustering for efficient retrieval from streaming egocentric video data”, Proc. IEEE Int. Conf. Pattern Recognit., pp. 2631-2636, 2014. Show Context Google Scholar

25. S. Chowdhury, P. J. McParlane, S. Ferdous and J. Jose, “My day in review: Visually summarising noisy lifelog data”, Proc. ACM Int. Conf. Multimedia Inf. Retrieval, pp. 607-610, 2015. Show Context Access at ACM Google Scholar

26. D. Damen, O. Haines, T. Leelasawassk, A. Calway and W. Mayol-Cuevas, “Multi-user egocentric online system for unsupervised assistance on object usage”, Proc. Eur. Conf. Comput. Vision Workshops, pp. 481-492, 2014. Show Context CrossRef Google Scholar

27. M. Dimiccoli and P. Radeva, “Visual lifelogging in the era of outstanding digitization”, Digit. Presentation Preservation Cultural Sci. Heritage, vol. V, pp. 59-64, 2015. Show Context Google Scholar

28. A. R. Doherty, C. Ó Conaire, M. Blighe, A. F. Smeaton and N. E. O’Connor, “Combining image descriptors to effectively retrieve events from visual lifelogs”, Proc. ACM Int. Conf. Multimedia Inf. Retrieval, pp. 10-17, 2008. Show Context Access at ACM Google Scholar

29. A. R. Doherty et al., “Experiences of aiding autobiographical memory using the SenseCam”, Human–Comput. Interact., vol. 27, no. 1/2, pp. 151-174, 2012. Show Context Google Scholar

30. A. R. Doherty and A. F. Smeaton, “Automatically segmenting lifelog data into events”, Proc. Int. Workshop Image Audio Anal. Multimedia Interactive Serv., pp. 20-23, 2008. Show Context Google Scholar

31. A. R. Doherty and A. F. Smeaton, “Combining face detection and novelty to identify important events in a visual lifelog”, Proc. IEEE Int. Conf. Comput. Inf. Technol. Workshops, pp. 348-353, 2008. Show Context Google Scholar

32. A. R. Doherty et al., “Wearable cameras in health: The state of the art and future possibilities”, Amer. J. Preventive Med., vol. 44, no. 3, pp. 320-323, 2013. Show Context CrossRef Google Scholar

33. C. Farabet, C. Couprie, L. Najman and Y. LeCun, “Learning hierarchical features for scene labeling”, IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 8, pp. 1915-1929, Aug. 2013. Show Context View Article Full Text: PDF (2618KB) Google Scholar

34. A. Fathi, A. Farhadi and J. M. Rehg, “Understanding egocentric activities”, IEEE Int. Conf. Proc. Comput. Vision, pp. 407-414, 2011. Show Context Google Scholar

35. A. Fathi, J. K. Hodgins and J. M. Rehg, “Social interactions: A first-person perspective”, Proc. IEEE Conf. Comput. Vision Pattern Recognit., pp. 1226-1233, 2012. Show Context Google Scholar

36. A. Fathi, Y. Li and J. M. Rehg, “Learning to recognize daily actions using gaze”, Proc. Eur. Conf. Comput. Vision, pp. 314-327, 2012. Show Context Google Scholar

37. A. Fathi, X. Ren and J. M. Rehg, “Learning to recognize objects in egocentric activities”, Proc. IEEE Conf. Comput. Vision Pattern Recognit., pp. 3281-3288, 2011. Show Context View Article Full Text: PDF (1417KB) Google Scholar

38. T. Feix, R. Pawlik, H.-B. Schmiedmayer, J. Romero and D. Kragic, “A comprehensive grasp taxonomy”, Proc. Robot. Sci. Syst. Workshop Understanding Human Hand Adv. Robot. Manipulation, pp. 2-3, 2009. Google Scholar

39. A. Furlan, S. Miller, D. G Sorrenti, L. Fei-Fei and S. Savarese, “Free your camera: 3d indoor scene understanding from arbitrary camera motion”, Proc. Brit. Mach. Vision Conf., pp. 24.1-24.12, 2013. CrossRef Google Scholar

40. J. Ghosh, Y. J. Lee and K. Grauman, “Discovering important people and objects for egocentric video summarization”, Proc. IEEE Conf. Comput. Vision Pattern Recognit., pp. 1346-1353, 2012. Show Context Google Scholar

41. C. Gurrin, A. F. Smeaton and A. R. Doherty, “Lifelogging: Personal big data”, Found. Trends Inf. Retrieval, vol. 8, no. 1, pp. 1-125, 2014. Show Context CrossRef Google Scholar

42. M. Harvey, M. Langheinrich and G. Ward, “Remembering through lifelogging: A survey of human memory augmentation”, Pervasive Mobile Comput., vol. 27, pp. 14-26, 2016. Show Context CrossRef Google Scholar

43. D. S. Hayden et al., “The accuracy-obtrusiveness tradeoff for wearable vision platforms”, Proc. IEEE Conf. Comput. Vision Pattern Recognit. Workshop Egocentric Vision, 2012. Show Context Google Scholar

44. S. Hodges et al., “SenseCam: A retrospective memory aid”, Proc. 8th Int. Conf. Ubiquitous Comput., pp. 177-193, 2006. Show Context Google Scholar

45. F. Hopfgartner, Y. Yang, L. M. Zhou and C. Gurrin, “User interaction templates for the design of lifelogging systems”, Proc. Semantic Models Adaptive Interactive Syst., pp. 187-204, 2013. Show Context CrossRef Google Scholar

46. H. Hung and B. Kröse, “Detecting f-formations as dominant sets”, Proc. ACM Int. Conf. Multimodal Interfaces, pp. 231-238, 2011. Show Context Access at ACM Google Scholar

47. P. Isola, J. Xiao, A. Torralba and A. Oliva, “What makes an image memorable”, Proc. IEEE Conf. Comput. Vision Pattern Recognit., pp. 145-152, 2011. Show Context View Article Full Text: PDF (4465KB) Google Scholar

48. Y. Iwashita, A. Takamine, R. Kurazume and M. S. Ryoo, “First-person animal activity recognition from egocentric videos”, Proc. IEEE Int. Conf. Pattern Recognit., pp. 4310-4315, 2014. Show Context Google Scholar

49. A. Jinda-Apiraksa, J. Machajdik and R. Sablatnig, “A keyframe selection of lifelog image sequences”, 2012. Show Context Google Scholar

50. N. Jojic, A. Perina and V. Murino, “Structural epitome: A way to summarize ones visual experience”, Proc. Adv. Neural Inf. Process. Syst. Conf., pp. 1027-1035, 2010. Show Context Google Scholar

51. T. Kanade and M. Hebert, “First-person vision”, Proc. IEEE, vol. 100, no. 8, pp. 2442-2453, Aug. 2012. Show Context View Article Full Text: PDF (1642KB) Google Scholar

52. H. Kang, M. Hebert and T. Kanade, “Discovering object instances from scenes of daily living”, Proc. IEEE Int. Conf. Comput. Vision, pp. 762-769, 2011. Show Context Google Scholar

53. A. Kendon, Studies in the Behavior of Social Interaction, Atlantic Highlands, NJ, USA:Humanities Press Int., vol. 6, 1977. Show Context Google Scholar

54. A. Kendon, Conducting Interaction: Patterns of Behavior in Focused Encounters, Cambridge, U.K.:Cambridge Univ. Press, vol. 7, 1990. Show Context Google Scholar

55. B. Kikhia, A.y Boytsov, J. Hallberg, H. Jonsson and K. Synnes, “Structuring and presenting lifelogs based on location data” in Pervasive Computing Paradigms for Mental Health, Berlin, Germany:Springer, pp. 133-144, 2014. CrossRef Google Scholar

56. K. M. Kitani, T. Okabe, Y. Sato and A. Sugimoto, “Fast unsupervised ego-action learning for first-person sports videos”, Proc. IEEE Conf. Comput. Vision Pattern Recognit., pp. 3241-3248, 2011. Show Context View Article Full Text: PDF (1970KB) Google Scholar

57. M. L. Lee and A. K. Dey, “Lifelogging memory appliance for people with episodic memory impairment”, Proc. 10th Int. Conf. Ubiquitous Comput., pp. 44-53, 2008. Show Context Access at ACM Google Scholar

58. S. Lee, S. Bambach, D. J Crandall, J. M. Franchak and C. Yu, “This hand is my hand: A probabilistic approach to hand disambiguation in egocentric video”, Proc. IEEE Conf. Comput. Vision Pattern Recognit. Workshops, pp. 557-564, 2014. Show Context Google Scholar

59. C. Li and K. M. Kitani, “Pixel-level hand detection in ego-centric videos”, Proc. IEEE Conf. Comput. Vision Pattern Recognit., pp. 3570-3577, 2013. Show Context Google Scholar

60. N. Li, M. Crane and H. J. Ruskin, “Automatically detecting “significant events” on SenseCam”, Int. J. Wavelets Multiresolution Inf. Process., vol. 11, no. 6, 2013. Show Context CrossRef Google Scholar

61. A. Lidon, M. Bolaños, M. Dimiccoli, P. Radeva, M. Garolera and X. Giró-i Nieto, “Semantic summarization of egocentric photo stream events”, arXiv preprint arXiv:1511.00438, 2015. Show Context Google Scholar

62. W.-H. Lin and A. Hauptmann, “Structuring continuous video recordings of everyday life using time-constrained clustering”, Proc. SPIE, vol. 6073, 2006. Show Context CrossRef Google Scholar

63. J. Long, E. Shelhamer and T. Darrell, “Fully convolutional networks for semantic segmentation”, Proc. IEEE Conf. Comput. Vision Pattern Recognit., pp. 3431-3440, 2015. Show Context Google Scholar

64. Z. Lu and K. Grauman, “Story-driven summarization for egocentric video”, Proc. IEEE Conf. Comput. Vision Pattern Recognit., pp. 2714-2721, 2013. Show Context Google Scholar

65. K. Matsuo, K. Yamada, S. Ueno and S. Naito, “An attention-based activity recognition for egocentric video”, Proc. IEEE Conf. Comput. Vision Pattern Recognit. Workshops, pp. 565-570, 2014. Show Context Google Scholar

66. W. W. Mayol-Cuevas, B. J. Tordoff and D. W. Murray, “On the choice and placement of wearable vision sensors”, IEEE Trans. Syst. Man Cybern. A Syst. Humans, vol. 39, no. 2, pp. 414-425, Mar. 2009. Show Context View Article Full Text: PDF (1585KB) Google Scholar

67. A. Mehrabian, “Significance of posture and position in the communication of attitude and status relationships”, Psychol. Bull., vol. 71, no. 5, pp. 359-372, 1969. Show Context CrossRef Google Scholar

68. W. Min, X. Li, C. Tan, B. Mandal, L. Li and J.-H. Lim, “Efficient retrieval from large-scale egocentric visual data using a sparse graph representation”, Proc. IEEE Conf. Comput. Vision Pattern Recognit. Workshops, pp. 541-548, 2014. Show Context Google Scholar

69. M. Naaman, S. Harada, Q. Wang, H. Garcia-Molina and A. Paepcke, “Context data in geo-referenced digital photo collections”, Proc. ACM Int. Conf. Multimedia, pp. 196-203, 2004. Show Context Access at ACM Google Scholar

70. S. Narayan, M. S. Kankanhalli and K. R. Ramakrishnan, “Action and interaction recognition in first-person videos”, Proc. IEEE Conf. Comput. Vision Pattern Recognit. Workshops, pp. 526-532, 2014. Show Context Google Scholar

71. H. Pirsiavash and D. Ramanan, “Detecting activities of daily living in first-person camera views”, Proc. IEEE Conf. Comput. Vision Pattern Recognit., pp. 2847-2854, 2012. Show Context Google Scholar

72. F. Poiesi and A. Cavallaro, “Predicting and recognizing human interactions in public spaces”, J. Real-Time Image Process., vol. 10, pp. 785-803, 2014. Show Context CrossRef Google Scholar

73. Y. Poleg, C. Arora and S. Peleg, “Temporal segmentation of egocentric videos”, Proc. IEEE Conf. Comput. Vision Pattern Recognit., pp. 2537-2544, 2014. Show Context Google Scholar

74. X. Ren and C. Gu, “Figure-ground segmentation improves handled object recognition in egocentric video”, Proc. IEEE Conf. Comput. Vision Pattern Recognit., pp. 3137-3144, 2010. Show Context Google Scholar

75. X. Ren and M. Philipose, “Egocentric recognition of handled objects: Benchmark and analysis”, Proc. IEEE Conf. Comput. Vision Pattern Recognit. Workshops, pp. 1-8, 2009. Show Context Google Scholar

76. G. Rogez, M. Khademi, J. S. Supančič, J.-M. M. Montiel and D. Ramanan, “3D hand pose detection in egocentric RGB-D images”, Proc. Eur. Conf. Comput. Vision Workshops, pp. 356-371, 2014. Show Context Google Scholar

77. G. Rogez, J. S. Supancic and D. Ramanan, “Egocentric pose recognition in four lines of code”, arXiv preprint arXiv:1412.0060, 2014. Show Context Google Scholar

78. R. J. Rummel, “Social behavior and interaction” in Understanding Conflict and War—The Conflict., New York, NY, USA:Wiley, 1976. Show Context Google Scholar

79. M. S. Ryoo and L. Matthies, “First-person activity recognition: What are they doing to me”, Proc. IEEE Conf. Comput. Vision Pattern Recognit., pp. 2730-2737, 2013. Show Context Google Scholar

80. M. S. Ryoo, B. Rothrock and L. Matthies, “Pooled motion features for first-person videos”, Proc. IEEE Conf. Comput. Vision Pattern Recognit., pp. 896-904, 2015. Show Context Google Scholar

81. M. Schinas, S. Papadopoulos, Y. Kompatsiaris and P. A. Mitkas, “Visual event summarization on social media using topic modelling and graph-based ranking algorithms”, Proc. 5th ACM Int. Conf. Multimedia Inf. Retrieval, pp. 203-210, 2015. Show Context Access at ACM Google Scholar

82. F. Setti, C. Russell, C. Bassetti and M. Cristani, “F-formation detection: Individuating free-standing conversational groups in images”, PloS One, vol. 10, 2015. Show Context CrossRef Google Scholar

83. A. F. Smeaton, P. Over and A. R. Doherty, “Video shot boundary detection: Seven years of TRECVid activity”, Comput. Vision Image Understanding, vol. 114, no. 4, pp. 411-418, 2010. Show Context CrossRef Google Scholar

84. S. Song, V. Chandrasekhar, N.-M. Cheung, S. Narayan, L. Li and J.-H. Lim, “Activity recognition in egocentric life-logging videos”, Proc. Comput. Vision ACCV Workshops, pp. 445-458, 2014. Show Context CrossRef Google Scholar

85. H. Soo Park and J. Shi, “Social saliency prediction”, Proc. IEEE Conf. Comput. Vision Pattern Recognit., pp. 4777-4785, 2015. Show Context Google Scholar

86. E. H. Spriggs, F. De La Torre and M. Hebert, “Temporal segmentation and activity classification from first-person sensing”, Proc. IEEE Comput. Soc. Conf. Comput. Vision Pattern Recognit. Workshops, pp. 17-24, 2009. Show Context Google Scholar

87. S. Sundaram and W. W. Mayol-Cuevas, “Egocentric visual event classification with location-based priors”, Proc. 6th Int. Conf. Adv. Visual Comput., pp. 596-605, 2010. Show Context CrossRef Google Scholar

88. E. Talavera, M. Dimiccoli, M. Bolaños, M. Aghaei and P. Radeva, “R-clustering for egocentric video segmentation” in Pattern Recognition and Image Analysis, Berlin, Germany:Springer, pp. 327-336, 2015. Show Context Google Scholar

89. C. Tan, H. Goh, V. Chandrasekhar, L. Li and J.-H. Lim, “Understanding the nature of first-person videos: Characterization and classification using low-level features”, Proc. Comput. Vision Pattern Recognit. Workshops, pp. 549-556, 2014. Show Context Google Scholar

90. B. T. Truong and S. Venkatesh, “Video abstraction: A systematic review and classification”, ACM Trans. Multimedia Comput. Commun. Appl., vol. 3, 2007. Show Context Access at ACM Google Scholar

91. P. Varini, R. Serra, G. and Cucchiara, “Personalized egocentric video summarization for cultural experience”, Proc. ACM Int. Conf. Multimedia Inf. Retrieval, pp. 539-542, 2015. Show Context Google Scholar

92. S. Venugopalan, M. Rohrbach, J. Donahue, R. Mooney, T. Darrell and K. Saenko, “Sequence to sequence-video to text”, Proc. IEEE Conf. Comput. Vision, pp. 4534-4542, 2015. Show Context View Article Full Text: PDF (948KB) Google Scholar

93. P. Wang and A. F. Smeaton, “Semantics-based selection of everyday concepts in visual lifelogging”, Int. J. Multimedia Inf. Retrieval, vol. 1, no. 2, pp. 87-101, 2012. Show Context CrossRef Google Scholar

94. Z. Wang, M. D. Hoffman, P. R. Cook and K. Li, “Vferret: Content-based similarity search tool for continuous archived video”, Proc. ACM Workshop Continuous Archival Retrieval Pers. Experiences, pp. 19-26, 2006. Show Context Access at ACM Google Scholar

95. H. Wannous, V. Dovgalecs, R. Mégret and M. Daoudi, Place Recognition Via 3D Modeling for Personal Activity Lifelog Using Wearable Camera, New York, NY, USA:Springer, 2012. Show Context Google Scholar

96. B. Xiong and K. Grauman, “Detecting snap points in egocentric video with a web photo prior”, Proc. Eur. Conf. Comput. Vision, pp. 282-298, 2014. Show Context Google Scholar

97. Y. Yan, E. Ricci, G. Liu and N. Sebe, “Recognizing daily activities from first-person videos with multi-task clustering”, Proc. 12th Asian Conf. Comput. Vision, pp. 522-537, 2014. Show Context CrossRef Google Scholar

98. J. Yang, B. Price, S. Cohen and M.-H. Yang, “Context driven scene parsing with attention to rare classes”, Proc. IEEE Conf. Comput. Vision Pattern Recognit., pp. 3294-3301, 2014. Show Context Google Scholar

99. L. Yao et al., “Describing videos by exploiting temporal structure”, Stat, vol. 1050, pp. 25, 2015. Show Context View Article Full Text: PDF (1095KB) Google Scholar

100. B. Zhou, A. Lapedriza, J. Xiao, A. Torralba and A. Oliva, “Learning deep features for scene recognition using places database”, Proc. Adv. Neural Inf. Process. Syst. Conf., pp. 487-495, 2014. Show Context Google Scholar