[en] In this paper, we propose to detect an action as soon as possible and ideally before it is fully completed. The objective is to support the monitoring of surveillance videos for preventing criminal or terrorist attacks. For such a scenario, it is of importance to have not only high detection and recognition rates but also low time latency for the detection. Our solution consists in an adaptive sliding window approach in an online manner, which efficiently rejects irrelevant data. Furthermore, we exploit both spatial and temporal information by constructing feature vectors based on temporal blocks. For an added efficiency, only partial template actions are considered for the detection. The relationship between the template size and latency is experimentally evaluated. We show promising preliminary experimental results using Motion Capture data with a skeleton representation of the human body.
Disciplines :
Computer science
Author, co-author :
Baptista, Renato ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)
Antunes, Michel
Aouada, Djamila ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)
Ottersten, Björn ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)
External co-authors :
yes
Language :
English
Title :
Anticipating Suspicious Actions using a Small Dataset of Action Templates
Publication date :
January 2018
Event name :
13th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISAPP)
Event organizer :
http://visapp.visigrapp.org/
Event place :
Madeira, Portugal
Event date :
from 27-01-2018 to 29-01-2018
Audience :
International
Main work title :
13th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISAPP)
Peer reviewed :
Peer reviewed
European Projects :
H2020 - 689947 - STARR - Decision SupporT and self-mAnagement system for stRoke survivoRs
FnR Project :
FNR10415355 - 3d Action Recognition Using Refinement And Invariance Strategies For Reliable Surveillance, 2015 (01/06/2016-31/05/2019) - Bjorn Ottersten
Antunes, M., Baptista, R., Demisse, G., Aouada, D., and Ottersten, B. (2016). Visual and human-interpretable feedback for assisting physical activity. In European Conference on Computer Vision (ECCV) Workshop on Assistive Computer Vision and Robotics Amsterdam,.
Baptista, R., Antunes, M., Aouada, D., and Ottersten, B. (2017a). Video-based feedback for assisting physical activity. In International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISAPP).
Baptista, R., Antunes, M., Shabayek, A. E. R., Aouada, D., and Ottersten, B. (2017b). Flexible feedback system for posture monitoring and correction. In IEEE International Conference on Image Information Processing (ICIIP).
Bilen, H., Fernando, B., Gavves, E., Vedaldi, A., and Gould, S. (2016). Dynamic image networks for action recognition. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Bilinski, P. and Bremond, F. (2016). Human violence recognition and detection in surveillance videos. In Advanced Video and Signal Based Surveillance (AVSS), 2016 13th IEEE International Conference on, pages 30–36. IEEE.
Chu, W.-S., Zhou, F., and De la Torre, F. (2012). Unsupervised temporal commonality discovery. In European Conference on Computer Vision, pages 373–387. Springer Berlin Heidelberg.
Datta, A., Shah, M., and Da Vitoria Lobo, N. (2002). Person-on-person violence detection in video data. In Proceedings of the 16 th International Conference on Pattern Recognition (ICPR’02) Volume 1 - Volume 1, ICPR’02, pages 10433–, Washington, DC, USA. IEEE Computer Society.
Du, Y., Wang, W., and Wang, L. (2015). Hierarchical recurrent neural network for skeleton based action recognition. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Fernando, B., Gavves, E., Oramas, J., Ghodrati, A., and Tuytelaars, T. (2016). Rank pooling for action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence.
Gaidon, A., Harchaoui, Z., and Schmid, C. (2011). Actom Sequence Models for Efficient Action Detection. In CVPR 2011 - IEEE Conference on Computer Vision & Pattern Recognition, pages 3201–3208, Colorado Springs, United States. IEEE.
Geest, R. D., Gavves, E., Ghodrati, A., Li, Z., Snoek, C., and Tuytelaars, T. (2016). Online action detection. CoRR, abs/1604.06506.
Gkioxari, G. and Malik, J. (2015). Finding action tubes.
Han, F., Reily, B., Hoff, W., and Zhang, H. (2017). Space-time representation of people based on 3D skeletal data: A review. Computer Vision and Image Understanding, pages –.
Hoai, M. and De la Torre, F. (2012). Max-margin early event detectors. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.
Hoai, M. and De la Torre, F. (2014). Max-margin early event detectors. International Journal of Computer Vision, 107(2):191–202.
Jain, M., van Gemert, J. C., Jégou, H., Bouthemy, P., and Snoek, C. G. M. (2014). Action localization by tube-lets from motion. In IEEE Conference on Computer Vision and Pattern Recognition.
Li, Y., Lan, C., Xing, J., Zeng, W., Yuan, C., and Liu, J. (2016). Online human action detection using joint classification-regression recurrent neural networks. European Conference on Computer Vision.
Meshry, M., Hussein, M. E., and Torki, M. (2015). Action detection from skeletal data using effecient linear search. CoRR, abs/1502.01228.
Papadopoulos, K., Antunes, M., Aouada, D., and Ottersten, B. (2017). Enhanced trajectory-based action recognition using human pose. In IEEE International Conference on Image Processing (ICIP).
Pishchulin, L., Insafutdinov, E., Tang, S., Andres, B., An-driluka, M., Gehler, P. V., and Schiele, B. (2016). Dee-pcut: Joint subset partition and labeling for multi person pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4929–4937.
Sharaf, A., Torki, M., Hussein, M. E., and El-Saban, M. (2015). Real-time multi-scale action detection from 3D skeleton data. In Applications of Computer Vision (WACV), 2015 IEEE Winter Conference on, pages 998–1005. IEEE.
Sivic, J. and Zisserman, A. (2003). Video Google: A text retrieval approach to object matching in videos. In IEEE International Conference on Computer Vision, volume 2, pages 1470–1477.
Tian, Y., Sukthankar, R., and Shah, M. (2013). Spatiotem-poral deformable part models for action detection. In Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, CVPR’13, pages 2642–2649, Washington, DC, USA. IEEE Computer Society.
Vemulapalli, R., Arrate, F., and Chellappa, R. (2014). Human action recognition by representing 3D skeletons as points in a lie group. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
Wang, H. and Schmid, C. (2013). Action recognition with improved trajectories. In IEEE International Conference on Computer Vision, Sydney, Australia.
Wang, Z., Wang, L., Du, W., and Qiao, Y. (2015). Exploring fisher vector and deep networks for action spotting. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops.
Yuan, J., Liu, Z., and Wu, Y. (2011). Discriminative video pattern search for efficient action detection. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 33(9):1728–1743.