Reference : VIEW-INVARIANT ACTION RECOGNITION FROM RGB DATA VIA 3D POSE ESTIMATION
Scientific congresses, symposiums and conference proceedings : Paper published in a book
Engineering, computing & technology : Computer science
Security, Reliability and Trust
http://hdl.handle.net/10993/39033
VIEW-INVARIANT ACTION RECOGNITION FROM RGB DATA VIA 3D POSE ESTIMATION
English
Baptista, Renato mailto [University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > >]
Ghorbel, Enjie mailto [University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > >]
Papadopoulos, Konstantinos mailto [University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > >]
Demisse, Girum [> >]
Aouada, Djamila mailto [University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > >]
Ottersten, Björn mailto [University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > >]
May-2019
IEEE International Conference on Acoustics, Speech and Signal Processing, Brighton, UK, 12–17 May 2019
Yes
International Conference on Acoustics, Speech and Signal Processing
12-17 May 2019
IEEE
Brighton
UK
[en] Pose Estimation ; View-Invariance ; LSTM
[en] In this paper, we propose a novel view-invariant action recognition method using a single monocular RGB camera. View-invariance remains a very challenging topic in 2D action recognition due to the lack of 3D information in RGB images. Most successful approaches make use of the concept of knowledge transfer by projecting 3D synthetic data to multiple viewpoints.
Instead of relying on knowledge transfer, we propose to augment the RGB data by a third dimension by means of 3D skeleton estimation from 2D images using a CNN-based pose estimator. In order to ensure view-invariance, a pre-processing for alignment is applied followed by data expansion as a way for denoising. Finally, a Long-Short Term Memory (LSTM) architecture is used to model the temporal dependency between skeletons. The proposed network is trained to directly recognize actions from aligned 3D skeletons. The experiments performed on the challenging Northwestern-UCLA dataset show the superiority of our approach as compared to state-of-the-art ones.
Interdisciplinary Centre for Security, Reliability and Trust (SnT) > SIGCOM
Fonds National de la Recherche - FnR
http://hdl.handle.net/10993/39033
FnR ; FNR10415355 > Bjorn Ottersten > 3D-ACT > 3D Action Recognition Using Refinement and Invariance Strategies for Reliable Surveillance > 01/06/2016 > 31/05/2019 > 2015

File(s) associated to this reference

Fulltext file(s):

FileCommentaryVersionSizeAccess
Open access
ICASSP_Baptista_toappear.pdfAuthor postprint455.44 kBView/Open

Bookmark and Share SFX Query

All documents in ORBilu are protected by a user license.