A survey on deep learning-based monocular spacecraft pose estimation: Current state, limitations and prospectsPauly, Leo ; Rharbaoui, Wassim ; Shneider, Carl et alin Acta Astronautica (2023), 212 Estimating the pose of an uncooperative spacecraft is an important computer vision problem for enabling the deployment of automatic vision-based systems in orbit, with applications ranging from on-orbit ... [more ▼] Estimating the pose of an uncooperative spacecraft is an important computer vision problem for enabling the deployment of automatic vision-based systems in orbit, with applications ranging from on-orbit servicing to space debris removal. Following the general trend in computer vision, more and more works have been focusing on leveraging Deep Learning (DL) methods to address this problem. However and despite promising research-stage results, major challenges preventing the use of such methods in real-life missions still stand in the way. In particular, the deployment of such computation-intensive algorithms is still under-investigated, while the performance drop when training on synthetic and testing on real images remains to mitigate. The primary goal of this survey is to describe the current DL-based methods for spacecraft pose estimation in a comprehensive manner. The secondary goal is to help define the limitations towards the effective deployment of DL-based spacecraft pose estimation solutions for reliable autonomous vision-based applications. To this end, the survey first summarises the existing algorithms according to two approaches: hybrid modular pipelines and direct end-to-end regression methods. A comparison of algorithms is presented not only in terms of pose accuracy but also with a focus on network architectures and models' sizes keeping potential deployment in mind. Then, current monocular spacecraft pose estimation datasets used to train and test these methods are discussed. The data generation methods: simulators and testbeds, the domain gap and the performance drop between synthetically generated and lab/space collected images and the potential solutions are also discussed. Finally, the paper presents open research questions and future directions in the field, drawing parallels with other computer vision applications. [less ▲] Detailed reference viewed: 83 (1 UL) SHARP Challenge 2023: Solving CAD History and pArameters Recovery from Point clouds and 3D scans. Overview, Datasets, Metrics, and Baselines.Mallis, Dimitrios ; Ali, Sk Aziz ; Dupont, Elona et alin International Conference on Computer Vision Workshops (2023, October 03) Recent breakthroughs in geometric Deep Learning (DL) and the availability of large Computer-Aided Design (CAD) datasets have advanced the research on learning CAD modeling processes and relating them to ... [more ▼] Recent breakthroughs in geometric Deep Learning (DL) and the availability of large Computer-Aided Design (CAD) datasets have advanced the research on learning CAD modeling processes and relating them to real objects. In this context, 3D reverse engineering of CAD models from 3D scans is considered to be one of the most sought-after goals for the CAD industry. However, recent efforts assume multiple simplifications limiting the applications in real-world settings. The SHARP Challenge 2023 aims at pushing the research a step closer to the real-world scenario of CAD reverse engineering from 3D scans through dedicated datasets and tracks. In this paper, we define the proposed SHARP 2023 tracks, describe the provided datasets, and propose a set of baseline methods along with suitable evaluation metrics to assess the performance of the track solutions. All proposed datasets along with useful routines and the evaluation metrics are publicly available. [less ▲] Detailed reference viewed: 86 (0 UL) Multi-label Deepfake ClassificationSingh, Inder Pal ; Mejri, Nesryne ; Nguyen, van Dat et alin IEEE Workshop on Multimedia Signal Processing (2023, September 27) In this paper, we investigate the suitability of current multi-label classification approaches for deepfake detection. With the recent advances in generative modeling, new deepfake detection methods have ... [more ▼] In this paper, we investigate the suitability of current multi-label classification approaches for deepfake detection. With the recent advances in generative modeling, new deepfake detection methods have been proposed. Nevertheless, they mostly formulate this topic as a binary classification problem, resulting in poor explainability capabilities. Indeed, a forged image might be induced by multi-step manipulations with different properties. For a better interpretability of the results, recognizing the nature of these stacked manipulations is highly relevant. For that reason, we propose to model deepfake detection as a multi-label classification task, where each label corresponds to a specific kind of manipulation. In this context, state-of-the-art multi-label image classification methods are considered. Extensive experiments are performed to assess the practical use case of deepfake detection. [less ▲] Detailed reference viewed: 114 (7 UL) Impact of Disentanglement on Pruning Neural NetworksShneider, Carl ; Rostami Abendansari, Peyman ; Kacem, Anis et alScientific Conference (2023, July 19) Deploying deep learning neural networks on edge devices, to accomplish task specific objectives in the real-world, requires a reduction in their memory footprint, power consumption, and latency. This can ... [more ▼] Deploying deep learning neural networks on edge devices, to accomplish task specific objectives in the real-world, requires a reduction in their memory footprint, power consumption, and latency. This can be realized via efficient model compression. Disentangled latent representations produced by variational autoencoder (VAE) networks are a promising approach for achieving model compression because they mainly retain task-specific information, discarding useless information for the task at hand. We make use of the Beta-VAE framework combined with a standard criterion for pruning to investigate the impact of forcing the network to learn disentangled representations on the pruning process for the task of classification. In particular, we perform experiments on MNIST and CIFAR10 datasets, examine disentanglement challenges, and propose a path forward for future works. [less ▲] Detailed reference viewed: 77 (0 UL) Impact of Disentanglement on Pruning Neural NetworksShneider, Carl ; Rostami Abendansari, Peyman ; Kacem, Anis et alPoster (2023, June 20) Efficient model compression techniques are required to deploy deep neural networks (DNNs) on edge devices for task specific objectives. A variational autoencoder (VAE) framework is combined with a pruning ... [more ▼] Efficient model compression techniques are required to deploy deep neural networks (DNNs) on edge devices for task specific objectives. A variational autoencoder (VAE) framework is combined with a pruning criterion to investigate the impact of having the network learn disentangled representations on the pruning process for the classification task. [less ▲] Detailed reference viewed: 91 (0 UL) Space Debris: Are Deep Learning-based Image Enhancements part of the Solution?Jamrozik, Michele Lynn ; Gaudilliere, Vincent ; Mohamed Ali, Mohamed Adel et alin Proceedings International Symposium on Computational Sensing (2023) The volume of space debris currently orbiting the Earth is reaching an unsustainable level at an accelerated pace. The detection, tracking, identification, and differentiation between orbit-defined ... [more ▼] The volume of space debris currently orbiting the Earth is reaching an unsustainable level at an accelerated pace. The detection, tracking, identification, and differentiation between orbit-defined, registered spacecraft, and rogue/inactive space “objects”, is critical to asset protection. The primary objective of this work is to investigate the validity of Deep Neural Network (DNN) solutions to overcome the limitations and image artefacts most prevalent when captured with monocular cameras in the visible light spectrum. In this work, a hybrid UNet-ResNet34 Deep Learning (DL) architecture pre-trained on the ImageNet dataset, is developed. Image degradations addressed include blurring, exposure issues, poor contrast, and noise. The shortage of space-generated data suitable for supervised DL is also addressed. A visual comparison between the URes34P model developed in this work and the existing state of the art in deep learning image enhancement methods, relevant to images captured in space, is presented. Based upon visual inspection, it is determined that our UNet model is capable of correcting for space-related image degradations and merits further investigation to reduce its computational complexity. [less ▲] Detailed reference viewed: 131 (7 UL) Compression of Deep Neural Networks for Space Autonomous SystemsShneider, Carl ; Sinha, Nilotpal ; Jamrozik, Michele Lynn et alPoster (2023, April 19) Efficient compression techniques are required to deploy deep neural networks (DNNs) on edge devices for space resource utilization tasks. Two approaches are investigated. Detailed reference viewed: 71 (0 UL) You Can Dance! Generating Music-Conditioned Dances on Real 3D Scans.Dupont, Elona ; Singh, Inder Pal ; et alScientific Conference (2023) Detailed reference viewed: 128 (3 UL) UNTAG: Learning Generic Features for Unsupervised Type-Agnostic Deepfake DetectionMejri, Nesryne ; Ghorbel, Enjie ; Aouada, Djamila ![]() in IEEE International Conference on Acoustics, Speech and Signal Processing. Proceedings (2023) This paper introduces a novel framework for unsupervised type-agnostic deepfake detection called UNTAG. Existing methods are generally trained in a supervised manner at the classification level, focusing ... [more ▼] This paper introduces a novel framework for unsupervised type-agnostic deepfake detection called UNTAG. Existing methods are generally trained in a supervised manner at the classification level, focusing on detecting at most two types of forgeries; thus, limiting their generalization capability across different deepfake types. To handle that, we reformulate the deepfake detection problem as a one-class classification supported by a self-supervision mechanism. Our intuition is that by estimating the distribution of real data in a discriminative feature space, deepfakes can be detected as outliers regardless of their type. UNTAG involves two sequential steps. First, deep representations are learned based on a self-supervised pretext task focusing on manipulated regions. Second, a one-class classifier fitted on authentic image embeddings is used to detect deepfakes. The results reported on several datasets show the effectiveness of UNTAG and the relevance of the proposed new paradigm. The code is publicly available. [less ▲] Detailed reference viewed: 190 (29 UL) CalcGraph: taming the high costs of deep learning using modelsLorentz, Joe ; Hartmann, Thomas ; Moawad, Assaad et alin Software and Systems Modeling (2022) Models based on differential programming, like deep neural networks, are well established in research and able to outperform manually coded counterparts in many applications. Today, there is a rising ... [more ▼] Models based on differential programming, like deep neural networks, are well established in research and able to outperform manually coded counterparts in many applications. Today, there is a rising interest to introduce this flexible modeling to solve real-world problems. A major challenge when moving from research to application is the strict constraints on computational resources (memory and time). It is difficult to determine and contain the resource requirements of differential models, especially during the early training and hyperparameter exploration stages. In this article, we address this challenge by introducing CalcGraph, a model abstraction of differentiable programming layers. CalcGraph allows to model the computational resources that should be used and then CalcGraph’s model interpreter can automatically schedule the execution respecting the specifications made. We propose a novel way to efficiently switch models from storage to preallocated memory zones and vice versa to maximize the number of model executions given the available resources. We demonstrate the efficiency of our approach by showing that it consumes less resources than state-of-the-art frameworks like TensorFlow and PyTorch for single-model and multi-model execution. [less ▲] Detailed reference viewed: 104 (4 UL) Pose Estimation of a Known Texture-Less Space Target using Convolutional Neural NetworksRathinam, Arunkumar ; Gaudilliere, Vincent ; Pauly, Leo et alin 73rd International Astronautical Congress, Paris 18-22 September 2022 (2022, September) Orbital debris removal and On-orbit Servicing, Assembly and Manufacturing [OSAM] are the main areas for future robotic space missions. To achieve intelligence and autonomy in these missions and to carry ... [more ▼] Orbital debris removal and On-orbit Servicing, Assembly and Manufacturing [OSAM] are the main areas for future robotic space missions. To achieve intelligence and autonomy in these missions and to carry out robot operations, it is essential to have autonomous guidance and navigation, especially vision-based navigation. With recent advances in machine learning, the state-of-the-art Deep Learning [DL] approaches for object detection, and camera pose estimation have advanced to be on par with classical approaches and can be used for target pose estimation during relative navigation scenarios. The state-of-the-art DL-based spacecraft pose estimation approaches are suitable for any known target with significant surface textures. However, it is less applicable in a scenario where the target is a texture-less and symmetric object like rocket nozzles. This paper investigates a novel ellipsoid-based approach combined with convolutional neural networks for texture-less space object pose estimation. Also, this paper presents the dataset for a new texture-less space target, an apogee kick motor, which is used for the study. It includes the synthetic images generated from the simulator developed for rendering synthetic space imagery. [less ▲] Detailed reference viewed: 248 (14 UL) Profiling the real world potential of neural network compressionLorentz, Joe ; ; et alin 2022 IEEE International Conference on Omni-layer Intelligent Systems (COINS), Barcelona 1-3 August 2022 (2022, August 01) Abstract—Many real world computer vision applications are required to run on hardware with limited computing power, often referred to as ”edge devices”. The state of the art in computer vision continues ... [more ▼] Abstract—Many real world computer vision applications are required to run on hardware with limited computing power, often referred to as ”edge devices”. The state of the art in computer vision continues towards ever bigger and deeper neural networks with equally rising computational requirements. Model compression methods promise to substantially reduce the computation time and memory demands with little to no impact on the model robustness. However, evaluation of the compression is mostly based on theoretic speedups in terms of required floating-point operations. This work offers a tool to profile the actual speedup offered by several compression algorithms. Our results show a significant discrepancy between the theoretical and actual speedup on various hardware setups. Furthermore, we show the potential of model compressions and highlight the importance of selecting the right compression algorithm for a target task and hardware. The code to reproduce our experiments is available at https://hub.datathings.com/papers/2022-coins. [less ▲] Detailed reference viewed: 74 (7 UL) Disentangled Face Identity Representationsfor Joint 3D Face Recognition and NeutralisationKacem, Anis ; ; Aouada, Djamila ![]() in 2022 8th International Conference on Virtual Reality (2022) In this paper, we propose a new deep learning based approach for disentangling face identity representations from expressive 3D faces. Given a 3D face, our approach not only extracts a disentangled ... [more ▼] In this paper, we propose a new deep learning based approach for disentangling face identity representations from expressive 3D faces. Given a 3D face, our approach not only extracts a disentangled identity representation, but also generates a realistic 3D face with a neutral expression while predicting its identity. The proposed network consists of three components; (1) a Graph Convolutional Autoencoder (GCA) to encode the 3D faces into latent representations, (2) a Generative Adversarial Network (GAN) that translates the latent representations of expressive faces into those of neutral faces, (3) and an identity recognition sub-network taking advantage of the neutralized latent representations for 3D face recognition. The whole network is trained in an end-to-end manner. Experiments are conducted on three publicly available datasets showing the effectiveness of the proposed approach. [less ▲] Detailed reference viewed: 134 (15 UL) Face-GCN: A Graph Convolutional Network for 3D Dynamic Face Recognition; Kacem, Anis ; et alin 2022 8th International Conference on Virtual Reality (2022) Face recognition has significantly advanced over the past years. However, most of the proposed approaches rely on static RGB frames and on neutral facial expressions. This has two disadvantages. First ... [more ▼] Face recognition has significantly advanced over the past years. However, most of the proposed approaches rely on static RGB frames and on neutral facial expressions. This has two disadvantages. First, important facial shape cues are ignored. Second, facial deformations due to expressions can have an impact in the performance of such a method. In this paper, we propose a novel framework for dynamic 3D face recognition based on facial keypoints. Each dynamic sequence of facial expressions is represented as a spatio-temporal graph, which is constructed using 3D facial landmarks. Each graph node contains local shape and texture features that are extracted from its neighborhood. For the classification of face videos, a Spatio-temporal Graph Convolutional Network (ST-GCN) is used. Finally, we evaluate our approach on a challenging dynamic 3D facial expression dataset. [less ▲] Detailed reference viewed: 131 (7 UL) A CLOSER LOOK AT AUTOENCODERS FOR UNSUPERVISED ANOMALY DETECTIONOyedotun, Oyebade ; Aouada, Djamila ![]() Poster (2022, May 22) Detailed reference viewed: 150 (20 UL) IML-GCN: Improved Multi-Label Graph Convolutional Network for Efficient yet Precise Image ClassificationSingh, Inder Pal ; Oyedotun, Oyebade ; Ghorbel, Enjie et alin AAAI-22 Workshop Program-Deep Learning on Graphs: Methods and Applications (2022, February) In this paper, we propose the Improved Multi-Label Graph Convolutional Network (IML-GCN) as a precise and efficient framework for multi-label image classification. Although previous approaches have shown ... [more ▼] In this paper, we propose the Improved Multi-Label Graph Convolutional Network (IML-GCN) as a precise and efficient framework for multi-label image classification. Although previous approaches have shown great performance, they usually make use of very large architectures. To handle this, we propose to combine the small version of a newly introduced network called TResNet with an extended version of Multi-label Graph Convolution Networks (ML-GCN); therefore ensuring the learning of label correlation while reducing the size of the overall network. The proposed approach considers a novel image feature embedding instead of using word embeddings. In fact, the latter are learned from words and not images making them inadequate for the task of multi-label image classification. Experimental results show that our framework competes with the state-of-the-art on two multi-label image benchmarks in terms of both precision and memory requirements. [less ▲] Detailed reference viewed: 442 (24 UL) MULTI LABEL IMAGE CLASSIFICATION USING ADAPTIVE GRAPH CONVOLUTIONAL NETWORKS (ML-AGCN)Singh, Inder Pal ; Ghorbel, Enjie ; Oyedotun, Oyebade et alin IEEE International Conference on Image Processing (2022) In this paper, a novel graph-based approach for multi-label image classification called Multi-Label Adaptive Graph Convolutional Network (ML-AGCN) is introduced. Graph-based methods have shown great ... [more ▼] In this paper, a novel graph-based approach for multi-label image classification called Multi-Label Adaptive Graph Convolutional Network (ML-AGCN) is introduced. Graph-based methods have shown great potential in the field of multi-label classification. However, these approaches heuristically fix the graph topology for modeling label dependencies, which might be not optimal. To handle that, we propose to learn the topology in an end-to-end manner. Specifically, we incorporate an attention-based mechanism for estimating the pairwise importance between graph nodes and a similarity-based mechanism for conserving the feature similarity between different nodes. This offers a more flexible way for adaptively modeling the graph. Experimental results are reported on two well-known datasets, namely, MS-COCO and VG-500. Results show that ML-AGCN outperforms state-of-the-art methods while reducing the number of model parameters. [less ▲] Detailed reference viewed: 159 (10 UL) Leveraging Equivariant Features for Absolute Pose RegressionMohamed Ali, Mohamed Adel ; Gaudilliere, Vincent ; Ortiz Del Castillo, Miguel et alin IEEE Conference on Computer Vision and Pattern Recognition. (2022) Pose estimation enables vision-based systems to refer to their environment, supporting activities ranging from scene navigation to object manipulation. However, end-to-end approaches, that have achieved ... [more ▼] Pose estimation enables vision-based systems to refer to their environment, supporting activities ranging from scene navigation to object manipulation. However, end-to-end approaches, that have achieved state-of-the-art performance in many perception tasks, are still unable to compete with 3D geometry-based methods in pose estimation. Indeed, absolute pose regression has been proven to be more related to image retrieval than to 3D structure. Our assumption is that statistical features learned by classical convolutional neural networks do not carry enough geometrical information for reliably solving this task. This paper studies the use of deep equivariant features for end-to-end pose regression. We further propose a translation and rotation equivariant Convolutional Neural Network whose architecture directly induces representations of camera motions into the feature space. In the context of absolute pose regression, this geometric property allows for implicitly augmenting the training data under a whole group of image plane-preserving transformations. Therefore, directly learning equivariant features efficiently compensates for learning intermediate representations that are indirectly equivariant yet data-intensive. Extensive experimental validation demonstrates that our lightweight model outperforms existing ones on standard datasets. [less ▲] Detailed reference viewed: 232 (6 UL) CubeSat-CDT: A Cross-Domain Dataset for 6-DoF Trajectory Estimation of a Symmetric SpacecraftMohamed Ali, Mohamed Adel ; Rathinam, Arunkumar ; Gaudilliere, Vincent et alin Proceedings of the 17th European Conference on Computer Vision Workshops (ECCVW 2022) (2022) This paper introduces a new cross-domain dataset, CubeSat- CDT, that includes 21 trajectories of a real CubeSat acquired in a labora- tory setup, combined with 65 trajectories generated using two ... [more ▼] This paper introduces a new cross-domain dataset, CubeSat- CDT, that includes 21 trajectories of a real CubeSat acquired in a labora- tory setup, combined with 65 trajectories generated using two rendering engines – i.e. Unity and Blender. The three data sources incorporate the same 1U CubeSat and share the same camera intrinsic parameters. In ad- dition, we conduct experiments to show the characteristics of the dataset using a novel and efficient spacecraft trajectory estimation method, that leverages the information provided from the three data domains. Given a video input of a target spacecraft, the proposed end-to-end approach re- lies on a Temporal Convolutional Network that enforces the inter-frame coherence of the estimated 6-Degree-of-Freedom spacecraft poses. The pipeline is decomposed into two stages; first, spatial features are ex- tracted from each frame in parallel; second, these features are lifted to the space of camera poses while preserving temporal information. Our re- sults highlight the importance of addressing the domain gap problem to propose reliable solutions for close-range autonomous relative navigation between spacecrafts. Since the nature of the data used during training impacts directly the performance of the final solution, the CubeSat-CDT dataset is provided to advance research into this direction. [less ▲] Detailed reference viewed: 188 (19 UL) TSCom-Net: Coarse-to-Fine 3D Textured Shape Completion NetworkKaradeniz, Ahmet Serdar ; Ali, Sk Aziz ; Kacem, Anis et alin Karadeniz, Ahmet Serdar; Ali, Sk Aziz; Kacem, Anis (Eds.) et al TSCom-Net: Coarse-to-Fine 3D Textured Shape Completion Network (2022) Reconstructing 3D human body shapes from 3D partial textured scans remains a fundamental task for many computer vision and graphics applications – e.g., body animation, and virtual dressing. We propose a ... [more ▼] Reconstructing 3D human body shapes from 3D partial textured scans remains a fundamental task for many computer vision and graphics applications – e.g., body animation, and virtual dressing. We propose a new neural network architecture for 3D body shape and highresolution texture completion – TSCom-Net – that can reconstruct the full geometry from mid-level to high-level partial input scans. We decompose the overall reconstruction task into two stages – first, a joint implicit learning network (SCom-Net and TCom-Net) that takes a voxelized scan and its occupancy grid as input to reconstruct the full body shape and predict vertex textures. Second, a high-resolution texture completion network, that utilizes the predicted coarse vertex textures to inpaint the missing parts of the partial ‘texture atlas’. A Thorough experimental evaluation on 3DBodyTex.V2 dataset shows that our method achieves competitive results with respect to the state-of-the-art while generalizing to different types and levels of partial shapes. The proposed method has also ranked second in the track1 of SHApe Recovery from Partial textured 3D scans (SHARP [37 , 2]) 2022 1 challenge1. [less ▲] Detailed reference viewed: 131 (19 UL) |
||