Evaluating the Robustness of Test Selection Methods for Deep Neural NetworksHu, Qiang ; ; et alin preprint (2023) Detailed reference viewed: 71 (2 UL) Going Further: Flatness at the Rescue of Early Stopping for Adversarial Example TransferabilityGubri, Martin ; Cordy, Maxime ; Le Traon, Yves ![]() E-print/Working paper (2023) Transferability is the property of adversarial examples to be misclassified by other models than the surrogate model for which they were crafted. Previous research has shown that transferability is ... [more ▼] Transferability is the property of adversarial examples to be misclassified by other models than the surrogate model for which they were crafted. Previous research has shown that transferability is substantially increased when the training of the surrogate model has been early stopped. A common hypothesis to explain this is that the later training epochs are when models learn the non-robust features that adversarial attacks exploit. Hence, an early stopped model is more robust (hence, a better surrogate) than fully trained models. We demonstrate that the reasons why early stopping improves transferability lie in the side effects it has on the learning dynamics of the model. We first show that early stopping benefits transferability even on models learning from data with non-robust features. We then establish links between transferability and the exploration of the loss landscape in the parameter space, on which early stopping has an inherent effect. More precisely, we observe that transferability peaks when the learning rate decays, which is also the time at which the sharpness of the loss significantly drops. This leads us to propose RFN, a new approach for transferability that minimizes loss sharpness during training in order to maximize transferability. We show that by searching for large flat neighborhoods, RFN always improves over early stopping (by up to 47 points of transferability rate) and is competitive to (if not better than) strong state-of-the-art baselines. [less ▲] Detailed reference viewed: 83 (0 UL) On the empirical effectiveness of unrealistic adversarial hardening against realistic adversarial attacksDyrmishi, Salijona ; Ghamizi, Salah ; Simonetto, Thibault Jean Angel et alin Conference Proceedings 2023 IEEE Symposium on Security and Privacy (SP) (2023) While the literature on security attacks and defense of Machine Learning (ML) systems mostly focuses on unrealistic adversarial examples, recent research has raised concern about the under-explored field ... [more ▼] While the literature on security attacks and defense of Machine Learning (ML) systems mostly focuses on unrealistic adversarial examples, recent research has raised concern about the under-explored field of realistic adversarial attacks and their implications on the robustness of real-world systems. Our paper paves the way for a better understanding of adversarial robustness against realistic attacks and makes two major contributions. First, we conduct a study on three real-world use cases (text classification, botnet detection, malware detection)) and five datasets in order to evaluate whether unrealistic adversarial examples can be used to protect models against realistic examples. Our results reveal discrepancies across the use cases, where unrealistic examples can either be as effective as the realistic ones or may offer only limited improvement. Second, to explain these results, we analyze the latent representation of the adversarial examples generated with realistic and unrealistic attacks. We shed light on the patterns that discriminate which unrealistic examples can be used for effective hardening. We release our code, datasets and models to support future research in exploring how to reduce the gap between unrealistic and realistic adversarial attacks. [less ▲] Detailed reference viewed: 86 (0 UL) How do humans perceive adversarial text? A reality check on the validity and naturalness of word-based adversarial attacksDyrmishi, Salijona ; Ghamizi, Salah ; Cordy, Maxime ![]() in Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (2023) Natural Language Processing (NLP) models based on Machine Learning (ML) are susceptible to adversarial attacks -- malicious algorithms that imperceptibly modify input text to force models into making ... [more ▼] Natural Language Processing (NLP) models based on Machine Learning (ML) are susceptible to adversarial attacks -- malicious algorithms that imperceptibly modify input text to force models into making incorrect predictions. However, evaluations of these attacks ignore the property of imperceptibility or study it under limited settings. This entails that adversarial perturbations would not pass any human quality gate and do not represent real threats to human-checked NLP systems. To bypass this limitation and enable proper assessment (and later, improvement) of NLP model robustness, we have surveyed 378 human participants about the perceptibility of text adversarial examples produced by state-of-the-art methods. Our results underline that existing text attacks are impractical in real-world scenarios where humans are involved. This contrasts with previous smaller-scale human studies, which reported overly optimistic conclusions regarding attack success. Through our work, we hope to position human perceptibility as a first-class success criterion for text attacks, and provide guidance for research to build effective attack algorithms and, in turn, design appropriate defence mechanisms. [less ▲] Detailed reference viewed: 79 (0 UL) Faster and Cheaper Energy Demand Forecasting at ScaleBernier, Fabien ; Jimenez, Matthieu ; Cordy, Maxime et alin Has it Trained Yet? Workshop at the Conference on Neural Information Processing Systems (2022, December 02) Energy demand forecasting is one of the most challenging tasks for grids operators. Many approaches have been suggested over the years to tackle it. Yet, those still remain too expensive to train in terms ... [more ▼] Energy demand forecasting is one of the most challenging tasks for grids operators. Many approaches have been suggested over the years to tackle it. Yet, those still remain too expensive to train in terms of both time and computational resources, hindering their adoption as customers behaviors are continuously evolving. We introduce Transplit, a new lightweight transformer-based model, which significantly decreases this cost by exploiting the seasonality property and learning typical days of power demand. We show that Transplit can be run efficiently on CPU and is several hundred times faster than state-of-the-art predictive models, while performing as well. [less ▲] Detailed reference viewed: 178 (16 UL) What Made This Test Flake? Pinpointing Classes Responsible for Test Flakiness; Haben, Guillaume ; Sohn, Jeongju et alin What Made This Test Flake? Pinpointing Classes Responsible for Test Flakiness (2022, October) Flaky tests are defined as tests that manifest non-deterministic behaviour by passing and failing intermittently for the same version of the code. These tests cripple continuous integration with false ... [more ▼] Flaky tests are defined as tests that manifest non-deterministic behaviour by passing and failing intermittently for the same version of the code. These tests cripple continuous integration with false alerts that waste developers' time and break their trust in regression testing. To mitigate the effects of flakiness, both researchers and industrial experts proposed strategies and tools to detect and isolate flaky tests. However, flaky tests are rarely fixed as developers struggle to localise and understand their causes. Additionally, developers working with large codebases often need to know the sources of non-determinism to preserve code quality, i.e. avoid introducing technical debt linked with non-deterministic behaviour, and to avoid introducing new flaky tests. To aid with these tasks, we propose re-targeting Fault Localisation techniques to the flaky component localisation problem, i.e. pinpointing program classes that cause the non-deterministic behaviour of flaky tests. In particular, we employ Spectrum-Based Fault Localisation (SBFL), a coverage-based fault localisation technique commonly adopted for its simplicity and effectiveness. We also utilise other data sources, such as change history and static code metrics, to further improve the localisation. Our results show that augmenting SBFL with change and code metrics ranks flaky classes in the top-1 and top-5 suggestions, in 26% and 47% of the cases. Overall, we successfully reduced the average number of classes inspected to locate the first flaky class to 19% of the total number of classes covered by flaky tests. Our results also show that localisation methods are effective in major flakiness categories, such as concurrency and asynchronous waits, indicating their general ability to identify flaky components. [less ▲] Detailed reference viewed: 87 (1 UL) VARIABILITY-DRIVEN DESIGN CONFIGURATOR OF SPACE SYSTEMS TO SUPPORT DECISION-MAKERSRana, Loveneesh ; Lazreg, Sami ; et alScientific Conference (2022, October) Detailed reference viewed: 76 (1 UL) Learning from what we know: How to perform vulnerability prediction using noisy historical dataGarg, Aayush ; Degiovanni, Renzo Gaston ; Jimenez, Matthieu et alin Empirical Software Engineering (2022) Detailed reference viewed: 234 (28 UL) GraphCode2Vec: generic code embedding via lexical and program dependence analysesMa, Wei ; ; Soremekun, Ezekiel et alin Proceedings of the 19th International Conference on Mining Software Repositories (2022, May 22) Code embedding is a keystone in the application of machine learn- ing on several Software Engineering (SE) tasks. To effectively support a plethora of SE tasks, the embedding needs to capture program ... [more ▼] Code embedding is a keystone in the application of machine learn- ing on several Software Engineering (SE) tasks. To effectively support a plethora of SE tasks, the embedding needs to capture program syntax and semantics in a way that is generic. To this end, we propose the first self-supervised pre-training approach (called GraphCode2Vec) which produces task-agnostic embedding of lexical and program dependence features. GraphCode2Vec achieves this via a synergistic combination of code analysis and Graph Neural Networks. GraphCode2Vec is generic, it allows pre-training, and it is applicable to several SE downstream tasks. We evaluate the effectiveness of GraphCode2Vec on four (4) tasks (method name prediction, solution classification, mutation testing and overfitted patch classification), and compare it with four (4) similarly generic code embedding baselines (Code2Seq, Code2Vec, CodeBERT, Graph- CodeBERT) and seven (7) task-specific, learning-based methods. In particular, GraphCode2Vec is more effective than both generic and task-specific learning-based baselines. It is also complementary and comparable to GraphCodeBERT (a larger and more complex model). We also demonstrate through a probing and ablation study that GraphCode2Vec learns lexical and program dependence features and that self-supervised pre-training improves effectiveness. [less ▲] Detailed reference viewed: 60 (3 UL) iBiR: Bug Report driven Fault InjectionKhanfir, Ahmed ; ; Papadakis, Mike et alin ACM Transactions on Software Engineering and Methodology (2022) Detailed reference viewed: 96 (2 UL) A Qualitative Study on the Sources, Impacts, and Mitigation Strategies of Flaky TestsHabchi, Sarra ; Haben, Guillaume ; Papadakis, Mike et alin A Qualitative Study on the Sources, Impacts, and Mitigation Strategies of Flaky Tests (2022, April) Test flakiness forms a major testing concern. Flaky tests manifest non-deterministic outcomes that cripple continuous integration and lead developers to investigate false alerts. Industrial reports ... [more ▼] Test flakiness forms a major testing concern. Flaky tests manifest non-deterministic outcomes that cripple continuous integration and lead developers to investigate false alerts. Industrial reports indicate that on a large scale, the accrual of flaky tests breaks the trust in test suites and entails significant computational cost. To alleviate this, practitioners are constrained to identify flaky tests and investigate their impact. To shed light on such mitigation mechanisms, we interview 14 practitioners with the aim to identify (i) the sources of flakiness within the testing ecosystem, (ii) the impacts of flakiness, (iii) the measures adopted by practitioners when addressing flakiness, and (iv) the automation opportunities for these measures. Our analysis shows that, besides the tests and code, flakiness stems from interactions between the system components, the testing infrastructure, and external factors. We also highlight the impact of flakiness on testing practices and product quality and show that the adoption of guidelines together with a stable infrastructure are key measures in mitigating the problem. [less ▲] Detailed reference viewed: 82 (0 UL) Towards Generalizable Machine Learning for Chest X-ray Diagnosis with Multi-task learningGhamizi, Salah ; Garcia Santa Cruz, Beatriz ; et alE-print/Working paper (2022) Clinicians use chest radiography (CXR) to diagnose common pathologies. Automated classification of these diseases can expedite analysis workflow, scale to growing numbers of patients and reduce healthcare ... [more ▼] Clinicians use chest radiography (CXR) to diagnose common pathologies. Automated classification of these diseases can expedite analysis workflow, scale to growing numbers of patients and reduce healthcare costs. While research has produced classification models that perform well on a given dataset, the same models lack generalization on different datasets. This reduces confidence that these models can be reliably deployed across various clinical settings. We propose an approach based on multitask learning to improve model generalization. We demonstrate that learning a (main) pathology together with an auxiliary pathology can significantly impact generalization performance (between -10% and +15% AUC-ROC). A careful choice of auxiliary pathology even yields competitive performance with state-of-the-art models that rely on fine-tuning or ensemble learning, using between 6% and 34% of the training data that these models required. We, further, provide a method to determine what is the best auxiliary task to choose without access to the target dataset. Ultimately, our work makes a big step towards the creation of CXR diagnosis models applicable in the real world, through the evidence that multitask learning can drastically improve generalization. [less ▲] Detailed reference viewed: 188 (17 UL) Variability-Aware Design of Space Systems: Variability Modelling, Configuration Workflow and Research DirectionsLazreg, Sami ; ; Rana, Loveneesh et alin Proceedings of VAMOS 22 (2022, February) Detailed reference viewed: 92 (0 UL) Efficient and Transferable Adversarial Examples from Bayesian Neural NetworksGubri, Martin ; Cordy, Maxime ; Papadakis, Mike et alin The 38th Conference on Uncertainty in Artificial Intelligence (2022) An established way to improve the transferability of black-box evasion attacks is to craft the adversarial examples on an ensemble-based surrogate to increase diversity. We argue that transferability is ... [more ▼] An established way to improve the transferability of black-box evasion attacks is to craft the adversarial examples on an ensemble-based surrogate to increase diversity. We argue that transferability is fundamentally related to uncertainty. Based on a state-of-the-art Bayesian Deep Learning technique, we propose a new method to efficiently build a surrogate by sampling approximately from the posterior distribution of neural network weights, which represents the belief about the value of each parameter. Our extensive experiments on ImageNet, CIFAR-10 and MNIST show that our approach improves the success rates of four state-of-the-art attacks significantly (up to 83.2 percentage points), in both intra-architecture and inter-architecture transferability. On ImageNet, our approach can reach 94% of success rate while reducing training computations from 11.6 to 2.4 exaflops, compared to an ensemble of independently trained DNNs. Our vanilla surrogate achieves 87.5% of the time higher transferability than three test-time techniques designed for this purpose. Our work demonstrates that the way to train a surrogate has been overlooked, although it is an important element of transfer-based attacks. We are, therefore, the first to review the effectiveness of several training methods in increasing transferability. We provide new directions to better understand the transferability phenomenon and offer a simple but strong baseline for future work. [less ▲] Detailed reference viewed: 118 (9 UL) LGV: Boosting Adversarial Example Transferability from Large Geometric VicinityGubri, Martin ; Cordy, Maxime ; Papadakis, Mike et alin Computer Vision -- ECCV 2022 (2022) We propose transferability from Large Geometric Vicinity (LGV), a new technique to increase the transferability of black-box adversarial attacks. LGV starts from a pretrained surrogate model and collects ... [more ▼] We propose transferability from Large Geometric Vicinity (LGV), a new technique to increase the transferability of black-box adversarial attacks. LGV starts from a pretrained surrogate model and collects multiple weight sets from a few additional training epochs with a constant and high learning rate. LGV exploits two geometric properties that we relate to transferability. First, models that belong to a wider weight optimum are better surrogates. Second, we identify a subspace able to generate an effective surrogate ensemble among this wider optimum. Through extensive experiments, we show that LGV alone outperforms all (combinations of) four established test-time transformations by 1.8 to 59.9\% points. Our findings shed new light on the importance of the geometry of the weight space to explain the transferability of adversarial examples. [less ▲] Detailed reference viewed: 91 (3 UL) A Unified Framework for Adversarial Attack and Defense in Constrained Feature SpaceSimonetto, Thibault Jean Angel ; Dyrmishi, Salijona ; Ghamizi, Salah et alin Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI-22 (2022) The generation of feasible adversarial examples is necessary for properly assessing models that work in constrained feature space. However, it remains a challenging task to enforce constraints into ... [more ▼] The generation of feasible adversarial examples is necessary for properly assessing models that work in constrained feature space. However, it remains a challenging task to enforce constraints into attacks that were designed for computer vision. We propose a unified framework to generate feasible adversarial examples that satisfy given domain constraints. Our framework can handle both linear and non-linear constraints. We instantiate our framework into two algorithms: a gradient-based attack that introduces constraints in the loss function to maximize, and a multi-objective search algorithm that aims for misclassification, perturbation minimization, and constraint satisfaction. We show that our approach is effective in four different domains, with a success rate of up to 100%, where state-of-the-art attacks fail to generate a single feasible example. In addition to adversarial retraining, we propose to introduce engineered non-convex constraints to improve model adversarial robustness. We demonstrate that this new defense is as effective as adversarial retraining. Our framework forms the starting point for research on constrained adversarial attacks and provides relevant baselines and datasets that future research can exploit. [less ▲] Detailed reference viewed: 136 (5 UL) On Evaluating Adversarial Robustness of Chest X-ray Classification: Pitfalls and Best PracticesGhamizi, Salah ; Cordy, Maxime ; Papadakis, Mike et alin The Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI- 23) - SafeAI Workshop, Washington, D.C., Feb 13-14, 2023 (2022) Detailed reference viewed: 105 (0 UL) An Empirical Study on Data Distribution-Aware Test Selection for Deep Learning EnhancementHu, Qiang ; Guo, Yuejun ; Cordy, Maxime et alin ACM Transactions on Software Engineering and Methodology (2022) Similar to traditional software that is constantly under evolution, deep neural networks (DNNs) need to evolve upon the rapid growth of test data for continuous enhancement, e.g., adapting to distribution ... [more ▼] Similar to traditional software that is constantly under evolution, deep neural networks (DNNs) need to evolve upon the rapid growth of test data for continuous enhancement, e.g., adapting to distribution shift in a new environment for deployment. However, it is labor-intensive to manually label all the collected test data. Test selection solves this problem by strategically choosing a small set to label. Via retraining with the selected set, DNNs will achieve competitive accuracy. Unfortunately, existing selection metrics involve three main limitations: 1) using different retraining processes; 2) ignoring data distribution shifts; 3) being insufficiently evaluated. To fill this gap, we first conduct a systemically empirical study to reveal the impact of the retraining process and data distribution on model enhancement. Then based on our findings, we propose a novel distribution-aware test (DAT) selection metric. Experimental results reveal that retraining using both the training and selected data outperforms using only the selected data. None of the selection metrics perform the best under various data distributions. By contrast, DAT effectively alleviates the impact of distribution shifts and outperforms the compared metrics by up to 5 times and 30.09% accuracy improvement for model enhancement on simulated and in-the-wild distribution shift scenarios, respectively. [less ▲] Detailed reference viewed: 326 (62 UL) Adversarial Robustness in Multi-Task Learning: Promises and IllusionsGhamizi, Salah ; Cordy, Maxime ; Papadakis, Mike et alin Proceedings of the thirty-Sixth AAAI Conference on Artificial Intelligence (AAAI-22) (2022) Vulnerability to adversarial attacks is a well-known weakness of Deep Neural networks. While most of the studies focus on single-task neural networks with computer vision datasets, very little research ... [more ▼] Vulnerability to adversarial attacks is a well-known weakness of Deep Neural networks. While most of the studies focus on single-task neural networks with computer vision datasets, very little research has considered complex multi-task models that are common in real applications. In this paper, we evaluate the design choices that impact the robustness of multi-task deep learning networks. We provide evidence that blindly adding auxiliary tasks, or weighing the tasks provides a false sense of robustness. Thereby, we tone down the claim made by previous research and study the different factors which may affect robustness. In particular, we show that the choice of the task to incorporate in the loss function are important factors that can be leveraged to yield more robust models. [less ▲] Detailed reference viewed: 208 (10 UL) Multi-agent deep reinforcement learning based Predictive Maintenance on parallel machinesRuiz Rodriguez, Marcelo Luis ; Kubler, Sylvain ; et alin Robotics and Computer-Integrated Manufacturing (2022) In the context of Industry 4.0, companies understand the advantages of performing Predictive Maintenance (PdM). However, when moving towards PdM, several considerations must be carefully examined. First ... [more ▼] In the context of Industry 4.0, companies understand the advantages of performing Predictive Maintenance (PdM). However, when moving towards PdM, several considerations must be carefully examined. First, they need to have a sufficient number of production machines and relative fault data to generate maintenance predictions. Second, they need to adopt the right maintenance approach, which, ideally, should self-adapt to the machinery, priorities of the organization, technician skills, but also to be able to deal with uncertainty. Reinforcement learning (RL) is envisioned as a key technique in this regard due to its inherent ability to learn by interacting through trials and errors, but very few RL-based maintenance frameworks have been proposed so far in the literature, or are limited in several respects. This paper proposes a new multi-agent approach that learns a maintenance policy performed by technicians, under the uncertainty of multiple machine failures. This approach comprises RL agents that partially observe the state of each machine to coordinate the decision-making in maintenance scheduling, resulting in the dynamic assignment of maintenance tasks to technicians (with different skills) over a set of machines. Experimental evaluation shows that our RL-based maintenance policy outperforms traditional maintenance policies (incl., corrective and preventive ones) in terms of failure prevention and downtime, improving by ≈75% the overall performance. [less ▲] Detailed reference viewed: 357 (2 UL) |
||