ProPILE: Probing Privacy Leakage in Large Language Models; ; et al in Advances in Neural Information Processing Systems 36 (NeurIPS 2023) (2023, December) The rapid advancement and widespread use of large language models (LLMs) have raised significant concerns regarding the potential leakage of personally identifiable information (PII). These models are ... [more ▼] The rapid advancement and widespread use of large language models (LLMs) have raised significant concerns regarding the potential leakage of personally identifiable information (PII). These models are often trained on vast quantities of web-collected data, which may inadvertently include sensitive personal data. This paper presents ProPILE, a novel probing tool designed to empower data subjects, or the owners of the PII, with awareness of potential PII leakage in LLM-based services. ProPILE lets data subjects formulate prompts based on their own PII to evaluate the level of privacy intrusion in LLMs. We demonstrate its application on the OPT-1.3B model trained on the publicly available Pile dataset. We show how hypothetical data subjects may assess the likelihood of their PII being included in the Pile dataset being revealed. ProPILE can also be leveraged by LLM service providers to effectively evaluate their own levels of PII leakage with more powerful prompts specifically tuned for their in-house models. This tool represents a pioneering step towards empowering the data subjects for their awareness and control over their own data on the web. [less ▲] Detailed reference viewed: 162 (1 UL) What Matters in Model Training to Transfer Adversarial ExamplesGubri, Martin ![]() Doctoral thesis (2023) Despite state-of-the-art performance on natural data, Deep Neural Networks (DNNs) are highly vulnerable to adversarial examples, i.e., imperceptible, carefully crafted perturbations of inputs applied at ... [more ▼] Despite state-of-the-art performance on natural data, Deep Neural Networks (DNNs) are highly vulnerable to adversarial examples, i.e., imperceptible, carefully crafted perturbations of inputs applied at test time. Adversarial examples can transfer: an adversarial example against one model is likely to be adversarial against another independently trained model. This dissertation investigates the characteristics of the surrogate weight space that lead to the transferability of adversarial examples. Our research covers three complementary aspects of the weight space exploration: the multimodal exploration to obtain multiple models from different vicinities, the local exploration to obtain multiple models in the same vicinity, and the point selection to obtain a single transferable representation. First, from a probabilistic perspective, we argue that transferability is fundamentally related to uncertainty. The unknown weights of the target DNN can be treated as random variables. Under a specified threat model, deep ensemble can produce a surrogate by sampling from the distribution of the target model. Unfortunately, deep ensembles are computationally expensive. We propose an efficient alternative by approximately sampling surrogate models from the posterior distribution using cSGLD, a state-of-the-art Bayesian deep learning technique. Our extensive experiments show that our approach improves and complements four attacks, three transferability techniques, and five more training methods significantly on ImageNet, CIFAR-10, and MNIST (up to 83.2 percentage points), while reducing training computations from 11.6 to 2.4 exaflops compared to deep ensemble on ImageNet. Second, we propose transferability from Large Geometric Vicinity (LGV), a new technique based on the local exploration of the weight space. LGV starts from a pretrained model and collects multiple weights in a few additional training epochs with a constant and high learning rate. LGV exploits two geometric properties that we relate to transferability. First, we show that LGV explores a flatter region of the weight space and generates flatter adversarial examples in the input space. We present the surrogate-target misalignment hypothesis to explain why flatness could increase transferability. Second, we show that the LGV weights span a dense weight subspace whose geometry is intrinsically connected to transferability. Through extensive experiments, we show that LGV alone outperforms all (combinations of) four established transferability techniques by 1.8 to 59.9 percentage points. Third, we investigate how to train a transferable representation, that is, a single model for transferability. First, we refute a common hypothesis from previous research to explain why early stopping improves transferability. We then establish links between transferability and the exploration dynamics of the weight space, in which early stopping has an inherent effect. More precisely, we observe that transferability peaks when the learning rate decays, which is also the time at which the sharpness of the loss significantly drops. This leads us to propose RFN, a new approach to transferability that minimises the sharpness of the loss during training. We show that by searching for large flat neighbourhoods, RFN always improves over early stopping (by up to 47 points of success rate) and is competitive to (if not better than) strong state-of-the-art baselines. Overall, our three complementary techniques provide an extensive and practical method to obtain highly transferable adversarial examples from the multimodal and local exploration of flatter vicinities in the weight space. Our probabilistic and geometric approaches demonstrate that the way to train the surrogate model has been overlooked, although both the training noise and the flatness of the loss landscape are important elements of transfer-based attacks. [less ▲] Detailed reference viewed: 190 (15 UL) Going Further: Flatness at the Rescue of Early Stopping for Adversarial Example TransferabilityGubri, Martin ; Cordy, Maxime ; Le Traon, Yves ![]() E-print/Working paper (2023) Transferability is the property of adversarial examples to be misclassified by other models than the surrogate model for which they were crafted. Previous research has shown that transferability is ... [more ▼] Transferability is the property of adversarial examples to be misclassified by other models than the surrogate model for which they were crafted. Previous research has shown that transferability is substantially increased when the training of the surrogate model has been early stopped. A common hypothesis to explain this is that the later training epochs are when models learn the non-robust features that adversarial attacks exploit. Hence, an early stopped model is more robust (hence, a better surrogate) than fully trained models. We demonstrate that the reasons why early stopping improves transferability lie in the side effects it has on the learning dynamics of the model. We first show that early stopping benefits transferability even on models learning from data with non-robust features. We then establish links between transferability and the exploration of the loss landscape in the parameter space, on which early stopping has an inherent effect. More precisely, we observe that transferability peaks when the learning rate decays, which is also the time at which the sharpness of the loss significantly drops. This leads us to propose RFN, a new approach for transferability that minimizes loss sharpness during training in order to maximize transferability. We show that by searching for large flat neighborhoods, RFN always improves over early stopping (by up to 47 points of transferability rate) and is competitive to (if not better than) strong state-of-the-art baselines. [less ▲] Detailed reference viewed: 160 (0 UL) LGV: Boosting Adversarial Example Transferability from Large Geometric VicinityGubri, Martin ; Cordy, Maxime ; Papadakis, Mike et alin Computer Vision -- ECCV 2022 (2022) We propose transferability from Large Geometric Vicinity (LGV), a new technique to increase the transferability of black-box adversarial attacks. LGV starts from a pretrained surrogate model and collects ... [more ▼] We propose transferability from Large Geometric Vicinity (LGV), a new technique to increase the transferability of black-box adversarial attacks. LGV starts from a pretrained surrogate model and collects multiple weight sets from a few additional training epochs with a constant and high learning rate. LGV exploits two geometric properties that we relate to transferability. First, models that belong to a wider weight optimum are better surrogates. Second, we identify a subspace able to generate an effective surrogate ensemble among this wider optimum. Through extensive experiments, we show that LGV alone outperforms all (combinations of) four established test-time transformations by 1.8 to 59.9\% points. Our findings shed new light on the importance of the geometry of the weight space to explain the transferability of adversarial examples. [less ▲] Detailed reference viewed: 127 (3 UL) Efficient and Transferable Adversarial Examples from Bayesian Neural NetworksGubri, Martin ; Cordy, Maxime ; Papadakis, Mike et alin The 38th Conference on Uncertainty in Artificial Intelligence (2022) An established way to improve the transferability of black-box evasion attacks is to craft the adversarial examples on an ensemble-based surrogate to increase diversity. We argue that transferability is ... [more ▼] An established way to improve the transferability of black-box evasion attacks is to craft the adversarial examples on an ensemble-based surrogate to increase diversity. We argue that transferability is fundamentally related to uncertainty. Based on a state-of-the-art Bayesian Deep Learning technique, we propose a new method to efficiently build a surrogate by sampling approximately from the posterior distribution of neural network weights, which represents the belief about the value of each parameter. Our extensive experiments on ImageNet, CIFAR-10 and MNIST show that our approach improves the success rates of four state-of-the-art attacks significantly (up to 83.2 percentage points), in both intra-architecture and inter-architecture transferability. On ImageNet, our approach can reach 94% of success rate while reducing training computations from 11.6 to 2.4 exaflops, compared to an ensemble of independently trained DNNs. Our vanilla surrogate achieves 87.5% of the time higher transferability than three test-time techniques designed for this purpose. Our work demonstrates that the way to train a surrogate has been overlooked, although it is an important element of transfer-based attacks. We are, therefore, the first to review the effectiveness of several training methods in increasing transferability. We provide new directions to better understand the transferability phenomenon and offer a simple but strong baseline for future work. [less ▲] Detailed reference viewed: 167 (9 UL) Search-based adversarial testing and improvement of constrained credit scoring systemsGhamizi, Salah ; Cordy, Maxime ; Gubri, Martin et alin ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE '20), November 8-13, 2020 (2020) Detailed reference viewed: 274 (28 UL) |
||