A prompt-based approach to adversarial example generation and robustness enhancement.

In this paper, we propose a novel prompt-based adversarial attack to compromise NLP models and robustness enhancement technique. ... Furthermore, we design a prompt-based adversarial training method to improve the robustness of PLMs. ... Conclusion In this paper, we design prompting-based approaches for adversarial example generation and robustness enhancement. ...

arXiv:2203.10714v1 fatcat:kmp53fxw7rb3dazsk2gv7v7wza

This paper presents the first empirical study on the adversarial robustness of a large prompt-based language model of code, \codex. ... While it has been established that the robustness of smaller semantic parsers can be enhanced through adversarial training, this approach is not feasible for large language models in real-world scenarios ... Prompt-based semantic parsers learn to solve a new task by in-context learning, instructing the parsers to generate correct LFs by constructing the prompt with a few demonstration examples. ...

arXiv:2301.12868v3 fatcat:sc4wrzawlrfgbl6dcylhjo2wk4

Open Access Multiple Versions

These datasets are designed to fortify the safety classifier's robustness, and we investigate the consequences of incorporating adversarial examples into the training process. ... In response, our study introduces the Adversarial Prompt Shield (APS), a lightweight model that excels in detection accuracy and demonstrates resilience against adversarial prompts. ... While incorporating all possible adversarial suffixes in the training data can potentially enhance the robustness TABLE III: Examples of Bot Adversarial Noisy Data Generation. ...

arXiv:2311.00172v1 fatcat:pummn75u6zc3dolwkwevuph3hm

Our objective is to offer a nuanced understanding of LLM attacks, foster awareness within the AI community, and inspire robust solutions to mitigate these risks in future developments. ... We delve into topics such as adversarial attacks that aim to manipulate model outputs, data poisoning that affects model training, and privacy concerns related to training data exploitation. ... Automated Adversarial Attacks Automated adversarial attacks use algorithms to generate and deploy adversarial examples, offering scalability without human expertise. ...

arXiv:2403.04786v2 fatcat:cnsgaz42cfevtlcarleyr6t4km

Open Access Multiple Versions

We sincerely hope this paper can provide valuable insights for researchers and practitioners endeavoring to build safe and dependable FMs and foster a stable and consistent ICL environment, thereby unlocking ... As foundation models (FMs) continue to shape the landscape of AI, the in-context learning (ICL) paradigm thrives but also encounters issues such as toxicity, hallucination, disparity, adversarial vulnerability ... ., 2019) architecture to train a model that generates contextualized prompts based on a given (subject, relation) pair in knowledge probing tasks. ...

arXiv:2402.17671v1 fatcat:5kluxu5aavcrpnskcqpq5jlrpu

Building on this analysis, we propose a method for robustness fine-tuning, inspired by adversarial training. ... This enables us to compare the vulnerability of a model against confidentiality attacks and also the effectiveness of different defense strategies. ... Acknowledgments We would like to thank Avital Shafran and David Pape for their valuable feedback. ...

arXiv:2402.06922v1 fatcat:2pveirppararjbqebzaxegowku

We also propose a novel training objective that enhances the consistency of multi-modal features while encourages differentiated uni-modal features between natural and adversarial examples. ... In this paper, to address these issues, we propose a few-shot adversarial prompt framework where adapting input sequences with limited data makes significant adversarial robustness improvement. ... It enhances cross-modal consistency between natural and adversarial examples to avoid potential robustness generalization failures, while encourages uni-modal divergence to introduce an adversarial aware ...

arXiv:2403.14774v1 fatcat:75oip73rzvcchgnnrtxnsnfq3m

Adversarial attacks, particularly patch attacks, pose significant threats to the robustness and reliability of deep learning models. ... Moreover, we propose a few-shot prompt-tuning algorithm to fine-tune the diffusion model, enabling the pre-trained diffusion model to adapt to the defense task easily. ... Based on these insights, we propose DIFFender, a defense approach that leverages a diffusion model to locate and restore patch attacks. The pipeline of DIFFender is illustrated in Fig. 2 . ...

arXiv:2306.09124v3 fatcat:zynbimxzuvaehdxrzj5bmwu6e4

Multiple Versions

Our study generates 4788 adversarial prompts, meticulously evaluated over 8 tasks and 13 datasets. Our findings demonstrate that contemporary LLMs are not robust to adversarial prompts. ... In response to this vital need, we introduce PromptBench, a robustness benchmark designed to measure LLMs' resilience to adversarial prompts. ... We categorize the robustness enhancement (i.e., defenses to adversarial prompts) approaches into three main axes: strategies in the training phase, input preprocessing, and downstream fine-tuning. ...

arXiv:2306.04528v4 fatcat:fwiaszq76reatcin4yyasxjvfm

Open Access Multiple Versions

In this paper, we study and demonstrate the potential of benefiting from method names to enhance the performance of PCGMs, from a model robustness perspective. ... The former attacks a PCGM by generating adversarial method names as part of the input, which are semantic and visual similar to the original input, but may trick the PCGM to generate completely unrelated ... In the next step, we resort to a retrieval-enhanced prompt training approach. ...

doi:10.1145/3630010 fatcat:utd6inpemva2tpodytxsercqny

model combining robust KNN with LLMs based on our data augmentation, RigorLLM offers a robust solution to harmful content moderation. ... By employing a multi-faceted approach that includes energy-based training data augmentation through Langevin dynamics, optimizing a safe suffix for inputs via minimax optimization, and integrating a fusion-based ... To address the above challenges, we propose a novel energy-based data generation approach to improve the quality of the embeddings of the limited training data by generating new examples for each harmful ...

arXiv:2403.13031v1 fatcat:bxygaofvd5bjpbp24dy3ttfahm

Open Access

We investigate this problem and show that the vanilla VP approach is not effective in adversarial defense since a universal input prompt lacks the capacity for robust learning against sample-specific adversarial ... To circumvent it, we propose a new VP method, termed Class-wise Adversarial Visual Prompting (C-AVP), to generate class-wise visual prompts so as to not only leverage the strengths of ensemble prompts ... parameter to strike a balance between generalization and adversarial robustness [2] . ...

arXiv:2210.06284v2 fatcat:ufaluh6jbfdevawjil5liirbua

Open Access Multiple Versions

and deployment, and an output module for exporting LLM-generated content. ... Therefore, there is a growing need to organize the existing studies and establish comprehensive taxonomies for the community. ... It includes 583,884 adversarial examples and covers a wide range of text-based attacks. ...

arXiv:2401.05778v1 fatcat:xmoqvjuo5fhxdjhxqqwnofl34y

Open Access

We develop an effective and efficient adversarial prompt generation approach for DMs, termed UnlearnDiffAtk. ... Our results demonstrate the effectiveness and efficiency merits of UnlearnDiffAtk over the state-of-the-art adversarial prompt generation method and reveal the lack of robustness of current safety-driven ... ., 2023) to penalize the distance between an adversarially generated image (under the adversarial prompt) and a normally generated image. ...

arXiv:2310.11868v2 fatcat:rmozzv2pgvfh5ofwqbmdwtmqhi

Open Access Multiple Versions

to enhance the model's zero-shot adversarial robustness. ... to imperceptible adversarial examples. ... As adversarial examples are generated based on a specific dataset, the robustness features acquired by the target model are confined and may excessively specialize to a particular downstream task dataset ...

arXiv:2401.04350v3 fatcat:mvh3zzoc65cy3ch2evgmnhkpsa

Multiple Versions

A Prompting-based Approach for Adversarial Example Generation and Robustness Enhancement [article]

Preserved Fulltext

On Robustness of Prompt-based Semantic Parsing with Large Pre-trained Language Model: An Empirical Study on Codex [article]

Preserved Fulltext

Other Versions

Robust Safety Classifier for Large Language Models: Adversarial Prompt Shield [article]

Preserved Fulltext

Breaking Down the Defenses: A Comparative Survey of Attacks on Large Language Models [article]

Preserved Fulltext

Other Versions

Securing Reliability: A Brief Overview on Enhancing In-Context Learning for Foundation Models [article]

Preserved Fulltext

Whispers in the Machine: Confidentiality in LLM-integrated Systems [article]

Preserved Fulltext

Few-Shot Adversarial Prompt Learning on Vision-Language Models [article]

Preserved Fulltext

DIFFender: Diffusion-Based Adversarial Defense against Patch Attacks [article]

Preserved Fulltext

Other Versions

PromptBench: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts [article]

Preserved Fulltext

Other Versions

How Important are Good Method Names in Neural Code Generation? A Model Robustness Perspective

Preserved Fulltext

RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content [article]

Preserved Fulltext

Visual Prompting for Adversarial Robustness [article]

Preserved Fulltext

Other Versions

Risk Taxonomy, Mitigation, and Assessment Benchmarks of Large Language Model Systems [article]

Preserved Fulltext

To Generate or Not? Safety-Driven Unlearned Diffusion Models Are Still Easy To Generate Unsafe Images ... For Now [article]

Preserved Fulltext

Other Versions

Pre-trained Model Guided Fine-Tuning for Zero-Shot Adversarial Robustness [article]

Preserved Fulltext

Other Versions