A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit the original URL.
The file type is application/pdf
.
Filters
A Prompting-based Approach for Adversarial Example Generation and Robustness Enhancement
[article]
2022
arXiv
pre-print
In this paper, we propose a novel prompt-based adversarial attack to compromise NLP models and robustness enhancement technique. ...
Furthermore, we design a prompt-based adversarial training method to improve the robustness of PLMs. ...
Conclusion In this paper, we design prompting-based approaches for adversarial example generation and robustness enhancement. ...
arXiv:2203.10714v1
fatcat:kmp53fxw7rb3dazsk2gv7v7wza
On Robustness of Prompt-based Semantic Parsing with Large Pre-trained Language Model: An Empirical Study on Codex
[article]
2023
arXiv
pre-print
This paper presents the first empirical study on the adversarial robustness of a large prompt-based language model of code, \codex. ...
While it has been established that the robustness of smaller semantic parsers can be enhanced through adversarial training, this approach is not feasible for large language models in real-world scenarios ...
Prompt-based semantic parsers learn to solve a new task by in-context learning, instructing the parsers to generate correct LFs by constructing the prompt with a few demonstration examples. ...
arXiv:2301.12868v3
fatcat:sc4wrzawlrfgbl6dcylhjo2wk4
Robust Safety Classifier for Large Language Models: Adversarial Prompt Shield
[article]
2023
arXiv
pre-print
These datasets are designed to fortify the safety classifier's robustness, and we investigate the consequences of incorporating adversarial examples into the training process. ...
In response, our study introduces the Adversarial Prompt Shield (APS), a lightweight model that excels in detection accuracy and demonstrates resilience against adversarial prompts. ...
While incorporating all possible adversarial suffixes in the training data can potentially enhance the robustness TABLE III: Examples of Bot Adversarial Noisy Data Generation. ...
arXiv:2311.00172v1
fatcat:pummn75u6zc3dolwkwevuph3hm
Breaking Down the Defenses: A Comparative Survey of Attacks on Large Language Models
[article]
2024
arXiv
pre-print
Our objective is to offer a nuanced understanding of LLM attacks, foster awareness within the AI community, and inspire robust solutions to mitigate these risks in future developments. ...
We delve into topics such as adversarial attacks that aim to manipulate model outputs, data poisoning that affects model training, and privacy concerns related to training data exploitation. ...
Automated Adversarial Attacks Automated adversarial attacks use algorithms to generate and deploy adversarial examples, offering scalability without human expertise. ...
arXiv:2403.04786v2
fatcat:cnsgaz42cfevtlcarleyr6t4km
Securing Reliability: A Brief Overview on Enhancing In-Context Learning for Foundation Models
[article]
2024
arXiv
pre-print
We sincerely hope this paper can provide valuable insights for researchers and practitioners endeavoring to build safe and dependable FMs and foster a stable and consistent ICL environment, thereby unlocking ...
As foundation models (FMs) continue to shape the landscape of AI, the in-context learning (ICL) paradigm thrives but also encounters issues such as toxicity, hallucination, disparity, adversarial vulnerability ...
., 2019) architecture to train a model that generates contextualized prompts based on a given (subject, relation) pair in knowledge probing tasks. ...
arXiv:2402.17671v1
fatcat:5kluxu5aavcrpnskcqpq5jlrpu
Whispers in the Machine: Confidentiality in LLM-integrated Systems
[article]
2024
arXiv
pre-print
Building on this analysis, we propose a method for robustness fine-tuning, inspired by adversarial training. ...
This enables us to compare the vulnerability of a model against confidentiality attacks and also the effectiveness of different defense strategies. ...
Acknowledgments We would like to thank Avital Shafran and David Pape for their valuable feedback. ...
arXiv:2402.06922v1
fatcat:2pveirppararjbqebzaxegowku
Few-Shot Adversarial Prompt Learning on Vision-Language Models
[article]
2024
arXiv
pre-print
We also propose a novel training objective that enhances the consistency of multi-modal features while encourages differentiated uni-modal features between natural and adversarial examples. ...
In this paper, to address these issues, we propose a few-shot adversarial prompt framework where adapting input sequences with limited data makes significant adversarial robustness improvement. ...
It enhances cross-modal consistency between natural and adversarial examples to avoid potential robustness generalization failures, while encourages uni-modal divergence to introduce an adversarial aware ...
arXiv:2403.14774v1
fatcat:75oip73rzvcchgnnrtxnsnfq3m
DIFFender: Diffusion-Based Adversarial Defense against Patch Attacks
[article]
2023
arXiv
pre-print
Adversarial attacks, particularly patch attacks, pose significant threats to the robustness and reliability of deep learning models. ...
Moreover, we propose a few-shot prompt-tuning algorithm to fine-tune the diffusion model, enabling the pre-trained diffusion model to adapt to the defense task easily. ...
Based on these insights, we propose DIFFender, a defense approach that leverages a diffusion model to locate and restore patch attacks. The pipeline of DIFFender is illustrated in Fig. 2 . ...
arXiv:2306.09124v3
fatcat:zynbimxzuvaehdxrzj5bmwu6e4
PromptBench: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts
[article]
2023
arXiv
pre-print
Our study generates 4788 adversarial prompts, meticulously evaluated over 8 tasks and 13 datasets. Our findings demonstrate that contemporary LLMs are not robust to adversarial prompts. ...
In response to this vital need, we introduce PromptBench, a robustness benchmark designed to measure LLMs' resilience to adversarial prompts. ...
We categorize the robustness enhancement (i.e., defenses to adversarial prompts) approaches into three main axes: strategies in the training phase, input preprocessing, and downstream fine-tuning. ...
arXiv:2306.04528v4
fatcat:fwiaszq76reatcin4yyasxjvfm
How Important are Good Method Names in Neural Code Generation? A Model Robustness Perspective
2023
ACM Transactions on Software Engineering and Methodology
In this paper, we study and demonstrate the potential of benefiting from method names to enhance the performance of PCGMs, from a model robustness perspective. ...
The former attacks a PCGM by generating adversarial method names as part of the input, which are semantic and visual similar to the original input, but may trick the PCGM to generate completely unrelated ...
In the next step, we resort to a retrieval-enhanced prompt training approach. ...
doi:10.1145/3630010
fatcat:utd6inpemva2tpodytxsercqny
RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content
[article]
2024
arXiv
pre-print
model combining robust KNN with LLMs based on our data augmentation, RigorLLM offers a robust solution to harmful content moderation. ...
By employing a multi-faceted approach that includes energy-based training data augmentation through Langevin dynamics, optimizing a safe suffix for inputs via minimax optimization, and integrating a fusion-based ...
To address the above challenges, we propose a novel energy-based data generation approach to improve the quality of the embeddings of the limited training data by generating new examples for each harmful ...
arXiv:2403.13031v1
fatcat:bxygaofvd5bjpbp24dy3ttfahm
Visual Prompting for Adversarial Robustness
[article]
2023
arXiv
pre-print
We investigate this problem and show that the vanilla VP approach is not effective in adversarial defense since a universal input prompt lacks the capacity for robust learning against sample-specific adversarial ...
To circumvent it, we propose a new VP method, termed Class-wise Adversarial Visual Prompting (C-AVP), to generate class-wise visual prompts so as to not only leverage the strengths of ensemble prompts ...
parameter to strike a balance between generalization and adversarial robustness [2] . ...
arXiv:2210.06284v2
fatcat:ufaluh6jbfdevawjil5liirbua
Risk Taxonomy, Mitigation, and Assessment Benchmarks of Large Language Model Systems
[article]
2024
arXiv
pre-print
and deployment, and an output module for exporting LLM-generated content. ...
Therefore, there is a growing need to organize the existing studies and establish comprehensive taxonomies for the community. ...
It includes 583,884 adversarial examples and covers a wide range of text-based attacks. ...
arXiv:2401.05778v1
fatcat:xmoqvjuo5fhxdjhxqqwnofl34y
To Generate or Not? Safety-Driven Unlearned Diffusion Models Are Still Easy To Generate Unsafe Images ... For Now
[article]
2024
arXiv
pre-print
We develop an effective and efficient adversarial prompt generation approach for DMs, termed UnlearnDiffAtk. ...
Our results demonstrate the effectiveness and efficiency merits of UnlearnDiffAtk over the state-of-the-art adversarial prompt generation method and reveal the lack of robustness of current safety-driven ...
., 2023) to penalize the distance between an adversarially generated image (under the adversarial prompt) and a normally generated image. ...
arXiv:2310.11868v2
fatcat:rmozzv2pgvfh5ofwqbmdwtmqhi
Pre-trained Model Guided Fine-Tuning for Zero-Shot Adversarial Robustness
[article]
2024
arXiv
pre-print
to enhance the model's zero-shot adversarial robustness. ...
to imperceptible adversarial examples. ...
As adversarial examples are generated based on a specific dataset, the robustness features acquired by the target model are confined and may excessively specialize to a particular downstream task dataset ...
arXiv:2401.04350v3
fatcat:mvh3zzoc65cy3ch2evgmnhkpsa
« Previous
Showing results 1 — 15 out of 11,103 results