Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Filters








11,103 Hits in 4.8 sec

A Prompting-based Approach for Adversarial Example Generation and Robustness Enhancement [article]

Yuting Yang, Pei Huang, Juan Cao, Jintao Li, Yun Lin, Jin Song Dong, Feifei Ma, Jian Zhang
2022 arXiv   pre-print
In this paper, we propose a novel prompt-based adversarial attack to compromise NLP models and robustness enhancement technique.  ...  Furthermore, we design a prompt-based adversarial training method to improve the robustness of PLMs.  ...  Conclusion In this paper, we design prompting-based approaches for adversarial example generation and robustness enhancement.  ... 
arXiv:2203.10714v1 fatcat:kmp53fxw7rb3dazsk2gv7v7wza

On Robustness of Prompt-based Semantic Parsing with Large Pre-trained Language Model: An Empirical Study on Codex [article]

Terry Yue Zhuo, Zhuang Li, Yujin Huang, Fatemeh Shiri, Weiqing Wang, Gholamreza Haffari, Yuan-Fang Li
2023 arXiv   pre-print
This paper presents the first empirical study on the adversarial robustness of a large prompt-based language model of code, \codex.  ...  While it has been established that the robustness of smaller semantic parsers can be enhanced through adversarial training, this approach is not feasible for large language models in real-world scenarios  ...  Prompt-based semantic parsers learn to solve a new task by in-context learning, instructing the parsers to generate correct LFs by constructing the prompt with a few demonstration examples.  ... 
arXiv:2301.12868v3 fatcat:sc4wrzawlrfgbl6dcylhjo2wk4

Robust Safety Classifier for Large Language Models: Adversarial Prompt Shield [article]

Jinhwa Kim, Ali Derakhshan, Ian G. Harris
2023 arXiv   pre-print
These datasets are designed to fortify the safety classifier's robustness, and we investigate the consequences of incorporating adversarial examples into the training process.  ...  In response, our study introduces the Adversarial Prompt Shield (APS), a lightweight model that excels in detection accuracy and demonstrates resilience against adversarial prompts.  ...  While incorporating all possible adversarial suffixes in the training data can potentially enhance the robustness TABLE III: Examples of Bot Adversarial Noisy Data Generation.  ... 
arXiv:2311.00172v1 fatcat:pummn75u6zc3dolwkwevuph3hm

Breaking Down the Defenses: A Comparative Survey of Attacks on Large Language Models [article]

Arijit Ghosh Chowdhury, Md Mofijul Islam, Vaibhav Kumar, Faysal Hossain Shezan, Vaibhav Kumar, Vinija Jain, Aman Chadha
2024 arXiv   pre-print
Our objective is to offer a nuanced understanding of LLM attacks, foster awareness within the AI community, and inspire robust solutions to mitigate these risks in future developments.  ...  We delve into topics such as adversarial attacks that aim to manipulate model outputs, data poisoning that affects model training, and privacy concerns related to training data exploitation.  ...  Automated Adversarial Attacks Automated adversarial attacks use algorithms to generate and deploy adversarial examples, offering scalability without human expertise.  ... 
arXiv:2403.04786v2 fatcat:cnsgaz42cfevtlcarleyr6t4km

Securing Reliability: A Brief Overview on Enhancing In-Context Learning for Foundation Models [article]

Yunpeng Huang, Yaonan Gu, Jingwei Xu, Zhihong Zhu, Zhaorun Chen, Xiaoxing Ma
2024 arXiv   pre-print
We sincerely hope this paper can provide valuable insights for researchers and practitioners endeavoring to build safe and dependable FMs and foster a stable and consistent ICL environment, thereby unlocking  ...  As foundation models (FMs) continue to shape the landscape of AI, the in-context learning (ICL) paradigm thrives but also encounters issues such as toxicity, hallucination, disparity, adversarial vulnerability  ...  ., 2019) architecture to train a model that generates contextualized prompts based on a given (subject, relation) pair in knowledge probing tasks.  ... 
arXiv:2402.17671v1 fatcat:5kluxu5aavcrpnskcqpq5jlrpu

Whispers in the Machine: Confidentiality in LLM-integrated Systems [article]

Jonathan Evertz, Merlin Chlosta, Lea Schönherr, Thorsten Eisenhofer
2024 arXiv   pre-print
Building on this analysis, we propose a method for robustness fine-tuning, inspired by adversarial training.  ...  This enables us to compare the vulnerability of a model against confidentiality attacks and also the effectiveness of different defense strategies.  ...  Acknowledgments We would like to thank Avital Shafran and David Pape for their valuable feedback.  ... 
arXiv:2402.06922v1 fatcat:2pveirppararjbqebzaxegowku

Few-Shot Adversarial Prompt Learning on Vision-Language Models [article]

Yiwei Zhou, Xiaobo Xia, Zhiwei Lin, Bo Han, Tongliang Liu
2024 arXiv   pre-print
We also propose a novel training objective that enhances the consistency of multi-modal features while encourages differentiated uni-modal features between natural and adversarial examples.  ...  In this paper, to address these issues, we propose a few-shot adversarial prompt framework where adapting input sequences with limited data makes significant adversarial robustness improvement.  ...  It enhances cross-modal consistency between natural and adversarial examples to avoid potential robustness generalization failures, while encourages uni-modal divergence to introduce an adversarial aware  ... 
arXiv:2403.14774v1 fatcat:75oip73rzvcchgnnrtxnsnfq3m

DIFFender: Diffusion-Based Adversarial Defense against Patch Attacks [article]

Caixin Kang, Yinpeng Dong, Zhengyi Wang, Shouwei Ruan, Yubo Chen, Hang Su, Xingxing Wei
2023 arXiv   pre-print
Adversarial attacks, particularly patch attacks, pose significant threats to the robustness and reliability of deep learning models.  ...  Moreover, we propose a few-shot prompt-tuning algorithm to fine-tune the diffusion model, enabling the pre-trained diffusion model to adapt to the defense task easily.  ...  Based on these insights, we propose DIFFender, a defense approach that leverages a diffusion model to locate and restore patch attacks. The pipeline of DIFFender is illustrated in Fig. 2 .  ... 
arXiv:2306.09124v3 fatcat:zynbimxzuvaehdxrzj5bmwu6e4

PromptBench: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts [article]

Kaijie Zhu, Jindong Wang, Jiaheng Zhou, Zichen Wang, Hao Chen, Yidong Wang, Linyi Yang, Wei Ye, Yue Zhang, Neil Zhenqiang Gong, Xing Xie
2023 arXiv   pre-print
Our study generates 4788 adversarial prompts, meticulously evaluated over 8 tasks and 13 datasets. Our findings demonstrate that contemporary LLMs are not robust to adversarial prompts.  ...  In response to this vital need, we introduce PromptBench, a robustness benchmark designed to measure LLMs' resilience to adversarial prompts.  ...  We categorize the robustness enhancement (i.e., defenses to adversarial prompts) approaches into three main axes: strategies in the training phase, input preprocessing, and downstream fine-tuning.  ... 
arXiv:2306.04528v4 fatcat:fwiaszq76reatcin4yyasxjvfm

How Important are Good Method Names in Neural Code Generation? A Model Robustness Perspective

Guang Yang, Yu Zhou, Wenhua Yang, Tao Yue, Xiang Chen, Taolue Chen
2023 ACM Transactions on Software Engineering and Methodology  
In this paper, we study and demonstrate the potential of benefiting from method names to enhance the performance of PCGMs, from a model robustness perspective.  ...  The former attacks a PCGM by generating adversarial method names as part of the input, which are semantic and visual similar to the original input, but may trick the PCGM to generate completely unrelated  ...  In the next step, we resort to a retrieval-enhanced prompt training approach.  ... 
doi:10.1145/3630010 fatcat:utd6inpemva2tpodytxsercqny

RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content [article]

Zhuowen Yuan, Zidi Xiong, Yi Zeng, Ning Yu, Ruoxi Jia, Dawn Song, Bo Li
2024 arXiv   pre-print
model combining robust KNN with LLMs based on our data augmentation, RigorLLM offers a robust solution to harmful content moderation.  ...  By employing a multi-faceted approach that includes energy-based training data augmentation through Langevin dynamics, optimizing a safe suffix for inputs via minimax optimization, and integrating a fusion-based  ...  To address the above challenges, we propose a novel energy-based data generation approach to improve the quality of the embeddings of the limited training data by generating new examples for each harmful  ... 
arXiv:2403.13031v1 fatcat:bxygaofvd5bjpbp24dy3ttfahm

Visual Prompting for Adversarial Robustness [article]

Aochuan Chen, Peter Lorenz, Yuguang Yao, Pin-Yu Chen, Sijia Liu
2023 arXiv   pre-print
We investigate this problem and show that the vanilla VP approach is not effective in adversarial defense since a universal input prompt lacks the capacity for robust learning against sample-specific adversarial  ...  To circumvent it, we propose a new VP method, termed Class-wise Adversarial Visual Prompting (C-AVP), to generate class-wise visual prompts so as to not only leverage the strengths of ensemble prompts  ...  parameter to strike a balance between generalization and adversarial robustness [2] .  ... 
arXiv:2210.06284v2 fatcat:ufaluh6jbfdevawjil5liirbua

Risk Taxonomy, Mitigation, and Assessment Benchmarks of Large Language Model Systems [article]

Tianyu Cui, Yanling Wang, Chuanpu Fu, Yong Xiao, Sijia Li, Xinhao Deng, Yunpeng Liu, Qinglin Zhang, Ziyi Qiu, Peiyang Li, Zhixing Tan, Junwu Xiong (+4 others)
2024 arXiv   pre-print
and deployment, and an output module for exporting LLM-generated content.  ...  Therefore, there is a growing need to organize the existing studies and establish comprehensive taxonomies for the community.  ...  It includes 583,884 adversarial examples and covers a wide range of text-based attacks.  ... 
arXiv:2401.05778v1 fatcat:xmoqvjuo5fhxdjhxqqwnofl34y

To Generate or Not? Safety-Driven Unlearned Diffusion Models Are Still Easy To Generate Unsafe Images ... For Now [article]

Yimeng Zhang, Jinghan Jia, Xin Chen, Aochuan Chen, Yihua Zhang, Jiancheng Liu, Ke Ding, Sijia Liu
2024 arXiv   pre-print
We develop an effective and efficient adversarial prompt generation approach for DMs, termed UnlearnDiffAtk.  ...  Our results demonstrate the effectiveness and efficiency merits of UnlearnDiffAtk over the state-of-the-art adversarial prompt generation method and reveal the lack of robustness of current safety-driven  ...  ., 2023) to penalize the distance between an adversarially generated image (under the adversarial prompt) and a normally generated image.  ... 
arXiv:2310.11868v2 fatcat:rmozzv2pgvfh5ofwqbmdwtmqhi

Pre-trained Model Guided Fine-Tuning for Zero-Shot Adversarial Robustness [article]

Sibo Wang, Jie Zhang, Zheng Yuan, Shiguang Shan
2024 arXiv   pre-print
to enhance the model's zero-shot adversarial robustness.  ...  to imperceptible adversarial examples.  ...  As adversarial examples are generated based on a specific dataset, the robustness features acquired by the target model are confined and may excessively specialize to a particular downstream task dataset  ... 
arXiv:2401.04350v3 fatcat:mvh3zzoc65cy3ch2evgmnhkpsa
« Previous Showing results 1 — 15 out of 11,103 results