PromptAttack: Prompt-Based Attack for Language Models via Gradient Search

Shi, Yundi; Li, Piji; Yin, Changchun; Han, Zhaoyang; Zhou, Lu; Liu, Zhe

doi:10.1007/978-3-031-17120-8_53

Yundi Shi¹¹,
Piji Li¹¹,
Changchun Yin¹¹,
Zhaoyang Han¹¹,
Lu Zhou¹¹ &
…
Zhe Liu¹¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13551))

Included in the following conference series:

CCF International Conference on Natural Language Processing and Chinese Computing

2764 Accesses
1 Citations

Abstract

As the pre-trained language models (PLMs) continue to grow, so do the hardware and data requirements for fine-tuning PLMs. Therefore, the researchers have come up with a lighter method called Prompt Learning. However, during the investigations, we observe that the prompt learning methods are vulnerable and can easily be attacked by some illegally constructed prompts, resulting in classification errors, and serious security problems for PLMs. Most of the current research ignores the security issue of prompt-based methods. Therefore, in this paper, we propose a malicious prompt template construction method (PromptAttack) to probe the security performance of PLMs. Several unfriendly template construction approaches are investigated to guide the model to misclassify the task. Extensive experiments on three datasets and three PLMs prove the effectiveness of our proposed approach PromptAttack. We also conduct experiments to verify that our method is applicable in few-shot scenarios.

This research is supported by the National Natural Science Foundation of China (No. 62106105) and the National Key R &D Program of China (No. 2021YFB3100700).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

COVER: A Heuristic Greedy Adversarial Attack on Prompt-Based Learning in Language Models

A prompt-based approach to adversarial example generation and robustness enhancement

Article 18 December 2023

Vulnerability Analysis of Continuous Prompts for Pre-trained Language Models

References

Ebrahimi, J., Rao, A., Lowd, D., Dou, D.: Hotflip: White-box adversarial examples for text classification. arXiv preprint arXiv:1712.06751 (2017)
Gu, T., Liu, K., Dolan-Gavitt, B., Garg, S.: Badnets: evaluating backdooring attacks on deep neural networks. IEEE Access 7, 47230–47244 (2019)
Article Google Scholar
Han, X., et al.: Pre-trained models: past, present and future. AI Open 2, 225–250 (2021)
Article Google Scholar
Kurita, K., Michel, P., Neubig, G.: Weight poisoning attacks on pre-trained models. arXiv preprint arXiv:2004.06660 (2020)
Li, J., Ji, S., Du, T., Li, B., Wang, T.: Textbugger: generating adversarial text against real-world applications. arXiv preprint arXiv:1812.05271 (2018)
Li, X.L., Liang, P.: Prefix-tuning: optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190 (2021)
Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., Neubig, G.: Pre-train, prompt, and predict: a systematic survey of prompting methods in natural language processing. arXiv preprint arXiv:2107.13586 (2021)
Liu, X., et al.: Gpt understands, too. arXiv preprint arXiv:2103.10385 (2021)
Papernot, N., McDaniel, P., Swami, A., Harang, R.: Crafting adversarial input sequences for recurrent neural networks. In: MILCOM 2016–2016 IEEE Military Communications Conference, pp. 49–54. IEEE (2016)
Google Scholar
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, pp. 8748–8763. PMLR (2021)
Google Scholar
Schick, T., Schmid, H., Schütze, H.: Automatically identifying words that can serve as labels for few-shot text classification. arXiv preprint arXiv:2010.13641 (2020)
Schick, T., Schütze, H.: Exploiting cloze questions for few shot text classification and natural language inference. arXiv preprint arXiv:2001.07676 (2020)
Shin, T., Razeghi, Y., Logan IV, R.L., Wallace, E., Singh, S.: Autoprompt: Eliciting knowledge from language models with automatically generated prompts. arXiv preprint arXiv:2010.15980 (2020)
Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. pp. 1631–1642 (2013)
Google Scholar
Wallace, E., Feng, S., Kandpal, N., Gardner, M., Singh, S.: Universal adversarial triggers for attacking and analyzing nlp. arXiv preprint arXiv:1908.07125 (2019)
Wolf, T., et al.: Huggingface’s transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771 (2019)
Xu, L., Chen, Y., Cui, G., Gao, H., Liu, Z.: Exploring the universal vulnerability of prompt-based learning paradigm. arXiv preprint arXiv:2204.05239 (2022)
Yao, Y., Zhang, A., Zhang, Z., Liu, Z., Chua, T.S., Sun, M.: Cpt: colorful prompt tuning for pre-trained vision-language models. arXiv preprint arXiv:2109.11797 (2021)
Zhou, K., Yang, J., Loy, C.C., Liu, Z.: Learning to prompt for vision-language models. arXiv preprint arXiv:2109.01134 (2021)

Download references

Author information

Authors and Affiliations

Nanjing University of Aeronautics and Astronautics, Nanjing, Jiangsu, China
Yundi Shi, Piji Li, Changchun Yin, Zhaoyang Han, Lu Zhou & Zhe Liu

Authors

Yundi Shi
View author publications
You can also search for this author in PubMed Google Scholar
Piji Li
View author publications
You can also search for this author in PubMed Google Scholar
Changchun Yin
View author publications
You can also search for this author in PubMed Google Scholar
Zhaoyang Han
View author publications
You can also search for this author in PubMed Google Scholar
Lu Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Zhe Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Piji Li .

Editor information

Editors and Affiliations

Singapore University of Technology and Design, Singapore, Singapore
Wei Lu
Nanjing University, Nanjing, China
Shujian Huang
Soochow University, Suzhou, China
Yu Hong
Soochow University, Soochow, China
Xiabing Zhou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shi, Y., Li, P., Yin, C., Han, Z., Zhou, L., Liu, Z. (2022). PromptAttack: Prompt-Based Attack for Language Models via Gradient Search. In: Lu, W., Huang, S., Hong, Y., Zhou, X. (eds) Natural Language Processing and Chinese Computing. NLPCC 2022. Lecture Notes in Computer Science(), vol 13551. Springer, Cham. https://doi.org/10.1007/978-3-031-17120-8_53

Download citation

DOI: https://doi.org/10.1007/978-3-031-17120-8_53
Published: 24 September 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-17119-2
Online ISBN: 978-3-031-17120-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the China Computer Federation (CCF) (opens in a new tab)

PromptAttack: Prompt-Based Attack for Language Models via Gradient Search

Abstract

Access this chapter

Similar content being viewed by others

COVER: A Heuristic Greedy Adversarial Attack on Prompt-Based Learning in Language Models

A prompt-based approach to adversarial example generation and robustness enhancement

Vulnerability Analysis of Continuous Prompts for Pre-trained Language Models

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

PromptAttack: Prompt-Based Attack for Language Models via Gradient Search

Abstract

Access this chapter

Similar content being viewed by others

COVER: A Heuristic Greedy Adversarial Attack on Prompt-Based Learning in Language Models

A prompt-based approach to adversarial example generation and robustness enhancement

Vulnerability Analysis of Continuous Prompts for Pre-trained Language Models

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation