Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

PromptAttack: Prompt-Based Attack for Language Models via Gradient Search

  • Conference paper
  • First Online:
Natural Language Processing and Chinese Computing (NLPCC 2022)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13551))

Abstract

As the pre-trained language models (PLMs) continue to grow, so do the hardware and data requirements for fine-tuning PLMs. Therefore, the researchers have come up with a lighter method called Prompt Learning. However, during the investigations, we observe that the prompt learning methods are vulnerable and can easily be attacked by some illegally constructed prompts, resulting in classification errors, and serious security problems for PLMs. Most of the current research ignores the security issue of prompt-based methods. Therefore, in this paper, we propose a malicious prompt template construction method (PromptAttack) to probe the security performance of PLMs. Several unfriendly template construction approaches are investigated to guide the model to misclassify the task. Extensive experiments on three datasets and three PLMs prove the effectiveness of our proposed approach PromptAttack. We also conduct experiments to verify that our method is applicable in few-shot scenarios.

This research is supported by the National Natural Science Foundation of China (No. 62106105) and the National Key R &D Program of China (No. 2021YFB3100700).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Ebrahimi, J., Rao, A., Lowd, D., Dou, D.: Hotflip: White-box adversarial examples for text classification. arXiv preprint arXiv:1712.06751 (2017)

  2. Gu, T., Liu, K., Dolan-Gavitt, B., Garg, S.: Badnets: evaluating backdooring attacks on deep neural networks. IEEE Access 7, 47230–47244 (2019)

    Article  Google Scholar 

  3. Han, X., et al.: Pre-trained models: past, present and future. AI Open 2, 225–250 (2021)

    Article  Google Scholar 

  4. Kurita, K., Michel, P., Neubig, G.: Weight poisoning attacks on pre-trained models. arXiv preprint arXiv:2004.06660 (2020)

  5. Li, J., Ji, S., Du, T., Li, B., Wang, T.: Textbugger: generating adversarial text against real-world applications. arXiv preprint arXiv:1812.05271 (2018)

  6. Li, X.L., Liang, P.: Prefix-tuning: optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190 (2021)

  7. Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., Neubig, G.: Pre-train, prompt, and predict: a systematic survey of prompting methods in natural language processing. arXiv preprint arXiv:2107.13586 (2021)

  8. Liu, X., et al.: Gpt understands, too. arXiv preprint arXiv:2103.10385 (2021)

  9. Papernot, N., McDaniel, P., Swami, A., Harang, R.: Crafting adversarial input sequences for recurrent neural networks. In: MILCOM 2016–2016 IEEE Military Communications Conference, pp. 49–54. IEEE (2016)

    Google Scholar 

  10. Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, pp. 8748–8763. PMLR (2021)

    Google Scholar 

  11. Schick, T., Schmid, H., Schütze, H.: Automatically identifying words that can serve as labels for few-shot text classification. arXiv preprint arXiv:2010.13641 (2020)

  12. Schick, T., Schütze, H.: Exploiting cloze questions for few shot text classification and natural language inference. arXiv preprint arXiv:2001.07676 (2020)

  13. Shin, T., Razeghi, Y., Logan IV, R.L., Wallace, E., Singh, S.: Autoprompt: Eliciting knowledge from language models with automatically generated prompts. arXiv preprint arXiv:2010.15980 (2020)

  14. Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. pp. 1631–1642 (2013)

    Google Scholar 

  15. Wallace, E., Feng, S., Kandpal, N., Gardner, M., Singh, S.: Universal adversarial triggers for attacking and analyzing nlp. arXiv preprint arXiv:1908.07125 (2019)

  16. Wolf, T., et al.: Huggingface’s transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771 (2019)

  17. Xu, L., Chen, Y., Cui, G., Gao, H., Liu, Z.: Exploring the universal vulnerability of prompt-based learning paradigm. arXiv preprint arXiv:2204.05239 (2022)

  18. Yao, Y., Zhang, A., Zhang, Z., Liu, Z., Chua, T.S., Sun, M.: Cpt: colorful prompt tuning for pre-trained vision-language models. arXiv preprint arXiv:2109.11797 (2021)

  19. Zhou, K., Yang, J., Loy, C.C., Liu, Z.: Learning to prompt for vision-language models. arXiv preprint arXiv:2109.01134 (2021)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Piji Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Shi, Y., Li, P., Yin, C., Han, Z., Zhou, L., Liu, Z. (2022). PromptAttack: Prompt-Based Attack for Language Models via Gradient Search. In: Lu, W., Huang, S., Hong, Y., Zhou, X. (eds) Natural Language Processing and Chinese Computing. NLPCC 2022. Lecture Notes in Computer Science(), vol 13551. Springer, Cham. https://doi.org/10.1007/978-3-031-17120-8_53

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-17120-8_53

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-17119-2

  • Online ISBN: 978-3-031-17120-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics