Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3270101.3270103acmconferencesArticle/Chapter ViewAbstractPublication PagesccsConference Proceedingsconference-collections
research-article

All You Need is "Love": Evading Hate Speech Detection

Published:15 January 2018Publication History

ABSTRACT

With the spread of social networks and their unfortunate use for hate speech, automatic detection of the latter has become a pressing problem. In this paper, we reproduce seven state-of-the-art hate speech detection models from prior work, and show that they perform well only when tested on the same type of data they were trained on. Based on these results, we argue that for successful hate speech detection, model architecture is less important than the type of data and labeling criteria. We further show that all proposed detection techniques are brittle against adversaries who can (automatically) insert typos, change word boundaries or add innocuous words to the original hate speech. A combination of these methods is also effective against Google Perspective - a cutting-edge solution from industry. Our experiments demonstrate that adversarial training does not completely mitigate the attacks, and using character-level features makes the models systematically more attack-resistant than using word-level features.

References

  1. Badjatiya, P., Gupta, S., Gupta, M., and Varma, V. Deep Learning for Hate Speech Detection in Tweets. In Proceedings of the 26th International Conference on World Wide Web Companion (2017), pp. 759--760. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Brennan, M., Afroz, S., and Greenstadt, R. Adversarial stylometry: Circumventing authorship recognition to preserve privacy and anonymity. ACM Transactions on Information and System Security 15, 3 (2011), 12:1--12:22. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Brennan, M., and Greenstadt, R. Practical Attacks Against Authorship Recognition Techniques. In Proceedings of the Twenty-First Conference on Innovative Applications of Artificial Intelligence (2009), K. Haigh and N. Rychtyckyj, Eds., pp. 60--65.Google ScholarGoogle Scholar
  4. Brown, A. What is hate speech? Part1: The myth of hate. Law and Philosophy 36, 4 (2017), 419--468.Google ScholarGoogle Scholar
  5. Burnap, P., and Williams, M. L. Cyber hate speech on twitter: An application of machine classification and statistical modeling for policy and decision making. Policy & Internet 7, 2 (2015), 223--242.Google ScholarGoogle ScholarCross RefCross Ref
  6. Chen, Y., Zhou, Y., Zhu, S., and Xu, H. Detecting offensive language in social media to protect adolescent online safety. In Proceedings of the 2012 International Conference on Privacy, Security, Risk and Trust and of the 2012 International Conference on Social Computing, PAS-SAT/SocialCom '12 (Amsterdam, 2012), pp. 71--80. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Davidson, T., Warmslay, D., Macy, M., and Weber, I. Automated Hate Speech Detection and the Problem of Offensive Language. In Proceedings of the 11th Conference on Web and Social Media (2017), pp. 512--515.Google ScholarGoogle Scholar
  8. Dinakar, K., Jones, B., Havasi, C., Lieberman, H., and Picard, R. Common sense reasoning for detection, prevention, and mitigation of cyberbullying. ACM Transactions on Interactive Intelligen Systems 2, 3 (2012), 18:1--18:30. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Gitari, N. D., Zuping, Z., Damien, H., and Long, J. A lexicon-based approach for hate speech detection. International Journal of Multimedia and Ubiquitous Engineering 10, 4 (2015), 215--230.Google ScholarGoogle ScholarCross RefCross Ref
  10. Goldberg, Y. A primer on neural network models for natural language processing. Journal of Artificial Intelligence Research 57, 1 (2016), 345--420. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Hosseini, H., Kannan, S., Zhang, B., and Poovendran, R. Deceiving Google's Perspective API Built for Detecting Toxic Comments. CoRR abs/1702.08138 (2017).Google ScholarGoogle Scholar
  12. Howard, J., and Ruder, S. Fine-tuned language models for text classification. CoRR abs/1801.06146 (2018).Google ScholarGoogle Scholar
  13. Kumar, S., and Shah, N. False information on web and social media: A survey. CoRR abs/1804.08559 (2018).Google ScholarGoogle Scholar
  14. Lowd, D., and Meek, C. Good word attacks on statistical spam filters. In CEAS (2005).Google ScholarGoogle Scholar
  15. Marpaung, J., Sain, M., and Lee, H.-J. Survey on malware evasion techniques: State of the art and challenges. In 14th International Conference on Advanced Communication Technology (2012), pp. 744--749.Google ScholarGoogle Scholar
  16. Mehdad, Y., and Tetreault, J. Do characters abuse more than words? In 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue (2016), pp. 299--303.Google ScholarGoogle ScholarCross RefCross Ref
  17. Merity, S., Xiong, C., Bradbury, J., and Socher, R. Pointer Sentinel Mixture Models. In Proceedings of the International Conference on Learning Representations (2017).Google ScholarGoogle Scholar
  18. Michel, J.-B., Shen, Y. K., Aiden, A. P., Veres, A., Gray, M. K., Team, T. G. B., Pickett, J. P., Hoiberg, D., Clancy, D., Norvig, P., Orwant, J., Pinker, S., Nowak, M. A., and Aiden, E. L. Quantitative Analysis of Culture Using Millions of Digitized Books. Science 6014, 331 (2011), 176--182.Google ScholarGoogle Scholar
  19. Mikolov, T., Chen, K., Corrado, G., and Dean, J. Efficient Estimation of Word Representations in Vector Space. ArXiv e-prints (2013).Google ScholarGoogle Scholar
  20. Nobata, C., Tetreault, J., Thomas, A., Mehdad, Y., and Chang, Y. Abusive language detection in online user content. In Proceedings of the 25th International Conference on World Wide Web. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Pennington, J., Socher, R., and Manning, C. D. Glove: Global vectors for word representation. In Empirical Methods in Natural Language Processing (EMNLP) (2014), pp. 1532--1543.Google ScholarGoogle ScholarCross RefCross Ref
  22. Perea, M., nabeitia, J. A. D., and Carreiras, M. R34D1NG W0RD5 W1TH NUMB3R5. Journal of Experimental Psychology: Human Perception and Performance 34 (2008), 237--241.Google ScholarGoogle ScholarCross RefCross Ref
  23. Rayner, K., White, S., Johnson, R., and Liversedge, S. Raeding wrods with jubmled lettres: there is a cost. Psychological Science 17, 3 (2006), 192--193.Google ScholarGoogle ScholarCross RefCross Ref
  24. Schmidt, A., and Wiegand, M. A Survey on Hate Speech Detection using Natural Language Processing. In Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media (2017), pp. 1--10.Google ScholarGoogle ScholarCross RefCross Ref
  25. Stern, H., Mason, J., and Shepherd, M. A linguistics-based attack on personalised statistical e-mail classifiers. Tech. rep., Dalhousie University, 2004.Google ScholarGoogle Scholar
  26. Warner, W., and Hirschberg, J. Detecting hate speech on the world wide web. In Proceedings of the Second Workshop on Language in Social Media, LSM '12 (2012), pp. 19--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Waseem, Z., and Hovy, D. Hateful symbols or hateful people? predictive features for hate speech detection on twitter. In Proceedings of the NAACL Student Research Workshop (2016), pp. 88--93.Google ScholarGoogle ScholarCross RefCross Ref
  28. Wulczyn, E., Thain, N., and Dixon, L. Ex Machina: Personal Attacks Seen at Scale. In Proceedings of the 26th International Conference on World Wide Web (2017), pp. 1391--1399. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Zhang, Z., Robinson, D., and Tepper, J. Detecting Hate Speech on Twitter Using a Convolution-GRU Based Deep Neural Network. In Proceedings of ESWC (2018), pp. 745--760.Google ScholarGoogle ScholarCross RefCross Ref
  30. Zhou, Y., Jorgensen, Z., and Inge, W. M. Combating good word attacks on statistical spam filters with multiple instance learning. 19th IEEE International Conference on Tools with Artificial Intelligence(ICTAI 2007) 2 (2007), 298--305. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. All You Need is "Love": Evading Hate Speech Detection

                Recommendations

                Comments

                Login options

                Check if you have access through your login credentials or your institution to get full access on this article.

                Sign in
                • Published in

                  cover image ACM Conferences
                  AISec '18: Proceedings of the 11th ACM Workshop on Artificial Intelligence and Security
                  October 2018
                  103 pages
                  ISBN:9781450360043
                  DOI:10.1145/3270101

                  Copyright © 2018 ACM

                  Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

                  Publisher

                  Association for Computing Machinery

                  New York, NY, United States

                  Publication History

                  • Published: 15 January 2018

                  Permissions

                  Request permissions about this article.

                  Request Permissions

                  Check for updates

                  Qualifiers

                  • research-article

                  Acceptance Rates

                  AISec '18 Paper Acceptance Rate9of32submissions,28%Overall Acceptance Rate94of231submissions,41%

                  Upcoming Conference

                  CCS '24
                  ACM SIGSAC Conference on Computer and Communications Security
                  October 14 - 18, 2024
                  Salt Lake City , UT , USA

                PDF Format

                View or Download as a PDF file.

                PDF

                eReader

                View online with eReader.

                eReader