research-article

All You Need is "Love": Evading Hate Speech Detection

Authors:
Tommi Gröndahl

Aalto University, Espoo, Finland

Aalto University, Espoo, Finland
View Profile

,
Luca Pajola

Aalto University, Espoo, Finland

Aalto University, Espoo, Finland
View Profile

,
Mika Juuti

Aalto University, Espoo, Finland

Aalto University, Espoo, Finland
View Profile

,
Mauro Conti

University of Padua, Padua, Italy

University of Padua, Padua, Italy
View Profile

,
N. Asokan

Aalto University, Espoo, Finland

Aalto University, Espoo, Finland
View Profile

AISec '18: Proceedings of the 11th ACM Workshop on Artificial Intelligence and SecurityOctober 2018Pages 2–12https://doi.org/10.1145/3270101.3270103

Published:15 January 2018Publication History

AISec '18: Proceedings of the 11th ACM Workshop on Artificial Intelligence and Security

Pages 2–12

ABSTRACT

With the spread of social networks and their unfortunate use for hate speech, automatic detection of the latter has become a pressing problem. In this paper, we reproduce seven state-of-the-art hate speech detection models from prior work, and show that they perform well only when tested on the same type of data they were trained on. Based on these results, we argue that for successful hate speech detection, model architecture is less important than the type of data and labeling criteria. We further show that all proposed detection techniques are brittle against adversaries who can (automatically) insert typos, change word boundaries or add innocuous words to the original hate speech. A combination of these methods is also effective against Google Perspective - a cutting-edge solution from industry. Our experiments demonstrate that adversarial training does not completely mitigate the attacks, and using character-level features makes the models systematically more attack-resistant than using word-level features.

References

Badjatiya, P., Gupta, S., Gupta, M., and Varma, V. Deep Learning for Hate Speech Detection in Tweets. In Proceedings of the 26th International Conference on World Wide Web Companion (2017), pp. 759--760. Google ScholarDigital Library
Brennan, M., Afroz, S., and Greenstadt, R. Adversarial stylometry: Circumventing authorship recognition to preserve privacy and anonymity. ACM Transactions on Information and System Security 15, 3 (2011), 12:1--12:22. Google ScholarDigital Library
Brennan, M., and Greenstadt, R. Practical Attacks Against Authorship Recognition Techniques. In Proceedings of the Twenty-First Conference on Innovative Applications of Artificial Intelligence (2009), K. Haigh and N. Rychtyckyj, Eds., pp. 60--65.Google Scholar
Brown, A. What is hate speech? Part1: The myth of hate. Law and Philosophy 36, 4 (2017), 419--468.Google Scholar
Burnap, P., and Williams, M. L. Cyber hate speech on twitter: An application of machine classification and statistical modeling for policy and decision making. Policy & Internet 7, 2 (2015), 223--242.Google ScholarCross Ref
Chen, Y., Zhou, Y., Zhu, S., and Xu, H. Detecting offensive language in social media to protect adolescent online safety. In Proceedings of the 2012 International Conference on Privacy, Security, Risk and Trust and of the 2012 International Conference on Social Computing, PAS-SAT/SocialCom '12 (Amsterdam, 2012), pp. 71--80. Google ScholarDigital Library
Davidson, T., Warmslay, D., Macy, M., and Weber, I. Automated Hate Speech Detection and the Problem of Offensive Language. In Proceedings of the 11th Conference on Web and Social Media (2017), pp. 512--515.Google Scholar
Dinakar, K., Jones, B., Havasi, C., Lieberman, H., and Picard, R. Common sense reasoning for detection, prevention, and mitigation of cyberbullying. ACM Transactions on Interactive Intelligen Systems 2, 3 (2012), 18:1--18:30. Google ScholarDigital Library
Gitari, N. D., Zuping, Z., Damien, H., and Long, J. A lexicon-based approach for hate speech detection. International Journal of Multimedia and Ubiquitous Engineering 10, 4 (2015), 215--230.Google ScholarCross Ref
Goldberg, Y. A primer on neural network models for natural language processing. Journal of Artificial Intelligence Research 57, 1 (2016), 345--420. Google ScholarDigital Library
Hosseini, H., Kannan, S., Zhang, B., and Poovendran, R. Deceiving Google's Perspective API Built for Detecting Toxic Comments. CoRR abs/1702.08138 (2017).Google Scholar
Howard, J., and Ruder, S. Fine-tuned language models for text classification. CoRR abs/1801.06146 (2018).Google Scholar
Kumar, S., and Shah, N. False information on web and social media: A survey. CoRR abs/1804.08559 (2018).Google Scholar
Lowd, D., and Meek, C. Good word attacks on statistical spam filters. In CEAS (2005).Google Scholar
Marpaung, J., Sain, M., and Lee, H.-J. Survey on malware evasion techniques: State of the art and challenges. In 14th International Conference on Advanced Communication Technology (2012), pp. 744--749.Google Scholar
Mehdad, Y., and Tetreault, J. Do characters abuse more than words? In 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue (2016), pp. 299--303.Google ScholarCross Ref
Merity, S., Xiong, C., Bradbury, J., and Socher, R. Pointer Sentinel Mixture Models. In Proceedings of the International Conference on Learning Representations (2017).Google Scholar
Michel, J.-B., Shen, Y. K., Aiden, A. P., Veres, A., Gray, M. K., Team, T. G. B., Pickett, J. P., Hoiberg, D., Clancy, D., Norvig, P., Orwant, J., Pinker, S., Nowak, M. A., and Aiden, E. L. Quantitative Analysis of Culture Using Millions of Digitized Books. Science 6014, 331 (2011), 176--182.Google Scholar
Mikolov, T., Chen, K., Corrado, G., and Dean, J. Efficient Estimation of Word Representations in Vector Space. ArXiv e-prints (2013).Google Scholar
Nobata, C., Tetreault, J., Thomas, A., Mehdad, Y., and Chang, Y. Abusive language detection in online user content. In Proceedings of the 25th International Conference on World Wide Web. Google ScholarDigital Library
Pennington, J., Socher, R., and Manning, C. D. Glove: Global vectors for word representation. In Empirical Methods in Natural Language Processing (EMNLP) (2014), pp. 1532--1543.Google ScholarCross Ref
Perea, M., nabeitia, J. A. D., and Carreiras, M. R34D1NG W0RD5 W1TH NUMB3R5. Journal of Experimental Psychology: Human Perception and Performance 34 (2008), 237--241.Google ScholarCross Ref
Rayner, K., White, S., Johnson, R., and Liversedge, S. Raeding wrods with jubmled lettres: there is a cost. Psychological Science 17, 3 (2006), 192--193.Google ScholarCross Ref
Schmidt, A., and Wiegand, M. A Survey on Hate Speech Detection using Natural Language Processing. In Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media (2017), pp. 1--10.Google ScholarCross Ref
Stern, H., Mason, J., and Shepherd, M. A linguistics-based attack on personalised statistical e-mail classifiers. Tech. rep., Dalhousie University, 2004.Google Scholar
Warner, W., and Hirschberg, J. Detecting hate speech on the world wide web. In Proceedings of the Second Workshop on Language in Social Media, LSM '12 (2012), pp. 19--26. Google ScholarDigital Library
Waseem, Z., and Hovy, D. Hateful symbols or hateful people? predictive features for hate speech detection on twitter. In Proceedings of the NAACL Student Research Workshop (2016), pp. 88--93.Google ScholarCross Ref
Wulczyn, E., Thain, N., and Dixon, L. Ex Machina: Personal Attacks Seen at Scale. In Proceedings of the 26th International Conference on World Wide Web (2017), pp. 1391--1399. Google ScholarDigital Library
Zhang, Z., Robinson, D., and Tepper, J. Detecting Hate Speech on Twitter Using a Convolution-GRU Based Deep Neural Network. In Proceedings of ESWC (2018), pp. 745--760.Google ScholarCross Ref
Zhou, Y., Jorgensen, Z., and Inge, W. M. Combating good word attacks on statistical spam filters with multiple instance learning. 19th IEEE International Conference on Tools with Artificial Intelligence(ICTAI 2007) 2 (2007), 298--305. Google ScholarDigital Library

Index Terms

All You Need is "Love": Evading Hate Speech Detection
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
  2. Machine learning
    1. Learning paradigms
      1. Multi-task learning
        Transfer learning
      2. Supervised learning
        Supervised learning by classification
    2. Machine learning approaches
      1. Neural networks
2. Social and professional topics
  1. Computing / technology policy
    1. Censorship

Recommendations

On Improving the Effectiveness of Adversarial Training
IWSPA '19: Proceedings of the ACM International Workshop on Security and Privacy Analytics

Machine learning models, including neural networks, are vulnerable to adversarial examples, which are adversarial inputs generated from legitimate examples by applying small perturbations to fool machine learning models to misclassify. Algorithms that ...
Read More
Adversarial Machine Learning Attacks and Defense Methods in the Cyber Security Domain

In recent years, machine learning algorithms, and more specifically deep learning algorithms, have been widely used in many fields, including cyber security. However, machine learning systems are vulnerable to adversarial attacks, and this limits the ...
Read More
Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning
CCS '18: Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security

Deep neural networks and machine-learning algorithms are pervasively used in several applications, ranging from computer vision to computer security. In most of these applications, the learning algorithm has to face intelligent and adaptive attackers ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
AISec '18: Proceedings of the 11th ACM Workshop on Artificial Intelligence and Security
October 2018
103 pages
ISBN:9781450360043
DOI:10.1145/3270101
Program Chairs:
Sadia Afroz
ICSI, UC Berkeley, USA
,
Battista Biggio
Univ Rennes, CNRS, IRISA, France
,
Yuval Elovici
Ben-Gurion University of the Negev, Israel
,
David Freeman
Facebook, USA
,
Asaf Shabtai
Ben-Gurion University of the Negev, Israel
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 January 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
adversarial examples
adversarial training
classification
deep learning
evasion attacks
hate speech
logistic regression
neural networks
supervised learning
Qualifiers
- research-article
Conference

Acceptance Rates
AISec '18 Paper Acceptance Rate9of32submissions,28%Overall Acceptance Rate94of231submissions,41%
More
Upcoming Conference
CCS '24

Sponsor:

sigsac

ACM SIGSAC Conference on Computer and Communications Security

October 14 - 18, 2024

Salt Lake City , UT , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 91
  Total Citations
  View Citations
- 1,962
  Total Downloads
- Downloads (Last 12 months)289
- Downloads (Last 6 weeks)40
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

All You Need is "Love": Evading Hate Speech Detection

AISec '18: Proceedings of the 11th ACM Workshop on Artificial Intelligence and Security

ABSTRACT

References

Cited By

Index Terms

Recommendations

On Improving the Effectiveness of Adversarial Training

Adversarial Machine Learning Attacks and Defense Methods in the Cyber Security Domain

Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning