ABSTRACT
Using statistical machine learning for making security decisions introduces new vulnerabilities in large scale systems. This paper shows how an adversary can exploit statistical machine learning, as used in the SpamBayes spam filter, to render it useless--even if the adversary's access is limited to only 1% of the training messages. We further demonstrate a new class of focused attacks that successfully prevent victims from receiving specific email messages. Finally, we introduce two new types of defenses against these attacks.
- Marco Barreno, Blaine Nelson, Russell Sears, Anthony D. Joseph, and J. D. Tygar. Can machine learning be secure? In Proceedings of the ACM Symposium on InformAtion, Computer, and Communications Security (ASIACCS'06), March 2006. Google ScholarDigital Library
- Simon P. Chung and Aloysius K. Mok. Allergy attack against automatic signature generation. In Recent Advances in Intrusion Detection (RAID), pages 61-80, 2006. Google ScholarDigital Library
- Simon P. Chung and Aloysius K. Mok. Advanced allergy attacks: Does a corpus really help? In Recent Advances in Intrusion Detection (RAID), pages 236-255, 2007. Google ScholarDigital Library
- Gordon Cormack and Thomas Lynam. Spam corpus creation for TREC. In Proceedings of the Second Conference on Email and Anti-Spam (CEAS 2005), July 2005.Google Scholar
- Nilesh Dalvi, Pedro Domingos, Mausam, Sumit Sanghai, and Deepak Verma. Adversarial classification. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 99-108, Seattle, WA, 2004. ACM Press. Google ScholarDigital Library
- Ronald A. Fisher. Question 14: Combining independent tests of significance. American Statistician, 2(5):30-30J, 1948.Google Scholar
- Paul Graham. A plan for spam. http://www. paulgraham.com/spam.html, August 2002.Google Scholar
- Christoph Karlberger, Günther Bayler, Christopher Kruegel, and Engin Kirda. Exploiting redundancy in natural language to penetrate Bayesian spam filters. In WOOT'07: Proceedings of the first conference on First USENIX Workshop on Offensive Technologies, 2007. Google ScholarDigital Library
- Michael Kearns and Ming Li. Learning in the presence of malicious errors. SIAM Journal on Computing, 22(4):807-837, 1993. Google ScholarDigital Library
- Hyang-Ah Kim and Brad Karp. Autograph: Toward automated, distributed worm signature detection. In USENIX Security Symposium, August 2004. Google ScholarDigital Library
- Bryan Klimt and Yiming Yang. Introducing the Enron corpus. In Proceedings of the First Conference on Email and Anti-Spam (CEAS), July 2004.Google Scholar
- Daniel Lowd and Christopher Meek. Adversarial learning. In Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 641-647, 2005. Google ScholarDigital Library
- Daniel Lowd and Christopher Meek. Good word attacks on statistical spam filters. In Proceedings of the Second Conference on Email and Anti-Spam (CEAS), 2005.Google Scholar
- Tony Meyer and Brendon Whateley. SpamBayes: Effective open-source, Bayesian based, email classification system. In Proceedings of the First Conference on Email and Anti-Spam (CEAS), July 2004.Google Scholar
- James Newsome, Brad Karp, and Dawn Song. Polygraph: Automatically generating signatures for polymorphic worms. In Proceedings of the IEEE Symposium on Security and Privacy, pages 226-241, May 2005. Google ScholarDigital Library
- James Newsome, Brad Karp, and Dawn Song. Paragraph: Thwarting signature learning by training maliciously. In Proceedings of the 9th International Symposium on Recent Advances in Intrusion Detection (RAID 2006), September 2006. Google ScholarDigital Library
- Gary Robinson. A statistical approach to the spam problem. Linux Journal, March 2003. Google ScholarDigital Library
- Cyrus Shaoul and Chris Westbury. A USENET corpus (2005-2007), October 2007. http: //www.psych.ualberta.ca/~westburylab/ downloads/usenetcorpus.download.html.Google Scholar
- Gregory L. Wittel and S. Felix Wu. On attacking statistical spam filters. In Proceedings of the First Conference on Email and Anti-Spam (CEAS), 2004.Google Scholar
Index Terms
- Exploiting machine learning to subvert your spam filter
Recommendations
Adversarial machine learning
AISec '11: Proceedings of the 4th ACM workshop on Security and artificial intelligenceIn this paper (expanded from an invited talk at AISEC 2010), we discuss an emerging field of study: adversarial machine learning---the study of effective machine learning techniques against an adversarial opponent. In this paper, we: give a taxonomy for ...
Adversarial machine learning for spam filters
ARES '20: Proceedings of the 15th International Conference on Availability, Reliability and SecurityEmail spam filters based on machine learning techniques are widely deployed in today's organizations. As our society relies more on artificial intelligence (AI), the security of AI, especially the machine learning algorithms, becomes increasingly ...
Effect of Spam Filter on SPOT Algorithm
WCI '15: Proceedings of the Third International Symposium on Women in Computing and InformaticsCompromised machine is any computing resource whose availability, confidentiality, integrity has been negatively impacted either intentionally or unintentionally, by an untrusted source. These machines are often used to elevate various security attacks ...
Comments