Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/1387709.1387716guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Exploiting machine learning to subvert your spam filter

Published:15 April 2008Publication History

ABSTRACT

Using statistical machine learning for making security decisions introduces new vulnerabilities in large scale systems. This paper shows how an adversary can exploit statistical machine learning, as used in the SpamBayes spam filter, to render it useless--even if the adversary's access is limited to only 1% of the training messages. We further demonstrate a new class of focused attacks that successfully prevent victims from receiving specific email messages. Finally, we introduce two new types of defenses against these attacks.

References

  1. Marco Barreno, Blaine Nelson, Russell Sears, Anthony D. Joseph, and J. D. Tygar. Can machine learning be secure? In Proceedings of the ACM Symposium on InformAtion, Computer, and Communications Security (ASIACCS'06), March 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Simon P. Chung and Aloysius K. Mok. Allergy attack against automatic signature generation. In Recent Advances in Intrusion Detection (RAID), pages 61-80, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Simon P. Chung and Aloysius K. Mok. Advanced allergy attacks: Does a corpus really help? In Recent Advances in Intrusion Detection (RAID), pages 236-255, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Gordon Cormack and Thomas Lynam. Spam corpus creation for TREC. In Proceedings of the Second Conference on Email and Anti-Spam (CEAS 2005), July 2005.Google ScholarGoogle Scholar
  5. Nilesh Dalvi, Pedro Domingos, Mausam, Sumit Sanghai, and Deepak Verma. Adversarial classification. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 99-108, Seattle, WA, 2004. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Ronald A. Fisher. Question 14: Combining independent tests of significance. American Statistician, 2(5):30-30J, 1948.Google ScholarGoogle Scholar
  7. Paul Graham. A plan for spam. http://www. paulgraham.com/spam.html, August 2002.Google ScholarGoogle Scholar
  8. Christoph Karlberger, Günther Bayler, Christopher Kruegel, and Engin Kirda. Exploiting redundancy in natural language to penetrate Bayesian spam filters. In WOOT'07: Proceedings of the first conference on First USENIX Workshop on Offensive Technologies, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Michael Kearns and Ming Li. Learning in the presence of malicious errors. SIAM Journal on Computing, 22(4):807-837, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Hyang-Ah Kim and Brad Karp. Autograph: Toward automated, distributed worm signature detection. In USENIX Security Symposium, August 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Bryan Klimt and Yiming Yang. Introducing the Enron corpus. In Proceedings of the First Conference on Email and Anti-Spam (CEAS), July 2004.Google ScholarGoogle Scholar
  12. Daniel Lowd and Christopher Meek. Adversarial learning. In Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 641-647, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Daniel Lowd and Christopher Meek. Good word attacks on statistical spam filters. In Proceedings of the Second Conference on Email and Anti-Spam (CEAS), 2005.Google ScholarGoogle Scholar
  14. Tony Meyer and Brendon Whateley. SpamBayes: Effective open-source, Bayesian based, email classification system. In Proceedings of the First Conference on Email and Anti-Spam (CEAS), July 2004.Google ScholarGoogle Scholar
  15. James Newsome, Brad Karp, and Dawn Song. Polygraph: Automatically generating signatures for polymorphic worms. In Proceedings of the IEEE Symposium on Security and Privacy, pages 226-241, May 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. James Newsome, Brad Karp, and Dawn Song. Paragraph: Thwarting signature learning by training maliciously. In Proceedings of the 9th International Symposium on Recent Advances in Intrusion Detection (RAID 2006), September 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Gary Robinson. A statistical approach to the spam problem. Linux Journal, March 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Cyrus Shaoul and Chris Westbury. A USENET corpus (2005-2007), October 2007. http: //www.psych.ualberta.ca/~westburylab/ downloads/usenetcorpus.download.html.Google ScholarGoogle Scholar
  19. Gregory L. Wittel and S. Felix Wu. On attacking statistical spam filters. In Proceedings of the First Conference on Email and Anti-Spam (CEAS), 2004.Google ScholarGoogle Scholar

Index Terms

  1. Exploiting machine learning to subvert your spam filter

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image Guide Proceedings
            LEET'08: Proceedings of the 1st Usenix Workshop on Large-Scale Exploits and Emergent Threats
            April 2008
            96 pages

            Publisher

            USENIX Association

            United States

            Publication History

            • Published: 15 April 2008

            Qualifiers

            • Article