Article

Exploiting machine learning to subvert your spam filter

Authors:
Blaine Nelson

University of California, Berkeley

University of California, Berkeley
View Profile

,
Marco Barreno

University of California, Berkeley

University of California, Berkeley
View Profile

,
Fuching Jack Chi

University of California, Berkeley

University of California, Berkeley
View Profile

,
Anthony D. Joseph

University of California, Berkeley

University of California, Berkeley
View Profile

,
Benjamin I. P. Rubinstein

University of California, Berkeley

University of California, Berkeley
View Profile

,
Udam Saini

University of California, Berkeley

University of California, Berkeley
View Profile

,
Charles Sutton

University of California, Berkeley

University of California, Berkeley
View Profile

,
J. D. Tygar

University of California, Berkeley

University of California, Berkeley
View Profile

,
Kai Xia

University of California, Berkeley

University of California, Berkeley
View Profile

LEET'08: Proceedings of the 1st Usenix Workshop on Large-Scale Exploits and Emergent ThreatsApril 2008Article No.: 7Pages 1–9

Published:15 April 2008Publication History

LEET'08: Proceedings of the 1st Usenix Workshop on Large-Scale Exploits and Emergent Threats

Pages 1–9

ABSTRACT

Using statistical machine learning for making security decisions introduces new vulnerabilities in large scale systems. This paper shows how an adversary can exploit statistical machine learning, as used in the SpamBayes spam filter, to render it useless--even if the adversary's access is limited to only 1% of the training messages. We further demonstrate a new class of focused attacks that successfully prevent victims from receiving specific email messages. Finally, we introduce two new types of defenses against these attacks.

References

Marco Barreno, Blaine Nelson, Russell Sears, Anthony D. Joseph, and J. D. Tygar. Can machine learning be secure? In Proceedings of the ACM Symposium on InformAtion, Computer, and Communications Security (ASIACCS'06), March 2006. Google ScholarDigital Library
Simon P. Chung and Aloysius K. Mok. Allergy attack against automatic signature generation. In Recent Advances in Intrusion Detection (RAID), pages 61-80, 2006. Google ScholarDigital Library
Simon P. Chung and Aloysius K. Mok. Advanced allergy attacks: Does a corpus really help? In Recent Advances in Intrusion Detection (RAID), pages 236-255, 2007. Google ScholarDigital Library
Gordon Cormack and Thomas Lynam. Spam corpus creation for TREC. In Proceedings of the Second Conference on Email and Anti-Spam (CEAS 2005), July 2005.Google Scholar
Nilesh Dalvi, Pedro Domingos, Mausam, Sumit Sanghai, and Deepak Verma. Adversarial classification. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 99-108, Seattle, WA, 2004. ACM Press. Google ScholarDigital Library
Ronald A. Fisher. Question 14: Combining independent tests of significance. American Statistician, 2(5):30-30J, 1948.Google Scholar
Paul Graham. A plan for spam. http://www. paulgraham.com/spam.html, August 2002.Google Scholar
Christoph Karlberger, Günther Bayler, Christopher Kruegel, and Engin Kirda. Exploiting redundancy in natural language to penetrate Bayesian spam filters. In WOOT'07: Proceedings of the first conference on First USENIX Workshop on Offensive Technologies, 2007. Google ScholarDigital Library
Michael Kearns and Ming Li. Learning in the presence of malicious errors. SIAM Journal on Computing, 22(4):807-837, 1993. Google ScholarDigital Library
Hyang-Ah Kim and Brad Karp. Autograph: Toward automated, distributed worm signature detection. In USENIX Security Symposium, August 2004. Google ScholarDigital Library
Bryan Klimt and Yiming Yang. Introducing the Enron corpus. In Proceedings of the First Conference on Email and Anti-Spam (CEAS), July 2004.Google Scholar
Daniel Lowd and Christopher Meek. Adversarial learning. In Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 641-647, 2005. Google ScholarDigital Library
Daniel Lowd and Christopher Meek. Good word attacks on statistical spam filters. In Proceedings of the Second Conference on Email and Anti-Spam (CEAS), 2005.Google Scholar
Tony Meyer and Brendon Whateley. SpamBayes: Effective open-source, Bayesian based, email classification system. In Proceedings of the First Conference on Email and Anti-Spam (CEAS), July 2004.Google Scholar
James Newsome, Brad Karp, and Dawn Song. Polygraph: Automatically generating signatures for polymorphic worms. In Proceedings of the IEEE Symposium on Security and Privacy, pages 226-241, May 2005. Google ScholarDigital Library
James Newsome, Brad Karp, and Dawn Song. Paragraph: Thwarting signature learning by training maliciously. In Proceedings of the 9th International Symposium on Recent Advances in Intrusion Detection (RAID 2006), September 2006. Google ScholarDigital Library
Gary Robinson. A statistical approach to the spam problem. Linux Journal, March 2003. Google ScholarDigital Library
Cyrus Shaoul and Chris Westbury. A USENET corpus (2005-2007), October 2007. http: //www.psych.ualberta.ca/~westburylab/ downloads/usenetcorpus.download.html.Google Scholar
Gregory L. Wittel and S. Felix Wu. On attacking statistical spam filters. In Proceedings of the First Conference on Email and Anti-Spam (CEAS), 2004.Google Scholar

Index Terms

Exploiting machine learning to subvert your spam filter

Recommendations

Adversarial machine learning
AISec '11: Proceedings of the 4th ACM workshop on Security and artificial intelligence

In this paper (expanded from an invited talk at AISEC 2010), we discuss an emerging field of study: adversarial machine learning---the study of effective machine learning techniques against an adversarial opponent. In this paper, we: give a taxonomy for ...
Read More
Adversarial machine learning for spam filters
ARES '20: Proceedings of the 15th International Conference on Availability, Reliability and Security

Email spam filters based on machine learning techniques are widely deployed in today's organizations. As our society relies more on artificial intelligence (AI), the security of AI, especially the machine learning algorithms, becomes increasingly ...
Read More
Effect of Spam Filter on SPOT Algorithm
WCI '15: Proceedings of the Third International Symposium on Women in Computing and Informatics

Compromised machine is any computing resource whose availability, confidentiality, integrity has been negatively impacted either intentionally or unintentionally, by an untrusted source. These machines are often used to elevate various security attacks ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
LEET'08: Proceedings of the 1st Usenix Workshop on Large-Scale Exploits and Emergent Threats
April 2008
96 pages
Editor:
Fabian Monrose
Johns Hopkins University
Sponsors
In-Cooperation
Publisher
USENIX Association
United States
Publication History
- Published: 15 April 2008
Qualifiers
- Article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 51
  Total Citations
  View Citations
- 2
  Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

Exploiting machine learning to subvert your spam filter

LEET'08: Proceedings of the 1st Usenix Workshop on Large-Scale Exploits and Emergent Threats

ABSTRACT

References

Cited By

Index Terms

Recommendations

Adversarial machine learning

Adversarial machine learning for spam filters

Effect of Spam Filter on SPOT Algorithm

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

Digital Edition

Caption

Exploiting machine learning to subvert your spam filter

LEET'08: Proceedings of the 1st Usenix Workshop on Large-Scale Exploits and Emergent Threats

ABSTRACT

References

Cited By

Index Terms

Recommendations

Adversarial machine learning

Adversarial machine learning for spam filters

Effect of Spam Filter on SPOT Algorithm

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

Digital Edition

Share this Publication link

Share on Social Media