ABSTRACT
Phishing is a ubiquitous and increasingly sophisticated online threat. To evade mitigations, phishers try to "cloak" malicious content from defenders to delay their appearance on blacklists, while still presenting the phishing payload to victims. This cat-and-mouse game is variable and fast-moving, with many distinct cloaking methods---we construct a dataset identifying 2,933 real-world phishing kits that implement cloaking mechanisms. These kits use information from the host, browser, and HTTP request to classify traffic as either anti-phishing entity or potential victim and change their behavior accordingly.
In this work we present SPARTACUS, a technique that subverts the phishing status quo by disguising user traffic as anti-phishing entities. These intentional false positives trigger cloaking behavior in phishing kits, thus hiding the malicious payload and protecting the user without disrupting benign sites.
To evaluate the effectiveness of this approach, we deployed SPARTACUS as a browser extension from November 2020 to July 2021. During that time, SPARTACUS browsers visited 160,728 reported phishing URLs in the wild. Of these, SPARTACUS protected against 132,274 sites (82.3%). The phishing kits which showed malicious content to SPARTACUS typically did so due to ineffective cloaking---the majority (98.4%) of the remainder were detected by conventional anti-phishing systems such as Google Safe Browsing or VirusTotal, and would be blacklisted regardless. We further evaluate SPARTACUS against benign websites sampled from the Alexa Top One Million List for impacts on latency, accessibility, layout, and CPU overhead, finding minimal performance penalties and no loss in functionality.
- Amazon. 2021. Alexa Top Sites. http://s3.amazonaws.com/alexa-static/top-1m.csv.zip.Google Scholar
- Hugo Bijmans, Tim Booij, Anneke Schwedersky, Aria Nedgabat, and Rolf van Wegberg. 2021. Catching Phishers By Their Bait: Investigating the Dutch Phishing Landscape through Phishing Kit Detection. In 30th USENIX Security Symposium (USENIX Security 21). 3757--3774.Google Scholar
- Leyla Bilge, Engin Kirda, Christopher Kruegel, and Marco Balduzzi. 2011. EXPOSURE: Finding Malicious Domains Using Passive DNS Analysis.. In Ndss. 1--17.Google Scholar
- Sun Bin, Wen Qiaoyan, and Liang Xiaoying. 2010. A DNS based anti-phishing approach. In 2010 Second International Conference on Networks Security, Wireless Communications and Trusted Computing, Vol. 2. IEEE, 262--265.Google ScholarDigital Library
- Aaron Blum, Brad Wardman, Thamar Solorio, and Gary Warner. 2010. Lexical feature based phishing URL detection using online learning. In Proceedings of the 3rd ACM Workshop on Artificial Intelligence and Security. 54--60.Google ScholarDigital Library
- Davide Canali, Davide Balzarotti, and Aurélien Francillon. 2013. The role of web hosting providers in detecting compromised websites. In Proceedings of the 22nd international conference on World Wide Web. ACM, 177--188.Google ScholarDigital Library
- Danny Cork. 2021. A Python package for retrieving WHOIS information of domains. https://github.com/DannyCork/python-whois.Google Scholar
- DataDome. 2019. Web scraping protection: How to protect your website against crawler and scraper bots. https://datadome.co/bot-management-protection/scraper-crawler-bots-how-to-protect-your-website-against-intensive-scraping/#2.Google Scholar
- Matthew Dunlop, Stephen Groat, and David Shelly. 2010. Goldphish: Using images for content-based phishing analysis. In 2010 Fifth international conference on internet monitoring and protection. IEEE, 123--128.Google ScholarDigital Library
- Mohammed Nazim Feroz and Susan Mengel. 2015. Phishing URL detection using URL ranking. In 2015 ieee international congress on big data. IEEE, 635--638.Google Scholar
- Google. 2019. Google Transparency Report. (2019). https://transparencyreport.google.com/safe-browsing/overview?hl=en.Google Scholar
- Grant Ho, Asaf Cidon, Lior Gavish, Marco Schweighauser, Vern Paxson, Stefan Savage, Geoffrey M Voelker, and David Wagner. 2019. Detecting and characterizing lateral phishing at scale. In 28th USENIX Security Symposium. 1273--1290.Google Scholar
- Huajun Huang, Liang Qian, and Yaojun Wang. 2012. A SVM-based technique to detect phishing URLs. Information Technology Journal, Vol. 11, 7 (2012), 921--925.Google ScholarCross Ref
- Luca Invernizzi, Kurt Thomas, Alexandros Kapravelos, Oxana Comanescu, Jean-Michel Picod, and Elie Bursztein. 2016. Cloak of visibility: Detecting when machines browse a different web. In 2016 IEEE Symposium on Security and Privacy (SP). IEEE, 743--758.Google ScholarCross Ref
- Mahmoud Khonji, Youssef Iraqi, and Andrew Jones. 2012. Enhancing phishing e-mail classifiers: A lexical url analysis approach. International Journal for Information Security Research (IJISR), Vol. 2, 1/2 (2012), 40.Google Scholar
- Mahmoud Khonji, Andrew Jones, and Youssef Iraqi. 2011. A novel Phishing classification based on URL features. In 2011 IEEE GCC conference and exhibition (GCC). IEEE, 221--224.Google ScholarCross Ref
- Anh Le, Athina Markopoulou, and Michalis Faloutsos. 2011. Phishdef: Url names say it all. In 2011 Proceedings IEEE INFOCOM. IEEE, 191--195.Google ScholarCross Ref
- Bin Liang, Miaoqiang Su, Wei You, Wenchang Shi, and Gang Yang. 2016. Cracking classifiers for evasion: a case study on the google's phishing pages filter. In Proceedings of the 25th International Conference on World Wide Web. 345--356.Google ScholarDigital Library
- Yun Lin, Ruofan Liu, Dinil Mon Divakaran, Jun Yang Ng, Qing Zhou Chan, Yiwen Lu, Yuxuan Si, Fan Zhang, and Jin Song Dong. 2021. Phishpedia: A Hybrid Deep Learning Based Approach to Visually Identify Phishing Webpages. In 30th USENIX Security Symposium (USENIX Security 21).Google Scholar
- Georg Merzdovnik, Markus Huber, Damjan Buhov, Nick Nikiforakis, Sebastian Neuner, Martin Schmiedecker, and Edgar Weippl. 2017. Block me if you can: A large-scale study of tracker-blocking tools. In 2017 IEEE European Symposium on Security and Privacy (EuroS&P). IEEE, 319--333.Google ScholarCross Ref
- 2019. Windows Defender SmartScreen. (2019). https://github.com/MicrosoftDocs/windows-itpro-docs/blob/public/windows/security/threat-protection/windows-defender-smartscreen/windows-defender-smartscreen-overview.md.Google Scholar
- Adam Oest, Yeganeh Safaei, Adam Doupé, Gail-Joon Ahn, Brad Wardman, and Kevin Tyers. 2019. PhishFarm: A Scalable Framework for Measuring the Effectiveness of Evasion Techniques Against Browser Phishing Blacklists. In 40th. Oakland, CA, 764--781.Google Scholar
- Adam Oest, Yeganeh Safaei, Adam Doupé, Gail-Joon Ahn, Brad Wardman, and Gary Warner. 2018. Inside a phisher's mind: Understanding the anti-phishing ecosystem through phishing kit analysis. In 2018 APWG Symposium on Electronic Crime Research (eCrime). IEEE, 1--12.Google ScholarCross Ref
- Adam Oest, Yeganeh Safaei, Penghui Zhang, Brad Wardman, Kevin Tyers, Yan Shoshitaishvili, and Adam Doupé. 2020a. PhishTime: Continuous longitudinal measurement of the effectiveness of anti-phishing blacklists. In 29th USENIX Security Symposium (USENIX Security 20). 379--396.Google Scholar
- Adam Oest, Penghui Zhang, Brad Wardman, Eric Nunes, Jakub Burgis, Ali Zand, Kurt Thomas, Adam Doupé, and Gail-Joon Ahn. 2020b. Sunrise to Sunset: Analyzing the End-to-end Life Cycle and Effectiveness of Phishing Attacks at Scale. In 29th USENIX Security Symposium (USENIX Security 20).Google Scholar
- Alina Oprea, Zhou Li, Robin Norris, and Kevin Bowers. 2018. Made: Security analytics for enterprise threat detection. In Proceedings of the 34th Annual Computer Security Applications Conference. 124--136.Google ScholarDigital Library
- Peng Peng, Chao Xu, Luke Quinn, Hang Hu, Bimal Viswanath, and Gang Wang. 2019. What happens after you leak your password: Understanding credential sharing on phishing sites. In Proceedings of the 2019 ACM Asia Conference on Computer and Communications Security. 181--192.Google ScholarDigital Library
- Phishunt. 2021. Exposing phishing kits seen from phishunt.io. https://github.com/danlopgom/phishing_kits.Google Scholar
- Privacy Policies. 2021. #1 Privacy Policy Generator - Privacy Policies. https://www.privacypolicies.com/.Google Scholar
- radware bot manager. 2021. How CAPTCHA Is Used To Block Bots, And Why We Do Not Recommend Using It. https://www.radwarebotmanager.com/when-to-use-and-when-not-to-use-captcha/.Google Scholar
- Arya Renjan, Karuna Pande Joshi, Sandeep Nair Narayanan, and Anupam Joshi. 2018. Dabr: Dynamic attribute-based reputation scoring for malicious ip address detection. In 2018 IEEE International Conference on Intelligence and Security Informatics (ISI). IEEE, 64--69.Google ScholarDigital Library
- Foy Shiver. 2016. APWG and the eCrime Exchange: A Member Network Providing Collaborative Threat Data Sharing. https://www.first.org/resources/papers/valencia2017/shiver-foy_slides.pdf.Google Scholar
- Suphannee Sivakorn, Jason Polakis, and Angelos D Keromytis. 2016. I'm not a human: Breaking the Google reCAPTCHA. Black Hat (2016), 1--12.Google Scholar
- Peter Snyder, Cynthia Taylor, and Chris Kanich. 2017. Most websites don't need to vibrate: A cost-benefit approach to improving browser security. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. 179--194.Google ScholarDigital Library
- Verizon Enterprise Solutions. 2019. Data Breach Investigations Report (DBIR). (2019).Google Scholar
- Fabian Stark, Caner Hazirbas, Rudolph Triebel, and Daniel Cremers. 2015. Captcha recognition with active deep learning. In Workshop new challenges in neural computation, Vol. 2015. Citeseer, 94.Google Scholar
- Cisco Talos. 2021. IP & Domain Reputation Center. https://www.cisco.com/c/en/us/products/security/talos.html.Google Scholar
- Kurt Thomas, Frank Li, Ali Zand, Jacob Barrett, Juri Ranieri, Luca Invernizzi, Yarik Markov, Oxana Comanescu, Vijay Eranti, Angelika Moscicki, et al. 2017. Data breaches, phishing, or malware?: Understanding the risks of stolen credentials. In Proceedings of the 2017 ACM SIGSAC conference on computer and communications security. ACM, 1421--1434.Google ScholarDigital Library
- Treo. 2020. Exthouse: Analyze the impact of a browser extension on web performance.. https://github.com/treosh/exthouse.Google Scholar
- Erik Trickel, Oleksii Starov, Alexandros Kapravelos, Nick Nikiforakis, and Adam Doupé. 2019. Everyone is different: Client-side diversification for defending against extension fingerprinting. In 28th USENIX Security Symposium (USENIX Security 19). 1679--1696.Google Scholar
- Amber Van Der Heijden and Luca Allodi. 2019. Cognitive triaging of phishing attacks. In 28th USENIX Security Symposium. 1309--1326.Google Scholar
- David Y Wang, Stefan Savage, and Geoffrey M Voelker. 2011. Cloak and dagger: dynamics of web search cloaking. In Proceedings of the 18th ACM conference on Computer and communications security. 477--490.Google ScholarDigital Library
- WEBrate. 2022. Webrate.org - Rate the web. https://webrate.org/.Google Scholar
- Tech Blog (wh). 2012. Most Common User Agents. https://techblog.willshouse.com/2012/01/03/most-common-user-agents/.Google Scholar
- Colin Whittaker, Brian Ryner, and Marria Nazif. 2010. Large-scale automatic classification of phishing pages. (2010).Google Scholar
- Stephan Wiefling, Nils Gruschka, and Luigi Lo Iacono. 2019. Even Turing Should Sometimes Not Be Able To Tell: Mimicking Humanoid Usage Behavior for Exploratory Studies of Online Services. In 24th Nordic Conference on Secure IT Systems (NordSec 2019) (Aalborg, Denmark) (Lecture Notes in Computer Science, Vol. 11875). Springer Nature, 188--203. https://doi.org/10.1007/978-3-030-35055-0_12Google Scholar
- wordpress.org. 2022. WordPress Source Code. https://github.com/WordPress/WordPress.Google Scholar
- Baoning Wu and Brian D Davison. 2005. Cloaking and Redirection: A Preliminary Study.. In AIRWeb. 7--16.Google Scholar
- Min Wu, Robert C Miller, and Greg Little. 2006. Web wallet: preventing phishing attacks by revealing user intentions. In Proceedings of the second symposium on Usable privacy and security. ACM, 102--113.Google ScholarDigital Library
- Guang Xiang, Jason Hong, Carolyn P Rose, and Lorrie Cranor. 2011. Cantina: A feature-rich machine learning framework for detecting phishing web sites. ACM Transactions on Information and System Security (TISSEC), Vol. 14, 2 (2011), 21.Google ScholarDigital Library
- Haijun Zhang, Gang Liu, Tommy WS Chow, and Wenyin Liu. 2011. Textual and visual content-based anti-phishing: a Bayesian approach. IEEE transactions on neural networks, Vol. 22, 10 (2011), 1532--1546.Google Scholar
- Penghui Zhang, Adam Oest, Haehyun Cho, Zhibo Sun, RC Johnson, Brad Wardman, Shaown Sarker, Alexandros Kpravelos, Tiffany Bao, Ruoyu Wang, Yan Shoshitaishvili, Adam Doupé, and Gail-Joon Ahn. 2021. CrawlPhish: Large-scale Analysis of Client-side Cloaking Techniques in Phishing. In Proceedings of the 42nd IEEE Symposium on Security and Privacy (Oakland). San Francisco, CA.Google ScholarCross Ref
- Yue Zhang, Jason I Hong, and Lorrie F Cranor. 2007. Cantina: a content-based approach to detecting phishing web sites. In Proceedings of the 16th international conference on World Wide Web. 639--648.Google ScholarDigital Library
Index Terms
- I'm SPARTACUS, No, I'm SPARTACUS: Proactively Protecting Users from Phishing by Intentionally Triggering Cloaking Behavior
Recommendations
Catching Transparent Phish: Analyzing and Detecting MITM Phishing Toolkits
CCS '21: Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications SecurityFor over a decade, phishing toolkits have been helping attackers automate and streamline their phishing campaigns. Man-in-the- Middle (MITM) phishing toolkits are the latest evolution in this space, where toolkits act as malicious reverse proxy servers ...
Fighting against phishing attacks: state of the art and future challenges
In the last few years, phishing scams have rapidly grown posing huge threat to global Internet security. Today, phishing attack is one of the most common and serious threats over Internet where cyber attackers try to steal user's personal or financial ...
Automatic Extraction of Indicators of Compromise for Web Applications
WWW '16: Proceedings of the 25th International Conference on World Wide WebIndicators of Compromise (IOCs) are forensic artifacts that are used as signs that a system has been compromised by an attack or that it has been infected with a particular malicious software. In this paper we propose for the first time an automated ...
Comments