URLdeepDetect: A Deep Learning Approach for Detecting Malicious URLs Using Semantic Vector Models

Afzal, Sara; Asim, Muhammad; Javed, Abdul Rehman; Beg, Mirza Omer; Baker, Thar

doi:10.1007/s10922-021-09587-8

URLdeepDetect: A Deep Learning Approach for Detecting Malicious URLs Using Semantic Vector Models

Published: 04 March 2021

Volume 29, article number 21, (2021)
Cite this article

Journal of Network and Systems Management Aims and scope Submit manuscript

Sara Afzal¹,
Muhammad Asim¹,
Abdul Rehman Javed²,
Mirza Omer Beg³ &
…
Thar Baker⁴

1424 Accesses
34 Citations
Explore all metrics

Abstract

Malicious Uniform Resource Locators (URLs) embedded in emails or Twitter posts have been used as weapons for luring susceptible Internet users into executing malicious content leading to compromised systems, scams, and a multitude of cyber-attacks. These attacks can potentially might cause damages ranging from fraud to massive data breaches resulting in huge financial losses. This paper proposes a hybrid deep-learning approach named URLdeepDetect for time-of-click URL analysis and classification to detect malicious URLs. URLdeepDetect analyzes semantic and lexical features of a URL by applying various techniques, including semantic vector models and URL encryption to determine a given URL as either malicious or benign. URLdeepDetect uses supervised and unsupervised mechanisms in the form of LSTM (Long Short-Term Memory) and k-means clustering for URL classification. URLdeepDetect achieves accuracy of 98.3% and 99.7% with LSTM and k-means clustering, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

Article 18 August 2021

"Challenges and future in deep learning for sentiment analysis: a comprehensive review and a proposed novel hybrid approach"

Article Open access 05 March 2024

A comprehensive survey of AI-enabled phishing attacks detection techniques

Article 23 October 2020

Notes

References

Bakshy, E., Rosenn, I., Marlow, C., Adamic, L.: The role of social networks in information diffusion. In: Proceedings of the 21st international conference on World Wide Web, pp 519–528 (2012)
Basit, A., Zafar, M., Liu, X., Javed, A.R., Jalil, Z., Kifayat, K.: A comprehensive survey of ai-enabled phishing attacks detection techniques. Telecommunication Systems pp 1–16 (2020)
Asad, M., Asim, M., Javed, T., Beg, M.O., Mujtaba, H., Abbas, S.: Deepdetect: detection of distributed denial of service attacks using deep learning. Comput. J. 63(7), 983–994 (2020)
Article Google Scholar
Benevenuto, F., Magno, G., Rodrigues, T., Almeida, V.: Detecting spammers on twitter. In: Collaboration, electronic messaging, anti-abuse and spam conference (CEAS), vol 6, p 12 (2010)
Javed, A.R., Beg, M.O., Asim, M., Baker, T., Al-Bayatti, A.H.: Alphalogger: detecting motion-based side-channel attack using smartphone keystrokes. J. Ambient Intell. Human. Comput. pp 1–14 (2020)
Nair, M.C., Prema, S.: A distributed system for detecting phishing in twitter stream. Int. J. Eng. Sci. Innov. Technol. 3(2), 151–158 (2014)
Google Scholar
Leukfeldt, E.R., Kleemans, E.R., Stol, W.P.: Cybercriminal networks, social ties and online forums: social ties versus digital ties within phishing and malware networks. Br. J. Criminol. 57(3), 704–722 (2017)
Google Scholar
Ohta, S., Kurebayashi, R., Kobayashi, K.: Minimizing false positives of a decision tree classifier for intrusion detection on the internet. J. Netw. Syst. Manag. 16(4), 399–419 (2008)
Article Google Scholar
Jiang, J., Papavassiliou, S.: Detecting network attacks in the internet via statistical network traffic normality prediction. J. Netw. Syst. Manag. 12(1), 51–72 (2004)
Article Google Scholar
Joshi, A., Lloyd, L., Westin, P., Seethapathy, S.: Using lexical features for malicious url detection–a machine learning approach. arXiv preprint arXiv:191006277 (2019)
Cova, M., Kruegel, C., Vigna, G.: Detection and analysis of drive-by-download attacks and malicious javascript code. In: Proceedings of the 19th international conference on World wide web, pp 281–290 (2010)
Moshchuk, A., Bragin, T., Gribble, S.D., Levy, H.M.: A crawler-based study of spyware in the web. In: NDSS, vol 1, p 2 (2006)
Hofstede, R., Jonker, M., Sperotto, A., Pras, A.: Flow-based web application brute-force attack and compromise detection. J. Netw. Syst. Manag. 25(4), 735–758 (2017)
Article Google Scholar
Alshboul, Y., Nepali, R., Wang, Y.: Detecting malicious short urls on twitter. In: Conference: 21st Americas Conference on Information SystemsAt: Puerto Rico (2015)
Shafahi, M., Kempers, L., Afsarmanesh, H.: Phishing through social bots on twitter. In: 2016 IEEE International Conference on Big Data (Big Data), IEEE, pp 3703–3712 (2016)
Burnap, P., Javed, A., Rana, O.F., Awan, M.S.: Real-time classification of malicious urls on twitter using machine activity data. In: Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015, pp 970–977 (2015)
Lee, C.H.: Unsupervised and supervised learning to evaluate event relatedness based on content mining from social-media streams. Expert Syst. Appl. 39(18), 13338–13356 (2012)
Article Google Scholar
Imtiaz, S.I., ur Rehman, S., Javed, A.R., Jalil, Z., Liu, X., Alnumay, W.S.: Deepamd: Detection and identification of android malware using high-efficient deep artificial neural network. Future Generation Computer Systems (2020)
Nepali, R.K., Wang, Y.: You look suspicious!!: Leveraging visible attributes to classify malicious short urls on twitter. In: 2016 49th Hawaii International Conference on System Sciences (HICSS), IEEE, pp 2648–2655 (2016)
Kuyama, M., Kakizaki, Y., Sasaki, R.: Method for detecting a malicious domain by using whois and dns features. In: The third international conference on digital security and forensics (DigitalSec2016), vol 74 (2016)
Javed, A., Burnap, P., Rana, O.: Prediction of drive-by download attacks on twitter. Inf. Process. Manag. 56(3), 1133–1145 (2019)
Article Google Scholar
Jahani, H., Jalili, S.: Online tor privacy breach through website fingerprinting attack. J. Netw. Syst. Manag. 27(2), 289–326 (2019)
Article Google Scholar
Blum, A., Wardman, B., Solorio, T., Warner, G.: Lexical feature based phishing url detection using online learning. In: Proceedings of the 3rd ACM Workshop on Artificial Intelligence and Security, pp 54–60 (2010)
Cao, C., Caverlee, J.: Detecting spam urls in social media via behavioral analysis. In: European conference on information retrieval, Springer, pp 703–714 (2015)
Wang, D., Navathe, S.B., Liu, L., Irani, D., Tamersoy, A., Pu, C.: Click traffic analysis of short url spam on twitter. In: 9th IEEE International Conference on Collaborative Computing: Networking, pp. 250–259. Applications and Worksharing, IEEE (2013)
Verma, M., Sofat, S.: Techniques to detect spammers in twitter-a survey. Intl. J. Comput. Appl. 85(10), (2014)
Selvaganapathy, S., Nivaashini, M., Natarajan, H.: Deep belief network based detection and categorization of malicious urls. Inf. Secur. J. 27(3), 145–161 (2018)
Google Scholar
Vinayakumar, R., Soman, K., Poornachandran, P.: Evaluating deep learning approaches to characterize and classify malicious url’s. Journal of Intelligent & Fuzzy Systems 34(3), 1333–1343 (2018)
Article Google Scholar
Saxe, J., Berlin, K.: expose: A character-level convolutional neural network with embeddings for detecting malicious urls, file paths and registry keys. arXiv preprint arXiv:170208568 (2017)
Patgiri, R., Katari, H., Kumar, R., Sharma, D.: Empirical study on malicious url detection using machine learning. In: International Conference on Distributed Computing and Internet Technology, Springer, pp 380–388 (2019)
Begum, A., Badugu, S.: A study of malicious url detection using machine learning and heuristic approaches. In: Advances in Decision Sciences, pp. 587–597. Image Processing, Security and Computer Vision, Springer (2020)
Google Scholar
Kulkarni, A.D., Brown, L.L., III.: Phishing websites detection using machine learning. Intl. J. Adv. Comput. Sci. Appl. 10(7), (2019)
Zafar, S., Jangsher, S., Bouachir, O., Aloqaily, M., Othman, J.B.: Qos enhancement with deep learning-based interference prediction in mobile iot. Comput. Commun. 148, 86–97 (2019)
Article Google Scholar
Zafar, S., Jangsher, S., Aloqaily, M., Bouachir, O., Othman, J.B.: Resource allocation in moving small cell network using deep learning based interference determination. In: 2019 IEEE 30th Annual International Symposium on Personal, pp. 1–6. Indoor and Mobile Radio Communications (PIMRC), IEEE (2019)
Google Scholar
Lee, S., Kim, J.: Warningbird: a near real-time detection system for suspicious urls in twitter stream. IEEE Trans. Depend. Secure Comput. 10(3), 183–195 (2013)
Article Google Scholar
Liew, S.W., Sani, N.F.M., Abdullah, M.T., Yaakob, R., Sharum, M.Y.: An effective security alert mechanism for real-time phishing tweet detection on twitter. Comput. Secur. 83, 201–207 (2019)
Article Google Scholar
Patil, D.R., Patil, J.B.: Feature-based malicious url and attack type detection using multi-class classification. ISeCure 10(2), (2018)
Namasivayam, B.: Categorization of phishing detection features. PhD thesis, PhD thesis, Arizona State University (2017)
Hai, Q.T., Hwang, S.O.: Detection of malicious urls based on word vector representation and ngram. J. Intell. Fuzzy Syst. 35(6), 5889–5900 (2018)
Article Google Scholar
Yuan, H., Yang, Z., Chen, X., Li, Y., Liu, W.: Url2vec: Url modeling with character embeddings for fast and accurate phishing website detection. In: 2018 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Ubiquitous Computing & Communications, Big Data & Cloud Computing, Social Computing & Networking, Sustainable Computing & Communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom), IEEE, pp 265–272 (2018)
Jang, B., Kim, I., Kim, J.W.: Word2vec convolutional neural networks for classification of news articles and tweets. PLoS ONE 14(8), (2019)
Otoum, S., Kantarci, B., Mouftah, H.T.: On the feasibility of deep learning in sensor network intrusion detection. IEEE Netw. Lett. 1(2), 68–71 (2019)
Article Google Scholar
Aloqaily, M., Otoum, S., Al Ridhawi, I., Jararweh, Y.: An intrusion detection system for connected vehicles in smart cities. Ad Hoc Netw. 90, 101842 (2019)
Article Google Scholar
Javed, A.R., Usman, M., Rehman, S.U., Khan, M.U., Haghighi, M.S.: Anomaly detection in automated vehicles using multistage attention-based convolutional neural network. IEEE Trans. Intell. Transport. Syst. pp 1–10, https://doi.org/10.1109/TITS.2020.3025875 (2020)
Rehman Javed, A., Jalil, Z., Atif Moqurrab, S., Abbas, S., Liu, X.: Ensemble adaboost classifier for accurate and fast detection of botnet attacks in connected vehicles. Trans. Emerg. Telecommun. Technol. p e4088 (2020)
Le, H., Pham, Q., Sahoo, D., Hoi, S.C.: Urlnet: learning a url representation with deep learning for malicious url detection. arXiv preprint arXiv:180203162 (2018)
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: t-distributed stochastic neighbor embedding. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

National University of Computer and Emerging Sciences, Islamabad, 44000, Pakistan
Sara Afzal & Muhammad Asim
Department of Cyber Security, Air University, Islamabad, Pakistan
Abdul Rehman Javed
National University of Computer and Emerging Sciences, Islamabad, 44000, Pakistan
Mirza Omer Beg
Department of Computer Science, University of Sharjah, Sharjah, 27272, UAE
Thar Baker

Authors

Sara Afzal
View author publications
You can also search for this author in PubMed Google Scholar
Muhammad Asim
View author publications
You can also search for this author in PubMed Google Scholar
Abdul Rehman Javed
View author publications
You can also search for this author in PubMed Google Scholar
Mirza Omer Beg
View author publications
You can also search for this author in PubMed Google Scholar
Thar Baker
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Muhammad Asim.

Ethics declarations

Conflicts of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Afzal, S., Asim, M., Javed, A.R. et al. URLdeepDetect: A Deep Learning Approach for Detecting Malicious URLs Using Semantic Vector Models. J Netw Syst Manage 29, 21 (2021). https://doi.org/10.1007/s10922-021-09587-8

Download citation

Received: 01 June 2020
Revised: 17 January 2021
Accepted: 03 February 2021
Published: 04 March 2021
DOI: https://doi.org/10.1007/s10922-021-09587-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

URLdeepDetect: A Deep Learning Approach for Detecting Malicious URLs Using Semantic Vector Models

Abstract

Access this article

Similar content being viewed by others

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

"Challenges and future in deep learning for sentiment analysis: a comprehensive review and a proposed novel hybrid approach"

A comprehensive survey of AI-enabled phishing attacks detection techniques

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

URLdeepDetect: A Deep Learning Approach for Detecting Malicious URLs Using Semantic Vector Models

Abstract

Access this article

Similar content being viewed by others

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

"Challenges and future in deep learning for sentiment analysis: a comprehensive review and a proposed novel hybrid approach"

A comprehensive survey of AI-enabled phishing attacks detection techniques

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation