Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

URLdeepDetect: A Deep Learning Approach for Detecting Malicious URLs Using Semantic Vector Models

  • Published:
Journal of Network and Systems Management Aims and scope Submit manuscript

Abstract

Malicious Uniform Resource Locators (URLs) embedded in emails or Twitter posts have been used as weapons for luring susceptible Internet users into executing malicious content leading to compromised systems, scams, and a multitude of cyber-attacks. These attacks can potentially might cause damages ranging from fraud to massive data breaches resulting in huge financial losses. This paper proposes a hybrid deep-learning approach named URLdeepDetect for time-of-click URL analysis and classification to detect malicious URLs. URLdeepDetect analyzes semantic and lexical features of a URL by applying various techniques, including semantic vector models and URL encryption to determine a given URL as either malicious or benign. URLdeepDetect uses supervised and unsupervised mechanisms in the form of LSTM (Long Short-Term Memory) and k-means clustering for URL classification. URLdeepDetect achieves accuracy of 98.3% and 99.7% with LSTM and k-means clustering, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. https://www.statista.com/statistics/282087/number-of-monthly-active-twitter-users/.

  2. https://towardsdatascience.com/machine-learning-text-processing-1d5a2d638958.

  3. https://towardsdatascience.com/tf-idf-for-document-ranking-from-scratch-in-python-on-real-world-dataset-796d339a4089.

  4. https://www.kaggle.com/siddharthkumar25/malicious-and-benign-urls.

References

  1. Bakshy, E., Rosenn, I., Marlow, C., Adamic, L.: The role of social networks in information diffusion. In: Proceedings of the 21st international conference on World Wide Web, pp 519–528 (2012)

  2. Basit, A., Zafar, M., Liu, X., Javed, A.R., Jalil, Z., Kifayat, K.: A comprehensive survey of ai-enabled phishing attacks detection techniques. Telecommunication Systems pp 1–16 (2020)

  3. Asad, M., Asim, M., Javed, T., Beg, M.O., Mujtaba, H., Abbas, S.: Deepdetect: detection of distributed denial of service attacks using deep learning. Comput. J. 63(7), 983–994 (2020)

    Article  Google Scholar 

  4. Benevenuto, F., Magno, G., Rodrigues, T., Almeida, V.: Detecting spammers on twitter. In: Collaboration, electronic messaging, anti-abuse and spam conference (CEAS), vol 6, p 12 (2010)

  5. Javed, A.R., Beg, M.O., Asim, M., Baker, T., Al-Bayatti, A.H.: Alphalogger: detecting motion-based side-channel attack using smartphone keystrokes. J. Ambient Intell. Human. Comput. pp 1–14 (2020)

  6. Nair, M.C., Prema, S.: A distributed system for detecting phishing in twitter stream. Int. J. Eng. Sci. Innov. Technol. 3(2), 151–158 (2014)

    Google Scholar 

  7. Leukfeldt, E.R., Kleemans, E.R., Stol, W.P.: Cybercriminal networks, social ties and online forums: social ties versus digital ties within phishing and malware networks. Br. J. Criminol. 57(3), 704–722 (2017)

    Google Scholar 

  8. Ohta, S., Kurebayashi, R., Kobayashi, K.: Minimizing false positives of a decision tree classifier for intrusion detection on the internet. J. Netw. Syst. Manag. 16(4), 399–419 (2008)

    Article  Google Scholar 

  9. Jiang, J., Papavassiliou, S.: Detecting network attacks in the internet via statistical network traffic normality prediction. J. Netw. Syst. Manag. 12(1), 51–72 (2004)

    Article  Google Scholar 

  10. Joshi, A., Lloyd, L., Westin, P., Seethapathy, S.: Using lexical features for malicious url detection–a machine learning approach. arXiv preprint arXiv:191006277 (2019)

  11. Cova, M., Kruegel, C., Vigna, G.: Detection and analysis of drive-by-download attacks and malicious javascript code. In: Proceedings of the 19th international conference on World wide web, pp 281–290 (2010)

  12. Moshchuk, A., Bragin, T., Gribble, S.D., Levy, H.M.: A crawler-based study of spyware in the web. In: NDSS, vol 1, p 2 (2006)

  13. Hofstede, R., Jonker, M., Sperotto, A., Pras, A.: Flow-based web application brute-force attack and compromise detection. J. Netw. Syst. Manag. 25(4), 735–758 (2017)

    Article  Google Scholar 

  14. Alshboul, Y., Nepali, R., Wang, Y.: Detecting malicious short urls on twitter. In: Conference: 21st Americas Conference on Information SystemsAt: Puerto Rico (2015)

  15. Shafahi, M., Kempers, L., Afsarmanesh, H.: Phishing through social bots on twitter. In: 2016 IEEE International Conference on Big Data (Big Data), IEEE, pp 3703–3712 (2016)

  16. Burnap, P., Javed, A., Rana, O.F., Awan, M.S.: Real-time classification of malicious urls on twitter using machine activity data. In: Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015, pp 970–977 (2015)

  17. Lee, C.H.: Unsupervised and supervised learning to evaluate event relatedness based on content mining from social-media streams. Expert Syst. Appl. 39(18), 13338–13356 (2012)

    Article  Google Scholar 

  18. Imtiaz, S.I., ur Rehman, S., Javed, A.R., Jalil, Z., Liu, X., Alnumay, W.S.: Deepamd: Detection and identification of android malware using high-efficient deep artificial neural network. Future Generation Computer Systems (2020)

  19. Nepali, R.K., Wang, Y.: You look suspicious!!: Leveraging visible attributes to classify malicious short urls on twitter. In: 2016 49th Hawaii International Conference on System Sciences (HICSS), IEEE, pp 2648–2655 (2016)

  20. Kuyama, M., Kakizaki, Y., Sasaki, R.: Method for detecting a malicious domain by using whois and dns features. In: The third international conference on digital security and forensics (DigitalSec2016), vol 74 (2016)

  21. Javed, A., Burnap, P., Rana, O.: Prediction of drive-by download attacks on twitter. Inf. Process. Manag. 56(3), 1133–1145 (2019)

    Article  Google Scholar 

  22. Jahani, H., Jalili, S.: Online tor privacy breach through website fingerprinting attack. J. Netw. Syst. Manag. 27(2), 289–326 (2019)

    Article  Google Scholar 

  23. Blum, A., Wardman, B., Solorio, T., Warner, G.: Lexical feature based phishing url detection using online learning. In: Proceedings of the 3rd ACM Workshop on Artificial Intelligence and Security, pp 54–60 (2010)

  24. Cao, C., Caverlee, J.: Detecting spam urls in social media via behavioral analysis. In: European conference on information retrieval, Springer, pp 703–714 (2015)

  25. Wang, D., Navathe, S.B., Liu, L., Irani, D., Tamersoy, A., Pu, C.: Click traffic analysis of short url spam on twitter. In: 9th IEEE International Conference on Collaborative Computing: Networking, pp. 250–259. Applications and Worksharing, IEEE (2013)

  26. Verma, M., Sofat, S.: Techniques to detect spammers in twitter-a survey. Intl. J. Comput. Appl. 85(10), (2014)

  27. Selvaganapathy, S., Nivaashini, M., Natarajan, H.: Deep belief network based detection and categorization of malicious urls. Inf. Secur. J. 27(3), 145–161 (2018)

    Google Scholar 

  28. Vinayakumar, R., Soman, K., Poornachandran, P.: Evaluating deep learning approaches to characterize and classify malicious url’s. Journal of Intelligent & Fuzzy Systems 34(3), 1333–1343 (2018)

    Article  Google Scholar 

  29. Saxe, J., Berlin, K.: expose: A character-level convolutional neural network with embeddings for detecting malicious urls, file paths and registry keys. arXiv preprint arXiv:170208568 (2017)

  30. Patgiri, R., Katari, H., Kumar, R., Sharma, D.: Empirical study on malicious url detection using machine learning. In: International Conference on Distributed Computing and Internet Technology, Springer, pp 380–388 (2019)

  31. Begum, A., Badugu, S.: A study of malicious url detection using machine learning and heuristic approaches. In: Advances in Decision Sciences, pp. 587–597. Image Processing, Security and Computer Vision, Springer (2020)

    Google Scholar 

  32. Kulkarni, A.D., Brown, L.L., III.: Phishing websites detection using machine learning. Intl. J. Adv. Comput. Sci. Appl. 10(7), (2019)

  33. Zafar, S., Jangsher, S., Bouachir, O., Aloqaily, M., Othman, J.B.: Qos enhancement with deep learning-based interference prediction in mobile iot. Comput. Commun. 148, 86–97 (2019)

    Article  Google Scholar 

  34. Zafar, S., Jangsher, S., Aloqaily, M., Bouachir, O., Othman, J.B.: Resource allocation in moving small cell network using deep learning based interference determination. In: 2019 IEEE 30th Annual International Symposium on Personal, pp. 1–6. Indoor and Mobile Radio Communications (PIMRC), IEEE (2019)

    Google Scholar 

  35. Lee, S., Kim, J.: Warningbird: a near real-time detection system for suspicious urls in twitter stream. IEEE Trans. Depend. Secure Comput. 10(3), 183–195 (2013)

    Article  Google Scholar 

  36. Liew, S.W., Sani, N.F.M., Abdullah, M.T., Yaakob, R., Sharum, M.Y.: An effective security alert mechanism for real-time phishing tweet detection on twitter. Comput. Secur. 83, 201–207 (2019)

    Article  Google Scholar 

  37. Patil, D.R., Patil, J.B.: Feature-based malicious url and attack type detection using multi-class classification. ISeCure 10(2), (2018)

  38. Namasivayam, B.: Categorization of phishing detection features. PhD thesis, PhD thesis, Arizona State University (2017)

  39. Hai, Q.T., Hwang, S.O.: Detection of malicious urls based on word vector representation and ngram. J. Intell. Fuzzy Syst. 35(6), 5889–5900 (2018)

    Article  Google Scholar 

  40. Yuan, H., Yang, Z., Chen, X., Li, Y., Liu, W.: Url2vec: Url modeling with character embeddings for fast and accurate phishing website detection. In: 2018 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Ubiquitous Computing & Communications, Big Data & Cloud Computing, Social Computing & Networking, Sustainable Computing & Communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom), IEEE, pp 265–272 (2018)

  41. Jang, B., Kim, I., Kim, J.W.: Word2vec convolutional neural networks for classification of news articles and tweets. PLoS ONE 14(8), (2019)

  42. Otoum, S., Kantarci, B., Mouftah, H.T.: On the feasibility of deep learning in sensor network intrusion detection. IEEE Netw. Lett. 1(2), 68–71 (2019)

    Article  Google Scholar 

  43. Aloqaily, M., Otoum, S., Al Ridhawi, I., Jararweh, Y.: An intrusion detection system for connected vehicles in smart cities. Ad Hoc Netw. 90, 101842 (2019)

    Article  Google Scholar 

  44. Javed, A.R., Usman, M., Rehman, S.U., Khan, M.U., Haghighi, M.S.: Anomaly detection in automated vehicles using multistage attention-based convolutional neural network. IEEE Trans. Intell. Transport. Syst. pp 1–10, https://doi.org/10.1109/TITS.2020.3025875 (2020)

  45. Rehman Javed, A., Jalil, Z., Atif Moqurrab, S., Abbas, S., Liu, X.: Ensemble adaboost classifier for accurate and fast detection of botnet attacks in connected vehicles. Trans. Emerg. Telecommun. Technol. p e4088 (2020)

  46. Le, H., Pham, Q., Sahoo, D., Hoi, S.C.: Urlnet: learning a url representation with deep learning for malicious url detection. arXiv preprint arXiv:180203162 (2018)

  47. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: t-distributed stochastic neighbor embedding. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Muhammad Asim.

Ethics declarations

Conflicts of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Afzal, S., Asim, M., Javed, A.R. et al. URLdeepDetect: A Deep Learning Approach for Detecting Malicious URLs Using Semantic Vector Models. J Netw Syst Manage 29, 21 (2021). https://doi.org/10.1007/s10922-021-09587-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10922-021-09587-8

Keywords