Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Predicting Android malware combining permissions and API call sequences

  • Published:
Software Quality Journal Aims and scope Submit manuscript

Abstract

Malware detection is an important task in software maintenance. It can effectively protect user information from the attack of malicious developers. Existing studies mainly focus on leveraging permission information and API call information to identify malware. However, many studies pay attention to the API call without considering the role of API call sequences. In this study, we propose a new method by combining both the permission information and the API call sequence information to distinguish malicious applications from benign applications. First, we extract features of permission and API call sequence with a decompiling tool. Then, one-hot encoding and Word2Vec are adopted to represent the permission feature and the API call sequence feature for each application, respectively. Based on this, we leverage Random Forest (RF) and Convolutional Neural Networks (CNN) to train a permission-based classifier and an API call sequence-based classifier, respectively. Finally, we design a linear strategy to combine the outputs of these two classifiers to predict the labels of newly arrived applications. By an evaluation with 15,198 malicious applications and 15,129 benign applications, our approach achieves 98.84% in terms of precision, 98.17% in terms of recall, 98.50% in terms of F1-score, and 98.52% in terms of accuracy on average, and outperforms the state-of-art method Malscan by 2.12%, 0.27%, 1.20%, and 1.24%, respectively. In addition, we demonstrate that the method combining two features achieves better performance than the methods based on a single feature.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. https://play.google.com/store/apps

  2. https://www.apple.com/ios/app-store/

  3. https://www.gdatasoftware.com/blog/2018/11/31255-cyber-attacks-on-android-devices-on-the-rise

  4. https://securelist.com/it-threat-evolution-q1-2020-statistics/96959/

  5. https://developer.android.google.cn/guide/topics/permissions/overview?hl=en

References

  • Aafer, Y., Du, W., & Yin, H. (2013). Droidapiminer: Mining API-level features for robust malware detection in android. In International Conference on Security and Privacy in Communication Systems (pp. 86–103). Springer.

  • Alazab, M., Alazab, M., Shalaginov, A., Mesleh, A., & Awajan, A. (2020). Intelligent mobile malware detection using permission requests and API calls. Future Generation Computer Systems, 107, 509–521. Publisher: Elsevier.

  • Allix, K., Bissyand, T. F., Klein, J., & Le Traon, Y. (2016). Androzoo: Collecting millions of android apps for the research community. In 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR) (pp. 468–471). IEEE.

  • Arp, D., Spreitzenbarth, M., Hubner, M., Gascon, H., Rieck, K., & Siemens, C. (2014). Drebin: Effective and explainable detection of android malware in your pocket. NDSS, 14, 23–26.

    Google Scholar 

  • Bottou, L. (1998). Online learning and stochastic approximations. On-line Learning in Neural Networks, 17(9), 142.

    MATH  Google Scholar 

  • Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. Publisher: Springer.

  • Burguera, I., Zurutuza, U., & Nadjm-Tehrani, S. (2011). Crowdroid: behavior-based malware detection system for android. In Proceedings of the 1st ACM Workshop on Security and Privacy in Smartphones and Mobile Devices (pp 15–26).

  • Garg, S., & Baliyan, N. (2019). A novel parallel classifier scheme for vulnerability detection in android. Computers & Electrical Engineering, 77, 12–26. Publisher: Elsevier.

  • Han, K., Kang, B., & Im, E. G. (2014). Malware analysis using visualized image matrices. The Scientific World Journal, 2014. Publisher: Hindawi.

  • Harris, D., & Harris, S. (2010). Digital design and computer architecture. Morgan Kaufmann.

    Google Scholar 

  • Herley, C. E., Keogh, B. W., Hulett, A. M., Marinescu, A. M., Williams, J. S., & Nurilov, S. (2015). Spyware detection mechanism. Google Patents.

  • Hui, T., Tang, X., & Loy, C. C. (2021). A lightweight optical flow CNN - revisiting data fidelity and regularization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(8), 2555–2569.

    Article  Google Scholar 

  • Jerbi, M., Dagdia, Z. C., Bechikh, S., & Said, L. B. (2020). On the use of artificial malicious patterns for android malware detection. Computers & Security, 92, 101743. Publisher: Elsevier.

  • Karbab, E. B., Debbabi, M., Alrabaee, S., & Mouheb, D. (2016). DySign: Dynamic fingerprinting for the automatic detection of android malware. In 2016 11th International Conference on Malicious and Unwanted Software (MALWARE) (pp. 1–8) IEEE.

  • Karbab, E. B., Debbabi, M., Derhab, A., & Mouheb, D. (2018). MalDozer: Automatic framework for android malware detection using deep learning. Digital Investigation, 24, S48–S59. Publisher: Elsevier.

  • Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. Preprint retrieved from http://arxiv.org/abs/1412.6980

  • Kohavi, R., et al. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. In IJCAI (vol. 14pp. 1137–1145). Montreal, Canada.

  • Landwehr, C. E., Bull, A. R., McDermott, J. P., & Choi, W. S. (1994). A taxonomy of computer program security flaws. ACM Computing Surveys (CSUR), 26(3), 211–254. Publisher: ACM New York, NY, USA.

  • LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998a). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324. Publisher: IEEE.

  • LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324.

    Article  Google Scholar 

  • Mercaldo, F., Visaggio, C. A., Canfora, G., & Cimitile, A. (2016). Mobile malware detection in the real world. In 2016 IEEE/ACM 38th International Conference on Software Engineering Companion (ICSE-C) (pp. 744–746) IEEE.

  • Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013a). Efficient estimation of word representations in vector space. Preprint retrieved from http://arxiv.org/abs/1301.3781

  • Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013b). Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems (pp. 3111–3119).

  • Nash, T. (2005). An undirected attack against critical infrastructure. US-CERT Control Systems Security Center: Technical Report.

    Google Scholar 

  • Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., et al. (2011). Scikit-learn: Machine learning in Python. The Journal of Machine Learning Research, 12, 2825–2830. Publisher: JMLR.org

  • Peiravian, N., & Zhu, X. (2013). Machine learning for android malware detection using permission and API calls. In 2013 IEEE 25th International Conference on Tools with Artificial Intelligence (pp. 300–305). IEEE.

  • Pektaş, A., & Acarman, T. (2020). Deep learning for effective Android malware detection using API call graph embeddings. Soft Computing, 24(2), 1027–1043. Publisher: Springer.

  • Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1), 81–106. Publisher: Springer.

  • Shao, Y., Chen, Q. A., Mao, Z. M., Ott, J., & Qian, Z. (2016). Kratos: Discovering inconsistent security policy enforcement in the android framework. In NDSS.

  • Srinivasa-Desikan, B. (2018). Natural language processing and computational linguistics: A practical guide to text analysis with Python, Gensim, spaCy, and Keras. Packt Publishing Ltd.

  • Vapnik, V. (2013). The nature of statistical learning theory. Springer Science & Business Media.

  • Vasan, D., Alazab, M., Wassan, S., Naeem, H., Safaei, B., & Zheng, Q. (2020). IMCFN: Image-based malware classification using fine-tuned convolutional neural network architecture. Computer Networks, 171, 107138. Publisher: Elsevier.

  • Wang, W., Wang, X., Feng, D., Liu, J., Han, Z., & Zhang, X. (2014). Exploring permission-induced risk in android applications for malicious application detection. IEEE Transactions on Information Forensics and Security, 9(11), 1869–1882. Publisher: IEEE.

  • Wu, D. J., Mao, C. H., Wei, T. E., Lee, H. M., & Wu, K. P. (2012). Droidmat: Android malware detection through manifest and API calls tracing. In 2012 Seventh Asia Joint Conference on Information Security (pp 62–69). IEEE.

  • Wu, Y., Li, X., Zou, D., Yang, W., Zhang, X., & Jin, H. (2019). MalScan: Fast market-wide mobile malware scanning by social-network centrality analysis. In 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE) (pp. 139–150). IEEE

  • Xu, L., Zhang, D., Jayasena, N., & Cavazos, J. (2016). HADM: Hybrid analysis for detection of malware. In Proceedings of SAI Intelligent Systems Conference (pp. 702–724). Springer.

  • Young, A., & Yung, M. (1996). Cryptovirology: Extortion-based security threats and countermeasures. In Proceedings 1996 IEEE Symposium on Security and Privacy (pp 129–140). IEEE.

  • Zhou, Y., & Jiang, X. (2012). Dissecting android malware: Characterization and evolution. In 2012 IEEE Symposium on Security and Privacy (pp 95–109). IEEE.

  • Zhu, H. J., Jiang, T. H., Ma, B., You, Z. H., Shi, W. L., & Cheng, L. (2018). HEMD: A highly efficient random forest-based malware detection framework for Android. Neural Computing and Applications, 30(11), 3353–3361. Publisher: Springer.

Download references

Funding

This work was supported in part by the Natural Science Foundation of Zhejiang Province under Grant LY21F020020, in part by the National Natural Science Foundation of China under Grant 61902096, and in part by Key Project of Science and Technology of Zhejiang Province under Grant 2020C01165.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dongjin Yu.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, X., Yu, H., Yu, D. et al. Predicting Android malware combining permissions and API call sequences. Software Qual J 31, 655–685 (2023). https://doi.org/10.1007/s11219-022-09602-4

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11219-022-09602-4

Keywords