Predicting Android malware combining permissions and API call sequences

Chen, Xin; Yu, Haihua; Yu, Dongjin; Chen, Jie; Sun, Xiaoxiao

doi:10.1007/s11219-022-09602-4

Predicting Android malware combining permissions and API call sequences

Published: 18 November 2022

Volume 31, pages 655–685, (2023)
Cite this article

Software Quality Journal Aims and scope Submit manuscript

Xin Chen¹,
Haihua Yu¹,
Dongjin Yu ORCID: orcid.org/0000-0001-8919-1613¹,
Jie Chen¹ &
…
Xiaoxiao Sun¹

379 Accesses
1 Citation
Explore all metrics

Abstract

Malware detection is an important task in software maintenance. It can effectively protect user information from the attack of malicious developers. Existing studies mainly focus on leveraging permission information and API call information to identify malware. However, many studies pay attention to the API call without considering the role of API call sequences. In this study, we propose a new method by combining both the permission information and the API call sequence information to distinguish malicious applications from benign applications. First, we extract features of permission and API call sequence with a decompiling tool. Then, one-hot encoding and Word2Vec are adopted to represent the permission feature and the API call sequence feature for each application, respectively. Based on this, we leverage Random Forest (RF) and Convolutional Neural Networks (CNN) to train a permission-based classifier and an API call sequence-based classifier, respectively. Finally, we design a linear strategy to combine the outputs of these two classifiers to predict the labels of newly arrived applications. By an evaluation with 15,198 malicious applications and 15,129 benign applications, our approach achieves 98.84% in terms of precision, 98.17% in terms of recall, 98.50% in terms of F1-score, and 98.52% in terms of accuracy on average, and outperforms the state-of-art method Malscan by 2.12%, 0.27%, 1.20%, and 1.24%, respectively. In addition, we demonstrate that the method combining two features achieves better performance than the methods based on a single feature.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Adventures in data analysis: a systematic review of Deep Learning techniques for pattern recognition in cyber-physical-social systems

Article 09 August 2023

LLM-BRC: A large language model-based bug report classification framework

Article 24 May 2024

Applications of AI in classical software engineering

Article Open access 26 July 2020

Notes

References

Aafer, Y., Du, W., & Yin, H. (2013). Droidapiminer: Mining API-level features for robust malware detection in android. In International Conference on Security and Privacy in Communication Systems (pp. 86–103). Springer.
Alazab, M., Alazab, M., Shalaginov, A., Mesleh, A., & Awajan, A. (2020). Intelligent mobile malware detection using permission requests and API calls. Future Generation Computer Systems, 107, 509–521. Publisher: Elsevier.
Allix, K., Bissyand, T. F., Klein, J., & Le Traon, Y. (2016). Androzoo: Collecting millions of android apps for the research community. In 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR) (pp. 468–471). IEEE.
Arp, D., Spreitzenbarth, M., Hubner, M., Gascon, H., Rieck, K., & Siemens, C. (2014). Drebin: Effective and explainable detection of android malware in your pocket. NDSS, 14, 23–26.
Google Scholar
Bottou, L. (1998). Online learning and stochastic approximations. On-line Learning in Neural Networks, 17(9), 142.
MATH Google Scholar
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. Publisher: Springer.
Burguera, I., Zurutuza, U., & Nadjm-Tehrani, S. (2011). Crowdroid: behavior-based malware detection system for android. In Proceedings of the 1st ACM Workshop on Security and Privacy in Smartphones and Mobile Devices (pp 15–26).
Garg, S., & Baliyan, N. (2019). A novel parallel classifier scheme for vulnerability detection in android. Computers & Electrical Engineering, 77, 12–26. Publisher: Elsevier.
Han, K., Kang, B., & Im, E. G. (2014). Malware analysis using visualized image matrices. The Scientific World Journal, 2014. Publisher: Hindawi.
Harris, D., & Harris, S. (2010). Digital design and computer architecture. Morgan Kaufmann.
Google Scholar
Herley, C. E., Keogh, B. W., Hulett, A. M., Marinescu, A. M., Williams, J. S., & Nurilov, S. (2015). Spyware detection mechanism. Google Patents.
Hui, T., Tang, X., & Loy, C. C. (2021). A lightweight optical flow CNN - revisiting data fidelity and regularization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(8), 2555–2569.
Article Google Scholar
Jerbi, M., Dagdia, Z. C., Bechikh, S., & Said, L. B. (2020). On the use of artificial malicious patterns for android malware detection. Computers & Security, 92, 101743. Publisher: Elsevier.
Karbab, E. B., Debbabi, M., Alrabaee, S., & Mouheb, D. (2016). DySign: Dynamic fingerprinting for the automatic detection of android malware. In 2016 11th International Conference on Malicious and Unwanted Software (MALWARE) (pp. 1–8) IEEE.
Karbab, E. B., Debbabi, M., Derhab, A., & Mouheb, D. (2018). MalDozer: Automatic framework for android malware detection using deep learning. Digital Investigation, 24, S48–S59. Publisher: Elsevier.
Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. Preprint retrieved from http://arxiv.org/abs/1412.6980
Kohavi, R., et al. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. In IJCAI (vol. 14, pp. 1137–1145). Montreal, Canada.
Landwehr, C. E., Bull, A. R., McDermott, J. P., & Choi, W. S. (1994). A taxonomy of computer program security flaws. ACM Computing Surveys (CSUR), 26(3), 211–254. Publisher: ACM New York, NY, USA.
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998a). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324. Publisher: IEEE.
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324.
Article Google Scholar
Mercaldo, F., Visaggio, C. A., Canfora, G., & Cimitile, A. (2016). Mobile malware detection in the real world. In 2016 IEEE/ACM 38th International Conference on Software Engineering Companion (ICSE-C) (pp. 744–746) IEEE.
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013a). Efficient estimation of word representations in vector space. Preprint retrieved from http://arxiv.org/abs/1301.3781
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013b). Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems (pp. 3111–3119).
Nash, T. (2005). An undirected attack against critical infrastructure. US-CERT Control Systems Security Center: Technical Report.
Google Scholar
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., et al. (2011). Scikit-learn: Machine learning in Python. The Journal of Machine Learning Research, 12, 2825–2830. Publisher: JMLR.org
Peiravian, N., & Zhu, X. (2013). Machine learning for android malware detection using permission and API calls. In 2013 IEEE 25th International Conference on Tools with Artificial Intelligence (pp. 300–305). IEEE.
Pektaş, A., & Acarman, T. (2020). Deep learning for effective Android malware detection using API call graph embeddings. Soft Computing, 24(2), 1027–1043. Publisher: Springer.
Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1), 81–106. Publisher: Springer.
Shao, Y., Chen, Q. A., Mao, Z. M., Ott, J., & Qian, Z. (2016). Kratos: Discovering inconsistent security policy enforcement in the android framework. In NDSS.
Srinivasa-Desikan, B. (2018). Natural language processing and computational linguistics: A practical guide to text analysis with Python, Gensim, spaCy, and Keras. Packt Publishing Ltd.
Vapnik, V. (2013). The nature of statistical learning theory. Springer Science & Business Media.
Vasan, D., Alazab, M., Wassan, S., Naeem, H., Safaei, B., & Zheng, Q. (2020). IMCFN: Image-based malware classification using fine-tuned convolutional neural network architecture. Computer Networks, 171, 107138. Publisher: Elsevier.
Wang, W., Wang, X., Feng, D., Liu, J., Han, Z., & Zhang, X. (2014). Exploring permission-induced risk in android applications for malicious application detection. IEEE Transactions on Information Forensics and Security, 9(11), 1869–1882. Publisher: IEEE.
Wu, D. J., Mao, C. H., Wei, T. E., Lee, H. M., & Wu, K. P. (2012). Droidmat: Android malware detection through manifest and API calls tracing. In 2012 Seventh Asia Joint Conference on Information Security (pp 62–69). IEEE.
Wu, Y., Li, X., Zou, D., Yang, W., Zhang, X., & Jin, H. (2019). MalScan: Fast market-wide mobile malware scanning by social-network centrality analysis. In 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE) (pp. 139–150). IEEE
Xu, L., Zhang, D., Jayasena, N., & Cavazos, J. (2016). HADM: Hybrid analysis for detection of malware. In Proceedings of SAI Intelligent Systems Conference (pp. 702–724). Springer.
Young, A., & Yung, M. (1996). Cryptovirology: Extortion-based security threats and countermeasures. In Proceedings 1996 IEEE Symposium on Security and Privacy (pp 129–140). IEEE.
Zhou, Y., & Jiang, X. (2012). Dissecting android malware: Characterization and evolution. In 2012 IEEE Symposium on Security and Privacy (pp 95–109). IEEE.
Zhu, H. J., Jiang, T. H., Ma, B., You, Z. H., Shi, W. L., & Cheng, L. (2018). HEMD: A highly efficient random forest-based malware detection framework for Android. Neural Computing and Applications, 30(11), 3353–3361. Publisher: Springer.

Download references

Funding

This work was supported in part by the Natural Science Foundation of Zhejiang Province under Grant LY21F020020, in part by the National Natural Science Foundation of China under Grant 61902096, and in part by Key Project of Science and Technology of Zhejiang Province under Grant 2020C01165.

Author information

Authors and Affiliations

School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, 310018, China
Xin Chen, Haihua Yu, Dongjin Yu, Jie Chen & Xiaoxiao Sun

Authors

Xin Chen
View author publications
You can also search for this author in PubMed Google Scholar
Haihua Yu
View author publications
You can also search for this author in PubMed Google Scholar
Dongjin Yu
View author publications
You can also search for this author in PubMed Google Scholar
Jie Chen
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoxiao Sun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dongjin Yu.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Chen, X., Yu, H., Yu, D. et al. Predicting Android malware combining permissions and API call sequences. Software Qual J 31, 655–685 (2023). https://doi.org/10.1007/s11219-022-09602-4

Download citation

Accepted: 26 September 2022
Published: 18 November 2022
Issue Date: September 2023
DOI: https://doi.org/10.1007/s11219-022-09602-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Predicting Android malware combining permissions and API call sequences

Abstract

Access this article

Similar content being viewed by others

Adventures in data analysis: a systematic review of Deep Learning techniques for pattern recognition in cyber-physical-social systems

LLM-BRC: A large language model-based bug report classification framework

Applications of AI in classical software engineering

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Predicting Android malware combining permissions and API call sequences

Abstract

Access this article

Similar content being viewed by others

Adventures in data analysis: a systematic review of Deep Learning techniques for pattern recognition in cyber-physical-social systems

LLM-BRC: A large language model-based bug report classification framework

Applications of AI in classical software engineering

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation