Abstract
Traditional intrusion detection methods are facing the problems of distinguishing different types of intrusion with high similarity. The methods use a single value to characterize each attribute and mine the relationship of each attribute at the feature extraction stage. However, this granularity of features extraction is not sufficient to distinguish different intrusions whose network flow characteristics are similar. Facing the problem, we establish an intrusion detection model based on Latent Dirichlet Allocation (ID-LDA) and propose a novel topic reconstruction method to extract the distinctive features. We mine the value distribution of each attribute and the association of multiple attributes to extract the more implicit semantic features. These features are more useful for identifying slight differences in different kinds of intrusions. However, the current LDA models are difficult in determining the most optimal topic number. Meanwhile, the recent methods ignore the multiple topics selection. These above problems result in difficulty in generating the perfect Document-Topic Distribution (DTD) and lower detection accuracy. So we propose a topic overlap degree and a dispersion degree to quantitatively assess the quality of the DTD. Finally, we get the most optimal topic number and select the best topics. Experiments on the public NSL-KDD dataset have verified the validity of the ID-LDA. These results outperform many state-of-the-art intrusion detection methods in terms of accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. Mach. Learn. Res. 3, 993–1022 (2003)
Zhang, Y., Chen, W., Zha, H., et al.: A time-topic coupled LDA model for IPTV user behaviors. IEEE Trans. Broadcast. 61(1), 56–65 (2015)
Farrahi, K., Gatica-Perez, D.: Discovering routines from large scale human locations using probabilistic topic models. ACM Trans. Intell. Syst. Technol. 2(1), (2011)
Huynh, T., Fritz, M., Schiele, B.: Discovery of activity patterns using topic models. In: Proceedings of the 10th International Conference on Ubiquitous Computing, Seoul, Korea, pp. 10–19. ACM (2008)
Guixian, X., Xu, W., Yao, H., et al.: Research on topic recognition of network sensitive information based on SW-LDA model. IEEE Access 7, 21527–21538 (2019)
Zhang, Y., Wang, Z., Yongtao, Yu., et al.: LF-LDA: a supervised topic model for multi-label documents classification. IJDWM 14(2), 18–36 (2018)
Casale, P., Pujol, O., Radeva, P., et al.: A first approach to activity recognition using topic models. In: Artificial Intelligence Research & Development, International Conference of the Catalan Association for Artificial Intelligence, CCIA, Vilar Rural De Cardona, October. DBLP (2009)
Yang, Y., Sun, J., Guo, L.: PersonaIA: a lightweight implicit authentication system based on customized user behavior selection. IEEE Trans. Dependable Secure Comput. 16(1), 113–126 (2019)
Wilson, J., Chaudhury, S., Lall, B.: Clustering short temporal behaviour sequences for customer segmentation using LDA. Expert Syst. e12250 (2009)
Xie, L., Shi, Y., Li, Z.: Driving pattern recognition based on improved LDA model. In: 5th IEEE International Conference on Cloud Computing and Intelligence Systems (CCIS), Nanjing, China, pp. 320–324 (2018)
Gao, Y., Wei, X., Zhang, X., et al.: A combinational LDA-based topic model for user interest inference of energy efficient IPTV service in smart building. IEEE Access 6, 48921–48933 (2018)
Chen, W., Zhang, Y., Zha, H.: Mining IPTV user behaviors with a coupled LDA model. In: IEEE International Symposium on Broadband Multimedia Systems & Broadcasting, London, pp. 1–6. IEEE (2013)
Wang, Z., Gu, S., Xu, X.: GSLDA: LDA-based group spamming detection in product reviews. Appl. Intell. 1, 1–14 (2018)
Budhiraja, A., Reddy, R., Shrivastava, M.: Poster: LWE: LDA refined word embeddings for duplicate bug report detection. In: 2018 IEEE/ACM 40th International Conference on Software Engineering: Companion Proceedings, Gothenburg, pp. 165–166. IEEE Computer Society (2018)
Andrzejewski, D., Mulhern, A., Liblit, B., Zhu, X.: Statistical debugging using latent topic models. In: Kok, J.N., Koronacki, J., Mantaras, R.L., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 6–17. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74958-5_5
Mäntylä, M., Claes, M., Farooq, U.: Measuring LDA topic stability from clusters of replicated runs. In: ESEM 2018 ACM, Oulu, Finland (2018)
Gollapalli, S.D., Li, X.-l.: Using PageRank for characterizing topic quality in LDA. In: 2018 ACM SIGIR International Conference on the Theory of Information Retrieval (ICTIR 2018), Tianjin, China, pp. 115–122 (2018)
Morstatter, F., Liu, H.: A novel measure for coherence in statistical topic models. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany, pp. 543–548 (2016)
Newman, D., Lau, J.H., Grieser, K., et al.: Automatic evaluation of topic coherence. In: Human Language Technologies: Conference of the North American Chapter of the Association of the ACL, Los Angeles, California, pp. 100–108 (2010)
Jonathan, C., Boyd-Graber, J., et al.: Reading tea leaves: how humans interpret topic models. In: NIPS, Vancouver, British Columbia, Canada (2009)
Zhao, W., Chen, J.J., Perkins, R., et al.: A heuristic approach to determine an appropriate number of topics in topic modeling. BMC Bioinformatics 16(Suppl 13), S8 (2015)
Grant, S., Cordy, J.R., Skillicorn, D.B.: Using heuristics to estimate an appropriate number of latent topics in source code analysis. Sci. Comput. Program. 78(9), 1663–1678 (2013)
Lin, J.: Divergence measures based on the Shannon entropy. IEEE Trans. Inf. Theory 37(1), 145–151 (1991)
McHugh, J., Brugger, S.T. (1999). http://kdd.ics.uci.edu/databases/kddcup99.thml
Zhihua, C., Lei, D., et al.: Malicious code detection based on CNNs and multi-objective algorithm. Parallel Distrib. Comput. 129, 50–58 (2019)
Xiaoyu, G., Hui, Z., et al.: A single attention-based combination of CNN and RNN for relation classification. IEEE Access 7, 12467–12475 (2019)
Yao, H., Sun, X., et al.: An enhanced LSTM for trend following of time series. IEEE Access 7, 34020–34030 (2019)
Alguliyev, R.M., Aliguliyev, R.M., et al.: The improved LSTM and CNN models for DDoS attacks prediction in social media. IJCWT 9(1), 1–18 (2019)
Acknowledgment
This work is supported by the National Natural Science Foundation of China (U1636208, F020605, No. 61902013).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Lei, S., Xia, C., Wang, T., Wang, S. (2020). Topic Reconstruction: A Novel Method Based on LDA Oriented to Intrusion Detection. In: Wen, S., Zomaya, A., Yang, L. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2019. Lecture Notes in Computer Science(), vol 11944. Springer, Cham. https://doi.org/10.1007/978-3-030-38991-8_38
Download citation
DOI: https://doi.org/10.1007/978-3-030-38991-8_38
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-38990-1
Online ISBN: 978-3-030-38991-8
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)