Topic Reconstruction: A Novel Method Based on LDA Oriented to Intrusion Detection

Lei, Shengwei; Xia, Chunhe; Wang, Tianbo; Wang, Shizhao

doi:10.1007/978-3-030-38991-8_38

Shengwei Lei¹¹,
Chunhe Xia¹¹,
Tianbo Wang^11,12 &
…
Shizhao Wang¹¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11944))

Included in the following conference series:

International Conference on Algorithms and Architectures for Parallel Processing

1610 Accesses

Abstract

Traditional intrusion detection methods are facing the problems of distinguishing different types of intrusion with high similarity. The methods use a single value to characterize each attribute and mine the relationship of each attribute at the feature extraction stage. However, this granularity of features extraction is not sufficient to distinguish different intrusions whose network flow characteristics are similar. Facing the problem, we establish an intrusion detection model based on Latent Dirichlet Allocation (ID-LDA) and propose a novel topic reconstruction method to extract the distinctive features. We mine the value distribution of each attribute and the association of multiple attributes to extract the more implicit semantic features. These features are more useful for identifying slight differences in different kinds of intrusions. However, the current LDA models are difficult in determining the most optimal topic number. Meanwhile, the recent methods ignore the multiple topics selection. These above problems result in difficulty in generating the perfect Document-Topic Distribution (DTD) and lower detection accuracy. So we propose a topic overlap degree and a dispersion degree to quantitatively assess the quality of the DTD. Finally, we get the most optimal topic number and select the best topics. Experiments on the public NSL-KDD dataset have verified the validity of the ID-LDA. These results outperform many state-of-the-art intrusion detection methods in terms of accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Zhang, Y., Chen, W., Zha, H., et al.: A time-topic coupled LDA model for IPTV user behaviors. IEEE Trans. Broadcast. 61(1), 56–65 (2015)
Article Google Scholar
Farrahi, K., Gatica-Perez, D.: Discovering routines from large scale human locations using probabilistic topic models. ACM Trans. Intell. Syst. Technol. 2(1), (2011)
Google Scholar
Huynh, T., Fritz, M., Schiele, B.: Discovery of activity patterns using topic models. In: Proceedings of the 10th International Conference on Ubiquitous Computing, Seoul, Korea, pp. 10–19. ACM (2008)
Google Scholar
Guixian, X., Xu, W., Yao, H., et al.: Research on topic recognition of network sensitive information based on SW-LDA model. IEEE Access 7, 21527–21538 (2019)
Article Google Scholar
Zhang, Y., Wang, Z., Yongtao, Yu., et al.: LF-LDA: a supervised topic model for multi-label documents classification. IJDWM 14(2), 18–36 (2018)
Google Scholar
Casale, P., Pujol, O., Radeva, P., et al.: A first approach to activity recognition using topic models. In: Artificial Intelligence Research & Development, International Conference of the Catalan Association for Artificial Intelligence, CCIA, Vilar Rural De Cardona, October. DBLP (2009)
Google Scholar
Yang, Y., Sun, J., Guo, L.: PersonaIA: a lightweight implicit authentication system based on customized user behavior selection. IEEE Trans. Dependable Secure Comput. 16(1), 113–126 (2019)
Article Google Scholar
Wilson, J., Chaudhury, S., Lall, B.: Clustering short temporal behaviour sequences for customer segmentation using LDA. Expert Syst. e12250 (2009)
Google Scholar
Xie, L., Shi, Y., Li, Z.: Driving pattern recognition based on improved LDA model. In: 5th IEEE International Conference on Cloud Computing and Intelligence Systems (CCIS), Nanjing, China, pp. 320–324 (2018)
Google Scholar
Gao, Y., Wei, X., Zhang, X., et al.: A combinational LDA-based topic model for user interest inference of energy efficient IPTV service in smart building. IEEE Access 6, 48921–48933 (2018)
Article Google Scholar
Chen, W., Zhang, Y., Zha, H.: Mining IPTV user behaviors with a coupled LDA model. In: IEEE International Symposium on Broadband Multimedia Systems & Broadcasting, London, pp. 1–6. IEEE (2013)
Google Scholar
Wang, Z., Gu, S., Xu, X.: GSLDA: LDA-based group spamming detection in product reviews. Appl. Intell. 1, 1–14 (2018)
Google Scholar
Budhiraja, A., Reddy, R., Shrivastava, M.: Poster: LWE: LDA refined word embeddings for duplicate bug report detection. In: 2018 IEEE/ACM 40th International Conference on Software Engineering: Companion Proceedings, Gothenburg, pp. 165–166. IEEE Computer Society (2018)
Google Scholar
Andrzejewski, D., Mulhern, A., Liblit, B., Zhu, X.: Statistical debugging using latent topic models. In: Kok, J.N., Koronacki, J., Mantaras, R.L., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 6–17. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74958-5_5
Chapter Google Scholar
Mäntylä, M., Claes, M., Farooq, U.: Measuring LDA topic stability from clusters of replicated runs. In: ESEM 2018 ACM, Oulu, Finland (2018)
Google Scholar
Gollapalli, S.D., Li, X.-l.: Using PageRank for characterizing topic quality in LDA. In: 2018 ACM SIGIR International Conference on the Theory of Information Retrieval (ICTIR 2018), Tianjin, China, pp. 115–122 (2018)
Google Scholar
Morstatter, F., Liu, H.: A novel measure for coherence in statistical topic models. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany, pp. 543–548 (2016)
Google Scholar
Newman, D., Lau, J.H., Grieser, K., et al.: Automatic evaluation of topic coherence. In: Human Language Technologies: Conference of the North American Chapter of the Association of the ACL, Los Angeles, California, pp. 100–108 (2010)
Google Scholar
Jonathan, C., Boyd-Graber, J., et al.: Reading tea leaves: how humans interpret topic models. In: NIPS, Vancouver, British Columbia, Canada (2009)
Google Scholar
Zhao, W., Chen, J.J., Perkins, R., et al.: A heuristic approach to determine an appropriate number of topics in topic modeling. BMC Bioinformatics 16(Suppl 13), S8 (2015)
Article Google Scholar
Grant, S., Cordy, J.R., Skillicorn, D.B.: Using heuristics to estimate an appropriate number of latent topics in source code analysis. Sci. Comput. Program. 78(9), 1663–1678 (2013)
Article Google Scholar
Lin, J.: Divergence measures based on the Shannon entropy. IEEE Trans. Inf. Theory 37(1), 145–151 (1991)
Article MathSciNet Google Scholar
McHugh, J., Brugger, S.T. (1999). http://kdd.ics.uci.edu/databases/kddcup99.thml
Zhihua, C., Lei, D., et al.: Malicious code detection based on CNNs and multi-objective algorithm. Parallel Distrib. Comput. 129, 50–58 (2019)
Article Google Scholar
Xiaoyu, G., Hui, Z., et al.: A single attention-based combination of CNN and RNN for relation classification. IEEE Access 7, 12467–12475 (2019)
Article Google Scholar
Yao, H., Sun, X., et al.: An enhanced LSTM for trend following of time series. IEEE Access 7, 34020–34030 (2019)
Article Google Scholar
Alguliyev, R.M., Aliguliyev, R.M., et al.: The improved LSTM and CNN models for DDoS attacks prediction in social media. IJCWT 9(1), 1–18 (2019)
Google Scholar

Download references

Acknowledgment

This work is supported by the National Natural Science Foundation of China (U1636208, F020605, No. 61902013).

Author information

Authors and Affiliations

Key Laboratory of Beijing Network Technology, Beihang University, Beijing, China
Shengwei Lei, Chunhe Xia, Tianbo Wang & Shizhao Wang
School of Cyber Science and Technology, Beihang University, Beijing, China
Tianbo Wang

Authors

Shengwei Lei
View author publications
You can also search for this author in PubMed Google Scholar
Chunhe Xia
View author publications
You can also search for this author in PubMed Google Scholar
Tianbo Wang
View author publications
You can also search for this author in PubMed Google Scholar
Shizhao Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tianbo Wang .

Editor information

Editors and Affiliations

Department of Computer Science and Software Engineering, Swinburne University of Technology, Hawthorn, Melbourne, VIC, Australia
Sheng Wen
School of Computer Science, The University of Sydney, Camperdown, NSW, Australia
Albert Zomaya
Department of Computer Science, St. Francis Xavier University, Antigonish, NS, Canada
Laurence T. Yang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lei, S., Xia, C., Wang, T., Wang, S. (2020). Topic Reconstruction: A Novel Method Based on LDA Oriented to Intrusion Detection. In: Wen, S., Zomaya, A., Yang, L. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2019. Lecture Notes in Computer Science(), vol 11944. Springer, Cham. https://doi.org/10.1007/978-3-030-38991-8_38

Download citation

DOI: https://doi.org/10.1007/978-3-030-38991-8_38
Published: 22 January 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-38990-1
Online ISBN: 978-3-030-38991-8
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics