Sensitive Data Recognition and Filtering Model of Webpage Content Based on Decision Tree Algorithm

Ye, Sheng; Cheng, Yong; Yang, Yonggang; Guo, Qian

doi:10.1007/978-981-16-3150-4_42

Sheng Ye⁸,
Yong Cheng⁹,
Yonggang Yang⁹ &
…
Qian Guo^10,11

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1415))

Included in the following conference series:

International Conference on Big Data and Security

932 Accesses

Abstract

In recent years, privacy-protecting data mining has attracted widespread concern because it is necessary to provide protection for the privacy level of sensitive and confidential data from unauthorized attacks. The purpose of this study is to develop a privacy-protecting anonymity algorithm using decision tree classification. This paper focuses on k-anonymity technology, which can prevent identity leakage. K-anonymity technology adopts generalization and suppression methods to achieve data anonymity. Then, the privacy level and mining quality of anonymous data sets will be tested by using decision tree classification, and then compared with other data mining technologies (logistic regression and support vector machine). As is shown in the research, compared with other data mining technologies, privacy level and data quality provide better results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Xu, N., Wang, H., Cen, L., et al.: Discussing the recognition method of sensitive data. Comput. Inf. Technol. 027(002), 14–15, 59 (2019)
Google Scholar
Huang, A., Chen, X.: An improved ID3 algorithm of decision trees . Comput. Eng. Sci. 31(6), 109–111 (2009)
Google Scholar
Song, F., Ma, T., Tian, Y., et al.: A new method of privacy protection: random k-anonymous. IEEE Access 7, 75434–75445 (2019)
Article Google Scholar
Prasser, F., Kohlmayer, F.: Putting statistical disclosure control into practice: the ARX data anonymization tool. In: Gkoulalas-Divanis, A., Loukides, G. (eds.) Medical Data Privacy Handbook, pp. 111–148. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23633-9_6
Chapter Google Scholar
Heckerman, D.: Bayesian networks for data mining. data mining and knowledge discovery. Data Mining Knowl. Discov. 1(1), 79–119 (1997)
Google Scholar
Holmes, G., Donkin, A., Witten, I.H . WEKA: a machine learning workbench. In: Proceedings of ANZIIS 1994 - Australian New Zealnd Intelligent Information Systems Conference, pp. 357–361 (1994)
Google Scholar
Aggarwal, C.C., Yu, P.S.: A general survey of privacy-preserving data mining models and algorithms. J. Vasc. Surg. 8(1), 64–70 (2008)
Google Scholar
Milman, Y.: Minimum number of operations needed to identify an object in an array. J. Biotechnol. 85(2):103–13 (1968)
Google Scholar
Chauhan, V.K., Dahiya, K., Sharma, A.: Problem formulations and solvers in linear SVM: a review. Artif. Intell. Rev. (2018)
Google Scholar
Song, L., Ma, C., Duan, G., et al.: Privacy-preserving logistic regression on vertically partitioned data . J. Comput. Res. Dev. 56(10), 2243–2249 (2019)
Google Scholar
Vaidya, J., Shafiq, B., Fan, W., et al.: A random decision tree framework for privacy-preserving data mining. IEEE Trans. Depend. Secure Comput. 11(5), 399–411 (2014)
Article Google Scholar
Zhan, J.: Using homomorphic encryption for privacy-preserving collaborative decision tree classiffication. In: 2007 IEEE Symposium on Computational Intelligence and Data Mining, pp. 637–645 (2007)
Google Scholar

Download references

Acknowledgement

This paper is supported by the science and technology project of State Grid Corporation of China: “Research and Application of Key Technology of Data Sharing and Distribution Security for Data Center” (Grand No. 5700-202090192A-0–0-00).

Author information

Authors and Affiliations

State Grid ZheJiang Electric Power Company Ltd, Hangzhou, 310000, China
Sheng Ye
State Grid ShanXi Electric Power Company, Xi’an, 710000, China
Yong Cheng & Yonggang Yang
Global Energy Interconnection Research Institute Co., Ltd, Nanjing, 210000, China
Qian Guo
State Grid Key Laboratory of Information & Network Security, Nanjing, 210003, China
Qian Guo

Authors

Sheng Ye
View author publications
You can also search for this author in PubMed Google Scholar
Yong Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Yonggang Yang
View author publications
You can also search for this author in PubMed Google Scholar
Qian Guo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qian Guo .

Editor information

Editors and Affiliations

Nanjing Institute of Technology, Nanjing, China
Yuan Tian
Nanjing University of Information Science and Technology, Nanjing, China
Tinghuai Ma
King Saud Unviersity, Riyadh, Saudi Arabia
Muhammad Khurram Khan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ye, S., Cheng, Y., Yang, Y., Guo, Q. (2021). Sensitive Data Recognition and Filtering Model of Webpage Content Based on Decision Tree Algorithm. In: Tian, Y., Ma, T., Khan, M.K. (eds) Big Data and Security. ICBDS 2020. Communications in Computer and Information Science, vol 1415. Springer, Singapore. https://doi.org/10.1007/978-981-16-3150-4_42

Download citation

DOI: https://doi.org/10.1007/978-981-16-3150-4_42
Published: 22 June 2021
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-3149-8
Online ISBN: 978-981-16-3150-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics