Abstract
In recent years, privacy-protecting data mining has attracted widespread concern because it is necessary to provide protection for the privacy level of sensitive and confidential data from unauthorized attacks. The purpose of this study is to develop a privacy-protecting anonymity algorithm using decision tree classification. This paper focuses on k-anonymity technology, which can prevent identity leakage. K-anonymity technology adopts generalization and suppression methods to achieve data anonymity. Then, the privacy level and mining quality of anonymous data sets will be tested by using decision tree classification, and then compared with other data mining technologies (logistic regression and support vector machine). As is shown in the research, compared with other data mining technologies, privacy level and data quality provide better results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Xu, N., Wang, H., Cen, L., et al.: Discussing the recognition method of sensitive data. Comput. Inf. Technol. 027(002), 14–15, 59 (2019)
Huang, A., Chen, X.: An improved ID3 algorithm of decision trees . Comput. Eng. Sci. 31(6), 109–111 (2009)
Song, F., Ma, T., Tian, Y., et al.: A new method of privacy protection: random k-anonymous. IEEE Access 7, 75434–75445 (2019)
Prasser, F., Kohlmayer, F.: Putting statistical disclosure control into practice: the ARX data anonymization tool. In: Gkoulalas-Divanis, A., Loukides, G. (eds.) Medical Data Privacy Handbook, pp. 111–148. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23633-9_6
Heckerman, D.: Bayesian networks for data mining. data mining and knowledge discovery. Data Mining Knowl. Discov. 1(1), 79–119 (1997)
Holmes, G., Donkin, A., Witten, I.H . WEKA: a machine learning workbench. In: Proceedings of ANZIIS 1994 - Australian New Zealnd Intelligent Information Systems Conference, pp. 357–361 (1994)
Aggarwal, C.C., Yu, P.S.: A general survey of privacy-preserving data mining models and algorithms. J. Vasc. Surg. 8(1), 64–70 (2008)
Milman, Y.: Minimum number of operations needed to identify an object in an array. J. Biotechnol. 85(2):103–13 (1968)
Chauhan, V.K., Dahiya, K., Sharma, A.: Problem formulations and solvers in linear SVM: a review. Artif. Intell. Rev. (2018)
Song, L., Ma, C., Duan, G., et al.: Privacy-preserving logistic regression on vertically partitioned data . J. Comput. Res. Dev. 56(10), 2243–2249 (2019)
Vaidya, J., Shafiq, B., Fan, W., et al.: A random decision tree framework for privacy-preserving data mining. IEEE Trans. Depend. Secure Comput. 11(5), 399–411 (2014)
Zhan, J.: Using homomorphic encryption for privacy-preserving collaborative decision tree classiffication. In: 2007 IEEE Symposium on Computational Intelligence and Data Mining, pp. 637–645 (2007)
Acknowledgement
This paper is supported by the science and technology project of State Grid Corporation of China: “Research and Application of Key Technology of Data Sharing and Distribution Security for Data Center” (Grand No. 5700-202090192A-0–0-00).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Ye, S., Cheng, Y., Yang, Y., Guo, Q. (2021). Sensitive Data Recognition and Filtering Model of Webpage Content Based on Decision Tree Algorithm. In: Tian, Y., Ma, T., Khan, M.K. (eds) Big Data and Security. ICBDS 2020. Communications in Computer and Information Science, vol 1415. Springer, Singapore. https://doi.org/10.1007/978-981-16-3150-4_42
Download citation
DOI: https://doi.org/10.1007/978-981-16-3150-4_42
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-3149-8
Online ISBN: 978-981-16-3150-4
eBook Packages: Computer ScienceComputer Science (R0)