ABSTRACT
Recent concerns about privacy issues motivated data mining researchers to develop methods for performing data mining while preserving the privacy of individuals. However, the current techniques for privacy preserving data mining suffer from high communication and computation overheads which are prohibitive considering even a modest database size. Furthermore, the proposed techniques have strict assumptions on the involved parties which need to be relaxed in order to reflect the real-world requirements. In this paper we concentrate on a distributed scenario where the data is partitioned vertically over multiple sites and the involved sites would like to perform clustering without revealing their local databases. For this setting, we propose a new protocol for privacy preserving k-means clustering based on additive secret sharing. We show that the new protocol is more secure than the state of the art. Experiments conducted on real and synthetic data sets show that, in realistic scenarios, the communication and computation cost of our protocol is considerably less than the state of the art which is crucial for data mining applications.
- Rakesh Agrawal and Ramakrishnan Srikant. Privacy-preserving data mining. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, May 16--18, 2000, Dallas, Texas, USA, pages 439--450. ACM, 2000. Google ScholarDigital Library
- Michael Ben-Or, Shafi Goldwasser, and Avi Wigderson. Completeness theorems for non-cryptographic fault-tolerant distributed computation. In STOC '88: Proceedings of the twentieth annual ACM symposium on Theory of computing, pages 1--10, New York, NY, USA, 1988. ACM. Google ScholarDigital Library
- Chris Clifton, Murat Kantarcioglu, Jaideep Vaidya, Xiaodong Lin, and Michael Y. Zhu. Tools for privacy preserving distributed data mining. SIGKDD Explor. Newsl., 4(2):28--34, 2002. Google ScholarDigital Library
- Marc Fischlin. A cost-effective pay-per-multiplication comparison method for millionaires. In Progress in Cryptology - CT-RSA 2001: The Cryptographers' Track at RSA Conference 2001, volume 2020 of Lecture Notes in Computer Science, page 457, 2001. Google ScholarDigital Library
- Murat Kantarcioglu and Chris Clifton. Privacy-preserving distributed mining of association rules on horizontally partitioned data. IEEE Trans. Knowl. Data Eng., 16(9):1026--1037, 2004. Google ScholarDigital Library
- S. V. Kaya, T. B. Pedersen, E. Savaş, and Y. Saygin. Efficient privacy preserving distributed clustering based on secret sharing. In PAKDD 2007 International Workshops: Emerging Technologies in Knowledge Discovery and Data Mining, pages 280--291. Springer, 2007. Google ScholarDigital Library
- Selim Volkan Kaya. Toolbox for Privacy Preserving Data Mining. Master's thesis, Sabanci University, Istanbul, TURKEY, July 2007.Google Scholar
- Sven Laur, Helger Lipmaa, and Taneli Mielikäinen. Cryptographically private support vector machines. In KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 618--624. ACM, 2006. Google ScholarDigital Library
- Kun Liu, Hillol Kargupta, and Jessica Ryan. Random projection-based multiplicative data perturbation for privacy preserving distributed data mining. IEEE Trans. Knowl. Data Eng., 18(1):92--106, 2006. Google ScholarDigital Library
- Pascal Paillier. Public-key cryptosystems based on composite degree residuosity classes. In Advances in Cryptology --- EUROCRYPT '99. International Conference on the Theory and Application of Cryptographic Techniques, Lecture Notes in Computer Science, pages 223--238. Springer-Verlag, May 1999. Google ScholarDigital Library
- Tomas Sander, Adam Young, and Moti Yung. Non-interactive cryptocomputing for nc1. In FOCS '99: Proceedings of the 40th Annual Symposium on Foundations of Computer Science, page 554, Washington, DC, USA, 1999. IEEE Computer Society. Google ScholarDigital Library
- Jaideep Vaidya and Chris Clifton. Privacy-preserving k-means clustering over vertically partitioned data. In KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 206--215, New York, NY, USA, 2003. ACM Press. Google ScholarDigital Library
- Jaikumar Vijayan. House committee chair wants info on cancelled dhs data-mining programs. Computer World, September 18 2007.Google Scholar
- Rebecca Wright and Zhiqiang Yang. Privacy-preserving bayesian network structure computation on distributed heterogeneous data. In KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 713--718, New York, NY, USA, 2004. ACM. Google ScholarDigital Library
- Andrew C. Yao. Protocols for secure computations. In Proceedings of the 23rd IEEE Symposium on Foundations of Computer Science (FOCS '82), pages 160--164, 1982. Google ScholarDigital Library
Index Terms
- Distributed privacy preserving k-means clustering with additive secret sharing
Recommendations
Distributed Privacy Preserving Clustering via Homomorphic Secret Sharing and Its Application to Vertically Partitioned Spatio-Temporal Data
Recent concerns about privacy issues have motivated data mining researchers to develop methods for performing data mining while preserving the privacy of individuals. One approach to develop privacy preserving data mining algorithms is secure multiparty ...
Secure Multi-party Protocols for Privacy Preserving Data Mining
WASA '08: Proceedings of the Third International Conference on Wireless Algorithms, Systems, and ApplicationsPeople are more and more concerned with privacy protection while performing data mining. ID3 is a very popular decision tree building method in data mining. Entropy and Gini index are two different criteria used in ID3. While there is quite some work in ...
Privacy Preserving Data Mining Techniques: Current Scenario and Future Prospects
ICCCT '12: Proceedings of the 2012 Third International Conference on Computer and Communication TechnologyPrivacy preserving has originated as an important concern with reference to the success of the data mining. Privacy preserving data mining (PPDM) deals with protecting the privacy of individual data or sensitive knowledge without sacrificing the utility ...
Comments