Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1835804.1835879acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Clustering by synchronization

Published:25 July 2010Publication History

ABSTRACT

Synchronization is a powerful basic concept in nature regulating a large variety of complex processes ranging from the metabolism in the cell to social behavior in groups of individuals. Therefore, synchronization phenomena have been extensively studied and models robustly capturing the dynamical synchronization process have been proposed, e.g. the Extensive Kuramoto Model. Inspired by the powerful concept of synchronization, we propose Sync, a novel approach to clustering. The basic idea is to view each data object as a phase oscillator and simulate the interaction behavior of the objects over time. As time evolves, similar objects naturally synchronize together and form distinct clusters. Inherited from synchronization, Sync has several desirable properties: The clusters revealed by dynamic synchronization truly reflect the intrinsic structure of the data set, Sync does not rely on any distribution assumption and allows detecting clusters of arbitrary number, shape and size. Moreover, the concept of synchronization allows natural outlier handling, since outliers do not synchronize with cluster objects. For fully automatic clustering, we propose to combine Sync with the Minimum Description Length principle. Extensive experiments on synthetic and real world data demonstrate the effectiveness and efficiency of our approach.

Skip Supplemental Material Section

Supplemental Material

kdd2010_shao_cs_01.mov

mov

110.4 MB

References

  1. J. A. Acebron, L. L. Bonilla, C. J. P. Vicente, F. Ritort, and R. Spigler. The kuramoto model: A simple paradigm for synchronization phenomena. Rev. of Modern Physics, 77(2):137--185, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  2. D. Aeyels, and F. D. Smet. A mathematical model for the dynamics of clustering. Physica D: Nonlinear Phenomena, 273(19):2517--2530, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  3. R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan. Automatic subspace clustering of high dimensional data for data mining applications. In SIGMOD Conf., pages 94--105, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Ankerst, M. M. Breunig, H.-P. Kriegel, and J. Sander. Optics: Ordering points to identify the clustering structure. SIGMOD Conf., pages 49--60, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. Arenas, A. Diaz-Guilera, J. Kurths, Y. Moreno and C. S. Zhou. Synchronization in complex networks. Phys. Rep. 469, pages 93--153, 2008.Google ScholarGoogle Scholar
  6. A. Arenas, A. Diaz-Guilera, and C. J. Perez-Vicente. Synchronization reveals topological scales in complex networks. Phys. Rev. Lett., 96:114102, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  7. F. Bach and M. Jordan. Learning spectral clustering. NIPS Conf., MIT Press, 2004.Google ScholarGoogle Scholar
  8. C. Boehm, C. Faloutsos, J.-Y. Pan, and C. Plant. Robust information-theoretic clustering. KDD Conf., page 65--75, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. C. Boehm, C. Faloutsos, and C. Plant. Outlier-robust clustering using independent components. SIGMOD Conf., pages 185--198, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. D. Comaniciu, and P. Meer. Mean shift: A robust approach toward feature space analysis. IEEE Trans. Patt. Analy. Mach. Intell., 24(5): 603--619, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society, 39(1):1--31, 1977.Google ScholarGoogle Scholar
  12. B. Dom. An information-theoretic external cluster-validity measure. Technical Report RJ10219, IBM, 2001.Google ScholarGoogle Scholar
  13. M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. KDD Conf., pages 226--231, 1996.Google ScholarGoogle Scholar
  14. B. J. Frey, D. Dueck. Clustering by passing messages between data points. Science, 315: 972--976, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  15. P. Gruenwald. A tutorial introduction to the minimum description length principle. Advances in Minimum Description Length: Theory and Applications, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  16. S. Guha, R. Rastogi and K. Shim. CURE: An Efficient Clustering Algorithm for Large Databases, SIGMOD Conf., pages 73--84, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. G. Hamerly and C. Elkan. Learning the k in k-means. NIPS Conf., 2003.Google ScholarGoogle Scholar
  18. A. K. Jain and R. C. Dubes. Algorithms for Clustering Data. Prentice-Hall. 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. C. S. Kim, C. S. Bae, and H. J. Tcha. A phase synchronization clustering algorithm for identifying interesting groups of genes from cell cycle expression data. BMC Bioinformatics, 9(56), 2008.Google ScholarGoogle Scholar
  20. Y. Kuramoto. Self-entrainment of a population of coupled non-linear oscillators. International Symposium on Mathematical Problems in Theoretical Physics, Lecture Notes in Physics, pages 420--422, 1975.Google ScholarGoogle Scholar
  21. Y. Kuramoto. Chemical oscillations, waves, and turbulence. Springer-Verlag, New York,NY,USA, 1984.Google ScholarGoogle Scholar
  22. J. B. MacQueen. Some methods for classification and analysis of multivariate observations. 5-th Berkeley Symposium on Math. Stat. and Prob. Vol. 1, University of California Press, pages 281--297, 1967.Google ScholarGoogle Scholar
  23. F. Murtagh. A Survey of Recent Advances in Hierarchical Clustering Algorithms. Comput. J., 26(4): 354--359, 1983.Google ScholarGoogle Scholar
  24. A. Y. Ng, M. I. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. In Advances in Neural Information Processing Systems 14, pages 849--856, 2001.Google ScholarGoogle Scholar
  25. R. T. Ng and J. Han. Efficient and effective clustering methods for spatial data mining. VLDB Conf., pages 144--155. 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. D. Pelleg and A. Moore. X-means: Extending k-means with efficient estimation of the number of clusters. ICML Conf., pages 727--734, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. P. Seliger, S. C. Young, and L. S. Tsimring. Plasticity and learning in a network of coupled phase oscillators. Phys. Rev. E, 65:137--185, Jan. 2002.Google ScholarGoogle ScholarCross RefCross Ref
  28. B. Silverman. Density Estimation for Statistics and Data Analysis. CHAPMAN and HALL, 1986.Google ScholarGoogle ScholarCross RefCross Ref
  29. N. X. Vinh, J. Epps, and J. Bailey. Information Theoretic Measures for Clusterings Comparison: Is a Correction for Chance Necessary? ICML Conf., pages 1073--1080, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. M. P. Wand and M. C. Jones. Kernel Smoothing. Chapman and Hall, London, 1995.Google ScholarGoogle ScholarCross RefCross Ref
  31. T. Zhang, R. Ramakrishnan, and M. Livny. An efficient data clustering method for very large databases. SIGMOD Conf., pages 103--114, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Clustering by synchronization

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
      July 2010
      1240 pages
      ISBN:9781450300551
      DOI:10.1145/1835804

      Copyright © 2010 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 25 July 2010

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate1,133of8,635submissions,13%

      Upcoming Conference

      KDD '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader