ABSTRACT
Synchronization is a powerful basic concept in nature regulating a large variety of complex processes ranging from the metabolism in the cell to social behavior in groups of individuals. Therefore, synchronization phenomena have been extensively studied and models robustly capturing the dynamical synchronization process have been proposed, e.g. the Extensive Kuramoto Model. Inspired by the powerful concept of synchronization, we propose Sync, a novel approach to clustering. The basic idea is to view each data object as a phase oscillator and simulate the interaction behavior of the objects over time. As time evolves, similar objects naturally synchronize together and form distinct clusters. Inherited from synchronization, Sync has several desirable properties: The clusters revealed by dynamic synchronization truly reflect the intrinsic structure of the data set, Sync does not rely on any distribution assumption and allows detecting clusters of arbitrary number, shape and size. Moreover, the concept of synchronization allows natural outlier handling, since outliers do not synchronize with cluster objects. For fully automatic clustering, we propose to combine Sync with the Minimum Description Length principle. Extensive experiments on synthetic and real world data demonstrate the effectiveness and efficiency of our approach.
Supplemental Material
- J. A. Acebron, L. L. Bonilla, C. J. P. Vicente, F. Ritort, and R. Spigler. The kuramoto model: A simple paradigm for synchronization phenomena. Rev. of Modern Physics, 77(2):137--185, 2005.Google ScholarCross Ref
- D. Aeyels, and F. D. Smet. A mathematical model for the dynamics of clustering. Physica D: Nonlinear Phenomena, 273(19):2517--2530, 2008.Google ScholarCross Ref
- R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan. Automatic subspace clustering of high dimensional data for data mining applications. In SIGMOD Conf., pages 94--105, 1998. Google ScholarDigital Library
- M. Ankerst, M. M. Breunig, H.-P. Kriegel, and J. Sander. Optics: Ordering points to identify the clustering structure. SIGMOD Conf., pages 49--60, 1999. Google ScholarDigital Library
- A. Arenas, A. Diaz-Guilera, J. Kurths, Y. Moreno and C. S. Zhou. Synchronization in complex networks. Phys. Rep. 469, pages 93--153, 2008.Google Scholar
- A. Arenas, A. Diaz-Guilera, and C. J. Perez-Vicente. Synchronization reveals topological scales in complex networks. Phys. Rev. Lett., 96:114102, 2006.Google ScholarCross Ref
- F. Bach and M. Jordan. Learning spectral clustering. NIPS Conf., MIT Press, 2004.Google Scholar
- C. Boehm, C. Faloutsos, J.-Y. Pan, and C. Plant. Robust information-theoretic clustering. KDD Conf., page 65--75, 2006. Google ScholarDigital Library
- C. Boehm, C. Faloutsos, and C. Plant. Outlier-robust clustering using independent components. SIGMOD Conf., pages 185--198, 2008. Google ScholarDigital Library
- D. Comaniciu, and P. Meer. Mean shift: A robust approach toward feature space analysis. IEEE Trans. Patt. Analy. Mach. Intell., 24(5): 603--619, 2002. Google ScholarDigital Library
- A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society, 39(1):1--31, 1977.Google Scholar
- B. Dom. An information-theoretic external cluster-validity measure. Technical Report RJ10219, IBM, 2001.Google Scholar
- M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. KDD Conf., pages 226--231, 1996.Google Scholar
- B. J. Frey, D. Dueck. Clustering by passing messages between data points. Science, 315: 972--976, 2007.Google ScholarCross Ref
- P. Gruenwald. A tutorial introduction to the minimum description length principle. Advances in Minimum Description Length: Theory and Applications, 2005.Google ScholarCross Ref
- S. Guha, R. Rastogi and K. Shim. CURE: An Efficient Clustering Algorithm for Large Databases, SIGMOD Conf., pages 73--84, 1998. Google ScholarDigital Library
- G. Hamerly and C. Elkan. Learning the k in k-means. NIPS Conf., 2003.Google Scholar
- A. K. Jain and R. C. Dubes. Algorithms for Clustering Data. Prentice-Hall. 1988. Google ScholarDigital Library
- C. S. Kim, C. S. Bae, and H. J. Tcha. A phase synchronization clustering algorithm for identifying interesting groups of genes from cell cycle expression data. BMC Bioinformatics, 9(56), 2008.Google Scholar
- Y. Kuramoto. Self-entrainment of a population of coupled non-linear oscillators. International Symposium on Mathematical Problems in Theoretical Physics, Lecture Notes in Physics, pages 420--422, 1975.Google Scholar
- Y. Kuramoto. Chemical oscillations, waves, and turbulence. Springer-Verlag, New York,NY,USA, 1984.Google Scholar
- J. B. MacQueen. Some methods for classification and analysis of multivariate observations. 5-th Berkeley Symposium on Math. Stat. and Prob. Vol. 1, University of California Press, pages 281--297, 1967.Google Scholar
- F. Murtagh. A Survey of Recent Advances in Hierarchical Clustering Algorithms. Comput. J., 26(4): 354--359, 1983.Google Scholar
- A. Y. Ng, M. I. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. In Advances in Neural Information Processing Systems 14, pages 849--856, 2001.Google Scholar
- R. T. Ng and J. Han. Efficient and effective clustering methods for spatial data mining. VLDB Conf., pages 144--155. 1994. Google ScholarDigital Library
- D. Pelleg and A. Moore. X-means: Extending k-means with efficient estimation of the number of clusters. ICML Conf., pages 727--734, 2000. Google ScholarDigital Library
- P. Seliger, S. C. Young, and L. S. Tsimring. Plasticity and learning in a network of coupled phase oscillators. Phys. Rev. E, 65:137--185, Jan. 2002.Google ScholarCross Ref
- B. Silverman. Density Estimation for Statistics and Data Analysis. CHAPMAN and HALL, 1986.Google ScholarCross Ref
- N. X. Vinh, J. Epps, and J. Bailey. Information Theoretic Measures for Clusterings Comparison: Is a Correction for Chance Necessary? ICML Conf., pages 1073--1080, 2009. Google ScholarDigital Library
- M. P. Wand and M. C. Jones. Kernel Smoothing. Chapman and Hall, London, 1995.Google ScholarCross Ref
- T. Zhang, R. Ramakrishnan, and M. Livny. An efficient data clustering method for very large databases. SIGMOD Conf., pages 103--114, 1996. Google ScholarDigital Library
Index Terms
- Clustering by synchronization
Recommendations
An effective synchronization clustering algorithm
This paper presents an Effective Synchronization Clustering (ESynC) algorithm using a linear version of Vicsek model. The development of ESynC algorithm is inspired by Synchronization Clustering (SynC) algorithm and Vicsek model. After some analysis and ...
Synchronization-Inspired Partitioning and Hierarchical Clustering
Synchronization is a powerful and inherently hierarchical concept regulating a large variety of complex processes ranging from the metabolism in a cell to opinion formation in a group of individuals. Synchronization phenomena in nature have been widely ...
A shrinking synchronization clustering algorithm based on a linear weighted Vicsek model
The purpose of clustering is to identify distributions and patterns within unlabelled datasets. Since the proposal of the original synchronization clustering (SynC) algorithm in 2010, synchronization clustering has become a significant research ...
Comments