research-article

Clustering by synchronization

Authors:
Christian Böhm

University of Munich, Munich, Germany

University of Munich, Munich, Germany
View Profile

,
Claudia Plant

Florida State University, Tallahassee, FL, USA

Florida State University, Tallahassee, FL, USA
View Profile

,
Junming Shao

University of Munich, Munich, Germany

University of Munich, Munich, Germany
View Profile

,
Qinli Yang

University of Edinburgh, Edinburgh, United Kingdom

University of Edinburgh, Edinburgh, United Kingdom
View Profile

KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data miningJuly 2010Pages 583–592https://doi.org/10.1145/1835804.1835879

Published:25 July 2010Publication History

KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 583–592

ABSTRACT

Synchronization is a powerful basic concept in nature regulating a large variety of complex processes ranging from the metabolism in the cell to social behavior in groups of individuals. Therefore, synchronization phenomena have been extensively studied and models robustly capturing the dynamical synchronization process have been proposed, e.g. the Extensive Kuramoto Model. Inspired by the powerful concept of synchronization, we propose Sync, a novel approach to clustering. The basic idea is to view each data object as a phase oscillator and simulate the interaction behavior of the objects over time. As time evolves, similar objects naturally synchronize together and form distinct clusters. Inherited from synchronization, Sync has several desirable properties: The clusters revealed by dynamic synchronization truly reflect the intrinsic structure of the data set, Sync does not rely on any distribution assumption and allows detecting clusters of arbitrary number, shape and size. Moreover, the concept of synchronization allows natural outlier handling, since outliers do not synchronize with cluster objects. For fully automatic clustering, we propose to combine Sync with the Minimum Description Length principle. Extensive experiments on synthetic and real world data demonstrate the effectiveness and efficiency of our approach.

Supplemental Material

kdd2010_shao_cs_01.mov

mov

110.4 MB

Download

References

J. A. Acebron, L. L. Bonilla, C. J. P. Vicente, F. Ritort, and R. Spigler. The kuramoto model: A simple paradigm for synchronization phenomena. Rev. of Modern Physics, 77(2):137--185, 2005.Google ScholarCross Ref
D. Aeyels, and F. D. Smet. A mathematical model for the dynamics of clustering. Physica D: Nonlinear Phenomena, 273(19):2517--2530, 2008.Google ScholarCross Ref
R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan. Automatic subspace clustering of high dimensional data for data mining applications. In SIGMOD Conf., pages 94--105, 1998. Google ScholarDigital Library
M. Ankerst, M. M. Breunig, H.-P. Kriegel, and J. Sander. Optics: Ordering points to identify the clustering structure. SIGMOD Conf., pages 49--60, 1999. Google ScholarDigital Library
A. Arenas, A. Diaz-Guilera, J. Kurths, Y. Moreno and C. S. Zhou. Synchronization in complex networks. Phys. Rep. 469, pages 93--153, 2008.Google Scholar
A. Arenas, A. Diaz-Guilera, and C. J. Perez-Vicente. Synchronization reveals topological scales in complex networks. Phys. Rev. Lett., 96:114102, 2006.Google ScholarCross Ref
F. Bach and M. Jordan. Learning spectral clustering. NIPS Conf., MIT Press, 2004.Google Scholar
C. Boehm, C. Faloutsos, J.-Y. Pan, and C. Plant. Robust information-theoretic clustering. KDD Conf., page 65--75, 2006. Google ScholarDigital Library
C. Boehm, C. Faloutsos, and C. Plant. Outlier-robust clustering using independent components. SIGMOD Conf., pages 185--198, 2008. Google ScholarDigital Library
D. Comaniciu, and P. Meer. Mean shift: A robust approach toward feature space analysis. IEEE Trans. Patt. Analy. Mach. Intell., 24(5): 603--619, 2002. Google ScholarDigital Library
A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society, 39(1):1--31, 1977.Google Scholar
B. Dom. An information-theoretic external cluster-validity measure. Technical Report RJ10219, IBM, 2001.Google Scholar
M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. KDD Conf., pages 226--231, 1996.Google Scholar
B. J. Frey, D. Dueck. Clustering by passing messages between data points. Science, 315: 972--976, 2007.Google ScholarCross Ref
P. Gruenwald. A tutorial introduction to the minimum description length principle. Advances in Minimum Description Length: Theory and Applications, 2005.Google ScholarCross Ref
S. Guha, R. Rastogi and K. Shim. CURE: An Efficient Clustering Algorithm for Large Databases, SIGMOD Conf., pages 73--84, 1998. Google ScholarDigital Library
G. Hamerly and C. Elkan. Learning the k in k-means. NIPS Conf., 2003.Google Scholar
A. K. Jain and R. C. Dubes. Algorithms for Clustering Data. Prentice-Hall. 1988. Google ScholarDigital Library
C. S. Kim, C. S. Bae, and H. J. Tcha. A phase synchronization clustering algorithm for identifying interesting groups of genes from cell cycle expression data. BMC Bioinformatics, 9(56), 2008.Google Scholar
Y. Kuramoto. Self-entrainment of a population of coupled non-linear oscillators. International Symposium on Mathematical Problems in Theoretical Physics, Lecture Notes in Physics, pages 420--422, 1975.Google Scholar
Y. Kuramoto. Chemical oscillations, waves, and turbulence. Springer-Verlag, New York,NY,USA, 1984.Google Scholar
J. B. MacQueen. Some methods for classification and analysis of multivariate observations. 5-th Berkeley Symposium on Math. Stat. and Prob. Vol. 1, University of California Press, pages 281--297, 1967.Google Scholar
F. Murtagh. A Survey of Recent Advances in Hierarchical Clustering Algorithms. Comput. J., 26(4): 354--359, 1983.Google Scholar
A. Y. Ng, M. I. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. In Advances in Neural Information Processing Systems 14, pages 849--856, 2001.Google Scholar
R. T. Ng and J. Han. Efficient and effective clustering methods for spatial data mining. VLDB Conf., pages 144--155. 1994. Google ScholarDigital Library
D. Pelleg and A. Moore. X-means: Extending k-means with efficient estimation of the number of clusters. ICML Conf., pages 727--734, 2000. Google ScholarDigital Library
P. Seliger, S. C. Young, and L. S. Tsimring. Plasticity and learning in a network of coupled phase oscillators. Phys. Rev. E, 65:137--185, Jan. 2002.Google ScholarCross Ref
B. Silverman. Density Estimation for Statistics and Data Analysis. CHAPMAN and HALL, 1986.Google ScholarCross Ref
N. X. Vinh, J. Epps, and J. Bailey. Information Theoretic Measures for Clusterings Comparison: Is a Correction for Chance Necessary? ICML Conf., pages 1073--1080, 2009. Google ScholarDigital Library
M. P. Wand and M. C. Jones. Kernel Smoothing. Chapman and Hall, London, 1995.Google ScholarCross Ref
T. Zhang, R. Ramakrishnan, and M. Livny. An efficient data clustering method for very large databases. SIGMOD Conf., pages 103--114, 1996. Google ScholarDigital Library

Index Terms

Clustering by synchronization
1. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

An effective synchronization clustering algorithm

This paper presents an Effective Synchronization Clustering (ESynC) algorithm using a linear version of Vicsek model. The development of ESynC algorithm is inspired by Synchronization Clustering (SynC) algorithm and Vicsek model. After some analysis and ...
Read More
Synchronization-Inspired Partitioning and Hierarchical Clustering

Synchronization is a powerful and inherently hierarchical concept regulating a large variety of complex processes ranging from the metabolism in a cell to opinion formation in a group of individuals. Synchronization phenomena in nature have been widely ...
Read More
A shrinking synchronization clustering algorithm based on a linear weighted Vicsek model

The purpose of clustering is to identify distributions and patterns within unlabelled datasets. Since the proposal of the original synchronization clustering (SynC) algorithm in 2010, synchronization clustering has become a significant research ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
July 2010
1240 pages
ISBN:9781450300551
DOI:10.1145/1835804
General Chairs:
Bharat Rao
Siemens
,
Balaji Krishnapuram
Siemens
,
Program Chairs:
Andrew Tomkins
Google Inc.
,
Qiang Yang
Hong Kong University of Science and Technology
Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 July 2010
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
clustering
kuramoto model
synchronization
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,133of8,635submissions,13%
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 54
  Total Citations
  View Citations
- 1,238
  Total Downloads
- Downloads (Last 12 months)38
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Clustering by synchronization

KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

An effective synchronization clustering algorithm

Synchronization-Inspired Partitioning and Hierarchical Clustering

A shrinking synchronization clustering algorithm based on a linear weighted Vicsek model