research-article

Summarization-based mining bipartite graphs

Authors:
Jing Feng

University of Munich, Munich, Germany

University of Munich, Munich, Germany
View Profile

,
Xiao He

University of Munich, Munich, Germany

University of Munich, Munich, Germany
View Profile

,
Bettina Konte

University of Munich, Munich, Germany

University of Munich, Munich, Germany
View Profile

,
Christian Böhm

University of Munich, Munich, Germany

University of Munich, Munich, Germany
View Profile

,
Claudia Plant

Florida State University, Tallahassee, FL, USA

Florida State University, Tallahassee, FL, USA
View Profile

KDD '12: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data miningAugust 2012Pages 1249–1257https://doi.org/10.1145/2339530.2339725

Published:12 August 2012Publication History

KDD '12: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 1249–1257

ABSTRACT

How to extract the truly relevant information from a large relational data set? The answer of this paper is a technique integrating graph summarization, graph clustering, link prediction and the discovery of the hidden structure on the basis of data compression. Our novel algorithm SCMiner (for Summarization-Compression Miner) reduces a large bipartite input graph to a highly compact representation which is very useful for different data mining tasks: 1) Clustering: The compact summary graph contains the truly relevant clusters of both types of nodes of a bipartite graph. 2) Link prediction: The compression scheme of SCMiner reveals suspicious edges which are probably erroneous as well as missing edges, i.e. pairs of nodes which should be connected by an edge. 3) Discovery of the hidden structure: Unlike traditional co-clustering methods, the result of SCMiner is not limited to row- and column-clusters. Besides the clusters, the summary graph also contains the essential relationships between both types of clusters and thus reveals the hidden structure of the data. Extensive experiments on synthetic and real data demonstrate that SCMiner outperforms state-of-the-art techniques for clustering and link prediction. Moreover, SCMiner discovers the hidden structure and reports it in an interpretable way to the user. Based on data compression, our technique does not rely on any input parameters which are difficult to estimate.

Supplemental Material

307_w_talk_4.mp4

mp4

292.8 MB

Download

References

D. Chakrabarti, S. Papadimitriou, D. S. Modha, and C. Faloutsos. Fully automatic cross-associations. In KDD, pages 79--88, 2004. Google ScholarDigital Library
H. Cho, I. S. Dhillon, Y. Guan, and S. Sra. Minimum sum-squared residue co-clustering of gene expression data. In SDM, 2004.Google ScholarCross Ref
I. S. Dhillon. Co-clustering documents and words using bipartite spectral graph partitioning. In KDD, pages 269--274, 2001. Google ScholarDigital Library
I. S. Dhillon, S. Mallela, and D. S. Modha. Information-theoretic co-clustering. In KDD, pages 89--98, 2003. Google ScholarDigital Library
T. George and S. Merugu. A scalable collaborative filtering framework based on co-clustering. In ICDM, pages 625--628, 2005. Google ScholarDigital Library
M. A. Hasan, V. Chaoji, S. Salem, and M. Zaki. Link prediction using supervised learning. In Proc. of SDM Workshop on Link Analysis, 2006.Google Scholar
L. B. Holder, D. J. Cook, and S. Djoko. Substucture discovery in the subdue system. In KDD Workshop, pages 169--180, 1994.Google Scholar
L. Katz. A new status index derived from sociometric analysis. PSYCHOMETRIKA, 18(1):39--43, 1953.Google ScholarCross Ref
J. Kunegis, E. W. D. Luca, and S. Albayrak. The link prediction problem in bipartite networks. CoRR, abs/1006.5367, 2010.Google Scholar
D. Liben-Nowell and J. M. Kleinberg. The link prediction problem for social networks. In CIKM, pages 556--559, 2003. Google ScholarDigital Library
R. Lichtenwalter, J. T. Lussier, and N. V. Chawla. New perspectives and methods in link prediction. In KDD, pages 243--252, 2010. Google ScholarDigital Library
B. Long, X. Wu, Z. M. Zhang, P. S. Yu, and P. S. Yu. Unsupervised learning on k-partite graphs. In KDD, pages 317--326, 2006. Google ScholarDigital Library
B. Long, Z. M. Zhang, P. S. Yu, and P. S. Yu. Co-clustering by block value decomposition. In KDD, pages 635--640, 2005. Google ScholarDigital Library
S. Navlakha, R. Rastogi, and N. Shrivastava. Graph summarization with bounded error. In SIGMOD, pages 419--432, 2008. Google ScholarDigital Library
M. E. J. Newman. Clustering and preferential attachment in growing networks. PHYS.REV.E, 64:025102, 2001.Google ScholarCross Ref
J. Rissanen. Information and Complexity in Statistical Modeling. Springer, 2007. Google ScholarDigital Library
H. Shan and A. Banerjee. Bayesian co-clustering. In ICDM, pages 530--539, 2008. Google ScholarDigital Library
A. Stolcke and S. M. Omohundro. Hidden markov model induction by bayesian model merging. In NIPS, pages 11--18, 1992. Google ScholarDigital Library
A. Stolcke and S. M. Omohundro. Inducing probabilistic grammars by bayesian model merging. In ICGI, pages 106--118, 1994. Google ScholarDigital Library
Y. Tian, R. A. Hankins, and J. M. Patel. Efficient aggregation for graph summarization. In SIGMOD, pages 567--580, 2008. Google ScholarDigital Library
N. X. Vinh, J. Epps, and J. Bailey. Information theoretic measures for clusterings comparison: is a correction for chance necessary? In ICML, pages 1073--1080, 2009. Google ScholarDigital Library
N. Zhang, Y. Tian, and J. M. Patel. Discovery-driven graph summarization. In ICDE, pages 880--891, 2010.Google ScholarCross Ref

Index Terms

Summarization-based mining bipartite graphs
1. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Equistarable bipartite graphs

Recently, Milanič and Trotignon introduced the class of equistarable graphs as graphs without isolated vertices admitting positive weights on the edges such that a subset of edges is of total weight 1 if and only if it forms a maximal star. Based on ...
Read More
Interval Non-edge-Colorable Bipartite Graphs and Multigraphs

An edge-coloring of a graph G with colors 1,...,t is called an interval t-coloring if all colors are used, and the colors of edges incident to any vertex of G are distinct and form an interval of integers. In 1991, Erdï s constructed a bipartite graph ...
Read More
Hamiltonian and long paths in bipartite graphs with connectivity
Abstract
Let G be a graph, ν ( G ) the order of G, κ ( G ) the connectivity of G and k a positive integer such that k ≤ ( ν ( G ) − 2 ) / 2. Then G is said to be k-extendable if it has a matching of size k and every matching of size k extends ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '12: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
August 2012
1616 pages
ISBN:9781450314626
DOI:10.1145/2339530
General Chair:
Qiang Yang
Hong Kong University of Science and Technology
,
Program Chairs:
Deepak Agarwal
LinkedIn
,
Jian Pei
Simon Fraser University
Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 August 2012
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
bipartite graph
clustering
link prediction
summarization
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,133of8,635submissions,13%
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 12
  Total Citations
  View Citations
- 911
  Total Downloads
- Downloads (Last 12 months)7
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Summarization-based mining bipartite graphs

KDD '12: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Equistarable bipartite graphs

Interval Non-edge-Colorable Bipartite Graphs and Multigraphs

Hamiltonian and long paths in bipartite graphs with connectivity