Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2339530.2339725acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Summarization-based mining bipartite graphs

Published:12 August 2012Publication History

ABSTRACT

How to extract the truly relevant information from a large relational data set? The answer of this paper is a technique integrating graph summarization, graph clustering, link prediction and the discovery of the hidden structure on the basis of data compression. Our novel algorithm SCMiner (for Summarization-Compression Miner) reduces a large bipartite input graph to a highly compact representation which is very useful for different data mining tasks: 1) Clustering: The compact summary graph contains the truly relevant clusters of both types of nodes of a bipartite graph. 2) Link prediction: The compression scheme of SCMiner reveals suspicious edges which are probably erroneous as well as missing edges, i.e. pairs of nodes which should be connected by an edge. 3) Discovery of the hidden structure: Unlike traditional co-clustering methods, the result of SCMiner is not limited to row- and column-clusters. Besides the clusters, the summary graph also contains the essential relationships between both types of clusters and thus reveals the hidden structure of the data. Extensive experiments on synthetic and real data demonstrate that SCMiner outperforms state-of-the-art techniques for clustering and link prediction. Moreover, SCMiner discovers the hidden structure and reports it in an interpretable way to the user. Based on data compression, our technique does not rely on any input parameters which are difficult to estimate.

Skip Supplemental Material Section

Supplemental Material

307_w_talk_4.mp4

mp4

292.8 MB

References

  1. D. Chakrabarti, S. Papadimitriou, D. S. Modha, and C. Faloutsos. Fully automatic cross-associations. In KDD, pages 79--88, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. H. Cho, I. S. Dhillon, Y. Guan, and S. Sra. Minimum sum-squared residue co-clustering of gene expression data. In SDM, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  3. I. S. Dhillon. Co-clustering documents and words using bipartite spectral graph partitioning. In KDD, pages 269--274, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. I. S. Dhillon, S. Mallela, and D. S. Modha. Information-theoretic co-clustering. In KDD, pages 89--98, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. T. George and S. Merugu. A scalable collaborative filtering framework based on co-clustering. In ICDM, pages 625--628, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. A. Hasan, V. Chaoji, S. Salem, and M. Zaki. Link prediction using supervised learning. In Proc. of SDM Workshop on Link Analysis, 2006.Google ScholarGoogle Scholar
  7. L. B. Holder, D. J. Cook, and S. Djoko. Substucture discovery in the subdue system. In KDD Workshop, pages 169--180, 1994.Google ScholarGoogle Scholar
  8. L. Katz. A new status index derived from sociometric analysis. PSYCHOMETRIKA, 18(1):39--43, 1953.Google ScholarGoogle ScholarCross RefCross Ref
  9. J. Kunegis, E. W. D. Luca, and S. Albayrak. The link prediction problem in bipartite networks. CoRR, abs/1006.5367, 2010.Google ScholarGoogle Scholar
  10. D. Liben-Nowell and J. M. Kleinberg. The link prediction problem for social networks. In CIKM, pages 556--559, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. R. Lichtenwalter, J. T. Lussier, and N. V. Chawla. New perspectives and methods in link prediction. In KDD, pages 243--252, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. B. Long, X. Wu, Z. M. Zhang, P. S. Yu, and P. S. Yu. Unsupervised learning on k-partite graphs. In KDD, pages 317--326, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. B. Long, Z. M. Zhang, P. S. Yu, and P. S. Yu. Co-clustering by block value decomposition. In KDD, pages 635--640, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. Navlakha, R. Rastogi, and N. Shrivastava. Graph summarization with bounded error. In SIGMOD, pages 419--432, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. E. J. Newman. Clustering and preferential attachment in growing networks. PHYS.REV.E, 64:025102, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  16. J. Rissanen. Information and Complexity in Statistical Modeling. Springer, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. H. Shan and A. Banerjee. Bayesian co-clustering. In ICDM, pages 530--539, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. A. Stolcke and S. M. Omohundro. Hidden markov model induction by bayesian model merging. In NIPS, pages 11--18, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. Stolcke and S. M. Omohundro. Inducing probabilistic grammars by bayesian model merging. In ICGI, pages 106--118, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Y. Tian, R. A. Hankins, and J. M. Patel. Efficient aggregation for graph summarization. In SIGMOD, pages 567--580, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. N. X. Vinh, J. Epps, and J. Bailey. Information theoretic measures for clusterings comparison: is a correction for chance necessary? In ICML, pages 1073--1080, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. N. Zhang, Y. Tian, and J. M. Patel. Discovery-driven graph summarization. In ICDE, pages 880--891, 2010.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Summarization-based mining bipartite graphs

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      KDD '12: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
      August 2012
      1616 pages
      ISBN:9781450314626
      DOI:10.1145/2339530

      Copyright © 2012 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 12 August 2012

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate1,133of8,635submissions,13%

      Upcoming Conference

      KDD '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader