Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2808719.2812595acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
research-article

Scalable multipartite subgraph enumeration for integrative analysis of heterogeneous experimental functional genomics data

Published:09 September 2015Publication History

ABSTRACT

Functional genomics, the effort to understand the role of genomic elements in biological processes, has led to an avalanche of diverse experimental and semantic information defining associations between genes and various biological concepts across species and experimental paradigms. Integrating this rapidly expanding wealth of heterogeneous data, and finding consensus among so many diverse sources for specific research questions, require highly sophisticated big data structures and algorithms for harmonization and scalable analysis. In this context, multipartite graphs can often serve as useful structures for representing questions about the role of genes in multiple, frequently-occurring disease processes. The main focus of this paper is on finding and analyzing efficient algorithms for dense subgraph enumeration in such graphs. An O(3n/3)-time procedure was devised to enumerate all maximal k-partite cliques in a k-partite graph, where k ≥ 3. The maximum number of such cliques is also shown to obey this bound, and thus this procedure obtains the best possible asymptotic performance. Empirical testing on both real and synthetic data is conducted. Concrete applications to biological data are described, as are scalability issues in the context of big data analysis.

References

  1. Abu-Khzam, F. N., Baldwin, N. E., Langston, M. A. and Samatova, N. F., On the Relative Efficiency of Maximal Clique Enumeration Algorithms, with Application to High-Throughput Computational Biology. in Proceedings, International Conference on Research Trends in Science and Technology, (Beirut, Lebanon, 2005).Google ScholarGoogle Scholar
  2. Aigner, M. Turán's Graph Theorem. The American Mathematical Monthly, 102 (9). 808--816.Google ScholarGoogle Scholar
  3. Baker, E. J., Jay, J. J., Bubier, J. A., Langston, M. A. and Chesler, E. J. GeneWeaver: a web-based system for integrative functional genomics. Nucleic Acids Res, 40 (Database issue). D1067--1076.Google ScholarGoogle Scholar
  4. Bomze, I., Budinich, M., Pardalos, P. and Pelillo, M. The Maximum Clique Problem. in Du, D.-Z. and Pardalos, P. M. eds. Handbook of Combinatorial Optimization, Kluwer Academic Publishers, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  5. Bron, C. and Kerbosch, J. Algorithm 457: finding all cliques of an undirected graph. Proceedings of the ACM, 16(9). 575--577. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Castro, V. M., Minnier, J., Murphy, S. N., Kohane, I., Churchill, S. E., Gainer, V., Cai, T., Hoffnagle, A. G., Dai, Y., Block, S., Weill, S. R., Nadal-Vicens, M., Pollastri, A. R., Rosenquist, J. N., Goryachev, S., Ongur, D., Sklar, P., Perlis, R. H. and Smoller, J. W. Validation of Electronic Health Record Phenotyping of Bipolar Disorder Cases and Controls. American Journal of Psychiatry, 172 (4).Google ScholarGoogle Scholar
  7. Clinton, S. M., Stead, J. D. H., Miller, S., Watson, S. J. and Akil, H. Developmental underpinnings of differences in rodent novelty-seeking an emotional reactivity. The European Journal of Neuroscience, 34 (6). 994--1005.Google ScholarGoogle ScholarCross RefCross Ref
  8. Cui, C., Shurtleff, D. and Harris, R. A. Neuroimmune Mechanisms of Alcohol and Drug Addiction. International Review of Neurobiology, 118. 1--12.Google ScholarGoogle Scholar
  9. Davis, A. P., Grondin, C. J., Lennon-Hopkins, K., Saraceni-Richards, C., Sciaky, D., King, B. L., Wiegers, T. C. and Mattingly, C. J. The Comparative Toxicogenomics Database's 10th year anniversary: update 2015. Nucleic Acids Res, 43 (Database issue). D914--920.Google ScholarGoogle Scholar
  10. Dean, J. and Ghemawat, S. MapReduce: simplified data processing on large clusters. Commun. ACM, 51 (1). 107--113. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Eppstein, D., Löffler, M. and Strash, D. Listing All Maximal Cliques in Sparse Graphs in Near-Optimal Time. in Cheong, O., Chwa, K.-Y. and Park, K. eds. Algorithms and Computation, Springer Berlin Heidelberg, 2010, 403--414.Google ScholarGoogle ScholarCross RefCross Ref
  12. Gaspers, S., Kratsch, D. and Liedloff, M. On Independent Sets and Bicliques in Graphs. Algorithmica, 62 (3-4). 637--658. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Hagan, R. D., Phillips, C. A., Wang, K., Rogers, G. L. and Langston, M. A., Toward an efficient, highly scalable maximum clique solver for massive graphs. in IEEE International Conference on Big Data, (2014), 41--45.Google ScholarGoogle ScholarCross RefCross Ref
  14. Jay, J., Eblen, J., Zhang, Y., Benson, M., Perkins, A., Saxton, A., Voy, B., Chesler, E. and Langston, M. A systematic comparison of genome-scale clustering algorithms. BMC Bioinformatics, 13 (Suppl 10). S7.Google ScholarGoogle Scholar
  15. Jay, J. J. Cross Species Integration of Functional Genomics Experiments. International Review of Neurobiology, 104. 1--24.Google ScholarGoogle Scholar
  16. Jones, K. A. and Thomsen, C. The Role of the Innate Immune System in Psychiatric Disorders. Molecular and Cellular Neuroscience, 53. 52--62.Google ScholarGoogle Scholar
  17. Karp, R. Reducibility among combinatorial problems. in Miller, R. and Thatcher, J. eds. Complexity of Computer Computations, Plenum Press, 1972, 85--103.Google ScholarGoogle ScholarCross RefCross Ref
  18. Kose, F., Weckwerth, W., Linke, T. and Fiehn, O. Visualizing plant metabolomic correlation networks using clique--metabolite matrices. Bioinformatics, 17. 1198--1208.Google ScholarGoogle Scholar
  19. Li, J., Li, H., Soh, D. and Wong, L. A Correspondence Between Maximal Complete Bipartite Subgraphs and Closed Patterns. in Jorge, A., Torgo, L., Brazdil, P., Camacho, R. and Gama, J. eds. Knowledge Discovery in Databases: PKDD 2005, Springer Berlin Heidelberg, 2005, 146--156. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Liu, Q., Chen, Y.-P.P. and Li, J. k-Partite cliques of protein interactions: A novel subgraph topology for functional coherence analysis on PPI networks. Journal of Theoretical Biology, 340 (0). 146--154.Google ScholarGoogle Scholar
  21. Mayfield, J., Ferguson, L. and Harris, R. A. Neuroimmune Signaling: A Key Component of Alcohol Abuse. Current opinion in neurobiology, 23 (4). 513--520.Google ScholarGoogle Scholar
  22. Miller, A. H., Haroon, E., Raison, C. L. and Felger, J. C. Cytokine Targets in the Brain: Impact on Neurotransmitters and Neurocircuits. Depression and anxiety, 30 (4). 297--306.Google ScholarGoogle Scholar
  23. Miller, R. E. and Muller, D. E. A problem of maximum consistent subsets. IBM Research Report RC-240, Watson Research Center, Yorktown Heights, NY.Google ScholarGoogle Scholar
  24. Moon, J. W. and Moser., L. On Cliques in Graphs. Israel J. Math, 3. 23--28.Google ScholarGoogle Scholar
  25. Potash, J. B. Electronic Medical Records: Fast Track to Big Data in Bipolar Disorder. The American Journal of Psychiatry.Google ScholarGoogle Scholar
  26. Rogers, G. L., Perkins, A. D., Phillips, C. A., Eblen, J. D., Abu-Khzam, F. N. and Langston, M. A., Using out-of-core techniques to produce exact solutions to the maximum clique problem on extremely large graphs. in Proceedings, ACS/IEEE International Conference on Computer Systems and Applications, (Rabat, Morocco, 2009), 374--381.Google ScholarGoogle ScholarCross RefCross Ref
  27. Setubal, J. C. and Meidanis, J. Introduction to Computational Molecular Biology. PWS Publishing Company, Boston, 1997.Google ScholarGoogle Scholar
  28. Tomita, E., Tanaka, A. and Takahashi, H. The Worst-Case Time Complexity for Generating all Maximal Cliques and Computational Experiments. Theoretical Computer Science, 363. 28--42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Torrente, M. P., Freeman, W. M. and Vrana, K. E. Protein biomarkers of alcohol abuse. Expert Review of Proteomics, 9 (4). 425--436.Google ScholarGoogle ScholarCross RefCross Ref
  30. Turán, P. On an Extremal Problem in Graph Theory. Matematikai és Fizikai Lapok (in Hungarian), 48. 436--452.Google ScholarGoogle Scholar
  31. White, T. Hadoop: The Definitive Guide. O'Reilly Media, Inc., 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Wood, D. On the Number of Maximal Independent Sets in a Graph. Discrete Mathematics & Theoretical Computer Science, 13. 17--20.Google ScholarGoogle Scholar
  33. Zaki, M. J., Peters, M., Assent, I. and Seidl, T. Clicks: An effective algorithm for mining subspace clusters in categorical datasets. Data & Knowledge Engineering, 60 (1). 51--70. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Zhang, Y., Abu-Khzam, F. N., Baldwin, N. E., Chesler, E. J., Langston, M. A. and Samatova, N. F., Genome-scale computational approaches to memory-intensive applications in systems biology. in Proceedings, Supercomputing, (Seattle, Washington, 2005). Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Zhang, Y., Abu-Khzam, F. N., Baldwin, N. E., Chesler, E. J., Langston, M. A. and Samatova, N. F., Genome-Scale Computational Approaches to Memory-Intensive Applications in Systems Biology. in Supercomputing, 2005. Proceedings of the ACM/IEEE SC 2005 Conference, (2005), 12--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Zhang, Y., Phillips, C. A., Rogers, G. L., Baker, E. J., Chesler, E. J. and Langston, M. A. On finding bicliques in bipartite graphs: a novel algorithm and its application to the integration of diverse biological data types. BMC Bioinformatics, 15 (1). 110.Google ScholarGoogle Scholar

Index Terms

  1. Scalable multipartite subgraph enumeration for integrative analysis of heterogeneous experimental functional genomics data

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              BCB '15: Proceedings of the 6th ACM Conference on Bioinformatics, Computational Biology and Health Informatics
              September 2015
              683 pages
              ISBN:9781450338530
              DOI:10.1145/2808719

              Copyright © 2015 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 9 September 2015

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article

              Acceptance Rates

              BCB '15 Paper Acceptance Rate48of141submissions,34%Overall Acceptance Rate254of885submissions,29%

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader