research-article

Scalable multipartite subgraph enumeration for integrative analysis of heterogeneous experimental functional genomics data

Authors:
Charles A. Phillips

University of Tennessee, Knoxville, TN

University of Tennessee, Knoxville, TN
View Profile

,
Kai Wang

University of Tennessee, Knoxville, TN

University of Tennessee, Knoxville, TN
View Profile

,
Jason Bubier

The Jackson Laboratory, Bar Harbor, ME

The Jackson Laboratory, Bar Harbor, ME
View Profile

,
Erich J. Baker

Baylor University, Waco, TX

Baylor University, Waco, TX
View Profile

,
Elissa J. Chesler

The Jackson Laboratory, Bar Harbor, ME

The Jackson Laboratory, Bar Harbor, ME
View Profile

,
Michael A. Langston

University of Tennessee, Knoxville, TN

University of Tennessee, Knoxville, TN
View Profile

BCB '15: Proceedings of the 6th ACM Conference on Bioinformatics, Computational Biology and Health InformaticsSeptember 2015Pages 626–633https://doi.org/10.1145/2808719.2812595

Published:09 September 2015Publication History

BCB '15: Proceedings of the 6th ACM Conference on Bioinformatics, Computational Biology and Health Informatics

Pages 626–633

ABSTRACT

Functional genomics, the effort to understand the role of genomic elements in biological processes, has led to an avalanche of diverse experimental and semantic information defining associations between genes and various biological concepts across species and experimental paradigms. Integrating this rapidly expanding wealth of heterogeneous data, and finding consensus among so many diverse sources for specific research questions, require highly sophisticated big data structures and algorithms for harmonization and scalable analysis. In this context, multipartite graphs can often serve as useful structures for representing questions about the role of genes in multiple, frequently-occurring disease processes. The main focus of this paper is on finding and analyzing efficient algorithms for dense subgraph enumeration in such graphs. An O(3^n/3)-time procedure was devised to enumerate all maximal k-partite cliques in a k-partite graph, where k ≥ 3. The maximum number of such cliques is also shown to obey this bound, and thus this procedure obtains the best possible asymptotic performance. Empirical testing on both real and synthetic data is conducted. Concrete applications to biological data are described, as are scalability issues in the context of big data analysis.

References

Abu-Khzam, F. N., Baldwin, N. E., Langston, M. A. and Samatova, N. F., On the Relative Efficiency of Maximal Clique Enumeration Algorithms, with Application to High-Throughput Computational Biology. in Proceedings, International Conference on Research Trends in Science and Technology, (Beirut, Lebanon, 2005).Google Scholar
Aigner, M. Turán's Graph Theorem. The American Mathematical Monthly, 102 (9). 808--816.Google Scholar
Baker, E. J., Jay, J. J., Bubier, J. A., Langston, M. A. and Chesler, E. J. GeneWeaver: a web-based system for integrative functional genomics. Nucleic Acids Res, 40 (Database issue). D1067--1076.Google Scholar
Bomze, I., Budinich, M., Pardalos, P. and Pelillo, M. The Maximum Clique Problem. in Du, D.-Z. and Pardalos, P. M. eds. Handbook of Combinatorial Optimization, Kluwer Academic Publishers, 1999.Google ScholarCross Ref
Bron, C. and Kerbosch, J. Algorithm 457: finding all cliques of an undirected graph. Proceedings of the ACM, 16(9). 575--577. Google ScholarDigital Library
Castro, V. M., Minnier, J., Murphy, S. N., Kohane, I., Churchill, S. E., Gainer, V., Cai, T., Hoffnagle, A. G., Dai, Y., Block, S., Weill, S. R., Nadal-Vicens, M., Pollastri, A. R., Rosenquist, J. N., Goryachev, S., Ongur, D., Sklar, P., Perlis, R. H. and Smoller, J. W. Validation of Electronic Health Record Phenotyping of Bipolar Disorder Cases and Controls. American Journal of Psychiatry, 172 (4).Google Scholar
Clinton, S. M., Stead, J. D. H., Miller, S., Watson, S. J. and Akil, H. Developmental underpinnings of differences in rodent novelty-seeking an emotional reactivity. The European Journal of Neuroscience, 34 (6). 994--1005.Google ScholarCross Ref
Cui, C., Shurtleff, D. and Harris, R. A. Neuroimmune Mechanisms of Alcohol and Drug Addiction. International Review of Neurobiology, 118. 1--12.Google Scholar
Davis, A. P., Grondin, C. J., Lennon-Hopkins, K., Saraceni-Richards, C., Sciaky, D., King, B. L., Wiegers, T. C. and Mattingly, C. J. The Comparative Toxicogenomics Database's 10th year anniversary: update 2015. Nucleic Acids Res, 43 (Database issue). D914--920.Google Scholar
Dean, J. and Ghemawat, S. MapReduce: simplified data processing on large clusters. Commun. ACM, 51 (1). 107--113. Google ScholarDigital Library
Eppstein, D., Löffler, M. and Strash, D. Listing All Maximal Cliques in Sparse Graphs in Near-Optimal Time. in Cheong, O., Chwa, K.-Y. and Park, K. eds. Algorithms and Computation, Springer Berlin Heidelberg, 2010, 403--414.Google ScholarCross Ref
Gaspers, S., Kratsch, D. and Liedloff, M. On Independent Sets and Bicliques in Graphs. Algorithmica, 62 (3-4). 637--658. Google ScholarDigital Library
Hagan, R. D., Phillips, C. A., Wang, K., Rogers, G. L. and Langston, M. A., Toward an efficient, highly scalable maximum clique solver for massive graphs. in IEEE International Conference on Big Data, (2014), 41--45.Google ScholarCross Ref
Jay, J., Eblen, J., Zhang, Y., Benson, M., Perkins, A., Saxton, A., Voy, B., Chesler, E. and Langston, M. A systematic comparison of genome-scale clustering algorithms. BMC Bioinformatics, 13 (Suppl 10). S7.Google Scholar
Jay, J. J. Cross Species Integration of Functional Genomics Experiments. International Review of Neurobiology, 104. 1--24.Google Scholar
Jones, K. A. and Thomsen, C. The Role of the Innate Immune System in Psychiatric Disorders. Molecular and Cellular Neuroscience, 53. 52--62.Google Scholar
Karp, R. Reducibility among combinatorial problems. in Miller, R. and Thatcher, J. eds. Complexity of Computer Computations, Plenum Press, 1972, 85--103.Google ScholarCross Ref
Kose, F., Weckwerth, W., Linke, T. and Fiehn, O. Visualizing plant metabolomic correlation networks using clique--metabolite matrices. Bioinformatics, 17. 1198--1208.Google Scholar
Li, J., Li, H., Soh, D. and Wong, L. A Correspondence Between Maximal Complete Bipartite Subgraphs and Closed Patterns. in Jorge, A., Torgo, L., Brazdil, P., Camacho, R. and Gama, J. eds. Knowledge Discovery in Databases: PKDD 2005, Springer Berlin Heidelberg, 2005, 146--156. Google ScholarDigital Library
Liu, Q., Chen, Y.-P.P. and Li, J. k-Partite cliques of protein interactions: A novel subgraph topology for functional coherence analysis on PPI networks. Journal of Theoretical Biology, 340 (0). 146--154.Google Scholar
Mayfield, J., Ferguson, L. and Harris, R. A. Neuroimmune Signaling: A Key Component of Alcohol Abuse. Current opinion in neurobiology, 23 (4). 513--520.Google Scholar
Miller, A. H., Haroon, E., Raison, C. L. and Felger, J. C. Cytokine Targets in the Brain: Impact on Neurotransmitters and Neurocircuits. Depression and anxiety, 30 (4). 297--306.Google Scholar
Miller, R. E. and Muller, D. E. A problem of maximum consistent subsets. IBM Research Report RC-240, Watson Research Center, Yorktown Heights, NY.Google Scholar
Moon, J. W. and Moser., L. On Cliques in Graphs. Israel J. Math, 3. 23--28.Google Scholar
Potash, J. B. Electronic Medical Records: Fast Track to Big Data in Bipolar Disorder. The American Journal of Psychiatry.Google Scholar
Rogers, G. L., Perkins, A. D., Phillips, C. A., Eblen, J. D., Abu-Khzam, F. N. and Langston, M. A., Using out-of-core techniques to produce exact solutions to the maximum clique problem on extremely large graphs. in Proceedings, ACS/IEEE International Conference on Computer Systems and Applications, (Rabat, Morocco, 2009), 374--381.Google ScholarCross Ref
Setubal, J. C. and Meidanis, J. Introduction to Computational Molecular Biology. PWS Publishing Company, Boston, 1997.Google Scholar
Tomita, E., Tanaka, A. and Takahashi, H. The Worst-Case Time Complexity for Generating all Maximal Cliques and Computational Experiments. Theoretical Computer Science, 363. 28--42. Google ScholarDigital Library
Torrente, M. P., Freeman, W. M. and Vrana, K. E. Protein biomarkers of alcohol abuse. Expert Review of Proteomics, 9 (4). 425--436.Google ScholarCross Ref
Turán, P. On an Extremal Problem in Graph Theory. Matematikai és Fizikai Lapok (in Hungarian), 48. 436--452.Google Scholar
White, T. Hadoop: The Definitive Guide. O'Reilly Media, Inc., 2009. Google ScholarDigital Library
Wood, D. On the Number of Maximal Independent Sets in a Graph. Discrete Mathematics & Theoretical Computer Science, 13. 17--20.Google Scholar
Zaki, M. J., Peters, M., Assent, I. and Seidl, T. Clicks: An effective algorithm for mining subspace clusters in categorical datasets. Data & Knowledge Engineering, 60 (1). 51--70. Google ScholarDigital Library
Zhang, Y., Abu-Khzam, F. N., Baldwin, N. E., Chesler, E. J., Langston, M. A. and Samatova, N. F., Genome-scale computational approaches to memory-intensive applications in systems biology. in Proceedings, Supercomputing, (Seattle, Washington, 2005). Google ScholarDigital Library
Zhang, Y., Abu-Khzam, F. N., Baldwin, N. E., Chesler, E. J., Langston, M. A. and Samatova, N. F., Genome-Scale Computational Approaches to Memory-Intensive Applications in Systems Biology. in Supercomputing, 2005. Proceedings of the ACM/IEEE SC 2005 Conference, (2005), 12--12. Google ScholarDigital Library
Zhang, Y., Phillips, C. A., Rogers, G. L., Baker, E. J., Chesler, E. J. and Langston, M. A. On finding bicliques in bipartite graphs: a novel algorithm and its application to the integration of diverse biological data types. BMC Bioinformatics, 15 (1). 110.Google Scholar

Index Terms

Scalable multipartite subgraph enumeration for integrative analysis of heterogeneous experimental functional genomics data
1. Applied computing
  1. Life and medical sciences
  2. Physical sciences and engineering
    1. Mathematics and statistics
2. Mathematics of computing
  1. Discrete mathematics
    1. Graph theory
      1. Graph algorithms

Recommendations

Scalable subgraph enumeration in MapReduce

Subgraph enumeration, which aims to find all the subgraphs of a large data graph that are isomorphic to a given pattern graph, is a fundamental graph problem with a wide range of applications. However, existing sequential algorithms for subgraph ...
Read More
On the termination of some biclique operators on multipartite graphs

We define a new graph operator, called the weak-factor graph, which comes from the context of complex network modelling. The weak-factor operator is close to the well-known clique-graph operator but it rather operates in terms of bicliques in a ...
Read More
Scalable subgraph enumeration in MapReduce

Subgraph enumeration, which aims to find all the subgraphs of a large data graph that are isomorphic to a given pattern graph, is a fundamental graph problem with a wide range of applications. However, existing sequential algorithms for subgraph ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

BCB '15: Proceedings of the 6th ACM Conference on Bioinformatics, Computational Biology and Health Informatics
September 2015
683 pages
ISBN:9781450338530
DOI:10.1145/2808719

Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 9 September 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
big data analytics
dense subgraph enumeration
life science applications
multipartite graphs
Qualifiers
- research-article
Conference

Acceptance Rates
BCB '15 Paper Acceptance Rate48of141submissions,34%Overall Acceptance Rate254of885submissions,29%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 88
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Scalable multipartite subgraph enumeration for integrative analysis of heterogeneous experimental functional genomics data

BCB '15: Proceedings of the 6th ACM Conference on Bioinformatics, Computational Biology and Health Informatics

ABSTRACT

References

Cited By

Index Terms

Recommendations

Scalable subgraph enumeration in MapReduce

On the termination of some biclique operators on multipartite graphs

Scalable subgraph enumeration in MapReduce

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Scalable multipartite subgraph enumeration for integrative analysis of heterogeneous experimental functional genomics data

BCB '15: Proceedings of the 6th ACM Conference on Bioinformatics, Computational Biology and Health Informatics

ABSTRACT

References

Cited By

Index Terms

Recommendations

Scalable subgraph enumeration in MapReduce

On the termination of some biclique operators on multipartite graphs

Scalable subgraph enumeration in MapReduce

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media