High-Efficiency Triangle Counting on the GPU.

We implement exact triangle counting in graphs on the GPU using three different methodologies: subgraph matching to a triangle pattern; programmable graph analytics, with a set-intersection approach; and ... efficient filtering steps to remove unnecessary work and its high-performance set-intersection core. ... We appreciate the funding support of DARPA XDATA under grants US Army award W911QX-12-C-0059, DARPA STTR awards D14PC00023 and D15PC00010 as well as the National Science Foundation under grants CCF-1017399 ...

doi:10.1145/2915516.2915521 dblp:conf/hpdc/WangWYO16 fatcat:tqpllepszvgujet2mfgirfrfwa

Multiple Versions

Second, we propose a novel GPU out-of-core approach that adopts a frame-to-frame coherence scheme in order to minimize the high communication cost between CPU and GPU. ... First, we present a simple and efficient mesh simplification algorithm towards GPU architecture. ... We also thank Dave Kasik of Boeing for providing the 3D model of Boeing 777 airplane. ...

doi:10.1111/j.1467-8659.2012.03018.x fatcat:wyho23qmznbfpnqpv5pkgckdtu

In this chapter, we present improved GPU programming techniques for implementing the algorithm more efficiently on current GPUs. ... The kd-tree is one of the most commonly used spatial data structures for a variety of graphics applications because of its reliably high acceleration performance. ... For efficient parallel implementation on a GPU, all triangles in each large node are grouped in-to chunks of fixed size (i.e., 256), parallelizing the computation over the triangles in the chunks. ...

doi:10.1007/978-981-287-134-3_13 fatcat:lk4bk4j66rfudo2yydmhry2zue

In this paper, we propose a novel method to compute triangle counting on GPUs. ... This work specifically focuses on leveraging our implementation on the triangle counting problem for the Subgraph Isomorphism Graph Challenge 2019, demonstrating a geometric mean speedup over the 2018 ... The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the U.S. Government. ...

doi:10.1109/hpec.2019.8916434 dblp:conf/hpec/WangO19 fatcat:5t2ziaqdkrexbkouaaylozk4jy

By using frame-to-frame coherence, the overhead of data transferring is significantly reduced on each GPU. ... However, the low bandwidth in CPU-GPU communication is still the major bottleneck that prevents users from achieving highperformance rendering of massive 3D models on a single-GPU system. ... Here, we use either the vertex count or the triangle count to represent the complexity (the level of details) of each object. ...

doi:10.1109/sc.companion.2012.37 dblp:conf/sc/PengMC12 fatcat:dqmffkfwvjh3lgcvuwxnoudrr4

We propose a technique that, with a modest amount of preprocessing, efficiently distributes isosurfacing load to GPU compute resources within a cluster. ... Increasing sizes of volumes over which isosurfacing is to be applied combined with increasingly hierarchical parallel architectures present challenges for efficiently distributing isosurfacing work loads ... The triangle counting algorithm is local to each GPU, with one GPU operating on one block at a time. ...

doi:10.2312/egpgv/egpgv10/091-100 fatcat:bhpoywqtyvdmjcnshfzethss2e

Our experimental results for the GPU implementation show at least 10 times speedup for triangle counting over the CPU counterpart. ... Given a graph G = (V , E), we provide an algorithm to count the number of triangles in G, while storing the adjacency information on the global memory. ... Therefore, the triangle counting problem achieves high speed-up from being solved on the GPU as compared to the CPU, and additionally the efficiency of the naïve implementation is further improved by using ...

doi:10.1109/ipdpsw.2013.235 dblp:conf/ipps/ChatterjeeRA13 fatcat:sss4acsdbzet7ox7sm5btvvbfy

We explore the problem of real time ray casting of large deformable models (over a million triangles) on large displays (a million pixels) on an off-the-shelf GPU in this paper. ... The GPUs pack high computation power and a restricted architecture into easily available hardware today. ... We describe how such a data structure can be built efficiently on the SIMD architecture of the GPU. ...

doi:10.1109/icvgip.2008.92 dblp:conf/icvgip/PatidarN08 fatcat:euezxgtfk5gepo45vf7m2owhce

Triangle counting is a fundamental building block in graph algorithms. ... The current state-of-the-art in triangle enumeration processes the Friendster graph in 2.1 seconds, not including data copy time between CPU and GPU. ... ., for the U.S. Department of Energy's National Nuclear Security Administration under contract DE-NA-0003525. ...

arXiv:2009.12457v1 fatcat:ttilsz73kjhq5l6afslof753eu

G2Miner uses pattern-aware, input-aware and architecture-aware search strategies to achieve high efficiency on GPUs. ... We also show that G2Miner on a V100 GPU is 48.3x and 15.2x faster than the state-of-the-art CPU-based system, Peregrine and GraphZero, on a 56-core CPU machine. ... Triangle counting on Tw4. Listing 4-cycle on Fr. 3-motif counting on Tw2. ...

arXiv:2112.09761v3 fatcat:pd3dklsyircptoj36vthvou22i

Multiple Versions

In this paper, we propose a novel method to compute triangle counting on GPUs. ... This work specifically focuses on leveraging our implementation on the triangle counting problem for the Subgraph Isomorphism Graph Challenge 2019, demonstrating a geometric mean speedup over the 2018 ... We believe our optimizations are not limited to only triangle counting on the GPU. ...

arXiv:1909.02127v1 fatcat:idmflbcqhbe3pia7rp6kpsba6u

Abstract-We describe CPU and GPU implementations of parallel triangle-counting and k-truss identification in the Galois and IrGL systems. ... Both systems are based on a graph-centric abstraction called the operator formulation of algorithms. ... INTRODUCTION This paper describes high-performance CPU and GPU implementations of triangle counting and k-truss identification in graphs. ...

doi:10.1109/hpec.2017.8091037 dblp:conf/hpec/VoegeleLPP17 fatcat:4nkrif36xjaobajmkecsp2dq64

On the contrary, we advocate that i) hashing can help the key operations for scalable triangle counting on Graphics Processing Units (GPUs), i.e., list intersection and graph partitioning, ii)vertex-centric ... To the best of our knowledge, TRUSTis the first work that achieves over one trillion Traversed Edges Per Second (TEPS) rate for triangle counting. ... ACKNOWLEDGEMENT We thank the anonymous reviewers for their helpful suggestions and feedback. This research is supported in part by the National Science Foundation CRII award No. ...

doi:10.1109/tpds.2021.3064892 fatcat:dzueeea3g5f47gty4vsbr7pkfi

Multiple Versions

We also demonstrate that the compact mesh construction scheme can easily be modified for also producing a time-and space-efficient GPU implementation of the marching cubes algorithm. ... Unlike previous approaches, the presented method extracts a smooth silhouette contour on the fly from each binary image, which markedly reduces the bumpy artifacts on the visual hull surface due to a simple ... To the best of our knowledge, our computation scheme is the first parallel algorithm that, fully run on the GPU, generates smooth high-resolution visual hull meshes in compact form, based on a refined ...

doi:10.1007/s00371-013-0796-2 fatcat:bnw2qbwbofcojenw5lbvvsfdau

In this work, we propose a parallel load balancing algorithm based on a screen partitioning strategy to dynamically balance the amount of vertices and triangles rendered by each GPU. ... Each GPU renders a screen region whose size may be different from the screen regions of other GPUs, but the amounts of vertices and triangles in those screen regions are balanced. ... We thank Nvidia for donating the GPU device that has been used in this work to run our approach and produce experimental results. ...

doi:10.2312/pgv.20191111 dblp:conf/egpgv/DongP19 fatcat:qr3bqcgzvzcbbj2wm4y4ckmkuu

A Comparative Study on Exact Triangle Counting Algorithms on the GPU

Preserved Fulltext

Other Versions

A GPU-based Approach for Massive Model Rendering with Frame-to-Frame Coherence

Preserved Fulltext

On the Efficient Implementation of a Real-Time Kd-Tree Construction Algorithm [chapter]

Preserved Fulltext

Fast BFS-Based Triangle Counting on GPUs

Preserved Fulltext

Load Balanced Parallel GPU Out-of-Core for Continuous LOD Model Visualization

Preserved Fulltext

Load-Balanced Isosurfacing on Multi-GPU Clusters [article]

Preserved Fulltext

On Analyzing Large Graphs Using GPUs

Preserved Fulltext

Ray Casting Deformable Models on the GPU

Preserved Fulltext

A Block-Based Triangle Counting Algorithm on Heterogeneous Environments [article]

Preserved Fulltext

Efficient and Scalable Graph Pattern Mining on GPUs [article]

Preserved Fulltext

Other Versions

Fast BFS-Based Triangle Counting on GPUs [article]

Preserved Fulltext

Parallel triangle counting and k-truss identification using graph-centric methods

Preserved Fulltext

TRUST: Triangle Counting Reloaded on GPUs

Preserved Fulltext

GPU-based parallel construction of compact visual hull meshes

Preserved Fulltext

Screen Partitioning Load Balancing for Parallel Rendering on a Multi-GPU Multi-Display Workstation

Preserved Fulltext