A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2016; you can also visit the original URL.
The file type is application/pdf
.
Filters
A Hybrid Approach for Parallel Transistor-Level Full-Chip Circuit Simulation
[chapter]
2015
Lecture Notes in Computer Science
Hybrid versions of two iterative linear solver strategies are presented, one takes advantage of block triangular form structure while the other uses a Schur complement technique. ...
Results indicate up to a 27x improvement in total simulation time on 256 cores. ...
The triangular solve in this particular case has multiple right-hand sides where the right-hand sides are themselves sparse columns of C. ...
doi:10.1007/978-3-319-17353-5_9
fatcat:5jrocvi2b5cltcfgjwkc6gs6by
Fast and Scalable Sparse Triangular Solver for Multi-GPU Based HPC Architectures
[article]
2020
arXiv
pre-print
This is particularly the case for Sparse Triangular Solver (SpTRSV) which introduces additional two-dimensional computation dependencies among subsequent computation steps. ...
Dependency information is exchanged and shared among GPUs, thus warrant for efficient memory allocation, data partitioning, and workload distribution as well as fine-grained communication and synchronization ...
heap and relying on the one-sided communication primitives in NVSHMEM for inter-GPU communication. ...
arXiv:2012.06959v1
fatcat:am7guw7i5fchxafrkp34plwvky
Preparing sparse solvers for exascale computing
2020
Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences
Sparse solvers provide essential functionality for a wide variety of scientific applications. ...
Highly parallel sparse solvers are essential for continuing advances in high-fidelity, multi-physics and multi-scale simulations, especially as we target exascale platforms. ...
The second technique leverages the one-sided MPI communication functions to implement a synchronization-free task queue, allowing more overlap of communication and computation, leading to additional 2× ...
doi:10.1098/rsta.2019.0053
pmid:31955673
fatcat:bqw6xqixbrabddmxglmtcbw2wa
Accelerating advanced preconditioning methods on hybrid architectures
2021
CLEI Electronic Journal
In particular, we study ILUPACK, a package for the solution of sparse linear systems via Krylov subspace methods that relies on a modern inverse-based multilevel ILU (incomplete LU) preconditioning technique ...
We present new data-parallel versions of the preconditioner and the most important solvers contained in the package that significantly improve its performance without affecting its accuracy. ...
[34] proposed a new GPU solver for sparse triangular systems, for matrices stored in the CSC format, based on the self-scheduled strategy. ...
doi:10.19153/cleiej.24.1.6
doaj:cf900516b6334e27afbe4102fa203079
fatcat:ohhmcgyrl5hgfaib7hb2rdyhom
Sparsifying Synchronization for High-Performance Shared-Memory Sparse Triangular Solver
[chapter]
2014
Lecture Notes in Computer Science
Sparse triangular solver is one such kernel and is the focus of this paper. ...
As a result, on a 12-core Intel R Xeon R processor, our approach improves the performance of sparse triangular solver by 1.6x, compared to the conventional level-scheduling with barrier synchronization ...
Heroux for his insights related to the implementation and performance of conjugate gradient, and Kiran Pamnany for sharing his dissemination barrier implementation.
Bibliography ...
doi:10.1007/978-3-319-07518-1_8
fatcat:z3lzntgb6bh27i3lo2lzmojox4
Towards a scalable hybrid sparse solver
2000
Concurrency Practice and Experience
When the sparse matrix is symmetric and positive de nite, direct methods use Cholesky factorization while iterative methods rely on Conjugate Gradients. ...
Our goal is to develop a scalable and memory-e cient hybrid of the two methods that can be implemented with high-e ciency on both serial and parallel computers and be suitable for a wide-range of problems ...
The key is to leverage technology that has been developed for sparse direct methods. ...
doi:10.1002/(sici)1096-9128(200002/03)12:2/3<53::aid-cpe473>3.3.co;2-2
fatcat:7bqxv5ygpbbvneb6yfxshyrske
Variable-Size Batched LU for Small Matrices and Its Integration into Block-Jacobi Preconditioning
2017
2017 46th International Conference on Parallel Processing (ICPP)
sparse linear systems. ...
We present a set of new batched CUDA kernels for the LU factorization of a large collection of independent problems of different size, and the subsequent triangular solves. ...
linear solvers" (d65). ...
doi:10.1109/icpp.2017.18
dblp:conf/icpp/AnztDFQ17
fatcat:sryw4eagnnf3zbuzllaxhm3ebm
Towards a scalable hybrid sparse solver
2000
Concurrency Practice and Experience
When the sparse matrix is symmetric and positive de nite, direct methods use Cholesky factorization while iterative methods rely on Conjugate Gradients. ...
Our goal is to develop a scalable and memory-e cient hybrid of the two methods that can be implemented with high-e ciency on both serial and parallel computers and be suitable for a wide-range of problems ...
The key is to leverage technology that has been developed for sparse direct methods. ...
doi:10.1002/(sici)1096-9128(200002/03)12:2/3<53::aid-cpe473>3.0.co;2-b
fatcat:kv7m4mb5lnh4jgigz6ygpmrkqy
Sparse matrix factorization in the implicit finite element method on petascale architecture
2016
Computer Methods in Applied Mechanics and Engineering
The performance of the massively parallel direct multifrontal solver Watson Sparse Matrix Package (WSMP) for solving large sparse systems of linear equations arising in implicit finite element method on ...
unstructured (free) meshes in solid mechanics was evaluated on one of the most powerful supercomputers currently available to the open science community-the sustained petascale high performance computing ...
Acknowledgments The authors would like to thank the Private Sector Program and the Blue Waters sustained-petascale computing project at the National Center for Supercomputing Applications (NCSA). ...
doi:10.1016/j.cma.2016.01.011
fatcat:7fvypjoinrcmnigv3hldkqfcxe
Communication in task-parallel ILU-preconditioned CG solvers using MPI + OmpSs
2017
Concurrency and Computation
For all these implementations, we analyze the communication patterns and perform a comparative analysis of their performance and scalability on a cluster consisting of 16 nodes, with 16 cores each. ...
We target the parallel solution of sparse linear systems via iterative Krylov subspace-based methods enhanced with ILU-type preconditioners on clusters of multicore processors. ...
On the positive side, it diminishes the amount of messages (though not the total volume of communication), and it does not change the numerical properties of the solver. ...
doi:10.1002/cpe.4280
fatcat:m7hsdjutjnhfhmxxl4q5izo6ka
Accelerating Band Linear Algebra Operations on GPUs with Application in Model Reduction
[chapter]
2014
Lecture Notes in Computer Science
Our experiments with an nVidia S2070 GPU report speed-ups up to 6× for the hybrid band solver based on the LU factorization over analogous CPU-only routines in Intel's MKL. ...
As a practical demonstration of these benefits, we plug the new CPU-GPU codes into a sparse matrix Lyapunov equation solver, showing a 3× acceleration on the solution of a large-scale benchmark arising ...
The advantages of the hybrid band routines carry over to the solution of sparse Lyapunov solvers, with an acceleration factor around 2-3× with respect to the analogous solver based on MKL. ...
doi:10.1007/978-3-319-09153-2_29
fatcat:vf37puekijcphjxs6vqmmw7wim
An Asynchronous Task-based Fan-Both Sparse Cholesky Solver
[article]
2016
arXiv
pre-print
In this paper, we investigate the use of an asynchronous task paradigm, one-sided communication and dynamic scheduling in implementing sparse Cholesky factorization (symPACK) on large-scale distributed ...
Our solver symPACK relies on efficient and flexible communication primitives provided by the UPC++ library. ...
Another very important characteristic of communication protocols is whether a communication primitive is two-sided or one-sided. ...
arXiv:1608.00044v2
fatcat:wfxqlgser5e2rmgwxjcum23m6i
Sympiler
2017
Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '17
The computation patterns in sparse numerical methods are guided by the input sparsity structure and the sparse algorithm itself. ...
Sympiler is a domain-specific code generator that optimizes sparse matrix computations by decoupling the symbolic analysis phase from the numerical manipulation stage in sparse codes. ...
Motivating Scenario Sparse triangular solve takes a lower triangular matrix L and a righthand side (RHS) vector b and solves the linear equation Lx = b for x. ...
doi:10.1145/3126908.3126936
dblp:conf/sc/CheshmiKSD17
fatcat:joe4jxi2lraelbjwo65l3sarpa
OnAn ( log ) solution algorithm for spectral element methods
[chapter]
2003
Computational Fluid and Solid Mechanics 2003
To leverage significant software development effort, general purpose unstructured codes are often used in structured or semi-structured applications. ...
We show that O(n log n) computational complexities, competitive with classic Fourier methods, are achievable for some classes of semi-structured spectral element applications. ...
For this application, we use SI-2, the SI scheme with a two-way data mapping [12] implemented in our solver package [9] . ...
doi:10.1016/b978-008044046-0.50500-5
fatcat:ymhmr2vxujdktmi4dsu3xbtq3y
Improving the energy efficiency of sparse linear system solvers on multicore and manycore systems
2014
Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences
improve the energy performance of sparse linear system solvers, without negatively impacting their performance. ...
One contribution of 14 to a Theme Issue 'Stochastic modelling and energy-efficient computing for weather and climate prediction' . ...
(c) Leveraging the CPU states on manycore systems The results in §3b illustrate that GPUs are among the most energy-efficient hardware architectures for sparse linear algebra. ...
doi:10.1098/rsta.2013.0279
pmid:24842036
fatcat:kw7cnmvzrff6pmihhqenl53uwm
« Previous
Showing results 1 — 15 out of 1,040 results