Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Filters








1,040 Hits in 4.9 sec

A Hybrid Approach for Parallel Transistor-Level Full-Chip Circuit Simulation [chapter]

Heidi K. Thornquist, Sivasankaran Rajamanickam
2015 Lecture Notes in Computer Science  
Hybrid versions of two iterative linear solver strategies are presented, one takes advantage of block triangular form structure while the other uses a Schur complement technique.  ...  Results indicate up to a 27x improvement in total simulation time on 256 cores.  ...  The triangular solve in this particular case has multiple right-hand sides where the right-hand sides are themselves sparse columns of C.  ... 
doi:10.1007/978-3-319-17353-5_9 fatcat:5jrocvi2b5cltcfgjwkc6gs6by

Fast and Scalable Sparse Triangular Solver for Multi-GPU Based HPC Architectures [article]

Chenhao Xie, Jieyang Chen, Jesun S Firoz, Jiajia Li, Shuaiwen Leon Song, Kevin Barker, Mark Raugas, Ang Li
2020 arXiv   pre-print
This is particularly the case for Sparse Triangular Solver (SpTRSV) which introduces additional two-dimensional computation dependencies among subsequent computation steps.  ...  Dependency information is exchanged and shared among GPUs, thus warrant for efficient memory allocation, data partitioning, and workload distribution as well as fine-grained communication and synchronization  ...  heap and relying on the one-sided communication primitives in NVSHMEM for inter-GPU communication.  ... 
arXiv:2012.06959v1 fatcat:am7guw7i5fchxafrkp34plwvky

Preparing sparse solvers for exascale computing

Hartwig Anzt, Erik Boman, Rob Falgout, Pieter Ghysels, Michael Heroux, Xiaoye Li, Lois Curfman McInnes, Richard Tran Mills, Sivasankaran Rajamanickam, Karl Rupp, Barry Smith, Ichitaro Yamazaki (+1 others)
2020 Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences  
Sparse solvers provide essential functionality for a wide variety of scientific applications.  ...  Highly parallel sparse solvers are essential for continuing advances in high-fidelity, multi-physics and multi-scale simulations, especially as we target exascale platforms.  ...  The second technique leverages the one-sided MPI communication functions to implement a synchronization-free task queue, allowing more overlap of communication and computation, leading to additional 2×  ... 
doi:10.1098/rsta.2019.0053 pmid:31955673 fatcat:bqw6xqixbrabddmxglmtcbw2wa

Accelerating advanced preconditioning methods on hybrid architectures

Ernesto Dufrechou
2021 CLEI Electronic Journal  
In particular, we study ILUPACK, a package for the solution of sparse linear systems via Krylov subspace methods that relies on a modern inverse-based multilevel ILU (incomplete LU) preconditioning technique  ...  We present new data-parallel versions of the preconditioner and the most important solvers contained in the package that significantly improve its performance without affecting its accuracy.  ...  [34] proposed a new GPU solver for sparse triangular systems, for matrices stored in the CSC format, based on the self-scheduled strategy.  ... 
doi:10.19153/cleiej.24.1.6 doaj:cf900516b6334e27afbe4102fa203079 fatcat:ohhmcgyrl5hgfaib7hb2rdyhom

Sparsifying Synchronization for High-Performance Shared-Memory Sparse Triangular Solver [chapter]

Jongsoo Park, Mikhail Smelyanskiy, Narayanan Sundaram, Pradeep Dubey
2014 Lecture Notes in Computer Science  
Sparse triangular solver is one such kernel and is the focus of this paper.  ...  As a result, on a 12-core Intel R Xeon R processor, our approach improves the performance of sparse triangular solver by 1.6x, compared to the conventional level-scheduling with barrier synchronization  ...  Heroux for his insights related to the implementation and performance of conjugate gradient, and Kiran Pamnany for sharing his dissemination barrier implementation. Bibliography  ... 
doi:10.1007/978-3-319-07518-1_8 fatcat:z3lzntgb6bh27i3lo2lzmojox4

Towards a scalable hybrid sparse solver

Esmond G. Ng, Padma Raghavan
2000 Concurrency Practice and Experience  
When the sparse matrix is symmetric and positive de nite, direct methods use Cholesky factorization while iterative methods rely on Conjugate Gradients.  ...  Our goal is to develop a scalable and memory-e cient hybrid of the two methods that can be implemented with high-e ciency on both serial and parallel computers and be suitable for a wide-range of problems  ...  The key is to leverage technology that has been developed for sparse direct methods.  ... 
doi:10.1002/(sici)1096-9128(200002/03)12:2/3<53::aid-cpe473>3.3.co;2-2 fatcat:7bqxv5ygpbbvneb6yfxshyrske

Variable-Size Batched LU for Small Matrices and Its Integration into Block-Jacobi Preconditioning

Hartwig Anzt, Jack Dongarra, Goran Flegar, Enrique S. Quintana-Orti
2017 2017 46th International Conference on Parallel Processing (ICPP)  
sparse linear systems.  ...  We present a set of new batched CUDA kernels for the LU factorization of a large collection of independent problems of different size, and the subsequent triangular solves.  ...  linear solvers" (d65).  ... 
doi:10.1109/icpp.2017.18 dblp:conf/icpp/AnztDFQ17 fatcat:sryw4eagnnf3zbuzllaxhm3ebm

Towards a scalable hybrid sparse solver

Esmond G. Ng, Padma Raghavan
2000 Concurrency Practice and Experience  
When the sparse matrix is symmetric and positive de nite, direct methods use Cholesky factorization while iterative methods rely on Conjugate Gradients.  ...  Our goal is to develop a scalable and memory-e cient hybrid of the two methods that can be implemented with high-e ciency on both serial and parallel computers and be suitable for a wide-range of problems  ...  The key is to leverage technology that has been developed for sparse direct methods.  ... 
doi:10.1002/(sici)1096-9128(200002/03)12:2/3<53::aid-cpe473>3.0.co;2-b fatcat:kv7m4mb5lnh4jgigz6ygpmrkqy

Sparse matrix factorization in the implicit finite element method on petascale architecture

Seid Koric, Anshul Gupta
2016 Computer Methods in Applied Mechanics and Engineering  
The performance of the massively parallel direct multifrontal solver Watson Sparse Matrix Package (WSMP) for solving large sparse systems of linear equations arising in implicit finite element method on  ...  unstructured (free) meshes in solid mechanics was evaluated on one of the most powerful supercomputers currently available to the open science community-the sustained petascale high performance computing  ...  Acknowledgments The authors would like to thank the Private Sector Program and the Blue Waters sustained-petascale computing project at the National Center for Supercomputing Applications (NCSA).  ... 
doi:10.1016/j.cma.2016.01.011 fatcat:7fvypjoinrcmnigv3hldkqfcxe

Communication in task-parallel ILU-preconditioned CG solvers using MPI + OmpSs

José I. Aliaga, María Barreda, Goran Flegar, Matthias Bollhöfer, Enrique S. Quintana-Ortí
2017 Concurrency and Computation  
For all these implementations, we analyze the communication patterns and perform a comparative analysis of their performance and scalability on a cluster consisting of 16 nodes, with 16 cores each.  ...  We target the parallel solution of sparse linear systems via iterative Krylov subspace-based methods enhanced with ILU-type preconditioners on clusters of multicore processors.  ...  On the positive side, it diminishes the amount of messages (though not the total volume of communication), and it does not change the numerical properties of the solver.  ... 
doi:10.1002/cpe.4280 fatcat:m7hsdjutjnhfhmxxl4q5izo6ka

Accelerating Band Linear Algebra Operations on GPUs with Application in Model Reduction [chapter]

Peter Benner, Ernesto Dufrechou, Pablo Ezzatti, Pablo Igounet, Enrique S. Quintana-Ortí, Alfredo Remón
2014 Lecture Notes in Computer Science  
Our experiments with an nVidia S2070 GPU report speed-ups up to 6× for the hybrid band solver based on the LU factorization over analogous CPU-only routines in Intel's MKL.  ...  As a practical demonstration of these benefits, we plug the new CPU-GPU codes into a sparse matrix Lyapunov equation solver, showing a 3× acceleration on the solution of a large-scale benchmark arising  ...  The advantages of the hybrid band routines carry over to the solution of sparse Lyapunov solvers, with an acceleration factor around 2-3× with respect to the analogous solver based on MKL.  ... 
doi:10.1007/978-3-319-09153-2_29 fatcat:vf37puekijcphjxs6vqmmw7wim

An Asynchronous Task-based Fan-Both Sparse Cholesky Solver [article]

Mathias Jacquelin, Yili Zheng, Esmond Ng, Katherine Yelick
2016 arXiv   pre-print
In this paper, we investigate the use of an asynchronous task paradigm, one-sided communication and dynamic scheduling in implementing sparse Cholesky factorization (symPACK) on large-scale distributed  ...  Our solver symPACK relies on efficient and flexible communication primitives provided by the UPC++ library.  ...  Another very important characteristic of communication protocols is whether a communication primitive is two-sided or one-sided.  ... 
arXiv:1608.00044v2 fatcat:wfxqlgser5e2rmgwxjcum23m6i

Sympiler

Kazem Cheshmi, Shoaib Kamil, Michelle Mills Strout, Maryam Mehri Dehnavi
2017 Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '17  
The computation patterns in sparse numerical methods are guided by the input sparsity structure and the sparse algorithm itself.  ...  Sympiler is a domain-specific code generator that optimizes sparse matrix computations by decoupling the symbolic analysis phase from the numerical manipulation stage in sparse codes.  ...  Motivating Scenario Sparse triangular solve takes a lower triangular matrix L and a righthand side (RHS) vector b and solves the linear equation Lx = b for x.  ... 
doi:10.1145/3126908.3126936 dblp:conf/sc/CheshmiKSD17 fatcat:joe4jxi2lraelbjwo65l3sarpa

OnAn ( log ) solution algorithm for spectral element methods [chapter]

I. Lee, P. Raghavan, S. Schofield, P. Fischer
2003 Computational Fluid and Solid Mechanics 2003  
To leverage significant software development effort, general purpose unstructured codes are often used in structured or semi-structured applications.  ...  We show that O(n log n) computational complexities, competitive with classic Fourier methods, are achievable for some classes of semi-structured spectral element applications.  ...  For this application, we use SI-2, the SI scheme with a two-way data mapping [12] implemented in our solver package [9] .  ... 
doi:10.1016/b978-008044046-0.50500-5 fatcat:ymhmr2vxujdktmi4dsu3xbtq3y

Improving the energy efficiency of sparse linear system solvers on multicore and manycore systems

H. Anzt, E. S. Quintana-Orti
2014 Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences  
improve the energy performance of sparse linear system solvers, without negatively impacting their performance.  ...  One contribution of 14 to a Theme Issue 'Stochastic modelling and energy-efficient computing for weather and climate prediction' .  ...  (c) Leveraging the CPU states on manycore systems The results in §3b illustrate that GPUs are among the most energy-efficient hardware architectures for sparse linear algebra.  ... 
doi:10.1098/rsta.2013.0279 pmid:24842036 fatcat:kw7cnmvzrff6pmihhqenl53uwm
« Previous Showing results 1 — 15 out of 1,040 results