Irregularity mitigation and portability abstractions for accelerated sparse matrix factorization
release_jhttfyetqnbmvigpltkkt624km
by
Daniel Thürck
2021
Abstract
In this thesis, we investigate new ways to mitigate the inherent irregularity in sparse matrix factorizations and decompose the resulting computation into simple kernels which are portable across a diverse set of compute accelerator architectures through our novel compiler borG. Be it weather prediction, climate models, personalized medicine, genetic analysis and autonomous driving: some of today's central challenges require processing of vast amounts of data, feeding large-scale simulations or AI models. As the scale of these problems outpaces the processing power and available storage capacity, it becomes crucial to exploit their inherent sparsity. Such sparse topologies, i.e., graph topologies where most of the nodes are not directly connected, are often the source for sparse linear systems of equations whose solution poses a major computational challenge. At the same time, we are witnessing a shift in terms of hardware in the high-performance computing field: as hardware designers try to avoid the quadratically increasing energy consumption for higher clock frequencies, compute setups increase parallelism and specialization instead. Notably, most of the accelerators in use today are optimized for massive parallelism on regular structures and dense data structures. Processing sparse workloads efficiently on novel, heterogeneous architectures presents a challenge that demands systemic solutions. In this thesis, we investigate strategies and systems focusing on an important building block for computational sciences: sparse numerical (matrix) factorizations. Factorizations exhibit irregularity in two aspects. First, the sparse data structures complicate workload distribution on accelerators geared towards regular grids. Second, numerically mandated pivoting introduces irregularity into the control flow. This leads to expensive synchronization points and requires expensive re-building of data structures. We propose two building blocks that help mitigate these problems for accelerators. First, a generalization of s [...]
In text/plain
format
Archived Files and Locations
application/pdf 9.4 MB
file_rbsicn63bjg7jh7uqantz23nwi
|
tuprints.ulb.tu-darmstadt.de (publisher) web.archive.org (webarchive) |
article-journal
Stage
published
Year 2021
access all versions, variants, and formats of this works (eg, pre-prints)
Datacite Metadata (via API)
Worldcat
wikidata.org
CORE.ac.uk
Semantic Scholar
Google Scholar