A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2019; you can also visit the original URL.
The file type is application/pdf
.
Filters
Efficient data layouts for a three-dimensional electrostatic Particle-in-Cell code
2018
Journal of Computational Science
To accurately solve realistic problems, the method requires to use trillions of particles and therefore, there is a strong demand for high performance code on modern architectures. ...
The Particle-in-Cell (PIC) method is a widely used tool in plasma physics. ...
We thus explored several space-filling curves for the data layout of E and ρ, and different loop transformations for the particle loops, with the aim of improving the cache reuse and achieving efficient ...
doi:10.1016/j.jocs.2018.06.004
fatcat:eztadpep6vd6llqy2r6v4cfibe
Enhancing locality for recursive traversals of recursive structures
2011
SIGPLAN notices
We develop a novel optimization called point blocking, inspired by the classic tiling loop transformation, and show that it can substantially enhance temporal locality in traversal codes. ...
In this paper, we argue that, for a class of irregular applications we call traversal codes, there exists substantial data reuse and hence opportunity for locality exploitation. ...
We would also like to thank the anonymous referees for providing insightful comments. This research was supported in part with funding from Intel and a grant from the Purdue Research Foundation. ...
doi:10.1145/2076021.2048104
fatcat:qaflx5pc7nbbtmhc3hurh2v2dq
Enhancing locality for recursive traversals of recursive structures
2011
Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications - OOPSLA '11
We develop a novel optimization called point blocking, inspired by the classic tiling loop transformation, and show that it can substantially enhance temporal locality in traversal codes. ...
In this paper, we argue that, for a class of irregular applications we call traversal codes, there exists substantial data reuse and hence opportunity for locality exploitation. ...
We would also like to thank the anonymous referees for providing insightful comments. This research was supported in part with funding from Intel and a grant from the Purdue Research Foundation. ...
doi:10.1145/2048066.2048104
dblp:conf/oopsla/JoK11
fatcat:6cudnrjtffbdfkcfc7w3s5iaee
Mesh Layouts for Block-Based Caches
2006
IEEE Transactions on Visualization and Computer Graphics
Index Terms-Mesh and graph layouts, cache-aware and cache-oblivious layouts, metrics for cache coherence, data locality. ! • The authors are with the Lawrence ...
In addition to guiding the layout process, our metrics can be used to quantify the quality of a layout, e.g. for comparing different layouts of the same mesh and for determining whether a given layout ...
This work was supported by the LOCAL LLNL LDRD project (05-ERD-018) and was performed under the auspices of the U.S. ...
doi:10.1109/tvcg.2006.162
pmid:17080854
fatcat:zevggpjkw5dqhcyxljiahj3jhq
Optimizing an MPI weather forecasting model via processor virtualization
2010
2010 International Conference on High Performance Computing
These models are typically executed in parallel machines and a major obstacle for their scalability is load imbalance. ...
In this paper, we demonstrate the effectiveness of processor virtualization for dynamically balancing the load in BRAMS, a mesoscale weather forecasting model based on MPI parallelization. ...
Because the Hilbert curve preserves spatial locality, threads corresponding to sub-domains that are close in space are likely to be assigned to the same processor. ...
doi:10.1109/hipc.2010.5713171
dblp:conf/hipc/RodriguesNPMK10
fatcat:m2mmh3iiu5dolgy6yalavutoee
Assignment as a location-based service in outsourced databases
2017
Turkish Journal of Electrical Engineering and Computer Sciences
A recent work [12], proposes to use Moore curves and asymmetric cryptography together for KNN queries. ...
Assignment query processing needs global data, since a small local change could totally change the result. ...
Conclusions In this paper, we propose a novel method to protect location privacy not only for static local queries but also for dynamic global queries such as assignment queries. ...
doi:10.3906/elk-1511-190
fatcat:tan4xx6v3ve5xavifyfbuiieuu
KOLAM: a cross-platform architecture for scalable visualization and tracking in wide-area imagery
2013
Geospatial InfoFusion III
KOLAM is an open, cross-platform, interoperable, scalable and extensible framework supporting a novel multiscale spatiotemporal dual-cache data structure for big data visualization and visual analytics ...
The KOLAM software architecture was extended to support airborne wide-area motion imagery by organizing spatiotemporal tiles in very large format video frames using a temporal cache of tiled pyramid cached ...
Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation thereon. ...
doi:10.1117/12.2018162
fatcat:ux7qxsmlqjgxtiosglmoubvs2i
Efficient query processing on unstructured tetrahedral meshes
2006
Proceedings of the 2006 ACM SIGMOD international conference on Management of data - SIGMOD '06
Finally, we present a new data layout approach for tetrahedral mesh datasets that provides better performance compared to the traditional space filling curves. ...
We develop Directed Local Search (DLS), an efficient indexing algorithm based on mesh topology information that is practically insensitive to the geometric properties of meshes. ...
, an IBM faculty partnership award and a NASA AISR fund. ...
doi:10.1145/1142473.1142535
dblp:conf/sigmod/PapadomanolakisALTOH06
fatcat:2cyruhfwznh65nho4tsuzk5jia
Using Evolutionary Algorithms to Find Cache-Friendly Generalized Morton Layouts for Arrays
[article]
2024
arXiv
pre-print
To this end, we propose a chromosomal representation for such layouts as well as a methodology for estimating the fitness of array layouts using cache simulation. ...
The layout of multi-dimensional data can have a significant impact on the efficacy of hardware caches and, by extension, the performance of applications. ...
Many of the experimental results shown in this paper were gathered on the Advanced School for Computing and Imaging (ASCI) DAS-6 compute cluster [11] . ...
arXiv:2309.07002v2
fatcat:igplhgez5rhrnn5gf2i7pqbnoe
Efficient band approximation of Gram matrices for large scale kernel methods on GPUs
2009
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis - SC '09
Our method relies on the locality preserving properties of space filling curves, and the special structure of Gram matrices. Our approach has several important merits. ...
We introduce a novel method to approximate a Gram matrix with a band matrix. ...
However, this locality preserving property varies from one type of curves to another. The family of Hilbert curves [3] is known to have good locality preserving properties. ...
doi:10.1145/1654059.1654091
dblp:conf/sc/HusseinA09
fatcat:ujfljhpjazbb3ns3y2bkdl7gwe
Advanced optimization strategies in the Rice dHPF compiler
2002
Concurrency and Computation
For loop nests with complex data dependences, such as the example of Figure 4 , we have developed an algorithm to eliminate inner-loop communication without excessive loss of cache reuse. ...
Communication is vectorized out of any loop as long as doing so will not cause any loop-carried or loop-independent data dependence to be violated. • The compiler can coalesce messages for arbitrary affine ...
For example, adaptive distributions based on space-filling (for instance, Hilbert) curves [25] would enable many irregular applications to be implemented efficiently in HPF. ...
doi:10.1002/cpe.647
fatcat:taze6xqwpzhw3hw27yglleqnei
A divide-and-conquer/cellular-decomposition framework for million-to-billion atom simulations of chemical reactions
2007
Computational materials science
Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to a penalty for failing to comply with a collection of information if it does not display a currently ...
The EDC and THCD frameworks expose maximal data localities, and consequently the isogranular parallel efficiency on 1920 processors is as high as 0.953. ...
Cells are traversed along either a Morton curve (Fig. 3) or a Hilbert curve, instead of the traditional raster-scan order. ...
doi:10.1016/j.commatsci.2006.04.012
fatcat:muocll6lfvazrj4qmy2geazy6m
2PCP: Two-phase CP decomposition for billion-scale dense tensors
2016
2016 IEEE 32nd International Conference on Data Engineering (ICDE)
In this paper, we introduce 2PCP, a two-phase, block-based CP decomposition system with intelligent buffer sensitive task scheduling and buffer management mechanisms. 2PCP aims to reduce I/O costs in the ...
Tensors are multi-dimensional arrays -consequently, tensor decomposition operations (CP and Tucker) are the bases for many high-dimensional data analysis tasks, from clustering, trend detection, anomaly ...
As we see here, Hilbert traversal relies on "U" shaped curvesegments (as opposed to the "Z" shaped curve-segments of the Z-order traversal) and this helps better preserve the adjacency property (i.e., ...
doi:10.1109/icde.2016.7498294
dblp:conf/icde/LiHCS16
fatcat:gbltoqe3bnhhfpijqkyfrzjknu
High-Performance and Scalable Agent-Based Simulation with BioDynaMo
[article]
2023
arXiv
pre-print
To overcome this limitation, we present a novel high-performance simulation engine. We identify three key challenges for which we present the following solutions. ...
Second, we reduce the memory access latency with a NUMA-aware agent iterator, agent sorting with a space-filling curve, and a custom heap memory allocator. ...
Acknowledgments We want to thank the CERN Knowledge Transfer office (https: //kt.cern/) for supporting this work. ...
arXiv:2301.06984v1
fatcat:vrwqse7vxfg2hfk2tzqpaanjay
Cost-Based Predictive Spatiotemporal Join
2009
IEEE Transactions on Knowledge and Data Engineering
In this paper we present CoPST, the first and foremost algorithm for such a join using two spatio-temporal indexes. ...
CoPST adapts gracefully to large scale databases, by dynamically switching between main-memory buffering and disk-based buffering, through a novel probabilistic cost model. ...
It is not possible to show the results for a varying cache size (in the same manner as in Figures 13 to 19 ) because the cache size is fixed for a given PC hardware. ...
doi:10.1109/tkde.2008.159
fatcat:lbbdro7rhzcttb6kxrxbdmgkwy
« Previous
Showing results 1 — 15 out of 287 results