A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Filters
Enabling Partial-Cache Line Prefetching through Data Compression
[chapter]
2006
High-Performance Computing
These prefetched values are held in vacant space created in the data cache by storing values in compressed form. ...
In comparison to a baseline cache that does not support prefetching, on average, our cache design reduces the memory traffic by 10%, reduces the data cache miss rate by 14%, and speeds up program execution ...
Experimental Setup We implemented compression enabled partial cache line prefetching scheme using Simplescalar 3.0 [4] . ...
doi:10.1002/0471732710.ch9
fatcat:dr4bqfk33fepbdnkezrp7tluvy
Enabling partial cache line prefetching through data compression
2003
2003 International Conference on Parallel Processing, 2003. Proceedings.
These prefetched values are held in vacant space created in the data cache by storing values in compressed form. ...
In comparison to a baseline cache that does not support prefetching, on average, our cache design reduces the memory traffic by 10%, reduces the data cache miss rate by 14%, and speeds up program execution ...
Experimental Setup We implemented compression enabled partial cache line prefetching scheme using Simplescalar 3.0 [4] . ...
doi:10.1109/icpp.2003.1240590
dblp:conf/icpp/ZhangG03
fatcat:2cpjkmz46radnha56rh3ebh26y
Interactions Between Compression and Prefetching in Chip Multiprocessors
2007
2007 IEEE 13th International Symposium on High Performance Computer Architecture
Second, we propose a simple adaptive prefetching mechanism that uses cache compression's extra tags to detect useless and harmful prefetches. ...
On an 8-processor CMP with no prefetching, compression improves performance by up to 18% for commercial workloads. ...
We also thank Brad Beckmann for his help with prefetching implementation. ...
doi:10.1109/hpca.2007.346200
dblp:conf/hpca/AlameldeenW07
fatcat:ekcsuohp4ffhpppztvrptaua3a
A Framework for Accelerating Bottlenecks in GPU Execution with Assist Warps
[article]
2016
arXiv
pre-print
CABA enables the use of idle computational units and pipelines to alleviate the memory bandwidth bottleneck, e.g., by using assist warps to perform data compression to transfer less data from memory. ...
We provide a comprehensive design and evaluation of CABA to perform effective and flexible data compression in the GPU memory hierarchy to alleviate the memory bandwidth bottleneck. ...
This research was partially supported by NSF (grants 0953246, 1065112, 1205618, 1212962, 1213052, 1302225 [115] . ...
arXiv:1602.01348v1
fatcat:qbzuknzcyncrticap55x4i5dhi
Virtual Cache Line: A New Technique to Improve Cache Exploitation for Recursive Data Structures
[chapter]
1999
Lecture Notes in Computer Science
Virtual cache lines increase the spatial locality of the given data structure resulting in better locality of references. ...
Once such a data structure goes through a sequence of updates (inserts and deletes), it may get scattered all over memory yielding poor spatial locality, which in turn introduces many cache misses. ...
Prefetching offers only partial remedy -it is the data layout itself which should be optimized for better spatial locality. ...
doi:10.1007/978-3-540-49051-7_18
fatcat:qgcbracqqvhkva2fdkz7pvl274
In-cache query co-processing on coupled CPU-GPU architectures
2014
Proceedings of the VLDB Endowment
In this paper, we propose a novel in-cache query co-processing paradigm for main memory On-Line Analytical Processing (OLAP) databases on coupled CPU-GPU architectures. ...
Specifically, we adapt CPU-assisted prefetching to minimize cache misses in GPU query co-processing and CPU-assisted decompression to improve query execution performance. ...
Besides, the CPU and the GPU share the L2 cache in this study, which enables the possibility of data reuse between them. ...
doi:10.14778/2735496.2735497
fatcat:2f53sczrrnh3rbs4xwnsm6f4re
Dependence based prefetching for linked data structures
1998
ACM SIGOPS Operating Systems Review
We introduce a dynamic scheme that captures the accesspatterns of linked data structures and can be used to predict future accesses with high accuracy. ...
To achieve a prefetching eflect, a small prefetch engine speculatively traverses this representation ahead of the executing program. ...
Here, data cache miss rates do not tell the whole story since the latency of many pointer loads, as well as other loads that access on pointer load cache lines, may be partially hidden. ...
doi:10.1145/384265.291034
fatcat:4lkgda5hpbck3hbt6npl65jsoi
Dependence based prefetching for linked data structures
1998
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems - ASPLOS-VIII
We introduce a dynamic scheme that captures the accesspatterns of linked data structures and can be used to predict future accesses with high accuracy. ...
To achieve a prefetching eflect, a small prefetch engine speculatively traverses this representation ahead of the executing program. ...
Here, data cache miss rates do not tell the whole story since the latency of many pointer loads, as well as other loads that access on pointer load cache lines, may be partially hidden. ...
doi:10.1145/291069.291034
dblp:conf/asplos/RothMS98
fatcat:jul62swkkjbr5f24jf5qjciwoa
Dependence based prefetching for linked data structures
1998
SIGPLAN notices
We introduce a dynamic scheme that captures the accesspatterns of linked data structures and can be used to predict future accesses with high accuracy. ...
To achieve a prefetching eflect, a small prefetch engine speculatively traverses this representation ahead of the executing program. ...
Here, data cache miss rates do not tell the whole story since the latency of many pointer loads, as well as other loads that access on pointer load cache lines, may be partially hidden. ...
doi:10.1145/291006.291034
fatcat:fblqhoxlrvafpjqgjz6gcq3csi
C-Pack: A High-Performance Microprocessor Cache Compression Algorithm
2010
IEEE Transactions on Very Large Scale Integration (vlsi) Systems
In this work, we present a lossless compression algorithm that has been designed for fast on-line data compression, and cache compression in particular. ...
The algorithm has a number of novel features tailored for this application, including combining pairs of compressed lines into one cache line and allowing parallel compression of multiple words while using ...
Alameldeen at Intel Corporation for his help understanding his cache compression research results. ...
doi:10.1109/tvlsi.2009.2020989
fatcat:vl2hhmdxfzealpkjnd6z2a6z7e
Understanding Cache Compression
2021
ACM Transactions on Architecture and Code Optimization (TACO)
This study sheds light on the challenges of adopting compression in cache design—from the shrinking of the data until its physical placement. ...
Hardware cache compression derives from software-compression research; yet, its implementation is not a straightforward translation, since it must abide by multiple restrictions to comply with area, power ...
For instance, Prefetched Blocks Compaction (PBC) [39] explores the similarity of prefetched lines to co-allocate them through inter-line compression and maximize use of the effective cache capacity. ...
doi:10.1145/3457207
fatcat:2jsbv7d3qfd53kpyiv44cpcfne
Capturing dynamic memory reference behavior with adaptive cache topology
1998
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems - ASPLOS-VIII
In this case, a larger cache and a bigger SHT/OUT directory will enable the prefetched lines to be kept longer so that they are more likely to be referenced. ...
Hit in Out-Of-Position Lines. If the line is found through the OUT directory, the data is accessed in the next cycle using the set-ID fetched from the OUT directory. ...
doi:10.1145/291069.291053
dblp:conf/asplos/PeirLH98
fatcat:hmfosgnqwzap5fg7oihs4xniee
Capturing dynamic memory reference behavior with adaptive cache topology
1998
SIGPLAN notices
In this case, a larger cache and a bigger SHT/OUT directory will enable the prefetched lines to be kept longer so that they are more likely to be referenced. ...
Hit in Out-Of-Position Lines. If the line is found through the OUT directory, the data is accessed in the next cycle using the set-ID fetched from the OUT directory. ...
doi:10.1145/291006.291053
fatcat:5mmburn4qfhuthucow5j22jfxi
Capturing dynamic memory reference behavior with adaptive cache topology
1998
ACM SIGOPS Operating Systems Review
In this case, a larger cache and a bigger SHT/OUT directory will enable the prefetched lines to be kept longer so that they are more likely to be referenced. ...
Hit in Out-Of-Position Lines. If the line is found through the OUT directory, the data is accessed in the next cycle using the set-ID fetched from the OUT directory. ...
doi:10.1145/384265.291053
fatcat:q553xtnbmjf3fhh6b4mcrhxiee
Temporal instruction fetch streaming
2008
2008 41st IEEE/ACM International Symposium on Microarchitecture
Rather than explore a program's control flow graph, TIFS predicts future instruction-cache misses directly, through recording and replaying recurring L1 instruction miss sequences. ...
Then, we describe a practical mechanism to record these recurring sequences in the L2 cache and leverage them for instruction-cache prefetching. ...
Like similar studies of repetitive streams in L1 data accesses [6] , off-chip data misses [35, 36] , and program paths [16] , we use the SEQUITUR [10] hierarchical data compression algorithm to identify ...
doi:10.1109/micro.2008.4771774
dblp:conf/micro/FerdmanWAFM08
fatcat:ffdk7ljp6jbi5hqj2qrtfhrljm
« Previous
Showing results 1 — 15 out of 1,670 results