Enabling Partial Cache Line Prefetching Through Data Compression.

These prefetched values are held in vacant space created in the data cache by storing values in compressed form. ... In comparison to a baseline cache that does not support prefetching, on average, our cache design reduces the memory traffic by 10%, reduces the data cache miss rate by 14%, and speeds up program execution ... Experimental Setup We implemented compression enabled partial cache line prefetching scheme using Simplescalar 3.0 [4] . ...

doi:10.1002/0471732710.ch9 fatcat:dr4bqfk33fepbdnkezrp7tluvy

These prefetched values are held in vacant space created in the data cache by storing values in compressed form. ... In comparison to a baseline cache that does not support prefetching, on average, our cache design reduces the memory traffic by 10%, reduces the data cache miss rate by 14%, and speeds up program execution ... Experimental Setup We implemented compression enabled partial cache line prefetching scheme using Simplescalar 3.0 [4] . ...

doi:10.1109/icpp.2003.1240590 dblp:conf/icpp/ZhangG03 fatcat:2cpjkmz46radnha56rh3ebh26y

Second, we propose a simple adaptive prefetching mechanism that uses cache compression's extra tags to detect useless and harmful prefetches. ... On an 8-processor CMP with no prefetching, compression improves performance by up to 18% for commercial workloads. ... We also thank Brad Beckmann for his help with prefetching implementation. ...

doi:10.1109/hpca.2007.346200 dblp:conf/hpca/AlameldeenW07 fatcat:ekcsuohp4ffhpppztvrptaua3a

CABA enables the use of idle computational units and pipelines to alleviate the memory bandwidth bottleneck, e.g., by using assist warps to perform data compression to transfer less data from memory. ... We provide a comprehensive design and evaluation of CABA to perform effective and flexible data compression in the GPU memory hierarchy to alleviate the memory bandwidth bottleneck. ... This research was partially supported by NSF (grants 0953246, 1065112, 1205618, 1212962, 1213052, 1302225 [115] . ...

arXiv:1602.01348v1 fatcat:qbzuknzcyncrticap55x4i5dhi

Virtual cache lines increase the spatial locality of the given data structure resulting in better locality of references. ... Once such a data structure goes through a sequence of updates (inserts and deletes), it may get scattered all over memory yielding poor spatial locality, which in turn introduces many cache misses. ... Prefetching offers only partial remedy -it is the data layout itself which should be optimized for better spatial locality. ...

doi:10.1007/978-3-540-49051-7_18 fatcat:qgcbracqqvhkva2fdkz7pvl274

In this paper, we propose a novel in-cache query co-processing paradigm for main memory On-Line Analytical Processing (OLAP) databases on coupled CPU-GPU architectures. ... Specifically, we adapt CPU-assisted prefetching to minimize cache misses in GPU query co-processing and CPU-assisted decompression to improve query execution performance. ... Besides, the CPU and the GPU share the L2 cache in this study, which enables the possibility of data reuse between them. ...

doi:10.14778/2735496.2735497 fatcat:2f53sczrrnh3rbs4xwnsm6f4re

We introduce a dynamic scheme that captures the accesspatterns of linked data structures and can be used to predict future accesses with high accuracy. ... To achieve a prefetching eflect, a small prefetch engine speculatively traverses this representation ahead of the executing program. ... Here, data cache miss rates do not tell the whole story since the latency of many pointer loads, as well as other loads that access on pointer load cache lines, may be partially hidden. ...

doi:10.1145/384265.291034 fatcat:4lkgda5hpbck3hbt6npl65jsoi

We introduce a dynamic scheme that captures the accesspatterns of linked data structures and can be used to predict future accesses with high accuracy. ... To achieve a prefetching eflect, a small prefetch engine speculatively traverses this representation ahead of the executing program. ... Here, data cache miss rates do not tell the whole story since the latency of many pointer loads, as well as other loads that access on pointer load cache lines, may be partially hidden. ...

doi:10.1145/291069.291034 dblp:conf/asplos/RothMS98 fatcat:jul62swkkjbr5f24jf5qjciwoa

We introduce a dynamic scheme that captures the accesspatterns of linked data structures and can be used to predict future accesses with high accuracy. ... To achieve a prefetching eflect, a small prefetch engine speculatively traverses this representation ahead of the executing program. ... Here, data cache miss rates do not tell the whole story since the latency of many pointer loads, as well as other loads that access on pointer load cache lines, may be partially hidden. ...

doi:10.1145/291006.291034 fatcat:fblqhoxlrvafpjqgjz6gcq3csi

In this work, we present a lossless compression algorithm that has been designed for fast on-line data compression, and cache compression in particular. ... The algorithm has a number of novel features tailored for this application, including combining pairs of compressed lines into one cache line and allowing parallel compression of multiple words while using ... Alameldeen at Intel Corporation for his help understanding his cache compression research results. ...

doi:10.1109/tvlsi.2009.2020989 fatcat:vl2hhmdxfzealpkjnd6z2a6z7e

This study sheds light on the challenges of adopting compression in cache design—from the shrinking of the data until its physical placement. ... Hardware cache compression derives from software-compression research; yet, its implementation is not a straightforward translation, since it must abide by multiple restrictions to comply with area, power ... For instance, Prefetched Blocks Compaction (PBC) [39] explores the similarity of prefetched lines to co-allocate them through inter-line compression and maximize use of the effective cache capacity. ...

doi:10.1145/3457207 fatcat:2jsbv7d3qfd53kpyiv44cpcfne

In this case, a larger cache and a bigger SHT/OUT directory will enable the prefetched lines to be kept longer so that they are more likely to be referenced. ... Hit in Out-Of-Position Lines. If the line is found through the OUT directory, the data is accessed in the next cycle using the set-ID fetched from the OUT directory. ...

doi:10.1145/291069.291053 dblp:conf/asplos/PeirLH98 fatcat:hmfosgnqwzap5fg7oihs4xniee

In this case, a larger cache and a bigger SHT/OUT directory will enable the prefetched lines to be kept longer so that they are more likely to be referenced. ... Hit in Out-Of-Position Lines. If the line is found through the OUT directory, the data is accessed in the next cycle using the set-ID fetched from the OUT directory. ...

doi:10.1145/291006.291053 fatcat:5mmburn4qfhuthucow5j22jfxi

In this case, a larger cache and a bigger SHT/OUT directory will enable the prefetched lines to be kept longer so that they are more likely to be referenced. ... Hit in Out-Of-Position Lines. If the line is found through the OUT directory, the data is accessed in the next cycle using the set-ID fetched from the OUT directory. ...

doi:10.1145/384265.291053 fatcat:q553xtnbmjf3fhh6b4mcrhxiee

Rather than explore a program's control flow graph, TIFS predicts future instruction-cache misses directly, through recording and replaying recurring L1 instruction miss sequences. ... Then, we describe a practical mechanism to record these recurring sequences in the L2 cache and leverage them for instruction-cache prefetching. ... Like similar studies of repetitive streams in L1 data accesses [6] , off-chip data misses [35, 36] , and program paths [16] , we use the SEQUITUR [10] hierarchical data compression algorithm to identify ...

doi:10.1109/micro.2008.4771774 dblp:conf/micro/FerdmanWAFM08 fatcat:ffdk7ljp6jbi5hqj2qrtfhrljm

Enabling Partial-Cache Line Prefetching through Data Compression [chapter]

Preserved Fulltext

Enabling partial cache line prefetching through data compression

Preserved Fulltext

Interactions Between Compression and Prefetching in Chip Multiprocessors

Preserved Fulltext

A Framework for Accelerating Bottlenecks in GPU Execution with Assist Warps [article]

Preserved Fulltext

Virtual Cache Line: A New Technique to Improve Cache Exploitation for Recursive Data Structures [chapter]

Preserved Fulltext

In-cache query co-processing on coupled CPU-GPU architectures

Preserved Fulltext

Dependence based prefetching for linked data structures

Preserved Fulltext

Dependence based prefetching for linked data structures

Preserved Fulltext

Dependence based prefetching for linked data structures

Preserved Fulltext

C-Pack: A High-Performance Microprocessor Cache Compression Algorithm

Preserved Fulltext

Understanding Cache Compression

Preserved Fulltext

Capturing dynamic memory reference behavior with adaptive cache topology

Preserved Fulltext

Capturing dynamic memory reference behavior with adaptive cache topology

Preserved Fulltext

Capturing dynamic memory reference behavior with adaptive cache topology

Preserved Fulltext

Temporal instruction fetch streaming

Preserved Fulltext