Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Filters








1,670 Hits in 4.9 sec

Enabling Partial-Cache Line Prefetching through Data Compression [chapter]

Youtao Zhang, Rajiv Gupta
2006 High-Performance Computing  
These prefetched values are held in vacant space created in the data cache by storing values in compressed form.  ...  In comparison to a baseline cache that does not support prefetching, on average, our cache design reduces the memory traffic by 10%, reduces the data cache miss rate by 14%, and speeds up program execution  ...  Experimental Setup We implemented compression enabled partial cache line prefetching scheme using Simplescalar 3.0 [4] .  ... 
doi:10.1002/0471732710.ch9 fatcat:dr4bqfk33fepbdnkezrp7tluvy

Enabling partial cache line prefetching through data compression

Y. Zhang, R. Gupta
2003 2003 International Conference on Parallel Processing, 2003. Proceedings.  
These prefetched values are held in vacant space created in the data cache by storing values in compressed form.  ...  In comparison to a baseline cache that does not support prefetching, on average, our cache design reduces the memory traffic by 10%, reduces the data cache miss rate by 14%, and speeds up program execution  ...  Experimental Setup We implemented compression enabled partial cache line prefetching scheme using Simplescalar 3.0 [4] .  ... 
doi:10.1109/icpp.2003.1240590 dblp:conf/icpp/ZhangG03 fatcat:2cpjkmz46radnha56rh3ebh26y

Interactions Between Compression and Prefetching in Chip Multiprocessors

Alaa R. Alameldeen, David A. Wood
2007 2007 IEEE 13th International Symposium on High Performance Computer Architecture  
Second, we propose a simple adaptive prefetching mechanism that uses cache compression's extra tags to detect useless and harmful prefetches.  ...  On an 8-processor CMP with no prefetching, compression improves performance by up to 18% for commercial workloads.  ...  We also thank Brad Beckmann for his help with prefetching implementation.  ... 
doi:10.1109/hpca.2007.346200 dblp:conf/hpca/AlameldeenW07 fatcat:ekcsuohp4ffhpppztvrptaua3a

A Framework for Accelerating Bottlenecks in GPU Execution with Assist Warps [article]

Nandita Vijaykumar, Gennady Pekhimenko, Adwait Jog, Saugata Ghose, Abhishek Bhowmick, Rachata Ausavarangnirun, Chita Das, Mahmut Kandemir, Todd C. Mowry, Onur Mutlu
2016 arXiv   pre-print
CABA enables the use of idle computational units and pipelines to alleviate the memory bandwidth bottleneck, e.g., by using assist warps to perform data compression to transfer less data from memory.  ...  We provide a comprehensive design and evaluation of CABA to perform effective and flexible data compression in the GPU memory hierarchy to alleviate the memory bandwidth bottleneck.  ...  This research was partially supported by NSF (grants 0953246, 1065112, 1205618, 1212962, 1213052, 1302225 [115] .  ... 
arXiv:1602.01348v1 fatcat:qbzuknzcyncrticap55x4i5dhi

Virtual Cache Line: A New Technique to Improve Cache Exploitation for Recursive Data Structures [chapter]

Shai Rubin, David Bernstein, Michael Rodeh
1999 Lecture Notes in Computer Science  
Virtual cache lines increase the spatial locality of the given data structure resulting in better locality of references.  ...  Once such a data structure goes through a sequence of updates (inserts and deletes), it may get scattered all over memory yielding poor spatial locality, which in turn introduces many cache misses.  ...  Prefetching offers only partial remedy -it is the data layout itself which should be optimized for better spatial locality.  ... 
doi:10.1007/978-3-540-49051-7_18 fatcat:qgcbracqqvhkva2fdkz7pvl274

In-cache query co-processing on coupled CPU-GPU architectures

Jiong He, Shuhao Zhang, Bingsheng He
2014 Proceedings of the VLDB Endowment  
In this paper, we propose a novel in-cache query co-processing paradigm for main memory On-Line Analytical Processing (OLAP) databases on coupled CPU-GPU architectures.  ...  Specifically, we adapt CPU-assisted prefetching to minimize cache misses in GPU query co-processing and CPU-assisted decompression to improve query execution performance.  ...  Besides, the CPU and the GPU share the L2 cache in this study, which enables the possibility of data reuse between them.  ... 
doi:10.14778/2735496.2735497 fatcat:2f53sczrrnh3rbs4xwnsm6f4re

Dependence based prefetching for linked data structures

Amir Roth, Andreas Moshovos, Gurindar S. Sohi
1998 ACM SIGOPS Operating Systems Review  
We introduce a dynamic scheme that captures the accesspatterns of linked data structures and can be used to predict future accesses with high accuracy.  ...  To achieve a prefetching eflect, a small prefetch engine speculatively traverses this representation ahead of the executing program.  ...  Here, data cache miss rates do not tell the whole story since the latency of many pointer loads, as well as other loads that access on pointer load cache lines, may be partially hidden.  ... 
doi:10.1145/384265.291034 fatcat:4lkgda5hpbck3hbt6npl65jsoi

Dependence based prefetching for linked data structures

Amir Roth, Andreas Moshovos, Gurindar S. Sohi
1998 Proceedings of the eighth international conference on Architectural support for programming languages and operating systems - ASPLOS-VIII  
We introduce a dynamic scheme that captures the accesspatterns of linked data structures and can be used to predict future accesses with high accuracy.  ...  To achieve a prefetching eflect, a small prefetch engine speculatively traverses this representation ahead of the executing program.  ...  Here, data cache miss rates do not tell the whole story since the latency of many pointer loads, as well as other loads that access on pointer load cache lines, may be partially hidden.  ... 
doi:10.1145/291069.291034 dblp:conf/asplos/RothMS98 fatcat:jul62swkkjbr5f24jf5qjciwoa

Dependence based prefetching for linked data structures

Amir Roth, Andreas Moshovos, Gurindar S. Sohi
1998 SIGPLAN notices  
We introduce a dynamic scheme that captures the accesspatterns of linked data structures and can be used to predict future accesses with high accuracy.  ...  To achieve a prefetching eflect, a small prefetch engine speculatively traverses this representation ahead of the executing program.  ...  Here, data cache miss rates do not tell the whole story since the latency of many pointer loads, as well as other loads that access on pointer load cache lines, may be partially hidden.  ... 
doi:10.1145/291006.291034 fatcat:fblqhoxlrvafpjqgjz6gcq3csi

C-Pack: A High-Performance Microprocessor Cache Compression Algorithm

Xi Chen, Lei Yang, Robert P Dick, Li Shang, Haris Lekatsas
2010 IEEE Transactions on Very Large Scale Integration (vlsi) Systems  
In this work, we present a lossless compression algorithm that has been designed for fast on-line data compression, and cache compression in particular.  ...  The algorithm has a number of novel features tailored for this application, including combining pairs of compressed lines into one cache line and allowing parallel compression of multiple words while using  ...  Alameldeen at Intel Corporation for his help understanding his cache compression research results.  ... 
doi:10.1109/tvlsi.2009.2020989 fatcat:vl2hhmdxfzealpkjnd6z2a6z7e

Understanding Cache Compression

Daniel Rodrigues Carvalho, André Seznec
2021 ACM Transactions on Architecture and Code Optimization (TACO)  
This study sheds light on the challenges of adopting compression in cache design—from the shrinking of the data until its physical placement.  ...  Hardware cache compression derives from software-compression research; yet, its implementation is not a straightforward translation, since it must abide by multiple restrictions to comply with area, power  ...  For instance, Prefetched Blocks Compaction (PBC) [39] explores the similarity of prefetched lines to co-allocate them through inter-line compression and maximize use of the effective cache capacity.  ... 
doi:10.1145/3457207 fatcat:2jsbv7d3qfd53kpyiv44cpcfne

Capturing dynamic memory reference behavior with adaptive cache topology

Jih-Kwon Peir, Yongjoon Lee, Windsor W. Hsu
1998 Proceedings of the eighth international conference on Architectural support for programming languages and operating systems - ASPLOS-VIII  
In this case, a larger cache and a bigger SHT/OUT directory will enable the prefetched lines to be kept longer so that they are more likely to be referenced.  ...  Hit in Out-Of-Position Lines. If the line is found through the OUT directory, the data is accessed in the next cycle using the set-ID fetched from the OUT directory.  ... 
doi:10.1145/291069.291053 dblp:conf/asplos/PeirLH98 fatcat:hmfosgnqwzap5fg7oihs4xniee

Capturing dynamic memory reference behavior with adaptive cache topology

Jih-Kwon Peir, Yongjoon Lee, Windsor W. Hsu
1998 SIGPLAN notices  
In this case, a larger cache and a bigger SHT/OUT directory will enable the prefetched lines to be kept longer so that they are more likely to be referenced.  ...  Hit in Out-Of-Position Lines. If the line is found through the OUT directory, the data is accessed in the next cycle using the set-ID fetched from the OUT directory.  ... 
doi:10.1145/291006.291053 fatcat:5mmburn4qfhuthucow5j22jfxi

Capturing dynamic memory reference behavior with adaptive cache topology

Jih-Kwon Peir, Yongjoon Lee, Windsor W. Hsu
1998 ACM SIGOPS Operating Systems Review  
In this case, a larger cache and a bigger SHT/OUT directory will enable the prefetched lines to be kept longer so that they are more likely to be referenced.  ...  Hit in Out-Of-Position Lines. If the line is found through the OUT directory, the data is accessed in the next cycle using the set-ID fetched from the OUT directory.  ... 
doi:10.1145/384265.291053 fatcat:q553xtnbmjf3fhh6b4mcrhxiee

Temporal instruction fetch streaming

Michael Ferdman, Thomas F. Wenisch, Anastasia Ailamaki, Babak Falsafi, Andreas Moshovos
2008 2008 41st IEEE/ACM International Symposium on Microarchitecture  
Rather than explore a program's control flow graph, TIFS predicts future instruction-cache misses directly, through recording and replaying recurring L1 instruction miss sequences.  ...  Then, we describe a practical mechanism to record these recurring sequences in the L2 cache and leverage them for instruction-cache prefetching.  ...  Like similar studies of repetitive streams in L1 data accesses [6] , off-chip data misses [35, 36] , and program paths [16] , we use the SEQUITUR [10] hierarchical data compression algorithm to identify  ... 
doi:10.1109/micro.2008.4771774 dblp:conf/micro/FerdmanWAFM08 fatcat:ffdk7ljp6jbi5hqj2qrtfhrljm
« Previous Showing results 1 — 15 out of 1,670 results