Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Filters








346 Hits in 3.1 sec

Restore truncation for performance improvement in future DRAM systems

Xianwei Zhang, Youtao Zhang, Bruce R. Childers, Jun Yang
2016 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA)  
In this paper, we propose restore truncation (RT), a lowcost restore strategy to improve performance of DRAM modules that adopt relaxed restore timing.  ...  Future DRAM chips are likely to suffer from significant variations and degraded timings, such as taking much more time to restore cell data after read and write access.  ...  ACKNOWLEDGMENTS We thank the anonymous referees for their valuable comments and suggestions.  ... 
doi:10.1109/hpca.2016.7446093 dblp:conf/hpca/ZhangZCY16 fatcat:dckypmnbl5cwrnomkxpzuisdei

FASA-DRAM: Reducing DRAM Latency with Destructive Activation and Delayed Restoration

Haitao Du, Yuhan Qin, Song Chen, Yi Kang
2024 ACM Transactions on Architecture and Code Optimization (TACO)  
Our evaluation shows that FASA-DRAM improves the average performance by 19.9% and reduces average DRAM energy consumption by 18.1% over DDR4 DRAM for four-core workloads, with less than 3.4% extra area  ...  DRAM memory is a performance bottleneck for many applications, due to its high access latency.  ...  We show that it signiicantly improves the performance and energy eiciency of a system with DDR4 DRAM and outperforms state-of-the-art in-DRAM caching mechanisms.  ... 
doi:10.1145/3649135 fatcat:kadrxrhfnnfmzbujd27vc43iqu

RAIDR: Retention-aware intelligent DRAM refresh

Jamie Liu, Ben Jaiyen, Richard Veras, Onur Mutlu
2012 2012 39th Annual International Symposium on Computer Architecture (ISCA)  
In an 8-core system with 32 GB DRAM, RAIDR achieves a 74.6% refresh reduction, an average DRAM power reduction of 16.1%, and an average system performance improvement of 8.6% over existing systems, at  ...  Existing DRAM devices refresh all cells at a rate determined by the leakiest cell in the device. However, most DRAM cells can retain data for significantly longer.  ...  Acknowledgments We thank the anonymous reviewers and members of the SAFARI research group for their feedback.  ... 
doi:10.1109/isca.2012.6237001 dblp:conf/isca/LiuJVM12 fatcat:gizsbubpona57kuuksgkjme3ny

RAIDR

Jamie Liu, Ben Jaiyen, Richard Veras, Onur Mutlu
2012 SIGARCH Computer Architecture News  
In an 8-core system with 32 GB DRAM, RAIDR achieves a 74.6% refresh reduction, an average DRAM power reduction of 16.1%, and an average system performance improvement of 8.6% over existing systems, at  ...  Existing DRAM devices refresh all cells at a rate determined by the leakiest cell in the device. However, most DRAM cells can retain data for significantly longer.  ...  Acknowledgments We thank the anonymous reviewers and members of the SAFARI research group for their feedback.  ... 
doi:10.1145/2366231.2337161 fatcat:254j7q3mufaldpbtqv4znkihhi

Whole-system persistence

Dushyanth Narayanan, Orion Hodson
2012 SIGPLAN notices  
Runtime overheads are eliminated by using "flush on fail": transient state in processor registers and caches is flushed to NVRAM only on failure, using the residual energy from the system power supply.  ...  However, a storage back end is still required for recovery from failures. Recovery can last for minutes for a single server or hours for a whole cluster, causing heavy load on the back end.  ...  Hybrid systems With SCMs, there is also the potential for hybrid DRAM-SCM systems, with a small fast DRAM alongside a larger slower SCM.  ... 
doi:10.1145/2248487.2151018 fatcat:o3q4u3urpbhlre54y363dfixem

Whole-system persistence

Dushyanth Narayanan, Orion Hodson
2012 Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS '12  
Runtime overheads are eliminated by using "flush on fail": transient state in processor registers and caches is flushed to NVRAM only on failure, using the residual energy from the system power supply.  ...  However, a storage back end is still required for recovery from failures. Recovery can last for minutes for a single server or hours for a whole cluster, causing heavy load on the back end.  ...  Hybrid systems With SCMs, there is also the potential for hybrid DRAM-SCM systems, with a small fast DRAM alongside a larger slower SCM.  ... 
doi:10.1145/2150976.2151018 dblp:conf/asplos/NarayananH12 fatcat:odrrvunri5a77n26hehrwau4ki

Whole-system persistence

Dushyanth Narayanan, Orion Hodson
2012 SIGARCH Computer Architecture News  
Runtime overheads are eliminated by using "flush on fail": transient state in processor registers and caches is flushed to NVRAM only on failure, using the residual energy from the system power supply.  ...  However, a storage back end is still required for recovery from failures. Recovery can last for minutes for a single server or hours for a whole cluster, causing heavy load on the back end.  ...  Hybrid systems With SCMs, there is also the potential for hybrid DRAM-SCM systems, with a small fast DRAM alongside a larger slower SCM.  ... 
doi:10.1145/2189750.2151018 fatcat:bms3fufut5bejdqq7ksagsck4m

No compromises

Aleksandar Dragojević, Dushyanth Narayanan, Edmund B. Nightingale, Matthew Renzelmann, Alex Shamis, Anirudh Badam, Miguel Castro
2015 Proceedings of the 25th Symposium on Operating Systems Principles - SOSP '15  
Transactions with strong consistency and high availability simplify building and reasoning about distributed systems. However, previous implementations performed poorly.  ...  In this paper, we show that there is no need to compromise in modern data centers.  ...  We would also like to thank Richard Black for his help in performance debugging, Andy Slowey and Oleg Losinets for keeping the test cluster running, and Chiranjeeb Buragohain, Sam Chandrashekar, Arlie  ... 
doi:10.1145/2815400.2815425 dblp:conf/sosp/DragojevicNNRSB15 fatcat:y2lqlswwcnb3xfv2gdiaqjykpa

FPB: Fine-grained Power Budgeting to Improve Write Throughput of Multi-level Cell Phase Change Memory

Lei Jiang, Youtao Zhang, Bruce R. Childers, Jun Yang
2012 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture  
Our experimental results show that these techniques achieve significant improvement on write throughput and system performance.  ...  In this paper, we propose Fine-grained write Power Budgeting (FPB) for MLC PCM.  ...  Acknowledgments We thank the anonymous reviewers for their constructive suggestions, and Prof. Moinuddin K. Qureshi for sheparding the paper.  ... 
doi:10.1109/micro.2012.10 dblp:conf/micro/JiangZC012 fatcat:flp2d77pzrcmdctiuogk6gxhza

AVR

Albin Eldstål-Damlin, Pedro Trancoso, Ioannis Sourdis
2019 Proceedings of the 48th International Conference on Parallel Processing - ICPP 2019  
Thereby, it utilizes more efficiently the available off-chip bandwidth improving significantly system performance and energy efficiency.  ...  For applications that tolerate aggressive approximation in large fractions of their data, AVR reduces memory traffic by up to 70%, execution time by up to 55%, and energy costs by up to 20% introducing  ...  In the past, the performance of memory subsystems has been improved for approximation-tolerant applications.  ... 
doi:10.1145/3337821.3337824 dblp:conf/icpp/Eldstal-DamlinT19 fatcat:cflqkugcazcqtfwfprjtinkhsy

eCNN

Chao-Tsung Huang, Yu-Chun Ding, Huan-Ching Wang, Chi-Wen Weng, Kai-Ping Lin, Li-Wei Wang, Li-De Chen
2019 Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture - MICRO '52  
In this paper, we approach this goal by considering the inference flow, network model, instruction set, and processor design jointly to optimize hardware performance and image quality.  ...  However, it is difficult for conventional CNN accelerators to support ultra-high-resolution videos at the edge due to their considerable DRAM bandwidth and power consumption.  ...  We first propose a block-based truncated-pyramid inference flow which can eliminate all the DRAM bandwidth for feature maps by storing them in on-chip block buffers.  ... 
doi:10.1145/3352460.3358263 dblp:conf/micro/HuangDWWLWC19 fatcat:u3n4eq42orazrpehal6swwxu4y

Assise: Performance and Availability via NVM Colocation in a Distributed File System [article]

Thomas E. Anderson, Marco Canini, Jongyul Kim, Dejan Kostić, Youngjin Kwon, Simon Peter, Waleed Reda, Henry N. Schuh, Emmett Witchel
2020 arXiv   pre-print
To demonstrate this, we built the Assise distributed file system, based on a persistent, replicated coherence protocol for managing a set of server-colocated PMMs as a fast, crash-recoverable cache between  ...  Unlike disaggregated file systems, Assise maximizes locality for all file IO by carrying out IO on colocated PMM whenever possible and minimizes coherence overhead by maintaining consistency at IO operation  ...  RAMcloud maintains data in DRAM for performance, using SSDs for asynchronous persistence.  ... 
arXiv:1910.05106v2 fatcat:3sjpue3tqzd3haqnh4ka72fezi

NVthreads

Terry Ching-Hsiang Hsu, Helge Brügner, Indrajit Roy, Kimberly Keeton, Patrick Eugster
2017 Proceedings of the Twelfth European Conference on Computer Systems - EuroSys '17  
NVthreads' page level mechanisms result in good performance: applications that use NVthreads can be more than 2× faster than state-of-the-art systems that favor fine-grained tracking of writes.  ...  NVthreads is a drop-in replacement for the pthreads library and requires only tens of lines of program changes to leverage non-volatile memory.  ...  We also thank Haris Volos, Dhruva Chakrabarti, and Hideaki Kimura for assisting us in evaluating NVthreads. This work was supported by Hewlett Packard Labs, NSF TC-1117065, and NSF TWC-1421910. P.  ... 
doi:10.1145/3064176.3064204 dblp:conf/eurosys/HsuBRKE17 fatcat:euoxx7bsz5hcpeh5tpwuffdtzi

Write-behind logging

Joy Arulraj, Matthew Perron, Andrew Pavlo
2016 Proceedings of the VLDB Endowment  
The design of the logging and recovery components of database management systems (DBMSs) has always been influenced by the difference in the performance characteristics of volatile (DRAM) and non-volatile  ...  This paper explores the changes that are required in a DBMS to leverage the unique properties of NVM in systems that still include volatile DRAM.  ...  We then measure the amount of time for the system to restore the database to a consistent state.  ... 
doi:10.14778/3025111.3025116 fatcat:vspt7chlcjd4rjm4kwotju4n4m

A Survey of Near-Data Processing Architectures for Neural Networks

Mehdi Hassanpour, Marc Riera, Antonio González
2022 Machine Learning and Knowledge Extraction  
Finally, we discuss open challenges and future perspectives that need to be explored in order to improve and extend the adoption of NDP architectures for future computing platforms.  ...  In this paper, we present a survey of techniques for designing NDP architectures for NN.  ...  Overall, the results show that TETRIS significantly improves the performance and reduces the energy consumption over DNN accelerators with conventional, low-power DRAM memory systems such as Eyeriss as  ... 
doi:10.3390/make4010004 fatcat:5frcwe57drgihbgygiecoqqnvy
« Previous Showing results 1 — 15 out of 346 results