A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Filters
Restore truncation for performance improvement in future DRAM systems
2016
2016 IEEE International Symposium on High Performance Computer Architecture (HPCA)
In this paper, we propose restore truncation (RT), a lowcost restore strategy to improve performance of DRAM modules that adopt relaxed restore timing. ...
Future DRAM chips are likely to suffer from significant variations and degraded timings, such as taking much more time to restore cell data after read and write access. ...
ACKNOWLEDGMENTS We thank the anonymous referees for their valuable comments and suggestions. ...
doi:10.1109/hpca.2016.7446093
dblp:conf/hpca/ZhangZCY16
fatcat:dckypmnbl5cwrnomkxpzuisdei
FASA-DRAM: Reducing DRAM Latency with Destructive Activation and Delayed Restoration
2024
ACM Transactions on Architecture and Code Optimization (TACO)
Our evaluation shows that FASA-DRAM improves the average performance by 19.9% and reduces average DRAM energy consumption by 18.1% over DDR4 DRAM for four-core workloads, with less than 3.4% extra area ...
DRAM memory is a performance bottleneck for many applications, due to its high access latency. ...
We show that it signiicantly improves the performance and energy eiciency of a system with DDR4 DRAM and outperforms state-of-the-art in-DRAM caching mechanisms. ...
doi:10.1145/3649135
fatcat:kadrxrhfnnfmzbujd27vc43iqu
RAIDR: Retention-aware intelligent DRAM refresh
2012
2012 39th Annual International Symposium on Computer Architecture (ISCA)
In an 8-core system with 32 GB DRAM, RAIDR achieves a 74.6% refresh reduction, an average DRAM power reduction of 16.1%, and an average system performance improvement of 8.6% over existing systems, at ...
Existing DRAM devices refresh all cells at a rate determined by the leakiest cell in the device. However, most DRAM cells can retain data for significantly longer. ...
Acknowledgments We thank the anonymous reviewers and members of the SAFARI research group for their feedback. ...
doi:10.1109/isca.2012.6237001
dblp:conf/isca/LiuJVM12
fatcat:gizsbubpona57kuuksgkjme3ny
RAIDR
2012
SIGARCH Computer Architecture News
In an 8-core system with 32 GB DRAM, RAIDR achieves a 74.6% refresh reduction, an average DRAM power reduction of 16.1%, and an average system performance improvement of 8.6% over existing systems, at ...
Existing DRAM devices refresh all cells at a rate determined by the leakiest cell in the device. However, most DRAM cells can retain data for significantly longer. ...
Acknowledgments We thank the anonymous reviewers and members of the SAFARI research group for their feedback. ...
doi:10.1145/2366231.2337161
fatcat:254j7q3mufaldpbtqv4znkihhi
Whole-system persistence
2012
SIGPLAN notices
Runtime overheads are eliminated by using "flush on fail": transient state in processor registers and caches is flushed to NVRAM only on failure, using the residual energy from the system power supply. ...
However, a storage back end is still required for recovery from failures. Recovery can last for minutes for a single server or hours for a whole cluster, causing heavy load on the back end. ...
Hybrid systems With SCMs, there is also the potential for hybrid DRAM-SCM systems, with a small fast DRAM alongside a larger slower SCM. ...
doi:10.1145/2248487.2151018
fatcat:o3q4u3urpbhlre54y363dfixem
Whole-system persistence
2012
Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS '12
Runtime overheads are eliminated by using "flush on fail": transient state in processor registers and caches is flushed to NVRAM only on failure, using the residual energy from the system power supply. ...
However, a storage back end is still required for recovery from failures. Recovery can last for minutes for a single server or hours for a whole cluster, causing heavy load on the back end. ...
Hybrid systems With SCMs, there is also the potential for hybrid DRAM-SCM systems, with a small fast DRAM alongside a larger slower SCM. ...
doi:10.1145/2150976.2151018
dblp:conf/asplos/NarayananH12
fatcat:odrrvunri5a77n26hehrwau4ki
Whole-system persistence
2012
SIGARCH Computer Architecture News
Runtime overheads are eliminated by using "flush on fail": transient state in processor registers and caches is flushed to NVRAM only on failure, using the residual energy from the system power supply. ...
However, a storage back end is still required for recovery from failures. Recovery can last for minutes for a single server or hours for a whole cluster, causing heavy load on the back end. ...
Hybrid systems With SCMs, there is also the potential for hybrid DRAM-SCM systems, with a small fast DRAM alongside a larger slower SCM. ...
doi:10.1145/2189750.2151018
fatcat:bms3fufut5bejdqq7ksagsck4m
No compromises
2015
Proceedings of the 25th Symposium on Operating Systems Principles - SOSP '15
Transactions with strong consistency and high availability simplify building and reasoning about distributed systems. However, previous implementations performed poorly. ...
In this paper, we show that there is no need to compromise in modern data centers. ...
We would also like to thank Richard Black for his help in performance debugging, Andy Slowey and Oleg Losinets for keeping the test cluster running, and Chiranjeeb Buragohain, Sam Chandrashekar, Arlie ...
doi:10.1145/2815400.2815425
dblp:conf/sosp/DragojevicNNRSB15
fatcat:y2lqlswwcnb3xfv2gdiaqjykpa
FPB: Fine-grained Power Budgeting to Improve Write Throughput of Multi-level Cell Phase Change Memory
2012
2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Our experimental results show that these techniques achieve significant improvement on write throughput and system performance. ...
In this paper, we propose Fine-grained write Power Budgeting (FPB) for MLC PCM. ...
Acknowledgments We thank the anonymous reviewers for their constructive suggestions, and Prof. Moinuddin K. Qureshi for sheparding the paper. ...
doi:10.1109/micro.2012.10
dblp:conf/micro/JiangZC012
fatcat:flp2d77pzrcmdctiuogk6gxhza
Thereby, it utilizes more efficiently the available off-chip bandwidth improving significantly system performance and energy efficiency. ...
For applications that tolerate aggressive approximation in large fractions of their data, AVR reduces memory traffic by up to 70%, execution time by up to 55%, and energy costs by up to 20% introducing ...
In the past, the performance of memory subsystems has been improved for approximation-tolerant applications. ...
doi:10.1145/3337821.3337824
dblp:conf/icpp/Eldstal-DamlinT19
fatcat:cflqkugcazcqtfwfprjtinkhsy
eCNN
2019
Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture - MICRO '52
In this paper, we approach this goal by considering the inference flow, network model, instruction set, and processor design jointly to optimize hardware performance and image quality. ...
However, it is difficult for conventional CNN accelerators to support ultra-high-resolution videos at the edge due to their considerable DRAM bandwidth and power consumption. ...
We first propose a block-based truncated-pyramid inference flow which can eliminate all the DRAM bandwidth for feature maps by storing them in on-chip block buffers. ...
doi:10.1145/3352460.3358263
dblp:conf/micro/HuangDWWLWC19
fatcat:u3n4eq42orazrpehal6swwxu4y
Assise: Performance and Availability via NVM Colocation in a Distributed File System
[article]
2020
arXiv
pre-print
To demonstrate this, we built the Assise distributed file system, based on a persistent, replicated coherence protocol for managing a set of server-colocated PMMs as a fast, crash-recoverable cache between ...
Unlike disaggregated file systems, Assise maximizes locality for all file IO by carrying out IO on colocated PMM whenever possible and minimizes coherence overhead by maintaining consistency at IO operation ...
RAMcloud maintains data in DRAM for performance, using SSDs for asynchronous persistence. ...
arXiv:1910.05106v2
fatcat:3sjpue3tqzd3haqnh4ka72fezi
NVthreads' page level mechanisms result in good performance: applications that use NVthreads can be more than 2× faster than state-of-the-art systems that favor fine-grained tracking of writes. ...
NVthreads is a drop-in replacement for the pthreads library and requires only tens of lines of program changes to leverage non-volatile memory. ...
We also thank Haris Volos, Dhruva Chakrabarti, and Hideaki Kimura for assisting us in evaluating NVthreads. This work was supported by Hewlett Packard Labs, NSF TC-1117065, and NSF TWC-1421910. P. ...
doi:10.1145/3064176.3064204
dblp:conf/eurosys/HsuBRKE17
fatcat:euoxx7bsz5hcpeh5tpwuffdtzi
Write-behind logging
2016
Proceedings of the VLDB Endowment
The design of the logging and recovery components of database management systems (DBMSs) has always been influenced by the difference in the performance characteristics of volatile (DRAM) and non-volatile ...
This paper explores the changes that are required in a DBMS to leverage the unique properties of NVM in systems that still include volatile DRAM. ...
We then measure the amount of time for the system to restore the database to a consistent state. ...
doi:10.14778/3025111.3025116
fatcat:vspt7chlcjd4rjm4kwotju4n4m
A Survey of Near-Data Processing Architectures for Neural Networks
2022
Machine Learning and Knowledge Extraction
Finally, we discuss open challenges and future perspectives that need to be explored in order to improve and extend the adoption of NDP architectures for future computing platforms. ...
In this paper, we present a survey of techniques for designing NDP architectures for NN. ...
Overall, the results show that TETRIS significantly improves the performance and reduces the energy consumption over DNN accelerators with conventional, low-power DRAM memory systems such as Eyeriss as ...
doi:10.3390/make4010004
fatcat:5frcwe57drgihbgygiecoqqnvy
« Previous
Showing results 1 — 15 out of 346 results