Restore truncation for performance improvement in future DRAM systems.

In this paper, we propose restore truncation (RT), a lowcost restore strategy to improve performance of DRAM modules that adopt relaxed restore timing. ... Future DRAM chips are likely to suffer from significant variations and degraded timings, such as taking much more time to restore cell data after read and write access. ... ACKNOWLEDGMENTS We thank the anonymous referees for their valuable comments and suggestions. ...

doi:10.1109/hpca.2016.7446093 dblp:conf/hpca/ZhangZCY16 fatcat:dckypmnbl5cwrnomkxpzuisdei

Our evaluation shows that FASA-DRAM improves the average performance by 19.9% and reduces average DRAM energy consumption by 18.1% over DDR4 DRAM for four-core workloads, with less than 3.4% extra area ... DRAM memory is a performance bottleneck for many applications, due to its high access latency. ... We show that it signiicantly improves the performance and energy eiciency of a system with DDR4 DRAM and outperforms state-of-the-art in-DRAM caching mechanisms. ...

doi:10.1145/3649135 fatcat:kadrxrhfnnfmzbujd27vc43iqu

In an 8-core system with 32 GB DRAM, RAIDR achieves a 74.6% refresh reduction, an average DRAM power reduction of 16.1%, and an average system performance improvement of 8.6% over existing systems, at ... Existing DRAM devices refresh all cells at a rate determined by the leakiest cell in the device. However, most DRAM cells can retain data for significantly longer. ... Acknowledgments We thank the anonymous reviewers and members of the SAFARI research group for their feedback. ...

doi:10.1109/isca.2012.6237001 dblp:conf/isca/LiuJVM12 fatcat:gizsbubpona57kuuksgkjme3ny

In an 8-core system with 32 GB DRAM, RAIDR achieves a 74.6% refresh reduction, an average DRAM power reduction of 16.1%, and an average system performance improvement of 8.6% over existing systems, at ... Existing DRAM devices refresh all cells at a rate determined by the leakiest cell in the device. However, most DRAM cells can retain data for significantly longer. ... Acknowledgments We thank the anonymous reviewers and members of the SAFARI research group for their feedback. ...

doi:10.1145/2366231.2337161 fatcat:254j7q3mufaldpbtqv4znkihhi

Runtime overheads are eliminated by using "flush on fail": transient state in processor registers and caches is flushed to NVRAM only on failure, using the residual energy from the system power supply. ... However, a storage back end is still required for recovery from failures. Recovery can last for minutes for a single server or hours for a whole cluster, causing heavy load on the back end. ... Hybrid systems With SCMs, there is also the potential for hybrid DRAM-SCM systems, with a small fast DRAM alongside a larger slower SCM. ...

doi:10.1145/2248487.2151018 fatcat:o3q4u3urpbhlre54y363dfixem

Runtime overheads are eliminated by using "flush on fail": transient state in processor registers and caches is flushed to NVRAM only on failure, using the residual energy from the system power supply. ... However, a storage back end is still required for recovery from failures. Recovery can last for minutes for a single server or hours for a whole cluster, causing heavy load on the back end. ... Hybrid systems With SCMs, there is also the potential for hybrid DRAM-SCM systems, with a small fast DRAM alongside a larger slower SCM. ...

doi:10.1145/2150976.2151018 dblp:conf/asplos/NarayananH12 fatcat:odrrvunri5a77n26hehrwau4ki

Runtime overheads are eliminated by using "flush on fail": transient state in processor registers and caches is flushed to NVRAM only on failure, using the residual energy from the system power supply. ... However, a storage back end is still required for recovery from failures. Recovery can last for minutes for a single server or hours for a whole cluster, causing heavy load on the back end. ... Hybrid systems With SCMs, there is also the potential for hybrid DRAM-SCM systems, with a small fast DRAM alongside a larger slower SCM. ...

doi:10.1145/2189750.2151018 fatcat:bms3fufut5bejdqq7ksagsck4m

Transactions with strong consistency and high availability simplify building and reasoning about distributed systems. However, previous implementations performed poorly. ... In this paper, we show that there is no need to compromise in modern data centers. ... We would also like to thank Richard Black for his help in performance debugging, Andy Slowey and Oleg Losinets for keeping the test cluster running, and Chiranjeeb Buragohain, Sam Chandrashekar, Arlie ...

doi:10.1145/2815400.2815425 dblp:conf/sosp/DragojevicNNRSB15 fatcat:y2lqlswwcnb3xfv2gdiaqjykpa

Our experimental results show that these techniques achieve significant improvement on write throughput and system performance. ... In this paper, we propose Fine-grained write Power Budgeting (FPB) for MLC PCM. ... Acknowledgments We thank the anonymous reviewers for their constructive suggestions, and Prof. Moinuddin K. Qureshi for sheparding the paper. ...

doi:10.1109/micro.2012.10 dblp:conf/micro/JiangZC012 fatcat:flp2d77pzrcmdctiuogk6gxhza

Thereby, it utilizes more efficiently the available off-chip bandwidth improving significantly system performance and energy efficiency. ... For applications that tolerate aggressive approximation in large fractions of their data, AVR reduces memory traffic by up to 70%, execution time by up to 55%, and energy costs by up to 20% introducing ... In the past, the performance of memory subsystems has been improved for approximation-tolerant applications. ...

doi:10.1145/3337821.3337824 dblp:conf/icpp/Eldstal-DamlinT19 fatcat:cflqkugcazcqtfwfprjtinkhsy

In this paper, we approach this goal by considering the inference flow, network model, instruction set, and processor design jointly to optimize hardware performance and image quality. ... However, it is difficult for conventional CNN accelerators to support ultra-high-resolution videos at the edge due to their considerable DRAM bandwidth and power consumption. ... We first propose a block-based truncated-pyramid inference flow which can eliminate all the DRAM bandwidth for feature maps by storing them in on-chip block buffers. ...

doi:10.1145/3352460.3358263 dblp:conf/micro/HuangDWWLWC19 fatcat:u3n4eq42orazrpehal6swwxu4y

Multiple Versions

To demonstrate this, we built the Assise distributed file system, based on a persistent, replicated coherence protocol for managing a set of server-colocated PMMs as a fast, crash-recoverable cache between ... Unlike disaggregated file systems, Assise maximizes locality for all file IO by carrying out IO on colocated PMM whenever possible and minimizes coherence overhead by maintaining consistency at IO operation ... RAMcloud maintains data in DRAM for performance, using SSDs for asynchronous persistence. ...

arXiv:1910.05106v2 fatcat:3sjpue3tqzd3haqnh4ka72fezi

Multiple Versions

NVthreads' page level mechanisms result in good performance: applications that use NVthreads can be more than 2× faster than state-of-the-art systems that favor fine-grained tracking of writes. ... NVthreads is a drop-in replacement for the pthreads library and requires only tens of lines of program changes to leverage non-volatile memory. ... We also thank Haris Volos, Dhruva Chakrabarti, and Hideaki Kimura for assisting us in evaluating NVthreads. This work was supported by Hewlett Packard Labs, NSF TC-1117065, and NSF TWC-1421910. P. ...

doi:10.1145/3064176.3064204 dblp:conf/eurosys/HsuBRKE17 fatcat:euoxx7bsz5hcpeh5tpwuffdtzi

The design of the logging and recovery components of database management systems (DBMSs) has always been influenced by the difference in the performance characteristics of volatile (DRAM) and non-volatile ... This paper explores the changes that are required in a DBMS to leverage the unique properties of NVM in systems that still include volatile DRAM. ... We then measure the amount of time for the system to restore the database to a consistent state. ...

doi:10.14778/3025111.3025116 fatcat:vspt7chlcjd4rjm4kwotju4n4m

Finally, we discuss open challenges and future perspectives that need to be explored in order to improve and extend the adoption of NDP architectures for future computing platforms. ... In this paper, we present a survey of techniques for designing NDP architectures for NN. ... Overall, the results show that TETRIS significantly improves the performance and reduces the energy consumption over DNN accelerators with conventional, low-power DRAM memory systems such as Eyeriss as ...

doi:10.3390/make4010004 fatcat:5frcwe57drgihbgygiecoqqnvy

DOAJ

Restore truncation for performance improvement in future DRAM systems

Preserved Fulltext

FASA-DRAM: Reducing DRAM Latency with Destructive Activation and Delayed Restoration

Preserved Fulltext

RAIDR: Retention-aware intelligent DRAM refresh

Preserved Fulltext

RAIDR

Preserved Fulltext

Whole-system persistence

Preserved Fulltext

Whole-system persistence

Preserved Fulltext

Whole-system persistence

Preserved Fulltext

No compromises

Preserved Fulltext

FPB: Fine-grained Power Budgeting to Improve Write Throughput of Multi-level Cell Phase Change Memory

Preserved Fulltext

AVR

Preserved Fulltext

eCNN

Preserved Fulltext

Assise: Performance and Availability via NVM Colocation in a Distributed File System [article]

Preserved Fulltext

Other Versions

NVthreads

Preserved Fulltext

Write-behind logging

Preserved Fulltext

A Survey of Near-Data Processing Architectures for Neural Networks

Preserved Fulltext