ABSTRACT
Main memory in clusters may dominate total system power. The resulting energy consumption increases system operating cost and the heat produced reduces reliability. Emergent memory technology will provide servers with the ability to dynamically turn-on (online) and turn-off (offline) memory devices at runtime. This technology, coupled with slack in memory demand, offers the potential for significant energy savings in clusters of servers. Enabling power-aware memory and conserving energy in clusters are non-trivial. First, power-aware memory techniques must be scalable to thousands of devices. Second, techniques must not negatively impact the performance of parallel scientific applications. Third, techniques must be transparent to the user to be practical. We propose a Memory Management Infra-Structure for Energy Reduction (Memory MISER). Memory MISER is transparent, performance-neutral, and scalable. It consists of a prototype Linux kernel that manages memory at device granularity and a userspace daemon that monitors memory demand systemically to control devices and implement energy- and performance-constrained policies. Experiments on an 8-node cluster show our control daemon reduces memory energy up to 56.8% with <1% performance degradation for several classes of parallel scientific codes. Our daemon uses a PID controller to conservatively offline memory and aggressively online memory at runtime. For multi-user workloads where memory demand often spikes dramatically, Memory MISER can save up to 67.94% of memory energy with <1% performance degradation. Current IBM eServer systems support up to 2 terabytes of SDRAM per node and 16 processors. For a server-based cluster with 8 90-watt processors and 32 GB of SDRAM per processor, Memory MISER can save about 30% total system energy for multi-user parallel workloads.
- N. Adiga, G. Almasi, et al., "An Overview of the BlueGene/L Supercomputer," Proceedings of IEEE/ACM SC 2002, Baltimore, MD, 2003. Google ScholarDigital Library
- K. J. Astrom and B. Wittenmark, Adaptive Control: Adison-Wesley, 1995. Google ScholarDigital Library
- L. Benini and G. De Micheli, "System-level power optimization: techniques and tools," ACM TODAES, vol. 5, pp. 115--192, 1999. Google ScholarDigital Library
- K. W. Cameron, X. Feng, and R. Ge, "Performance-and Energy-Conscious Distributed DVS Scheduling for Scientific Applications on Power-aware Clusters," Proceedings of 17th High Performance Computing, Networking and Storage Conference (SC 2005), Seattle, WA, 2005. Google ScholarDigital Library
- K. W. Cameron, R. Ge, and X. Feng, "High-performance, power-aware, distributed computing for scientific applications," IEEE Computer, vol. 38, pp. 40--47, 2005. Google ScholarDigital Library
- G. Chen, K. Malkowski, and P. Raghavan, "Reducing Power with Performance Constraints for Parallel Sparse Applications," Proceedings of 1st Workshop on High-performance, power-aware computing, Denver, CO, 2005. Google ScholarDigital Library
- V. Delaluz, M. Kandemir, N. Vijaykrishnan, A. Sivasubramaniam, and M. J. Irwin, "Hardware and Software Techniques for Controlling DRAM Power Modes," IEEE Transactions On Computers, vol. 50, pp. 1154--1173, 2001. Google ScholarDigital Library
- W. Feng, M. Warren, and E. Weigle, "The Bladed Beowulf: A Cost-Effective Alternative to Traditional Beowulfs," Proceedings of IEEE International Conference on Cluster Computing (CLUSTER'02), Chicago, Illinois, 2002. Google ScholarDigital Library
- X. Feng, R. Ge, and K. W. Cameron, "Power and Energy Profiling of Scientific Applications on Distributed Systems," Proceedings of 19th International Parallel and Distributed Processing Symposium (IPDPS 05), Denver, CO, 2005. Google ScholarDigital Library
- V. Freeh and D. K. Lowenthal, "Using Multiple Energy Gears in MPI Programs on a Power-Scalable Cluster," Proceedings of 10th AMC Symposium on Principles and Practice of Parallel Programming (PPOPP), Chicago, IL, 2005. Google ScholarDigital Library
- J. Haas and P. Vogt, "Fully-buffered DIMM Technology Moves Enterprise Platforms to the Next Level," in Technology@Intel Magazine, vol. 3, 2005.Google Scholar
- C.-H. Hsu and U. Kremer, "Compiler-directed dynamic voltage scaling for memory-bound applications," Department of Computer Science Rutgers University, Piscataway August 2002.Google Scholar
- H. Huang, C. Lefurgy, T. Keller, and K. G. Shin, "Memory Traffic Reshaping for Energy-Efficient Memory," Proceedings of International Symposium on Low Power Electronics and Design (ISLPED), San Diego, CA, 2005.Google Scholar
- A. R. Lebeck, X. Fan, H. Zeng, and C. Ellis, "Power Aware Page Allocation," Proceedings of ASPLOS--IX, 2002. Google ScholarDigital Library
- C. Lefurgy, K. Rajamani, et al., "Energy Management for Commercial Servers," IEEE Computer, vol. 36, pp. 39--48, 2003. Google ScholarDigital Library
- P. Mochel, "The sysfs Filesystem," Proceedings of Annual Linux Symposium, Ottawa, Canada, 2005.Google Scholar
- Rambus, "Rambus RDRAM," 1999.Google Scholar
- M. E. Tolentino, "Flexible Operating System Structure for Dynamic Memory Management," M.S. Thesis in Computer Science: University of Washington, 2004, pp. 88.Google Scholar
- M. E. Tolentino, J. Turner, and K. W. Cameron, "An Implementation of Page Allocation Shaping for Energy Efficiency," Proceedings of 3rd Workshop on High-Performance, Power-Aware Computing, Long Beach, CA, 2007.Google Scholar
Index Terms
- Memory-miser: a performance-constrained runtime system for power-scalable clusters
Recommendations
Memory MISER: Improving Main Memory Energy Efficiency in Servers
Main memory power in volume and mid-range servers is growing as a fraction of total system power. The resulting energy consumption increases system cost and the heat produced reduces reliability. Emergent memory technology will provide systems with the ...
Entry control in network-on-chip for memory power reduction
ISLPED '08: Proceedings of the 2008 international symposium on Low Power Electronics & DesignAs high-end mobile embedded systems become data-intensive, the off-chip memory is becoming a major contributor to the total energy consumption. Especially, high-end mobile chips accommodate dedicated hardware blocks, e.g., codec and 3D graphics IP's, ...
Reading spin-torque memory with spin-torque sensors
NANOARCH '13: Proceedings of the 2013 IEEE/ACM International Symposium on Nanoscale ArchitecturesSpin-Transfer-Torque Magnetic Random Access Memory (STT-MRAM) is a promising candidate for future on-chip memory, owing to its high-density, zero-leakage and energy efficiency. In a conventional STT-MRAM cache write operations consume larger energy as ...
Comments