ABSTRACT
Increasing levels of microprocessor power dissipation call for new approaches at the architectural level that save energy by better matching of on-chip resources to application requirements. Selective cache ways provides the ability to disable a subset of the ways in a set associative cache during periods of modest cache activity, while the full cache may remain operational for more cache-intensive periods. Because this approach leverages the subarray partitioning that is already present for performance reasons, only minor changes to a conventional cache are required, and therefore, full-speed cache operation can be maintained. Furthermore, the tradeoff between performance and energy is flexible, and can be dynamically tailored to meet changing application and machine environmental conditions. We show that trading off a small performance degradation for energy savings can produce a significant reduction in cache energy dissipation using this approach.
- 1.D. Albonesi. Dynamic IPC/clock rate optimization. Proceedings of the 25th International Symposium on Computer Architecture, pages 282-292, June 1998. Google ScholarDigital Library
- 2.G. Ammons, T. Ball, and J. Larus. Exploiting hardware performance counters with flow and context sensitive profiling. Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, June 1997. Google ScholarDigital Library
- 3.J. Anderson et al. Continuous profiling: Where have all the cycles gone? Proceedings of the 16th Symposium on Operating Systems Principles, October 1997. Google ScholarDigital Library
- 4.A. Argawal, J. Hennessy, and M. Horowitz. Cache performance of operating system and multiprogramming workloads. ACM Transactions on Computer Systems, 6(4):393- 431, November 1988. Google ScholarDigital Library
- 5.P. Bannon. Alpha 21364: A scalable single-chip SMP. Microprocessor Forum, October 1998.Google Scholar
- 6.N. Bellas et al. Architectural and compiler support for energy reduction in the memory hierarchy of high performance microprocessors. Proceedings of the International Symposium on Low Power Electronics and Design, pages 70-75, August 1998. Google ScholarDigital Library
- 7.W. Bowhill et al. Circuit implementation of a 300-MHz 64- bit second-generation CMOS Alpha CPLI. Digital Technical Journal, 7(1):100-118, Special Issue 1995. Google ScholarDigital Library
- 8.D. Burger and T. Austin. The SimpleScalar toolset, version 2.0. Technical Report TR-97-1342, University of Wisconsin-Madison, June 1997.Google Scholar
- 9.A. Chandrakasan, S. Sheng, and R. Brodersen. Low-power CMOS digital design. IEEE Journal of Solid-State Circuits, 27(4):473-484, April 1992.Google ScholarCross Ref
- 10.J. Dean et al. ProfileMe: Hardware support for instructionlevel profiling in out-of-order processors. Proceedings of the 30th International Symposium on Microarchitecture, pages 292-302, December 1997. Google ScholarDigital Library
- 11.P. Dinda et al. The CMU task parallel program suite. Technical Report CMU-CS-94-131, Carnegie Mellon University, March 1994.Google Scholar
- 12.D. Dobberpuhl et al. A 200MHz, 64-bit, dual-issue CMOS microprocessor. Digital Technical Journal, 4(4):35-50, Special Issue 1992.Google Scholar
- 13.J. Edmondson et al. Internal organization of the Alpha 21164, a 300MHz 64-bit quad-issue CMOS RISC microprocessor. Digital Technical Journal, 7(1): 119-135, Special Issue 1995. Google ScholarDigital Library
- 14.T. Horel and G. Lauterbach. UltraSPARC III: Designing third-generation 64-bit performance. IEEE Micro, 19(3):73- 85, May/June 1999. Google ScholarDigital Library
- 15.M. Kamble and K. Ghose. Analytical energy dissipation models for low power caches. Proceedings of the International Symposium on Low Power Electronics and Design, pages 143-148, August 1997. Google ScholarDigital Library
- 16.R. Kessler. The Alpha 21264 microprocessor. IEEE Micro, 19(2):24-36, March/April 1999. Google ScholarDigital Library
- 17.R. Kessler, E. McLellan, and D. Webb. The Alpha 21264 microprocessor architecture. International Conference on Computer Design, October 1998. Google ScholarDigital Library
- 18.J. Kin, M. Gupta, and W. Mangione-Smith. The filter cache: An energy efficient memory structure. Proceedings of the 29th International Symposium on Microarchitecture, pages 184-193, December 199'/. Google ScholarDigital Library
- 19.A. Kumar. The HP PA-8000 RISC CPU. IEEE Computer, 17(2):27-32, March 1997. Google ScholarDigital Library
- 20.A. Lebeck and D. Wood. Cache profiling and the SPEC benchmarks: A case study. IEEE Computer, 27(10):15-26, October 1994. Google ScholarDigital Library
- 21.G. Lesartre and D. Hunt. PA-8500: The continuing evolution of the PA-8000 family. Proceedings of Compcon, 1997.Google Scholar
- 22.E. McLellan. The Alpha AXP architecture and 21064 processor. IEEE Micro, 13(4):36-47, June 1993. Google ScholarDigital Library
- 23.J. Montanaro et al. A 160-MHz, 32-b, 0.5W CMOS RISC microprocessor. Digital Technical Journal, 9(1):49-62, 1997. Google ScholarDigital Library
- 24.M. Tremblay and J. O'Connor. UltraSparc l: A four-issue processor supporting multimedia. IEEE Micro, 16(2):42- 50, April 1996. Google ScholarDigital Library
- 25.S. Wilton and N. Jouppi. An enhanced access and cycle time model for on-chip caches. Technical Report 93/5, Digital Western Research Laboratory, July 1994.Google Scholar
- 26.X. Zhang et al. System support for automatic profiling and optimization. Proceedings of the 16th Symposium on Operating Systems Principles, October 1997. Google ScholarDigital Library
Index Terms
- Selective cache ways: on-demand cache resource allocation
Recommendations
A cache coherence scheme with fast selective invalidation
Special Issue: Proceedings of the 15th annual international symposium on Computer ArchitectureSoftware-assisted cache coherence enforcement schemes for large multiprocessor systems with shared global memory and interconnection network have gained increasing attention. Proposed software-assisted approaches rely on either indiscriminate ...
Reducing set-associative cache energy via way-prediction and selective direct-mapping
MICRO 34: Proceedings of the 34th annual ACM/IEEE international symposium on MicroarchitectureSet-associative caches achieve low miss rates for typical applications but result in significant energy dissipation. Set-associative caches minimize access time by probing all the data ways in parallel with the tag lookup, although the output of only ...
Selective Victim Caching: A Method to Improve the Performance of Direct-Mapped Caches
Although direct-mapped caches suffer from higher miss ratios as compared to set-associative caches, they are attractive for today's high-speed pipelined processors that require very low access times. Victim caching was proposed by Jouppi [1] as an ...
Comments