Lab for High Performance Computing SERC, Indian Institute of Science
Home | People | Research | Awards/Honours | Publications | Lab Resources | Gallery | Contact Info | Sponsored Research
Tech. Reports | Conferences / Journals | Theses / Project Reports

A Comprehensive Analytical Performance Model of DRAM Caches

Proceedings of the 6th ACM/SPEC International Conference on Performance Engineering, ICPE2015
Austin, TX, USA, February 01--04, 2015


  1. Nagendra Gulur, Supercomputer Education and Research Centre; Texas Instruments
  2. R. Govindarajan, Supercomputer Education and Research Centre; Department of Computer Science and Automation


Stacked DRAM promises to offer unprecedented capacity, and bandwidth to multi-core processors at moderately lower latency than off-chip DRAMs. A typical use of this abundant DRAM is as a large last level cache. Prior research works are divided on how to organize this cache and the proposed organizations fall into one of two categories: (i) as a Tags-In-DRAM organization with the cache organized as small blocks (typically 64B) and metadata (tags, valid, dirty, recency and coherence bits) stored in DRAM, and (ii) as a Tags-In-SRAM organization with the cache organized as larger blocks (typiclly 512B or larger) and metadata stored on SRAM. Tags-In-DRAM organizations tend to incur higher latency but conserve off-chip bandwidth while the Tags-In-SRAM organizations incur lower latency at some additional bandwidth. In this work, we develop a unified performance model of the DRAM-Cache that models these different organizational styles. The model is validated against detailed architecture simulations and shown to have latency estimation errors of 10.7% and 8.8% on average in 4-core and 8- core processors respectively. We also explore two insights from the model: (i) the need for achieving very high hit rates in the metadata cache/predictor (commonly employed in the Tags-In-DRAM designs) in reducing latency, and (ii) opportunities for reducing latency by load-balancing the DRAM Cache and main memory


Full Text