Lab for High Performance Computing SERC, Indian Institute of Science
Home | People | Research | Awards/Honours | Publications | Lab Resources | Gallery | Contact Info | Sponsored Research
Tech. Reports | Conferences / Journals | Theses / Project Reports

Two-level Mapping based Cache Index Selection for Packet Forwarding Engines

Proceedings of the Fifteenth International Conference on Parallel Architectures and Compilation Techniques (PACT-06)
Seattle, Washington, USA, September 16--20, 2006

Authors

  1. Kaushik Rajan, Supercomputer Education and Research Center
  2. R. Govindarajan, Department of Computer Science and Automation; Supercomputer Education and Research Center;

Abstract

Packet forwarding is a memory-intensive application requiring multiple accesses through a trie structure. The efficiency of a cache for this application critically depends on the placement function to reduce conflict misses. Traditional placement functions use a one-level mapping that naively partitions trie-nodes into cache sets. However, as a significant percentage of trie nodes are not useful, these schemes suffer from a non-uniform distribution of useful nodes to sets. This in turn results in increased conflict misses. Newer organizations such as variable associativity caches achieve flexibility in placement at the expense of increased hit-latency. This makes them unsuitable for L1 caches. We propose a novel two-level mapping framework that retains the hit-latency of one-level mapping yet incurs fewer conflict misses. This is achieved by introducing a secondlevel mapping which reorganizes the nodes in the naive initial partitions into refined partitions with near-uniform distribution of nodes. Further as this remapping is accomplished by simply adapting the index bits to a given routing table the hit-latency is not affected. We propose three new schemes which result in upto 16% reduction in the number of misses and 13% speedup in memory access time. In comparison, an XOR-based placement scheme known to perform extremely well for general purpose architectures, can obtain upto 2% speedup in memory access time.

Download

Full Text