Proceedings of the International Conference on Supercomputing (ICS-05)
Cambridge, Massachussets, USA, June 20--22, 2005
As network traffic continues to increase and with the requirement to process packets at line rates, high performance routers need to forward millions of packets every second. Even with an efficient lookup algorithm like the LC-trie, each packet needs upto 5 memory accesses. Earlier work shows that a single cache for the nodes of an LC-trie can reduce the number of external memory accesses.
We observe that the locality characteristics of the level-one nodes of an LC-trie are signicantly different from those of lower-level nodes. Hence, we propose a heterogeneously segmented cache architecture (HSCA) which uses separate caches for level-one and lower-level nodes each with carefully chosen sizes. We further improve the hit rate of the level-one nodes cache by introducing a weight-based replacement policy and an intelligent index bit selection scheme. To evaluate our cache scheme with realistic traces, we propose a synthetic trace generation method which emulates real traces and can generate traces with varying locality characteristics. The base HSCA scheme gives us upto 16% reduction in misses over the unified scheme. The optimizations further enhance this improvement to upto 25% for core router traces.