Proceedings of the 4th International Conference on the Quantitative
Evaluation of SysTems (QEST) 2007
Edinburgh, Scotland, September 16--19, 2007
Previous studies have shown that buffering packets in DRAM is a performance bottleneck. In order to understand the impediments in accessing the DRAM, we developed a detailed Petri net model of IP forwarding application on IXP2400 that models the different levels of the memory hierarchy. The cell based interface used to receive and transmit packets in a network processor, leads to some small size DRAM accesses. Such narrow accesses to the DRAM expose the bank access latency, reducing the bandwidth that can be realized. With real traces up to 30% of the accesses are smaller than the cell size, resulting in 7.7% reduction in DRAM bandwidth. To overcome this problem, we propose buffering these small chunks of data in the on chip scratchpad memory. This scheme also exploits greater degree of parallelism between different levels of the memory hierarchy. Using real traces from the internet, we show that the transmit rate can be improved by an average of 21% over the base scheme without the use of additional hardware. Further, the impact of different traffic patterns on the network processor resources is studied. Under real traffic conditions, we show that the data bus which connects the off-chip packet buffer to the microengines, is the obstacle in achieving higher throughput.