MSc(Engg) Thesis, Supercomputer Education and Research Centre,
Indian Institute of Science, Bangalore, India, July 2006.
In recent years there has been an exponential growth in Internet traffic resulting in increased network bandwidth requirements which, in turn, has led to stringent processing requirements on network layer devices like routers. Present backbone routers on OC 48 links (2.5Gbps) have to process four million minimum-sized packets per second. Further, the functionality supported in the network devices is also on the increase leading to programmable processors, such as Intel's IXP, Motorola's C5 and IBM's NP. These processors support multiple processors and multiple threads to exploit packet-level-parallelism inherent in network workloads.
This thesis studies the performance of network processors. We develop a Petri Net model for a commercial network processors (Intel IXP 2400,2850) for three different applications viz., IPv4 forwarding, Network Address Translation and IP security protocols. A salient feature of the Petri net model is its ability to model the application, architecture and their interaction in great detail. The model is validated using the Intel proprietary tool (SDK 3.51 for IXP architecture) over a range of configurations. Our Performance evaluation results indicate that the IXP processor is able to support a throughput of 2.5 Gbps for all modeled applications. Packet buffer memory (DRAM) is the bottleneck resource in a network processor and even multithreading is ineffective beyond a total of 16 threads in case of header processing applications and beyond 32 threads for payload processing applications. Since DRAM is the bottleneck resource we explore the benefits of increasing the DRAM banks and other software schemes like offloading the packet header to SRAM.
The second part of the thesis studies the impact of parallel processing in network processor on packet reordering and retransmission. Our results indicate that the concurrent processing of packets in a network processor and buffer allocation schemes in TFIFO leads to a significant packet reordering, (61%), on a 10-hop network (with packet sizes of 64 B) which in turn leads to a 76% retransmission under the TCP fast-restransmission algorithm. We explore different transmit buffer allocation schemes namely, contiguous, strided, local, and global for transmit buffer which reduces the packet retransmission to 24%. Our performance results also indicate that limiting the number of microengines can reduce the extent of packet reordering while providing the same throughput. We propose an alternative scheme, Packetsort, which guarantees complete packet ordering while achieving a throughput of 2.5 Gbps. Further, we observe that Packetsort outperforms, by up to 35%, the in-built schemes in the IXP processor namely, Inter Thread Signaling (ITS) and Asynchronous Insert and Synchronous Remove (AISR).
The final part of this thesis investigates the performance of the network processor in a bursty traffic scenario. We model bursty traffic using a Pareto distribution. We consider a parallel and pipelined buffering schemes and their impact on packet drop under bursty traffic. Our results indicate that the pipelined buffering scheme outperforms the parallel scheme.