Poster session at the Fourteenth International Conference on
High Performance Computing (HiPC 2007)
Goa, India, December 18--21, 2007
A new breed of processors like the Cell Broadband Engine, the Imagine stream processor and the various GPU processors emphasize data-level parallelism (DLP) and thread-level parallelism (TLP) as opposed to traditional instruction-level parallelism (ILP). This allows them to achieve order-of-magnitude improvements over conventional superscalar processors for many workloads. However, it is unclear as to how much parallelism of these types exists in current programs. Most earlier studies have largely concentrated on the amount of ILP in a program, without differentiating DLP or TLP.
In this study, we investigate the extent of data-level parallelism available in programs in the MediaBench suite. By packing instructions in a SIMD fashion, we observe reductions of up to 91% (84% on average) in the number of dynamic instructions, indicating a very high degree of DLP in several applications.