Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS-03)
Nice, France, April 2003
The available Instruction Level Parallelism in Java bytecode (Java-ILP) is not readily exploitable using traditional in-order or out-of-order issue mechanisms due to dependencies involving stack operands. The sequentialization due to stack dependency can be overcome by identifying bytecode traces, which are sequences of bytecode instructions that when executed leave the operand stack in the same state as it was at the beginning of the sequence. Instructions from different bytecode traces have no stack-operand dependency and hence can be executed in parallel on multiple operand stacks. In this paper, we propose a simultaneous multitrace instruction issue (SMTI) architecture for a processor that can issue instructions from multiple bytecode traces to exploit Java-ILP. The proposed architecture can easily take advantage of nested folding to further increase the ILP. Extraction of bytecode traces and nested bytecode folding are done in software, during the method verification stage, and add little run-time overhead. We carried out our experiments in our SMTI simulation environment with SPECjvm98, Scimark benchmarks and the Linpack workload. Simultaneous multi-trace issue combined with nested folding resulted in an average ILP speedup gain of 54% over the base inorder single-issue Java processor.