MSc(Engg) Thesis, Department of Computer Science and Automation,
Indian Institute of Science, Bangalore, India, April 1999.
Traditional software Distributed Shared Memory (DSM) Systems rely on the virtual memory management mechanisms to detect accesses to shared memory locations and maintain their consistency. This is achieved through the segmentation violation signal (segv) and an associated segv handler. While the steps taken by the segv handler themselves are unavoidable, the involvement of the OS (kernel) and the associated overhead which is significant, can be avoided by careful compile time analysis and code instrumentation.
In this thesis, we propose the implementation of CAS-DSM implementation, the page fault overhead is avoided by instrumenting the application code at the source level. The overhead caused by the execution of the instrumented code is reduced through aggressive compile time optimizations. Finally, we also address the issue of reducing the communication overheads. We used SUIF, a public domain compiler tool, to implement compile time analysis, instrumentation and optimizations.
In our implementation, the CAS-DSM, we rely on the linear array index analysis for detecting shared memory accesses that could potentially raise a segv. To improve the performance of our approach, instead of introducing the consistency check code immediately before the shared access, as we do in our basic implementation, we aggregate all the inserted code, one for each shared access in a loop, and hoist them above the loop. This aggregation and hoisting of the inserted code can be extended to outer loops as well. Taking an aggressive approach, we also propose to discard some of the inserted code, using a simple heuristic. To study the effect of inter-procedural analysis and optimizations in the Splash/Splash2 parallel application benchmarks. The benchmarks were run on an IBM-SP2, a distributed memory machine, for different number of processors ranging from 1 to 8. Our method achieves a performance improvement of 10% to 15% for most of the application compared to the original CVM implementation. By incorporating the communication overhead reduction, we were able to increase the performance further.