Lab for High Performance Computing SERC, Indian Institute of Science
Home | People | Research | Awards/Honours | Publications | Lab Resources | Gallery | Contact Info | Sponsored Research
Tech. Reports | Conferences / Journals | Theses / Project Reports

Performance Enhancement and Evaluation of DSM-SP2: A Distributed Shared Memory on IBM SP2

M.E. (Int.) Thesis, Department of Computer Science and Automation,
Indian Institute of Science, Bangalore, India, July 1997.


  1. Lakshmi, Department of Computer Science and Automation


Multiprocessor systems, which cater to the ever-increasing demands of high performance in computing, are either distributed memory systems or shared memory systems. Distributed Shared Memory, a fairly new concept, combines the advantages of the two systems. It logically implements shared memory model on a distributed memory machine.

DSM-SP2 [Ramesh97] is a software distributed shared memory system built on the IBM SP2 machine. It uses the Lazy Release Consistency model to keep the data across the processors consistent. It supports Multiple Writers Protocol to avoid false sharing.

In this project, we have made several optimizations and enhancements to DSM-SP2. Our optimizations include selective invalidation at a barrier, giving early wakeup message to the child process and the use of yield and semaphores to avoid busy waiting. All the optimizations put together improve the performance of DSM-SP2 by almost an order of magnitude.

We have also added a new synchronization primitive, called Conditional Locks, to the existing implementation. Another addition is the support of multiple processes on a single node to overlap computation and communication. Further, we have experimented with blocking and nonblocking sends. Finally, we study the effect of various coherence protocols both on the original and on the enhanced DSM-SP2. In each the above cases, we have used three benchmarks, jacobi, tomcatv and water for performance evaluation.

We have measured the performance of DSM-SP2 in terms of the execution times of the processes on each node. The breakup of execution time into computation time, polling time, DSM overheads, lock overheads and barrier overheads has also been reported. In addition, we have studied other performance metrics such as the number of page, diff and twin requests and the time taken for to service them. We have evaluated the effect of each of the above mentioned optimizations individually on the existing implementation of DSM-SP2. We have also measured the performance improvement due to combinations of these. Finally, we present a comparative study of the performance under three different coherence policies.


Full Text