Lab for High Performance Computing SERC, Indian Institute of Science
Home | People | Research | Awards/Honours | Publications | Lab Resources | Gallery | Contact Info | Sponsored Research
Tech. Reports | Conferences / Journals | Theses / Project Reports

Granularity Study and Evaluation of Performance Metrics for Shared Memory Accesses in Distributed Shared Memory Architectures

M.E. (Reg.) Thesis, Department of Computer Science and Automation,
Indian Institute of Science, Bangalore, India, January 1999.


  1. P. S. Udaya Shankara, Department of Computer Science and Automation


A Distributed Shared Memory (DSM) system facilitates parallel program development by providing the abstraction of shared memory. Different DSM systems viz. Cache Coherent Non Uniform Memory Access (CC-NUMA), Cache Only Memory Architecture (COMA), Simple COMA and Distributed Virtual Shared Memory (DVSM) support shared data at different levels. Similarly, parallel programs also exhibit data sharing at different granularity, known as granularity of sharing. Mismatch in the granularity of sharing of an application and the granularity supported by an architecture leads to performance degradation. Therefore, it is important to identify the granularity of an application.

In this work, we classify five parallel applications based on their granularity of sharing. This characterisation study is done in two ways --- the first one is based on memory access type to shared locations and the second one is based on the access pattern. While the former identifies the type of accesses such as READ or WRITE, and the extent of sharing, i.e., by one or many processors, the latter classifies the patterns into PRODUCER-CONSUMER, MIGRATORY etc., across synchronisation points. Depending upon the characteristics of the applications, we classify them into fine grain applications or coarse grain applications.

The characterisation study gives the percentage of various types of accesses/patterns in an application. We relate this information to the performance of the application in a given architecture. We have used the hit ratio at the cache/local memory, the effective memory access time for shared data and network traffic as metrics for comparison. We have also studied the bursty nature of network traffic. Lastly, we have studied how the introduction of sequential prefetching in these architectures affects their performance in terms of the above metrics.


Full Text