M.Tech. Thesis, Supercomputer Education and Research Centre,
Indian Institute of Science, Bangalore, India, January 2002.
The OpenMP Application Programming Interface is an emerging standard for parallel programming on shared memory multiprocessors. It defines a set of compiler directives and runtime routines for parallelization and provides an incremental approach to parallel programming. In this work we propose an efficient implementation of OpenMP on a cluster of SMPs. Cluster of SMPs is a potentially powerful platform for executing parallel applications. Our implementation includes a software Distributed Shared Memory (DSM) system on the cluster, an OpenMP compiler and an OpenMP runtime system. We refer to the combined system as DOMP, Distributed OpenMP on a cluster of SMPs.
In order to implement OpenMP efficiently on software DSMs, we propose lazy weak consistency, a novel memory consistency model. Our implementation of Flush Consistent Distributed Shared Memory (FCDSM) is based on lazy weak consistency and uses POSIX threads to exploit parallelism within an SMP node. It works at a granularity of a page size and allows multiple writers to modify a page concurrently.
OpenMP programs are compiled using a modified Omni OpenMP compiler. The compiler translates the OpenMP directives to library calls in our OpenMP runtime system, which in turn calls FCDSM library routines. We have modified Coherent Virtual Machine(CVM), a public domain software DSM to compare the performance with FCDSM. Our initial results show that our FCDSM implementation is performing better than CVM for SOR application. We have also compared the performance of OpenMP NAS parallel benchmark programs on DOMP with their MPI versions.