M.E. (Reg.) Thesis, Department of Computer Science and Automation,
Indian Institute of Science, Bangalore, India, July 2015.
The increasing use of General-Purpose computing on Graphics Processing Units (GPGPU) has attracted the attention of researchers to several research issues around current architecture with discrete Graphics Processing Units (GPU). In particular the problem of data transfer overhead and memory consistency between CPU and GPU. Newer Heterogeneous System Architecture (HSA) have been proposed to overcome the issues of the traditional architecture for GPGPU computing. However, the switch to Heterogeneous Systems have elevated challenges on caching mechanism, memory controller design, power management, and memory bandwidth management. While solutions proposed work well for architectures with discrete GPUs, they are still inapt for the challenges of HSA systems. The goal of this project is to explore HSA memory subsystem components in detail to understand challenges involved, and to come up with mechanisms to overcome them. Critical memory subsystem components like Last Level Cache (LLC), Memory Scheduling, and Dynamic Random-Access Memory (DRAM) controller need critical attention to achieve comparable sustained performance to traditional Chip Level Multiprocessor (CMP) architectures. In this work, we evaluate the memory subsystem for HSA. We propose constrained shared LLC and evaluate the performance impact in such systems. We finally propose a 2-level memory access scheduling algorithm, which reduces the effective CPU memory access latency by up to 86% in heavily loaded HSA.