Nowadays, General Purpose computing on Graphic Processing Unit (GPGPU) is deeply exploited in many application fields due to its high versatility and energy efficiency. Over the past decade, different solutions have been proposed to implement an embedded (soft) Graphic Processing Unit (GPU) on top of a Field Programmable Gate Array (FPGA) for GPGPU purposes, combining the performance and ease of programming of GPU architectures with the flexibility and reconfigurability of FPGA platforms. This paper describes the process of improving the hardware architecture of the soft GPU utilized in ICU4SAT project: an embedded GPU core built for FPGA platforms that is configurable, scalable, portable, and designed specifically for GPGPU purposes. Our improvements increase the compatibility of the soft GPU with the OpenCL standard through two major upgrades: 1) the addition of a local memory space in the memory hierarchy and 2) the addition of a barrier mechanism between the various threads. Each new feature is accompanied by an extension of the Instruction Set Architecture (ISA). The new features trade a relatively small increase in resource utilization for enhanced functionality without affecting the critical path.
Keywords: GPGPU, soft GPU, FPGA, SIMT, OpenCL