(pre-print on TechRxiv, will be published soon)
The rapid development of Artificial Intelligence (AI) algorithms has created a need for a resource-optimised hardware accelerator. Among various platforms, Coarse-Grained Reconfigurable Array (CGRA) have gained importance as on-edge accelerators. They comprise of heterogeneous Processing Element (PE) matrix, which allows for high flexibility and parallelisation of calculations. They are mainly used for speeding up Data Flow Graph (DFG) execution. We aim to provide a general purpose, highly parameterised, and flexible architecture for AI on-edge data crunching. We propose a CGRA with a vector extension which allows for dynamically adjustable precision of calculation while maintaining a desired performance-power-area optimisation. It targets 4 bits integer (INT4) and 8 bits integer (INT8) quantization for fast and efficient Neural Network (NN) processing. In this paper, we examined hardware costs required to support the vector extension functionality. We synthesised the design on the 40nm Standard-Cell technology from TSMC. The obtained results show that the proposed extension attains on average 28.2% decrease in power consumption and 21.6% decrease in area compared to a reference design of the same computation power.
Keywords: CGRA, vector extension, flexible, precision, efficient, AI, on-edge.