We propose a CMOS Analog Vector-Matrix Multiplier for Deep Neural Networks, implemented in a standard single-poly 180 nm CMOS technology. The learning weights are stored in analog floating-gate memory cells embedded in current mirrors implementing the multiplication operations. We experimentally verify the analog storage capability of designed single-poly floating-gate cells, the accuracy of the multiplying function of proposed tunable current mirrors, and the effective number of bits of the analog operation. We perform system-level simulations to show that an analog deep neural network based on the proposed vector-matrix multiplier can achieve an inference accuracy comparable to digital solutions with an energy efficiency of 26.4 TOPs/J, a layer latency close to 100μs and an intrinsically high degree of parallelism. Our proposed design has also a cost advantage, considering that it can be implemented in a standard single-poly CMOS process flow.