DECREASE: Keep it tight, keep it cool


Efficient scaling of DNN inference on multi-core CPUs with near-cache compute

Summary— ”Deep Neural Networks (DNN) are used in a wide variety of applications and services. With the development of DNNs, the race to develop optimal hardware (both in the data center and in the edge) continues. General purpose multi-core CPUs offer unique attractive advantages for DNN inference in both data centers [60] and edge [71]. Most of the design complexity of the CPU pipeline is aimed at optimizing general-purpose single-threaded performance and is excessive for relatively simpler, but still very important, data-parallel DNN inference workloads. Efficiently addressing this inequality can enable both raw power scaling and total power / watt improvements for multi-core CPU DNN inference.

We present REDUCT, where we develop innovative solutions that bypass traditional CPU resources that degrade DNN inference performance and limit its performance. In principle, REDUCT’s “Keep it close” policy enables successive work steps to be carried out close to one another. REDUCT enables instruction delivery / decoding close to execution and instruction execution close to data. “

Here you can find technical paper.

Technical paper presented at the 48th annual ACM / IEEE 2021 international symposium on computer architecture.

Anant Nori (Intel Labs); Rahul Bera (ETH Zurich); Shankar Balachandran, Joydeep Rakshit, Om J Omer (Intel Labs); Avishaii Abuhatzera, Belliappa Kuttanna (Intel); Sreenivas Subramoney (Intel Labs)


Leave A Reply