Highly efficient new neuromorphic chip for AI on the edge

0

A team of international researchers designed, manufactured and tested the NeuRRAM chip. Credit: David Baillot/University of California San Diego

The NeuRRAM chip is the first compute-in-memory chip to demonstrate a wide range of AI applications while consuming only a small percentage of the power that other platforms consume while remaining on par accuracy.

NeuRRAM, a new chip that performs calculations directly in memory and can run a variety of AI applications, was designed and built by an international team of researchers. What sets it apart is that it does all of this at a fraction of the power used by computing platforms for general-purpose AI computing.

The NeuRRAM neuromorphic chip brings AI one step closer to running on a wide range of cloud-disconnected edge devices. This means they can perform demanding cognitive tasks anywhere, anytime, without relying on a network connection to a central server. Applications for this device can be found in every corner of the world and in every facet of our lives. They range from smart watches to VR headsets, smart earphones, smart sensors in factories to rovers for space exploration.

The NeuRRAM chip is not only twice as energy efficient as the cutting-edge “compute-in-memory” chips, an innovative class of hybrid chips that perform calculations in memory, it also delivers results that are just as accurate as traditional digital chips. Traditional AI platforms are much larger and typically limited to using large data servers running in the cloud.

NeuRRAM neuromorphic chip up close

A closeup of the NeuRRAM chip. Credit: David Baillot/University of California San Diego

In addition, the NeuRRAM chip is extremely versatile, supporting many different neural network models and architectures. This allows the chip to be used for many different applications, including image recognition and reconstruction, and speech recognition.

“The conventional wisdom is that compute-in-memory’s higher efficiency comes at the expense of versatility, but our NeuRRAM chip achieves efficiency without sacrificing versatility,” said Weier Wan, the paper’s first corresponding author and recent PhD graduate . Stanford University graduate who worked on the chip at UC San Diego, where he was co-advised by Gert Cauwenberghs in the Department of Bioengineering.

The research team, co-led by bioengineers at the University of California San Diego (UCSD), presented their findings in the Aug. 17 issue Nature.

NeuRRAM Neuromorphic chip layers

The NeuRRAM chip uses an innovative architecture that has been co-optimized for the entire stack. Credit: David Baillot/University of California San Diego

Currently, AI computing is both power hungry and computationally intensive. Most AI applications on edge devices involve moving data from the devices to the cloud, where the AI ​​processes and analyzes it. The results are then transmitted back to the device. This is necessary because most edge devices are battery powered and therefore have a limited amount of power that can be reserved for computing.

By reducing the power consumption required for AI inference at the edge, this NeuRRAM chip could lead to more resilient, intelligent, and accessible edge devices and smarter manufacturing. It could also lead to better data protection as transferring data from devices to the cloud comes with increased security risks.

On AI chips, moving data from memory to compute is a major bottleneck.

“That’s the equivalent of an eight-hour commute for a two-hour workday,” Wan said.

To solve this data transfer problem, the researchers used something called resistive random access memory. This type of non-volatile memory allows computation to be performed directly in memory rather than in separate processing units. RRAM and other emerging memory technologies used as synapse arrays for neuromorphic computing were developed in the lab of Philip Wong, Wan’s consultant at Stanford and one of the primary contributors to this work. Although computation using RRAM chips is not necessarily new, it generally results in a reduction in the accuracy of the computations performed on the chip and a lack of flexibility in the chip’s architecture.

“Compute-in-memory has been standard practice in neuromorphic engineering since its inception more than 30 years ago,” said Cauwenberghs. “What’s new with NeuRRAM is that extreme efficiency is now accompanied by great flexibility for various AI applications, with almost no loss of accuracy compared to standard general-purpose digital computing platforms.”

A carefully crafted methodology was key to working with multiple layers of “co-optimization” across the hardware and software abstraction layers, from the design of the chip to its configuration to perform various AI tasks. In addition, the team made sure to address various constraints ranging from storage device physics to circuitry and network architecture.

“This chip now gives us a platform to address these issues across the stack from devices and circuits to algorithms,” said Siddharth Joshi, an assistant professor of computer science and engineering at the University of Notre Dame who began work on the project as a Ph.D. Undergraduate and postdoctoral fellow in Cauwenbergh’s lab at UCSD.

chip performance

The researchers measured the chip’s power efficiency using a measure known as the Energy Delay Product, or EDP. EDP ​​combines both the amount of energy used for each operation and the time it takes to complete the operation. This measure enables the NeuRRAM chip to achieve 1.6 to 2.3 times lower EDP (lower is better) and 7 to 13 times higher computing density than state-of-the-art chips.

Engineers performed various AI tasks on the chip. It achieved 99% accuracy on a handwritten digit recognition task; 85.7% on an image classification task; and 84.7% on a Google speech recognition task. In addition, the chip also achieved a 70 percent reduction in image reconstruction error in an image restoration task. These results are comparable to existing digital chips that perform calculations with the same bit precision, but with dramatic power savings.

An important contribution of the paper, the researchers point out, is that all the results presented are obtained directly on the hardware. In much previous work on compute-in-memory chips, AI benchmark results were often achieved in part through software simulation.

Next steps include improving architectures and circuits and scaling the design to more advanced technology nodes. The engineers also plan to tackle other applications, such as B. Spiking of neural networks.

“We can get better at the device level, improve circuit design to implement additional features, and address diverse applications with our dynamic NeuRRAM platform,” said Rajkumar Kubendran, an assistant professor at the University of Pittsburgh who began work on the project. while a Ph.D. Student in Cauwenbergh’s research group at UCSD.

In addition, Wan is a founding member of a startup working on the production of compute-in-memory technology. “As a researcher and engineer, my ambition is to bring research innovations from labs to practical applications,” said Wan.

New architecture

The key to NeuRRAM’s power efficiency is an innovative method of capturing the output in memory. Traditional approaches use voltage as an input and measure current as the result. However, this leads to the need for more complex and power-hungry circuits. In NeuRRAM, the team developed a neuron circuit that senses voltage and performs analog-to-digital conversion in an energy-efficient manner. This voltage-mode sensing can activate all rows and all columns of an RRAM array in a single computational cycle, enabling higher parallelism.

In the NeuRRAM architecture, CMOS neuron circuits are physically interleaved with RRAM weights. It differs from traditional designs where CMOS circuitry is typically located at the periphery of RRAM weights. The neuron’s connections to the RRAM array can be configured to serve as either the input or the output of the neuron. This enables neural network inference in different data flow directions without incurring overheads in terms of area or power consumption. This in turn facilitates the reconfiguration of the architecture.

To ensure that the accuracy of AI calculations can be maintained across different neural network architectures, engineers have developed a number of techniques to co-optimize hardware algorithms. The techniques have been verified on various neural networks, including convolutional neural networks, long short-term memory, and restricted Boltzmann machines.

As a neuromorphic AI chip, NeuroRRAM performs parallel distributed processing across 48 neurosynaptic cores. To achieve high versatility and high efficiency at the same time, NeuRRAM supports data parallelism by mapping a layer in the neural network model to multiple cores to make parallel inferences on multiple data. Also, NeuRRAM provides model parallelism by mapping different layers of a model to different cores and performing inference in a pipelined fashion.

An international research team

The work is the result of an international team of researchers.

The UCSD team designed the CMOS circuits that implement the neural functions that interface with the RRAM arrays to support the synaptic functions in the chip architecture for high efficiency and versatility. Wan worked closely with the entire team and implemented the design; characterizes the chip; trained the AI ​​models; and conducted the experiments. Wan also developed a software toolchain that maps AI applications to the chip.

The RRAM synapse array and its operating conditions have been extensively characterized and optimized at Stanford University.

The RRAM array was manufactured at Tsinghua University and integrated into CMOS.

The Notre Dame team contributed to the design and architecture of the chip, as well as the subsequent design and training of the machine learning model.

The research began as part of the National Science Foundation-funded Expeditions in Computing project on Visual Cortex on Silicon at Penn State University, with continued financial support from the Office of Naval Research Science of AI program, the Semiconductor Research Corporation, and DARPA JUMP program and Western Digital Corporation.

Reference: “A Compute-in-Memory Chip Based on Resistive Random Access Memory” by Weier Wan, Rajkumar Kubendran, Clemens Schaefer, Sukru Burc Eryilmaz, Wenqiang Zhang, Dabin Wu, Stephen Deiss, Priyanka Raina, He Qian, Bin Gao, Siddharth Joshi , Huaqiang Wu, H.-S. Philip Wong and Gert Cauwenberghs, August 17, 2022, Nature.
DOI:10.1038/s41586-022-04992-8

Published freely available in NatureAugust 17, 2022.

Weier Wan, Rajkumar Kubendran, Stephen Deiss, Siddharth Joshi, Gert Cauwenberghs, University of California San Diego

Weier Wan, S Burc Eryilmaz, Priyanka Raina, HS Philip Wong, Stanford University

Clemens Schaefer, Siddharth Joshi, University of Notre Dame

Rajkumar Kubendran, University of Pittsburgh

Wenqiang Zhang, Dabin Wu, He Qian, Bin Gao, Huaqiang Wu, Tsinghua University

Corresponding authors: Wan, Gao, Joshi, Wu, Wong and Cauwenberghs

Share.

Comments are closed.