AI researchers publish theory to explain how deep learning actually works



Artificial intelligence researchers from Facebook Inc., Princeton University, and the Massachusetts Institute of Technology have teamed up to publish a new manuscript that they believe provides a theoretical framework that describes for the first time how deep neural networks actually work .

In a blog post, Facebook AI researcher Sho Yaida stated that DNNs are one of the most important ingredients in modern AI research. But for many people, including most AI researchers, they are also considered too complicated to understand by first principles, he said.

This is a problem because while much of the advances in AI have been made through experimentation and trial and error, it means researchers are ignorant of many of the key features of DNNs that make them so incredibly useful. If researchers were more aware of these key features, it would likely lead to some dramatic advances and the development of much more powerful AI models, Yaida said.

Yaida drew a comparison between AI now and the steam engine at the beginning of the industrial revolution. He said that although the steam engine changed manufacturing forever, it wasn’t until the following century that scientists could have developed the laws of thermodynamics and the principles of statistical mechanics that, on a theoretical level, could fully explain how and why it worked.

That lack of understanding did not prevent the steam engine from being improved, he said, but many of the improvements made were the result of trial and error. After scientists discovered the principles of the heat engine, the pace of improvement increased much faster.

“When scientists finally understood statistical mechanics, the implications went way beyond building better, more efficient engines,” Yaida wrote. “Statistical mechanics led to the realization that matter is made up of atoms, anticipated the evolution of quantum mechanics and (if you look at it holistically) even led to the transistor that powers the computer you use today.”

According to Yaida, AI is currently at a similar point, treating DNNs as a black box too complicated to understand from the first principles. As a result, AI models are refined through trial and error, much like how humans improved the steam engine.

Of course, trial and error isn’t necessarily a bad thing, Yaida said, and it can be done intelligently based on years of experience. But trial and error is just a substitute for a unified theoretical language that describes DNNs and how they actually work.

The manuscript entitled “The Principles of Deep Learning Theory: An Effective Theory Approach to Understanding Neural Networks” is an attempt to fill this knowledge gap. A collaboration between Yaida, Dan Roberts from MIT and Salesforce, and Boris Hanin from Princeton is the first real attempt to provide a theoretical framework for understanding DNNs from the First Principles.

“For AI practitioners, this understanding could significantly reduce the trial and error required to train these DNNs,” said Yaida. “For example, it could reveal the optimal hyperparameters for any given model without having to perform the time-consuming and computationally intensive experiments required today.”

The actual theory is not for the faint of heart and requires a fairly sophisticated understanding of physics. For most, it will be the ramifications that will allow AI theorists to push for a deeper and more complete understanding of neural networks, Yaida said.

“Much remains to be calculated, but this work may bring the field closer to understanding the unique properties of these models that enable them to work intelligently,” he said.

The Principles of Deep Learning Theory is now available for download on arXiv and will be published by Cambridge University Press in early 2022.

Image: Facebook

Show your support for our mission by joining our Cube Club and Cube Event Community of Experts. Join the community that includes Amazon Web Services and soon Andy Jassy, ​​CEO of, Michael Dell, founder and CEO of Dell Technologies, Pat Gelsinger, CEO of Intel and many more luminaries and experts.

join our community

On June 16, we are holding our second cloud startup showcase. Click here to attend the free and open Startup Showcase event.

“TheCUBE is part of re: Invent, you know, you really are part of the event and we are very happy that you are coming here and I know that people also appreciate the content you create.” – Andy Iassy

We really want to hear from you. Thank you for taking the time to read this post. We look forward to your visit at the event and in the CUBE Club.



Leave A Reply