Teaching a computer to think like a human


In this series, NUS News profiles the university’s Presidential Young Professors, who are at the forefront of their fields of research, transforming creative ideas into important innovations that make the world a better place.

What is easier to teach? A computer or a child? Maybe a computer because it can execute commands perfectly. Maybe a kid, because most of us understand how to talk to another human while we might not understand how to write complex code. Presidential Young Professor Professor You Yang from NUS computer science says that it depends on the task.

We are at a stage of technological development where it is easy to teach a computer basic tasks such as elementary image recognition. However, computers still have difficulty learning concepts that have higher levels of thought.

The human brain is a spectacularly complicated organ. What comes naturally to humans is difficult to teach a computer. Take language as an example. By six months, a baby can babble “ma-ma” and repeat the sounds he hears. And by the age of two to three years the child begins to use prepositions indicating the location of an object (“the cat is in the box”), to use pronouns (“you”, “I”, “she”) and can answer simple questions.

These actions require a high level of cognition. Language is made up of grammar, context, tone and the definitional meaning of individual words. In fact, the human brain is so advanced that most children can hold a simple conversation within five years of some study and observation. The same is true of a person’s ability to make sense of the images he sees in the world. So how do we teach a computer to see the world the way a human does?

This is the scope of Asst Prof. You’s work: Developing and Improving Artificial Intelligence (AI) and Machine Learning. This research area is part of NUS’ research direction to develop capabilities to realize Singapore’s vision to become a smart nation and supports the Smart Nation and Digital Economy strand of Singapore Research, Innovation and Enterprise Plan 2025.

Fascination AI

When Asst Prof. You was 15, he saw the explosive growth of Google and Facebook on the news, and the young teenager knew that computers would be the future. He later attended a lecture by Andrew Ng, Baidu’s chief scientist at the time, on the future of AI when he was studying computer science at Tsinghua University. It became apparent that AI would dominate technology over the next century and be used in everyday applications such as Google Translate and YouTube’s recommendation algorithm. AI would also form the basis for future innovations.

“Tesla needs a supercomputer to train an AI system for their self-driving cars,” Asst Prof. You said, “but the future could be even crazier — we could have individual self-flying machines.”

To achieve these major technological feats, it is essential that scientists are able to train AI models accurately and efficiently – an area where Asst Prof. You has made headlines. In 2017, his team broke the ImageNet training speed record. Then, two years later, he broke the BERT training speed record. His contributions cemented him in Forbes’ Asia 30 Under 30 list in 2021, when he was also awarded the Presidential Young Professorship at NUS.

Asst Prof. You says that training a neural network is similar to teaching a child to read by giving them books – but instead of books, we give the AI ​​data to learn. Training takes time, and often speed sacrifices accuracy of the end result. For example, if we want to teach a child to read a total of 1000 books, we can either give them 10 books a day for 100 days or 100 books a day for 10 days. However, when a child has too many books to read, we cannot be sure that the child will learn the right content. In other words, increasing “batch size” often decreases accuracy.

Asst Prof You helps solve this problem by developing an optimizer that allows computer scientists to quickly train neural networks without sacrificing performance.

Train computers faster and more accurately

So what is an optimizer? Let’s take the example of a neutral network learning how to translate from English to French. Suppose you want to translate the phrase “I like basketball” (which Asst Prof. You says he enjoys watching in his free time). You can easily translate any word into its French equivalent – “I” is “je”, “like” is “aimer” and “basketball” is just “basketball”. However, French speakers know that the phrase “Je aimer basketball” is grammatically incorrect. The correct French translation is actually “J’aime le basketball”.

Hard coding all the small and unique rules in grammar and context would take ages and probably not bring good results. So it’s much better if we could somehow teach a computer French and English. Using neural networks, scientists feed the AI ​​system with a huge database of English sentences and their correct French translations. These sentences are converted into “vectors” – which is a set of numbers that the computer can understand and work with. The computer then maps the English language vectors to the French language vectors via a mathematical function (i.e. an equation). The function is the computer’s way of translating English into French. A “decoder” then converts these numbers back into words and sentences.

Of course, these functions are rarely simple and might work perfectly in some examples, but might not in other exceptions. So we need a way to maximize the accuracy of these functions, which is measured by a “loss function” – the less loss, the better the result. But it’s hard for a computer to minimize losses. How does a computer know if the loss it is currently suffering is the smallest possible loss? How does it know that there isn’t a better function that results in a more accurate translation? This process of the AI ​​model adjusting its features to minimize the loss until it has the minimum possible is governed by an optimizer.

Computers have to adjust the things they learn when they encounter anomalies, but these adjustments shouldn’t be too small or too large. A good optimizer ensures that a neural network makes appropriately sized adjustments to the things it learns.

Asst Prof. They have developed two optimization techniques: Layer-wise Adaptive Rate Scaling (LARS) and Layer-wise Adaptive Moments Optimization for Batch Training (LAMB). Both accelerated neural network training faster by allowing the use of larger stack sizes without sacrificing performance. The training time for BERT (a natural language model) has been reduced from three days to just 76 minutes. Likewise, training times for ImageNet (used for image processing) have been reduced from 14 days to 14 minutes. Scientists from Google, Microsoft, and NVIDIA leveraged Asst Prof You’s techniques as they continued to improve training speeds for BERT and ImageNet.

‘, ‘window.fbAsyncInit = function() {‘, ‘FB.init({‘, ‘appId:’216372371876365′,’, ‘xfbml:true,’, ‘version: ‘v2.6” , ‘});’ ]; ppLoadLater.placeholderFBSDK.push(‘_ga.trackFacebook();’); ppLoadLater.placeholderFBSDK.push(‘};’); var ppFacebookSDK = [
‘(function(d, s, id) {‘,
‘var js, fjs = d.getElementsByTagName(s)[0];’, ‘if (d.getElementById(id)) return;’, ‘js = d.createElement(s); js.id = id;’, ‘js.src = “https://connect.facebook.net/en_US/sdk.js”;’, ‘fjs.parentNode.insertBefore(js, fjs);’, ‘}( document, ‘script’, ‘facebook-jssdk’));’ ]; ppLoadLater.placeholderFBSDK = ppLoadLater.placeholderFBSDK.concat(ppFacebookSDK); ppLoadLater.placeholderFBSDK.push(‘‘); ppLoadLater.placeholderFBSDK = ppLoadLater.placeholderFBSDK.join(“n”);


Comments are closed.