Tesla supports a vision-only approach to autonomy with powerful supercomputers – TechCrunch



Tesla CEO Elon Musk has been teasing a neural network training computer called “Dojo” since at least 2019. While Dojo itself is still in development, Tesla today unveiled a new supercomputer that will serve as a development prototype version of what Dojo will ultimately offer.

At the 2021 conference on computer vision and pattern recognition on Monday, Tesla’s AI director Andrej Karpathy presented the company’s new supercomputer, with which the car manufacturer can do away with radar and lidar sensors in self-driving cars in favor of high-quality optical cameras. During his workshop on autonomous driving, Karpathy explained that it would take a huge data set and a hugely powerful supercomputer to train the company’s neural network-based autonomous driving technology with this data in order to make a computer respond to a new environment like this react that a human can. to adjust. Hence the development of this predecessor to dojo.

According to Karpathy, Tesla’s latest generation supercomputer has 10 petabytes of “hot” NVME memory and runs at 1.6 terabytes per second. At 1.8 EFLOPS, he said it could be the fifth most powerful supercomputer in the world, but later admitted that his team has not yet run the specific benchmark necessary to be included in the TOP500 supercomputing rankings.

“That said, if you take the total number of FLOPS, it would actually be somewhere around fifth,” Karpathy told TechCrunch. “Fifth place is currently occupied by NVIDIA with their Selene cluster, which has a very comparable architecture and a similar number of GPUs (4480 vs. 5760 for us, so a little less).”

Musk has been advocating a vision-only approach to autonomy for some time, largely because Cameras are faster than radar or lidar. Starting in May, the Tesla Model Y and Model 3 in North America will be built without radar, relying on cameras and machine learning to support the advanced driver assistance system and autopilot.

Many autonomous driving companies use lidar and high definition maps, which means they need incredibly detailed maps of the places they travel, including all of the lanes and their connections, traffic lights, and more.

“The approach we are pursuing is vision-based and primarily uses neural networks that can in principle work anywhere on earth,” Karpathy said in his workshop.

Replacing a “meat computer,” or rather a human, with a silicon computer results in lower latencies (better response time), 360-degree situational awareness, and a fully attentive driver who never checks their Instagram, Karpathy said.

Karpathy shared a few scenarios of how Tesla’s supercomputer is using computer vision to correct bad driver behavior, including an emergency braking scenario where the computer’s object detection intervenes to prevent a pedestrian from colliding, and a traffic control warning that shows a yellow light in the distance and send an alert to a driver who has not slowed down yet.

Tesla vehicles have also already proven a function called “Pedal Misapplication Mitigation”, in which the car detects pedestrians on their way or even a missing route and reacts if the driver accidentally accelerates instead of braking, which may save pedestrians the vehicle or preventing the driver from accelerating into a river.

Tesla’s supercomputer collects video from eight cameras surrounding the vehicle at 36 frames per second, which provides an incredible amount of information about the area around the car, Karpathy explained.

While the vision-only approach is more scalable than collecting, creating, and maintaining high-resolution maps anywhere in the world, it is also much more challenging because the neural networks that do the object recognition and driving will be able to do it need to, and process huge amounts of data at speeds that match the depth and speed sensing capabilities of a human.

Karpathy says after years of research that he believes this can be achieved by treating the challenge as a supervised learning problem. Engineers who tested the technology found they could drive in sparsely populated areas without intervention, Karpathy said, but “definitely do a lot more fighting in very hostile environments like San Francisco.” In order for the system to work really well and reduce the need for high resolution maps and additional sensors, it needs to get much better at dealing with densely populated areas.

One of the Tesla AI team’s game changers was auto-tagging, which allows things like road hazards and other objects to be tagged automatically from millions of videos captured by vehicles with the Tesla camera. Large AI datasets often required a lot of manual labeling, which is time consuming, especially when trying to get a clean dataset that is required for a supervised learning system to work well on a neural network.

With this latest supercomputer, Tesla has accumulated 1 million videos of around 10 seconds each and labeled 6 billion objects with depth, speed and acceleration. All of this takes up a whopping 1.5 petabytes of storage space. That seems like a tremendous amount, but it will take a lot more before the company can achieve the reliability it needs from an automated driving system based entirely on vision systems, Tesla’s pursuit of more advanced AI.



Leave A Reply