Here’s why deep learning might not be enough for celebrity facial recognition


Deep learning is a technology that continues to excite both industry experts and people outside the field. The idea of ​​something learning all by itself is intriguing enough to make people wonder if there’s a limit to its abilities.

Unfortunately, as with all things, Deep Learning cannot be the solution to all problems if we take a closer look.

So if you are a streaming platform, TV station or other media company trying to automate Celebrity Face Recognition in video content – I’m here to tell you there are better ways to do that.

Celebrity facial recognition application examples

First things first, why would you automate celebrity facial recognition in the first place?

Companies involved in the production and distribution of video content can derive a lot of value here.

For example, celebrity face metadata can help video companies categorize and manage all of their content more conveniently. The software, which can automate celebrity facial recognition, makes it easy to find any video that features a specific celebrity.

Only a single keyword search will remove it.

There are benefits for audiences too – celebrity facial recognition allows for a better viewing experience.

With this data, the streaming service or any other video platform can filter their content for specific celebrities or actors. This makes it easier for viewers to discover new content featuring well-known performers.

Another important feature enabled by celebrity facial recognition technology is performer data. A viewer presses pause and gets the names of everyone on the screen at that moment. Also, they can see the names of the actors, the names of the characters they play and any other useful information.

With that settled, let’s talk about how you can automate the process of celebrity face recognition.

Traditional facial recognition with deep learning

When you run into the problem of facial recognition, deep learning seems to be a pretty good fit.

Finally, neural networks have proven to be better than humans at recognizing faces in images and videos. The difference between the capabilities of machines and humans becomes even clearer on a larger scale.

A neural network does this by describing the detected faces as vectors and using the captured data as the basis for face detection in other scenes of the footage.

Let’s look at the process of face recognition using the neural network.

As I mentioned earlier, it all starts with a neural network that analyzes the video content and represents each captured face as a vector with a descriptor. The descriptor carries the data about the unique features of each face, which the network can use to recognize them later.

And that’s it.

But we need more.

Deep learning drops the ball in celebrity facial recognition

The process I just described works great for finding out who is who in an image or video.

But when we talk about practical video content analysis, in most cases we don’t analyze simple videos. For many companies, this content includes movies, sports, TV series, news programs, talk shows, and so on.

This is where deep learning stops as the best fitting solution.

The thing is, all of this type of content has many faces—too many faces, to be honest. There are extras, spectators and other participants who are not important.

The meaning of a particular face is what deep learning cannot distinguish. Deciding between the main and supporting characters in a film or TV show is beyond his abilities.

But I think we can all agree that the information is crucial. “Movies starring Chris Hemsworth” is a better defined category than “Movies with an extra #3”.

But how can you ensure that the technology understands the context of the analyzed video content to identify major and minor characters?

Celebrity facial recognition: loaded with decision factor

So, more sophisticated analysis of video content requires celebrity facial recognition, which is more complicated than a narrow approach used by deep learning: “look at that thing – remember that thing – recognize that thing”.

Finally, we strive to automate the entire process of video analysis.

So we need the software to make the decision: identify all the faces in the footage and distinguish between main and minor characters.

To achieve this, we add some technical magic to the decision factor so that celebrity facial recognition can be performed in context.

This magic involves mathematical algorithms and machine learning and does the following:

  1. Brands and ID of all characters appearing in the video;
  2. Analyzes the plot to distinguish major, minor and other characters;
  3. Marks the frame that represents each major, minor, and other character;
  4. Find every single scene the characters appear in;
  5. Chooses the most representative scene for a given character;
  6. Optionally generates the film poster with a selected celebrity and is aimed at different target groups.

And that’s it. This is where deep learning can only handle the narrow part of facial recognition and recognition.

The decision part – like identifying characters relevant to the story, choosing representative scenes, and so on – can only be performed by more complex technology.

bottom line

Celebrity facial recognition technology offers significant benefits to media companies that deal with a lot of video content.

It can make content library management easier by allowing categorization based on performers, and even give audiences a better viewing experience by allowing them to easily explore the content with their favorite celebrity.

And of course, it helps them remember the actor’s name, which they haven’t forgotten.

I don’t think we need to formulate a requirement for automating such a process: manually tagging each piece of content within the vast library will take an unfathomable amount of time.

So we turn to technology to do the heavy lifting for us. And deep learning is the first thing that springs to mind for many people.

But the thing is that DL is unable to understand what it recognizes. Therefore, it cannot fully automate the process of celebrity face recognition because it cannot tell the difference between main and secondary characters. Labeling “every face” would be inefficient.

That’s why you need a more complex cocktail of technology to handle celebrity facial recognition. Let the neural networks do their neural wiring and inject more sophisticated algorithms to delve deeper into the context of the analyzed footage and get better results.

So you get “Movies with Chris Hemsworth” instead of “Movies with Extra #3”.


Comments are closed.