When researchers don’t have the proteins they need, they can trick AI into “hallucinating” new structures

0


(The Conversation is an independent, non-profit source of news, analysis, and commentary from academic experts.)

(THE CONVERSATION) All living organisms use proteins that are made up of large numbers of complex molecules. They fulfill a wide range of functions, from using solar energy for oxygen production by plants to helping your immune system fight off pathogens to enabling your muscles to do physical activity. Many drugs are also based on proteins.

For many areas of biomedical research and drug development, however, there are no natural proteins that can serve as suitable starting points for the construction of new proteins. Researchers developing new drugs to prevent COVID-19 infection, or developing proteins that can turn genes on or off, or turn cells into computers, have had to create new proteins from scratch.

It can be difficult to get this de novo protein design process right. Protein engineers like me have been trying to find ways to make new proteins with the properties we need more efficiently and accurately.


Fortunately, a form of artificial intelligence called deep learning can provide an elegant way to create proteins that didn’t exist before – hallucinations.

Design proteins from scratch

Proteins are made up of hundreds to thousands of smaller building blocks called amino acids. These amino acids are linked together in long chains that fold into a protein. The order in which these amino acids are linked determines the unique structure and function of each protein.

The main challenge protein engineers face when designing new proteins is developing a protein structure that performs a desired function. To circumvent this problem, researchers typically create design templates based on naturally occurring proteins with a similar function. These templates provide instructions on how to create the unique folds of each protein. However, since a template has to be created for each individual fold, this strategy is time-consuming, labor-intensive, and limited in nature by the proteins available.

In recent years, various research groups, including the lab I work in, have developed a number of dedicated deep neural networks – computer programs that use multiple layers of processing to “learn” from input data in order to make predictions about a desired output .

When the desired result is a new protein, millions of parameters describing different facets of a protein are inserted into the network. What is predicted is a randomly selected sequence of amino acids mapped onto the most likely 3-D structure of that sequence.

Network predictions for a random amino acid sequence are fuzzy, meaning that the final structure of the protein is not very clear, while both naturally occurring proteins and proteins built from scratch produce much better defined protein structures.

Hallucinating new proteins

These observations suggest one way in which new proteins can be generated from scratch – by optimizing random inputs into the network until predictions reveal a well-defined structure.

The protein generation method developed by my colleagues and me is conceptually similar to computer vision methods such as Google’s DeepDream, which finds and enhances patterns in images.

These methods work by inverting networks trained to recognize human faces or other patterns in images, such as the shape of an animal or object, so that they learn to recognize those patterns where they do not exist. In DeepDream, for example, the network receives any input images that are adjusted until the network can recognize a face or another shape in the image. While the final image might not look very much like a face to a person looking at it, it would for the neural network.

The products of this technique are often referred to as hallucinations, and that’s how we call our developed proteins.

Our method begins with a random amino acid sequence being passed through a deep neural network. The resulting predictions are initially fuzzy, with unclear structures, as is to be expected for random sequences. Next, we introduce a mutation that converts one amino acid in the chain to another and retransmits that new sequence through the network. If this change gives the protein a more defined structure, we keep the amino acid and introduce another mutation into the sequence.

With each iteration of this process, the proteins approximate the actual shape they would take if made in nature. It takes thousands of repetitions to create a brand new protein.

Using this procedure, we generated 2,000 new protein sequences that were predicted to fold into well-defined structures. Of these, we have selected over 100 that were most evident in the form in order to physically reproduce them in the laboratory. Finally, we selected three of the top candidates for detailed analysis and confirmed that they came very close to the shapes predicted by our hallucinatory models.

Why Hallucinate New Proteins?

Our hallucination approach greatly simplifies the protein design pipeline. By eliminating the need for templates, researchers can focus directly on creating a protein based on its desired functions and letting the network do the work to determine the structure for them.

Our work opens up several avenues for researchers to explore. Our laboratory is currently investigating how this hallucination approach can best be used to achieve even more specificity in the function of designed proteins. Our approach can also be easily extended to design new proteins using other recently developed deep neural networks.

The potential uses of de novo proteins are enormous. With deep neural networks, researchers will be able to create even more proteins that can break down plastics to reduce pollution, identify and respond to unhealthy cells, and improve vaccines against existing and new pathogens – to name just a few.

[Like what you’ve read? Want more? Sign up for The Conversation’s daily newsletter.]

This article was republished by The Conversation under a Creative Commons license. Read the original article here: https://theconversation.com/when-researchers-dont-have-the-proteins-they-need-they-can-get-ai-to-hallucinate-new-structures-173209.


Share.

Comments are closed.