AlphaFold 2, Open Source AI for Protein Structure Prediction Technology

0

To print this article, all you need to do is register or log in to Mondaq.com.

On July 15, a team of scientists published a
nature Article entitled “Highly Accurate Prediction of Protein Structure with AlphaFold”.1 The article describes how the neural network model developed by Google’s DeepMind can predict protein structures “with atomic accuracy, even if no similar structure is known”.2 In addition, DeepMind has now made the code for AlphaFold 2 available as open source, which enables further collaborations for an even more precise prediction of the protein structure.

A protein can have a highly complex 3-D structure through a process called protein folding, and the task of predicting the structure has been “an important open research problem for more than 50 years.”3rd Last year, DeepMind took part in the CASP14 (14th Critical Assessment of Protein Structure Prediction) research competition, won the competition and revised AlphaFold to AlphaFold 2 in December 2020. The CASP competitions are known as the “Protein Folding Olympics”.4th Happened every two years since 1994, and with the development of AlphaFold 2, some believe that the problem of protein folding is essentially solved. DeepMind has successfully improved prediction accuracy “by integrating novel neural network architectures and training techniques based on the evolutionary, physical, and geometric constraints of protein structure.”5

AlphaFold inspired other research efforts which resulted in the publication of another article on July 15, “Accurately Predicting Protein Structure and Interactions Using a Three-Lane Neural Network”.6th The article by academic researchers describes how their RoseTTAFold model predicted protein structures with an accuracy close to that of AlphaFold. The model has a three-track network in which “information on the 1D sequence level, 2D distance map level and 3D coordinate plane are successively transformed and integrated.” With this technology, “RoseTTAFold enables solutions to sophisticated X-ray crystallography and cryo-EM modeling problems, provides insight into protein function without experimentally determined structures, and quickly creates accurate models of protein-protein complexes.”

Protein misfolding could lead to various diseases and disorders, and therefore the availability of computational tools that provide insight into protein folding is important to drug discovery and development. The predictive models, along with experimental techniques, are designed to help better understand the causes of diseases and develop compounds that could effectively treat the diseases.

In terms of patent protection, London-based DeepMind filed three international PCT applications with the same title, Machine Learning for Determining Protein Structures, on September 16, 2019, claiming priorities for the same three U.S. provisional applications, which were filed in September and November 2018.

US preliminary filings:

No. 62 / 734.757, filed September 21, 2018

No. 62 / 734.773, filed September 21, 2018

No. 62 / 770,490, filed November 21, 2018

WO2020 / 058174 comprises claims for a prediction method, system and computer storage media. Claim 1 reads as follows.

1. A method performed by one or more data processing devices to determine a final predicted structure of a given protein, the given protein comprising a sequence of amino acids, wherein a predicted structure of the given protein is defined by values ​​of a plurality of structural parameters, the method comprising:

Generating a plurality of predicted structures of the given protein, wherein generating a predicted structure of the given protein comprises:

Obtaining initial values ​​of the plurality of structure parameters that define the predicted structure;

Updating the initial values ​​of the plurality of structural parameters, including at each of a plurality of update iterations:

Determining a quality rating that characterizes a quality of the predicted structure that is defined by current values ​​of the structure parameters, the quality rating being based on respective outputs of one or more ratings

neural networks each configured to process: (i) the current values ​​of the structural parameters, (ii) a representation of the amino acid sequence of the given protein, or (iii) both; and

for one or more of the multitude of structural parameters:

Determining a gradient of the quality value in relation to the current value of the structure parameter; and

Updating the current value of the structural parameter using the gradient of the quality assessment with respect to the current value of the structural parameter; and determining the predicted structure of the given protein to be defined by the current values ​​of the plurality of structural parameters after a final iteration of update of the plurality of iterations of update; and

Selecting a particular predicted structure of the given protein as the final predicted structure of the given protein.

The prediction method according to claim 1 generates several predicted structures of a given protein, carries out certain calculations and at the end selects a certain predicted structure of the given protein as the final predicted structure. The calculations include obtaining initial values ​​of structural parameters that define the predicted structure and updating the values. The update process includes the following determination process using neural networks (emphasis added):

“Determining a quality rating that characterizes a quality of the predicted structure defined by current values ​​of the structure parameters, the quality rating being based on respective outputs of one or more scoring neural networks each configured to process: (i) the current values ​​of the structural parameters, (ii) a representation of the amino acid sequence of the given protein, or (iii) both

Claim 1 therefore recites the general functions of neural networks, but not specific neural network architectures. So similar to Ed Garlepp’s discussion on unique disclosure problems with AI, the claim treats the neural network more like a “black box”, although DeepMind is believed to have been working on developing novel network architectures. This claim is a good example of the balance that patent practitioners need in formulating claims that involve a neural network.

We note that the PCT application was submitted long before DeepMind conducted more extensive studies in CASP14 to meet the challenge of modeling various unknown protein structures that were provided from May to August 2020. During the pandemic, the team worked on predicting the structure of SARS-CoV-2 Orf8, one of the coronavirus proteins. Given the grave circumstances, DeepMind shared the results and released the results as soon as they were obtained. DeepMind’s patent strategy may have shifted towards an open strategy as a result of such work, which resulted in the details of their technology being recently released, with the source code made available under an open source license.

We look forward to seeing the prosecution of this patent and general development of this technology.

Footnotes

1 Jumper, J. et al. Highly precise protein structure prediction with AlphaFold. nature https://doi.org/10.1038/s41586-021-03819-2 (2021).

2 ID., Abstract.

3rd ID.

4 DeepMind (2020). AlphaFold: A Scientific Breakthrough [Video]. Youtube. https://www.youtube.com/watch?v=gg7WjuFs8F4

5 Jumper, J. et al. Highly precise protein structure prediction with AlphaFold. nature https://doi.org/10.1038/s41586-021-03819-2 (2021).

6 M. Baek et al., science 10.1126 / science.abj8754 (2021).

The content of this article is intended to provide general guidance on the subject. Expert advice should be sought regarding your specific circumstances.


Source link

Share.

Leave A Reply