London, 30 November 2020 - In a major scientific advance, the latest version of DeepMind's AI system AlphaFold has been recognised as a solution to the 50-year-old grand challenge of protein structure prediction, often referred to as the 'protein folding problem', according to a rigorous independent assessment. This breakthrough could significantly accelerate biological research over the long term, unlocking new possibilities in disease understanding and drug discovery among other fields.

Today, results from CASP14 show that DeepMind's latest AlphaFold system achieves unparalleled levels of accuracy in structure prediction. The system is able to determine highly-accurate structures in a matter of days. CASP, the Critical Assessment of protein Structure Prediction, is a biennial community-run assessment started in 1994, and the gold standard for assessing predictive techniques. Participants must blindly predict the structure of proteins that have only recently - or in some cases not yet - been experimentally determined, and wait for their predictions to be compared to experimental data.

CASP uses the "Global Distance Test (GDT)" metric to assess accuracy, ranging from 0-100. The new AlphaFold system achieves a median score of 92.4 GDT overall across all targets. The system's average error is approximately 1.6 Angstroms - about the width of an atom. According to Professor John Moult, Co-founder and Chair of CASP, a score of around 90 GDT is informally considered to be competitive with results obtained from experimental methods.

We have been stuck on this one problem - how do proteins fold up - for nearly 50 years. To see DeepMind produce a solution for this, having worked personally on this problem for so long and after so many stops and starts wondering if we'd ever get there, is a very special moment.
Professor John Moult, Co-Founder and Chair of CASP, University of Maryland


Why protein structure prediction matters

Proteins are essential to life and their shapes are closely linked with their functions. The ability to predict protein structures accurately enables a better understanding of what they do and how they work. There are currently over 200 million proteins in the main database and only a fraction of their 3D structures have been mapped out.

A major challenge is the astronomical number of ways a protein could theoretically fold before settling into its final 3D structure. Many of the greatest challenges facing society, like developing treatments for diseases or finding enzymes that break down industrial waste, are fundamentally tied to proteins and the role they play. Determining protein shapes and functions is a major field of scientific research, primarily using experimental techniques that can take years of painstaking and laborious work per structure, and require the use of multi-million dollar specialised equipment.

DeepMind's approach to the protein folding problem

This breakthrough builds on DeepMind's first entry at CASP13 in 2018, where the initial version of AlphaFold achieved the highest level of accuracy among all participants. Now, DeepMind has developed new deep learning architectures for CASP14, drawing inspiration from the fields of biology, physics, and machine learning, as well as the work of many scientists in the protein folding field over the past half-century.

A folded protein can be thought of as a "spatial graph", where residues are the nodes and edges connect the residues in close proximity. This graph is important for understanding the physical interactions within proteins, as well as their evolutionary history. For the latest version of AlphaFold used at CASP14, DeepMind created an attention-based neural network system, trained end-to-end, that attempts to interpret the structure of this graph, while reasoning over the implicit graph that it's building. It uses evolutionarily related sequences, multiple sequence alignment (MSA), and a representation of amino acid residue pairs to refine this graph.

By iterating this process, the system develops strong predictions of the underlying physical structure of the protein. Additionally, AlphaFold can predict which parts of each predicted protein structure are reliable using an internal confidence measure.

The system was trained on publicly available data consisting of ~170,000 protein structures from the protein data bank, using a relatively modest amount of compute by modern machine learning standards - approximately 128 TPUv3-cores (roughly equivalent to ~100-200 GPUs) run over a few weeks.

Potential for real world impact

DeepMind is excited to collaborate with others to learn more about AlphaFold's potential, and the AlphaFold team is looking into how protein structure predictions could contribute to understanding of certain diseases with a few specialist groups.

There are also signs that protein structure prediction could be useful in future pandemic response efforts, as one of many tools developed by the scientific community. Earlier this year, DeepMind predicted several protein structures of the SARS-CoV-2 virus, and impressively quick work by experimentalists has now confirmed that AlphaFold achieved a high degree of accuracy on its predictions.

AlphaFold is one of DeepMind's most significant advances to date. But as with all scientific research, there's still much to be done, including figuring out how multiple proteins form complexes, how they interact with DNA, RNA, or small molecules, and how to determine the precise location of all amino acid side chains.

As with its earlier CASP13 AlphaFold system, DeepMind is planning to submit a paper detailing the workings of this system to a peer-reviewed journal in due course, and is simultaneously exploring how best to provide broader access to the system in a scalable way.

AlphaFold breaks new ground in demonstrating the stunning potential for AI as a tool to aid fundamental scientific discovery. DeepMind looks forward to collaborating with others to unlock that potential.

 

This computational work represents a stunning advance on the protein-folding problem, a 50-year old grand challenge in biology. It has occurred decades before many people in the field would have predicted. It will be exciting to see the many ways in which it will fundamentally change biological research.
Professor Venki Ramakrishnan, Nobel Laureate and President of the Royal Society

 

The ultimate vision behind DeepMind has always been to build AI and then use it to help further our knowledge about the world around us by accelerating the pace of scientific discovery. For us AlphaFold represents a first proof point for that thesis. This advance is our first major breakthrough in a long-standing grand challenge in science, which we hope will have a big real-world impact on disease understanding and drug discovery.
Demis Hassabis, PhD, Founder and CEO, DeepMind

 

This is an incredible AI-powered breakthrough in protein folding, which will help us better understand one of life's most fundamental building blocks. This huge leap forward from DeepMind has immediate practical implications, enabling researchers to tackle new and difficult problems, from future pandemic response to environmental sustainability.
Sundar Pichai, CEO, Google and Alphabet

 

Read Press Release

 


Media contact

press@deepmind.com

 

About DeepMind

DeepMind is a multidisciplinary team of scientists, engineers, machine learning experts and more, working together to research and build safe AI systems that learn how to solve problems and advance scientific discovery for all.

Best-known for developing AlphaGo, the first program to beat a world champion at the complex game of Go, DeepMind has published over 1000 research papers - including more than a dozen in Nature and Science - and achieved breakthrough results in many challenging AI domains from StarCraft II to protein folding.

DeepMind was founded in London in 2010, and joined forces with Google in 2014 to accelerate its work. Since then, its community has expanded to include teams in Alberta, Montreal, Paris, and Mountain View in California.

 
 
 
 
 

EurekAlert