At their heart, proteins are much like any other polymers: flexible linear chains of amino-acid monomers drawn from a library of just 20 or so building blocks. But unlike synthetic polymers, which tend to flop around stochastically, proteins reliably fold into characteristic three-dimensional shapes. The diversity of those shapes gives rise to the complexity of the biological world.
Uncovering the relationship between amino-acid sequence and folded structure has been a grand challenge of the past half century, with connections to cell biology, chemistry, biophysics, and medicine. To date, more than 180 000 protein structures have been made available to the world in the Protein Data Bank (PDB). But even that enormous resource barely makes a dent in the tens of millions of proteins known to be encoded by genes across all living species.
Last November, as part of the Critical Assessment of Structure Prediction (CASP) project, researchers at DeepMind in London showed that their AlphaFold2 model had made astonishing progress. Given a protein’s amino-acid sequence, AlphaFold2 could often predict its structure with most atomic positions correct to within an angstrom—less than the length of a chemical bond.1 The team has now released its own database of predicted protein structures, including the complete human proteome and many nonhuman proteins whose structures, such as the one in figure 1, experimenters have yet to resolve.
To read more, click here.