Traditionally, the determination of protein structures has been a highly experimental and laborious scientific discipline. It involved a significant amount of trial-and-error laboratory work and highly complex experimental methods such as X-ray crystallography, cryo-EM or Nuclear Magnetic Resonance spectroscopy (NMR). Very often, years of work went into resolving just one protein structure.
Well… partly, but not completely. Although a quick read through the many success stories of AlphaFold could create the impression that experimental structure determination is now a thing of the past, this is not completely the case. AlphaFold represents a gigantic step forward for structural biology, but it will not completely replace the experimental side of this discipline. In most cases, AlphaFold provides reliable structure determinations based on the amino acid sequence of a protein, but analyses such as X-ray crystallography, cryo-EM or NMR will remain essential to gain insight into the actual working mechanisms and the interactions of proteins.
This means that the tool is not capable of predicting metal ions, cofactors or ligands, although they are of great importance to the folding of many proteins (including haemoglobin, the oxygen-binding protein in red blood cells).
Post-translational modifications (PTMs) of proteins, such as phosphorylation of glycosylation, can significantly impact protein structures. But again, AlphaFold deals with amino acids, nothing else – so PTMs are not considered in AlphaFold’s structure predictions.
Proteins are found in both stable, low-energy states and unstable, high-energy states. The latter states usually only exist for fractions of a second, but they can have key biological roles. AlphaFold only predicts one structure of a protein, and there is no information about the state that it corresponds to.
For amino acid sequences that show very little similarity with sequences in the databases that were used for training AlphaFold, the algorithm predicts ‘random loops’ in the protein structure.
For protein regions that are natively unfolded, the same issue of random loops occurs.
The multimer version of AlphaFold still has limitations when it comes to larger sequences or multimers with more than two protein domains. The accuracy of these predictions is highly variable and it has to be evaluated in each case if the predicted structures are useful.
Although AI-driven structure prediction is a massive leap forward in generating structures very quickly, there are still significant experimental efforts required to truly understand all the mechanistic details of a protein’s functionality. And these details are fundamental to understanding biology as well as disease and drug mechanisms.
Trust the data professionals.
< Read the first blog in this series here.
Read the next blog in this series here >