Proteins are the molecules that carry the most important functions inside the cells, like signaling, reaction acceleration, transport and storage of other molecules, etc. They are sequences of amino acids, and while the sequence is still being produced, they fold in the shape that contains the lowest energy.
The big problem is that predicting the structure of a protein from its amino acid sequence is incredibly complex. It was estimated that, on average, a protein can fold in 10¹⁴³ possible conformations, according to what is called Levinthal’s paradox.
The challenge
Every two years there is a challenge called CASP, Critical Assessment of Structure Prediction, in which many research groups compete to achieve the best accuracy in predicting the structure of a set of proteins based on its amino acid sequence.
The metric used to compare the algorithms is called GDT_TS. I will not explain what it is, but in a recent blog post [7] by Mohammed AlQurashi, he explained that the values of the scores can be roughly interpreted as:
- 20: corresponds to a random prediction
- 50: the general topology is right
- 70: the topology is accurate
- 90: the details are mostly right
From the previous edition of 2018, called CASP13, he predicted that in 4 years, we would reach a value of around 80, while he thought that a score of 90 was possible only in 10 years.
Well, 2 years after this prediction, AlphaFold2, a Deep Learning algorithm developed by DeepMind, reached a score of 92.4!
Too good to be true?
As any sane data scientist would do, I immediately thought that the result was probably overhyped, and the news just didn’t mention a caveat that would show that the results are not as good as they seem.
However, while there is usually a considerable chunk of the research community that is skeptical, in this case, I could not find any negative opinion from anyone in the field of structural biology.
For example, is it possible that the score is a result of overfitting, with some proteins predicted perfectly while others having a really bad score? As Mohammed AlQurashi shows, AlphaFold2 outperformed, with a big margin, all the other competitors in almost every protein of the challenge.
Another possible question is: was this particular edition of CASP easier than the others? The organizers showed that this edition was actually one of the hardest and, anyways, if a challenge is easy for AlphaFold2, it would have been easy also for the other groups.
Is protein folding solved?
The announcement from DeepMind and some of the CASP organizers stated that the protein folding is basically solved. Is this true? Well, it depends on how you define the term “solved”.
If we want to be strict in the definitions, we are nowhere near solving protein folding. The breakthrough is about the prediction of the protein structure, but how this protein reaches that structure through folding is still an open problem.
However, the AlphaFold2 model is not perfect, there are many corner cases in which the model is not perfectly accurate and it will take years before each of them will be covered. But, as Mohammed AlQurashi says, the problem is considered solved because it changed from being a research problem, for which you don’t know if a solution exists, to an engineering problem, for which you know that the solution exists, but we have not reached it yet, given the current resources.
It’s too early to predict what this achievement will bring, but it’s clear that, depending on the next moves from DeepMind, will greatly accelerate the progress in the field. Now that this part of the foundations is finished, we can now start to build the higher floors of the structural biology tower and ask deeper and more complex questions, that will likely further boost the impact of the field in our everyday life.
We have seen only the four most popular events of 2020, spanning a wide range of applications, from NLP to structural biology. It was hard to make a choice because there have been many other interesting breakthroughs this year. A lot of them are having a massive impact on our lives, showing the maturity of the AI field and the opportunity that it provides.
Let me know other topics that shook your 2020!