Yann LeCun Hails MSA Transformer’s ‘Huge Progress’ in Protein Contact Prediction

A transformer-based model that achieves state-of-the-art performance on unsupervised protein structure learning is making waves, with esteemed AI researcher Yann LeCun and others in the machine learning and biology communities celebrating the new study.
The development of protein biomolecules, or protein engineering, requires a holistic understanding of protein structure. As sequence variation within a protein family conveys information on the protein structure, approaches to learning protein structure have tended to separately fit models to each family of sequences. A prominent method is Neural Potts Models, which involves training a single model with shared parameters to explicitly model energy landscapes across multiple protein families.
More recently, due to the availability of large unlabelled protein databases generated from sequences of hundreds of bacterial genomes, a new approach has emerged. Protein language modelling fits large neural networks with shared parameters across millions of diverse sequences, presenting a promising unsupervised approach for distilling the fundamental features of a protein.
Although unsupervised protein language models show strong performance, they can only take a single sequence as input for inference, and so require many parameters. Potts models are superior in this aspect, as they can directly extract covariation signals from the input.
The proposed Multiple Sequence Alignments (MSA) Transformer combines the two paradigms. Introduced by researchers from UC Berkeley, Facebook AI Research and New York University, the model takes sets of aligned sequences as input, but shares parameters across many diverse sequence families.
Generally speaking, MSA Transformer models extend transformer pretraining to an MSA, which is an algorithmic solution for the alignment of related biological sequences such as proteins. Transformers are powerful sequence models as they construct a pairwise interaction map between all positions in a sequence, making them an ideal form for modelling residue-residue contacts. However, they can’t take a multiple sequence alignment as input to extract information during inference stage. To overcome this, the MSA Transformer architecture interleaves attention across the rows and columns of an alignment as in axial attention, enabling it to extract information during the inference phase to improve parameter efficiency.
(left to right) Sparsity structure of the attention, Untied row attention using different attention maps for each sequence in the MSA, single MSA Transformer block.To assess performance, the researchers trained an MSA Transformer with 100M parameters on a large dataset (4.3 TB) of 26 million MSAs, with an average of 1192 sequences per MSA.
(left to right) Top-L long-range contact precision (higher is better) MSA Transformer vs. Potts model, and ESM-1b on 14,842 proteins; Characterization of long-range contact precision performance for MSA Transformer, ESM-1b, and Potts model as a function of MSA depth.On the task of unsupervised contact prediction, the MSA Transformer model outperformed state-of-the-art transformer protein language models ESM-1b (Rives et al., 2020) and Potts models across all MSA depths by a wide margin.
Turing Award honoree Yann LeCun tweeted that the research represents “huge progress” in protein contact prediction using transformer architectures.
The paper MSA Transformer is on bioRxiv.
Footer