Whether you're a seasoned molecular biologist or a researcher in an adjacent field, you've likely come across AlphaFold2 - a game-changing deep learning model for predicting protein structures. In this three-part blog series, we delve into the ins and outs of AlphaFold2, from its fundamental workings to expert-level tips and tricks that can help you unlock its full potential. By the end of this series, you'll have a comprehensive understanding of this powerful tool and be well-equipped to utilize its capabilities for your own research.
Comparison between the Experimental and AlphaFold2 predicted structure of an Antifreeze Protein from Choristoneura Fumiferana (Spruce Budworm).
A brief history of protein structure prediction
Accurately solving protein structures has always been one of the most challenging tasks in biology. Structures reveal vital information regarding protein function, properties, and behavior with other molecules. The only problem is that the experiments traditionally used to solve them have always been incredibly time-consuming, potentially limited, and not to mention expensive.
Commonly used methods like x-ray crystallography have been around since 1912 but suffer from critical drawbacks that can limit their application for many researchers. Another common technique is Nuclear magnetic resonance (NMR) spectroscopy, which uses strong local magnetic fields to analyze the alignment of nuclei within an atom as opposed to the X-rays of X-ray crystallography.
Graphical representation of the SCXRD technique (Figure: Nea Möttönen) For more details on X-ray crystallography visit this page.
Pros of X-ray Crystallography
- Can yield high atomic resolution.
- Provides a two-dimensional view that indicates the three-dimensional structure of the material.
Cons of X-ray Crystallography
- Very expensive. A single protein sample typically costs several thousand dollars (and this is for ideal cases).
- Doesn't guarantee results.
- Requires specialized personnel.
- The sample must be crystallizable.
- Struggles with proteins containing trans-membrane domains.
- Large molecules can be difficult to crystallize due to their molecular weight and relatively poor solubility.
- An organized single crystal must be obtained to produce the desired diffraction.
- Non-dynamic method due to preparation of samples and crystallization. Only a static three-dimensional analysis is produced.
- X-ray Crystallography has limited applications for studies of biological samples due to the aforementioned issues.
Pros of NMR Spectroscopy
- Dynamic technique.
- Non-destructive and non-invasive.
- Three-dimensional structures in their natural state can be measured directly in solution.
- Can provide unique insights into dynamics and intramolecular interactions.
- Macromolecular three-dimensional structure resolution can be as low as sub nanometer.
Cons of NMR Spectroscopy
- Also very expensive. Refurbished NMR Spectrometers can set you back \(35,000 USD while, high-end machines like the Bruker Ascend 1.2 GHz NMR cost around \)17.8 million.
- Doesn't guarantee results.
- Requires specialized personnel.
- The complication and difficulty of interpretation of biomolecules with large molecular weight limits the application of NMR in large biomolecule analysis.
- Large amounts of pure samples are needed to achieve an acceptable signal-to-noise level.
- Highly sensitive to motion which can lead to signal distortions.
- The high-magnetic field can cause problems with other equipment in a laboratory. Therefore, extra precautions may be needed, especially if space is limited.
While both methods have their own pros and cons, they also suffer from similar drawbacks like high cost, being time-consuming, and lack of guarantees for results. These drawbacks alone make solving protein structures such a daunting task and why many researchers end up being limited by this crucial step. This is where models like AlphaFold2 come in.
What is AlphaFold2?
AlphaFold2 is a deep learning model for protein structure prediction developed by Google's DeepMind in 2021. Compared to experimental methods, AlphaFold2 is far cheaper, less time-consuming, and in many cases able to produce more accurate structures. It also doesn't come with some of the limitations that traditional methods do, like predicting the structure of trans-membrane containing proteins or even complexes.
Simplified diagram of the AlphaFold2 pipeline architecture. Don't worry too much about this right now, we're going to cover all the important stuff later. For a deeper understanding of AlphaFold2's inner-workings check out this excellent blog post by Justas Dauparas & Fabian Fuchs.
Additionally, while AlphaFold2 has its own drawbacks, one key advantage is that the results always come with confidence metrics that can help us infer whether or not the prediction is accurate. It's also perfect as a complementary method for the aforementioned techniques. For example, if you have a large protein or complex, you can partially solve the structure using a method like X-ray Crystallography and then solve or validate the problematic regions with AlphaFold2.
A small protein binder designed with AlphaFold2 for inhibiting Human PDCD1 as a means of treating certain types of cancer.
Pros of AlphaFold2
- Very fast (only takes minutes to hours for a single prediction compared to the months to years of traditional techniques).
- Pretty easy to use.
- Highly configurable.
- Only requires a protein's primary or amino acid sequence.
- Doesn't require a wet lab.
- Can sometimes be used to sample alternative conformations (more on this later).
- Pretty good at predicting intrinsically disordered regions as well as trans-membrane domains.
- Provides general purpose as well as per-residue confidence metrics.
- Can solve the structure of protein complexes/multimers.
- Complementary to experimental techniques.
Cons of AlphaFold2
- Requires expensive hardware such as high-end graphics cards with lots of VRAM (unless you use Neurosnap).
- Requires technical expertise to properly set up on a computer (unless you use Neurosnap).
- Can be annoying to maintain (you get the idea by now).
- Struggles with mutated proteins as well as some de-novo proteins.
- Struggles with proteins that have very shallow MSAs / low sequence homology.
Amazingly, most of the drawbacks presented here can be entirely circumvented by using our platform or reading the next few blog posts.
How researchers are using AlphaFold2 today
A recent AlphaFold2 success story is its contribution to helping solve a ten year molecular biology problem; predicting the structure of the nuclear pore complex. This massive complex contains more than 1,000 proteins, with hundreds present in each of your very own cells. The nuclear pore complex is responsible for regulating the flow of molecules in and out of the cell nucleus; a better understanding of its mechanisms is critical to developing new potential therapeutics.
A top-down view of the human nuclear pore complex, the largest molecular machine in human cells. Credit: Agnieszka Obarska-Kosinska
We hope you found this overview helpful in understanding the potential of this powerful tool for protein structure prediction. In our next blog post, we'll dive deeper into the different inputs and settings AlphaFold2 has, and provide some useful tips and tricks to help you get the most out of your usage. Don't miss out on this opportunity to elevate your bioinformatic skills!