How to Use AlphaFold2 as a Wet Lab Biologist (Pt 2)

Written by Keaun Amani

Published 2023-3-27

Preview

In our previous post, we discussed the significance of AlphaFold2 in protein structure prediction, its relevance in the field of biotechnology, and how it's poised to transform the way we understand protein folding. In this post, we'll dive deeper into AlphaFold2 and provide a comprehensive guide on how to use it effectively, the meaning behind each of its settings, and share some insider tips and tricks to help you create more accurate protein structure predictions with ease.

How to use AlphaFold2 on Neurosnap

Neurosnap is a cutting-edge platform that offers a comprehensive range of tools to simplify biological research. Our AlphaFold2 implementation which is based on ColabFold is a prime example, designed with a focus on user-friendliness and ease of use. In this post, we provide essential information about the input parameters and configuration options used on our platform, ensuring a smooth and streamlined experience for users.

Run your AlphaFold2 jobs on neurosnap here!

Model Inputs

Target Sequence(s)

These are the input amino acid sequences that AlphaFold2 will predict structures for. On Neurosnap all you need to do is make a FASTA file containing the amino acid sequences you want to predict the structures of, and we handle the rest. If you want to predict the structure of a complex you'll need to delimit different chains with the : character. For homo-oligomers such as homodimers, simply copy and paste the chain multiple times delimited by the : character.

Example of an input fasta with a single chain protein, a complex, and a homodimer, respectively.

>single_chain_protein
LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTNLV
EWIWGGFSVDKATLNRFFAFHFILPFTMVALAGVHLTFLHETGSNNPLGLTSDSDKIPFHPYYTIKDFLG
LLILILLLLLLALLSPDMLGDPDNHMPADPLNTPLHIKPEWYFLFAYAILRSVPNKLGGVLALFLSIVIL
GLMPFLHTSKHRSMMLRPLSQALFWTLTMDLLTLTWIGSQPVEYPYTIIGQMASILYFSIILAFLPIAGX
IENY

>two_chain_complex
LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTNLV
EWIWGGFSVDKATLNRFFAFHFILPFTMVALAGVHLTFLHETGSNNPLGLTSDSDKIPFHPYYTIKDFLG:
LLILILLLLLLALLSPDMLGDPDNHMPADPLNTPLHIKPEWYFLFAYAILRSVPNKLGGVLALFLSIVIL
GLMPFLHTSKHRS

>homodimer
LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTNLV:
LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTNLV

Custom Template

This is an optional input that when provided allows you to specify a template structure that AlphaFold2 will use to help predict the structure. Think of it as a reference structure that AlphaFold2 will use to bias its prediction to resemble the input structure you provide.

Custom MSA

AlphaFold2 uses coevolutionary information in the form of an MSA to help improve its prediction. This is an optional input and if you don't want to provide one, we can automatically generate a suitable one for you. While MSAs are technically optional, we highly discourage disabling them as AlphaFold2 is heavily dependent on MSA data.

Core Settings

Use Amber

If you want to use Amber force fields to "relax" the predicted structure, you can enable this option. Amber is used to remove artifacts and other stereochemical violations, without sacrificing accuracy. While AlphaFold2 is usually pretty good at predicting the overall backbone of a protein, it can sometimes struggle with the placement of side-chains. If the side-chain is important to you, we recommend enabling this option.

Model Type

This option allows you to choose the model type you want to utilize. If you want to predict the structure of a single chain / protein, we recommend using AlphaFold2-ptm, if you want to predict the structure of a complex use AlphaFold-multimer-v2. Note that in some cases it might be worth trying AlphaFold2-ptm on a complex as well. One such example consists of when you have a complex but only have the sequences for 2 out of the 3 chains. AlphaFold-multimer-v2 makes the assumption that all the chains have been provided whereas AlphaFold2-ptm is a bit more forgiving and in some cases assumes that not all the information has been provided and may leave space for additional chains. We recommend trying both and comparing between the results.

Advanced Settings

Template Mode

Choose whether or not you want to specify an input template. If set to none no input template will be used, if set to pdb70 then an input template from the pdb70 dataset will automatically be found and used as input. Note that specifying a custom input template will override this option. In many cases providing an input template from the pdb can definitely improve prediction quality.

MSA Mode

This option allows you to specify which MSA database to use for creating the MSA. The largest database available is MMseqs2 (UniRef+Environmental) which tends to produce diverse MSAs and better predictions. If you don't want to provide an input MSA set this option to single_sequence. Note that uploading a custom MSA will override this option.

Pair Mode

This option controls MSA pairing which essentially means sequences that form a complex with each other are concatenated together within the MSA row. Pairing tends to improve prediction accuracy for complexes, especially when orthologous genes are paired with each other.

Number Recycles

One of AlphaFold2's coolest features is its ability to build off of it's prediction by taking its outputs as inputs. The number of times it does this confers to the number of recycling steps. More recycling steps almost always result in better predictions. In fact one of the most effective methods for improving prediction quality is increasing the number of recycling steps.

The prediction of this transmembrane protein requires a bit over 12 recycling steps in order to get the correct fold. Credit: Dr. Sergey Ovchinnikov

The prediction of this transmembrane protein requires a bit over 12 recycling steps in order to get the correct fold. Credit: Dr. Sergey Ovchinnikov

Number Ensembles

By default, if the provided MSA has more than 256 sequences, AlphaFold2 will randomly sample which MSAs to use. Ensembling simply means randomly sampling which sequences to use as the input MSA n number of times where n is the Number Ensembles. A greater number of ensembles can result in a more accurate prediction, especially when the MSA is quite big. The default value is 1 and the CASP14 value used was 8, it's also noteworthy to mention that increasing the Number Ensembles will increase the amount of time AlphaFold2 needs to create a prediction.

Training Mode

Enabling this option results in the model dropout layers being activated. This basically forces the model to create more diverse predictions by "hiding" certain information internally. This is a great way to sample the structure space when you or AlphaFold2 or are unsure of a predicted structure.

Further Reading

Now that you've learned how to effectively use AlphaFold2 for protein structure prediction, it's time to dive into the next step: interpreting the results. In our next post, we'll guide you through the process of analyzing the data generated by AlphaFold2, and provide some insights into how to interpret the results accurately. Don't miss out on this valuable information - stay tuned for our next post and take your protein structure prediction skills to the next level!

Want to get started with running AlphaFold2? Register here and run your own AlphaFold2 jobs!

Accelerate your lab's
research today

Register for free — upgrade anytime.

Interested in getting a license? Contact Sales.

Sign up free