In our previous post, we discussed the significance of AlphaFold2 in protein structure prediction, its relevance in the field of biotechnology, and how it's poised to transform the way we understand protein folding. In this post, we'll dive deeper into AlphaFold2 and provide a comprehensive guide on how to use it effectively, the meaning behind each of its settings, and share some insider tips and tricks to help you create more accurate protein structure predictions with ease.
Neurosnap is a cutting-edge platform that offers a comprehensive range of tools to simplify biological research. Our AlphaFold2 implementation which is based on ColabFold is a prime example, designed with a focus on user-friendliness and ease of use. In this post, we provide essential information about the input parameters and configuration options used on our platform, ensuring a smooth and streamlined experience for users.
Run your AlphaFold2 jobs on neurosnap here!
These are the input amino acid sequences that AlphaFold2 will predict structures for. On Neurosnap all you need to do is make a FASTA file containing the amino acid sequences you want to predict the structures of, and we handle the rest. If you want to predict the structure of a complex you'll need to delimit different chains with the
: character. For homo-oligomers such as homodimers, simply copy and paste the chain multiple times delimited by the
Example of an input fasta with a single chain protein, a complex, and a homodimer, respectively.
This is an optional input that when provided allows you to specify a template structure that AlphaFold2 will use to help predict the structure. Think of it as a reference structure that AlphaFold2 will use to bias its prediction to resemble the input structure you provide.
AlphaFold2 uses coevolutionary information in the form of an MSA to help improve its prediction. This is an optional input and if you don't want to provide one, we can automatically generate a suitable one for you. While MSAs are technically optional, we highly discourage disabling them as AlphaFold2 is heavily dependent on MSA data.
If you want to use Amber force fields to "relax" the predicted structure, you can enable this option. Amber is used to remove artifacts and other stereochemical violations, without sacrificing accuracy. While AlphaFold2 is usually pretty good at predicting the overall backbone of a protein, it can sometimes struggle with the placement of side-chains. If the side-chain is important to you, we recommend enabling this option.
This option allows you to choose the model type you want to utilize. If you want to predict the structure of a single chain / protein, we recommend using
AlphaFold2-ptm, if you want to predict the structure of a complex use
AlphaFold-multimer-v2. Note that in some cases it might be worth trying
AlphaFold2-ptm on a complex as well. One such example consists of when you have a complex but only have the sequences for 2 out of the 3 chains.
AlphaFold-multimer-v2 makes the assumption that all the chains have been provided whereas
AlphaFold2-ptm is a bit more forgiving and in some cases assumes that not all the information has been provided and may leave space for additional chains. We recommend trying both and comparing between the results.
Choose whether or not you want to specify an input template. If set to
none no input template will be used, if set to
pdb70 then an input template from the
pdb70 dataset will automatically be found and used as input. Note that specifying a custom input template will override this option. In many cases providing an input template from the pdb can definitely improve prediction quality.
This option allows you to specify which MSA database to use for creating the MSA. The largest database available is
MMseqs2 (UniRef+Environmental) which tends to produce diverse MSAs and better predictions. If you don't want to provide an input MSA set this option to
single_sequence. Note that uploading a custom MSA will override this option.
This option controls MSA pairing which essentially means sequences that form a complex with each other are concatenated together within the MSA row. Pairing tends to improve prediction accuracy for complexes, especially when orthologous genes are paired with each other.
One of AlphaFold2's coolest features is its ability to build off of it's prediction by taking its outputs as inputs. The number of times it does this confers to the number of recycling steps. More recycling steps almost always result in better predictions. In fact one of the most effective methods for improving prediction quality is increasing the number of recycling steps.
By default, if the provided MSA has more than 256 sequences, AlphaFold2 will randomly sample which MSAs to use. Ensembling simply means randomly sampling which sequences to use as the input MSA
n number of times where n is the
Number Ensembles. A greater number of ensembles can result in a more accurate prediction, especially when the MSA is quite big. The default value is 1 and the CASP14 value used was 8, it's also noteworthy to mention that increasing the
Number Ensembles will increase the amount of time AlphaFold2 needs to create a prediction.
Enabling this option results in the model dropout layers being activated. This basically forces the model to create more diverse predictions by "hiding" certain information internally. This is a great way to sample the structure space when you or AlphaFold2 or are unsure of a predicted structure.
Now that you've learned how to effectively use AlphaFold2 for protein structure prediction, it's time to dive into the next step: interpreting the results. In our next post, we'll guide you through the process of analyzing the data generated by AlphaFold2, and provide some insights into how to interpret the results accurately. Don't miss out on this valuable information - stay tuned for our next post and take your protein structure prediction skills to the next level!
Want to get started with running AlphaFold2? Register here and run your own AlphaFold2 jobs!