Frequently Asked Questiones
Import data
-
How do I upload structures?At the top of the main web page, click the upload structure and mutations menu button. This displays a form which offers three ways to upload a structure:
- from the Protein Databank(PDB)
- from the AlphaFold-EBI website
- or from your local machine
- the UniProt ID (e.g. Q969P0)
- the full AlphaFold ID (e.g. AF-Q969P0-F1)
- or the full filename (e.g. AF-Q969P0-F1-model_v4.pdb).
-
Can I upload data from sequencing experiments?Yes, you can upload files in VCF format (https://docs.gdc.cancer.gov/Data/File_Formats/VCF_Format/).
However, only human SNP data can be evaluated currently. Further restrictions are listed on screen under upload vcf. -
How are VCF uploads processed?VCF files are forwarded to the ENSEMBL Variant Effect Predictor (GRCh38 version 105). The returned missense variants are filtered to retain only ENST transcript IDs that directly cross-reference to curated UniProtKB/Swiss-Pro UniProt IDs. To ease AlphaFold integration, only records that refer to canonical UniProt sequences with length ≤ 2700 are processed further. From the UniProt ID, the AlphaFold model is automatically selected. The model and its computed missense variants are together submitted to MutationExplorer.
Mutations
-
How can I define mutations?Once you submit a structure, you are forwarded to the mutation definition page. Three options are offered for defining mutations (click on the field to open the actual sub-form):
- Define any number of mutations manually using the simple syntax explained via the info button text (also in an FAQ below).
- Upload a sequence alignment (ClustalW format)
- Provide a target sequence and select the chain in the PDB to which the target sequence belongs. You can extend the form to have three input fields for each of theseupload methods:
- Fasta file upload: The fasta file should only contain one sequence and must beassigned to one of the chains. For each chain, only one sequence can be assigned.
- Paste sequence: The sequences must have the same length as the protein chain it is assigned to.
- UniProt ID: The sequence will be retrieved directly from the UniProt website.
There is no limit to the number of simultaneous mutations which can be explored. -
What is the syntax for manually defining mutations?Each amino acid mutation must be input in the simple yet strict format X:nnnA where:
- X is the chain as given in the PDB file, followed by the ":" separator
- nnn is the residue number
- A is the one letter code of the new (standard) amino acid to which you are mutating the residue.
Separate multiple mutations with commas. Do not enter spaces.
Example: A:12G,B:123W defines two simultaneous mutations:- Chain A residue 12 will be mutated to glycine.
- Chain B residue 123 will be mutated to tryptophan.
-
How do I use an alignment to define mutations?There are two reasons for using alignments. First, an alignment can define potentially many more mutations, more efficiently, than a manual input. Second, the differences in structure and energy provide deep insights into the alignment itself. Only aligned positions can be used to define mutations. Gaps cannot be translated into mutations. Thus, one sequence in the alignment must exactly match one chain in your PDB.
It’s easy to upload an alignment. Click upload structure and mutations, upload a structure and then select to upload an alignment file in ClustalW format. -
Is it possible to mutate peptides?Yes, it is. MutationExplorer has no restrictions concerning the size of uploaded proteins. However, when interpreting calculated energies, be mindful of limitations (see FAQ ”Limitations”). Energies in flexible regions should be interpreted with particular care, andshorter peptides are more likely to be flexible. Moreover, every protein conformation is just one snapshot. The smaller the peptide, the more likely a given conformation might not well represent the effect of mutation vis-à-vis an alternate conformations.
-
Is it possible to mutate transmembrane proteins?In general, yes. However, the Rosetta energy function that we use is optimized for soluble proteins and does not consider the lipid bilayer.
-
What about nonstandard amino acids, lipids, ions?Currently, nonstandard amino acids are not supported as mutation targets. Nevertheless, nonstandard amino acids occur frequently in the PDB (along with all manner of non-amino acid molecular species). MutationExplorer will consider nonstandard source residues when performing mutations. However, we recommend removal of ions or water molecules so that variant residues can fit more easily into the structure. In general, small molecules such as lipids, drugs, ions, or water, are not included in MutationExplorer calculations.
-
How do you mutate a mutant (perform multiple rounds of mutations)?Once you have performed your first mutations, the results will be displayed along with a window where you can enter further mutations to the mutated structure.
- In the Select region, select the variant you want to mutate. The displayed protein is mutated in the next step.
- In the Mutate section, define any number of mutations. As always, you will use the X:nnnA mutation syntax explained in the FAQ above.
Explorer
-
According to which values can the structure be colored?From the Color by drop-down in the Select area (top left in the result page), you can choose the following colorings:
- absolute rosetta energy
- energy difference with parent
- absolute hydrophobicity
- hydrophobicity difference with parent
-
What is your naming convention for structures?mut_0.pdb is the original protein without mutation. Subsequent naming reflects the tree-like progress of explorations which grow from the mut_0 root. You can select any listed file to mutate it.
- mut_0_1.pdb is the first mutation of the original protein.
- mut_0_1_1.pdb would be a further mutation of the first mutant.
- mut_0_2.pdb is the second independent mutation of the original protein.
-
Why is not possible to display energy differences for mut_0.pdb?The energy difference can only be displayed for mutants. mut_0.pdb is the original protein and has no parent protein to which it could be compared to.
Viewer
-
How can I change the representation of the structures?Currently, we only offer a cartoon representation for the structures. We want to include more representations in the future.
-
I don’t see an alignment?Most likely, you have defined mutations by uploading a target sequence or an alignment that does not exactly match one PDB chain. At least one sequence must be a complete subsequence of a PDB chain. If this is not the case, no alignment is displayed. You can check the sequences at any time in the Manual Rematch Structue tab at the bottom of the viewer. There, the sequences are displayed and the longest common subsequence is highlighted in light green.
-
What do I do if the wrong structure is highlighted when hovering over the sequence in the Alignment View tab?The structures and the sequences are matched automatically in the beginning. Therefore it can occur, that the wrong sequence in the clustal file is matched to the structure. You can fix this by manually rematching the sequence in the Manual Rematch Structre tab.
- In the left column, select which sequence from which clustal file you want to match.
- In the right column, select the chain of the structure the sequence in the left column is to be matched to.
- Select Apply Matching.
You can find a more detailed description on how to manually rematch a clustal sequence to a chain within a structure on the tutorial page under Rematch clustal sequences to a structure. -
What do I do if the mutation sphere is located on a wrong amino acid?This can occur if the automatic matching of the clustal sequence is matched to the wrong structure chain. This can be fixed by rematching the sequences manually. Read the FAQ section: "What do I do if the wrong structure is highlightes when hovering over the sequence in the Alignment View tab?".
-
What are the values of the color scale?The default color scale ranges from -2 over 0 to +2. You can change the intervals for the color scale in the Change Visualization tab.
We provide three different scales for protein coloring. For the first two, white is always zero. The third color scale is to show the conservation between two amino acid sequences:- red - gap
- dark blue - a mutation between amino acid groups
- light blue - a mutation withing an amino acid group
- white - no mutation/identical
-
Can I change the interval of the color scale?You can change the intervals for the color scale on the Change visualization tab.
Other
-
Does MutationExplorer handle multichain proteins?Yes. MutationExplorer always performs calculations in global protein complex context. You must take care to match alignments to specific chains, and specify the correct chain ID when inputting manual mutations. (Presently, VCF inputs are only displayed automatically on single-chain AlphaFold models).
-
How are the energies calculated?Rosetta performs the mutation and calculates the energy of a structure as a the sum of energies computed for each residue. In the difference coloring, we see the mutant (mutated result structure) minus the WT (parent structure) energy for each residue. Be aware that those energies are indicative and should be revisited with care. For qualitative values that reflect experimental data, other, more time-intensive protocols should be used, e.g. https://pubmed.ncbi.nlm.nih.gov/36173174/
-
Why do some of my mutations exhibit high energies?MutationExplorer is a platform which must calculate and present energy differences to users in near real-time. Minimizing the protein backbone is not possible given the tight run-time constraints of the application. Thus, mutations can cause in-silico clashes which cannot be resolved without more exhaustive sampling and relaxation techniques. In particular, mutations of residues from or to proline or glycine tend to greatly challenge Rosetta’s minimizer.
-
Why do some distant residues have a change in energy?When a mutation is introduced, side-chain optimization is not limited to immediately adjacent residues. As the minimizer descends the energy landscape of the entire protein, distant side-chains can be re-oriented by the algorithm.
-
Which parameters were used in the examples?Each example includes all settings and files needed to re-run and replicate the example outputs. The three examples in the first row demonstrate upload structure and mutations. The fourth example in the second row demonstrates upload vcf.
-
Where can I find more information about the tools used on the server, i.e. MDsrv/mol*/Rosetta?MDsrv:
https://proteininformatics.informatik.uni-leipzig.de/mdsrv
mol* viewer:
https://molstar.org/
Rosetta:
https://www.rosettacommons.org/
Varient Effect Predictor:
https://www.ensembl.org/info/docs/tools/vep/