Documentation and tutorial for MutationExplorer
General philosophy and key elements of the MutationExplorer
MutationExplorer - exploring the effect of mutations on the structureThe MutationExplorer maps variants onto the protein 3D structure which allows to interactively explore the effects of mutations with respect to stability and function. Often wwPDB structures contains mutations and can not intuitively be mapped to its native sequence, sequences of structures changed to close or remote homologs or structures designed interactive while inspecting the structure. With the MutationExplorer this is now easily possible and even visualises the effect of the variant on the protein structures stability.
Energy minimization - the key to high quality
The minimization is crucial for the quality of the outcome. The better the structure is energetically minimized, the more reliable the results will be.AlphaFold and your own models
We continuously pre-minimize structures from the PDB using the commandline described below. If you specify a PDB ID that is contained in our database, the pre-minimized structure is used.
Otherwise we offer two modes of side-chain minimization, a short and a long one. The short one should take a few minutes, depending on the size of the protein. The long one may take some hours for large proteins. It is strongly recommended to use the long minimization, despite the significant waiting period.
These are not pre-minimized and you should follow the steps below. The AlphaFold models should be minimized, but the minimization has to be performed with the same energy functions than we use in the MutationExplorer, namely Rosetta.Do it yourself!
The minimizations we offer as 'long' and 'short' are both only side-chain optimizations. However, we recommend you to perform the following minimization. It is optimizing also the backbone in a limited way (due to '-relax:constrain_relax_to_start_coords'). A free backbone minimization might be leading to problems for e.g. membrane proteins.
PATH_TO_ROSETTA/relax.static.linuxgccrelease -relax:fast true -relax:cartesian true -score:weights ref2015_cart -use_input_sc -optimization::default_max_cycles 200 -linmem_ig 10 -relax:constrain_relax_to_start_coords -ex1 -ex2 -nstruct 20 -in:file:s INPUT.pdb -out:pdb -out:prefix OUTDIR/
This commandline will create 20 models, from which you should select the one with the lowest energy. Each PDB contains a line starting with 'pose'. In this line you want the last value, which is the total energy.
Database of pre-minimized structures from the Protein Databank (PDB)
Since the minimization is essential for correctly highlighting the energetic impact of mutations, but takes no small amount of time to perform, we calculated a database of pre-minimized structures from the PDB. Currently our database contains 45,000 models. You will be notified if your PDB ID was found in our database or has to be calculated.
RaSP - bringing exploration to full bloom
RaSP is a new deep-learning-based tool that rapidly estimates protein stability changes. RaSP predictions strongly correlate to scores from Rosetta calculations which demand longer compute times.
With RaSP, MutationExplorer presents the user with a quick initial estimation of a mutation’s (de)stabilizing effect, without having to wait for the longer full minimization process.
RaSP will calculate all mutations for all amino acids in a chain within ~10 minutes.
If this option is selected, you can press the Ctrl key and left-click on the residue in the viewer. This will open a frame with 20 energy estimates. Be aware that this is an approximation.
Getting started with MutationExplorer
Tutorial 1: Start from the AlignMe webserver - discover sequence alignments in 3DAlignMe is a software package and webserver for detecting similarities between proteins, which are too subtle to be detected on the sequence level using standard methods.
Follow these steps:
- paste two sequences in Fasta format into the windows
- if the sequences are rather similar select the 'fast' alignment
- if there is little similarity on a sequence level, use one of the three other modes.
- on the result page, click the button to MutationExplorer
Now you are on a form of MutationExplorer where you can upload one or two structures on which the alignments are mapped.
- define base and target sequence from your alignment: the dropdown will display all fasta headers that you entered on the AlignMe website
- upload structure for the base sequence (required)
- define chain in PDB (crucial when you have e.g. homomultimers, otherwise the chain is determined automatically)
- optional: upload structure for target sequence
- define chain in PDB (crucial for homomultimers)
- select energy minimization (default is 'none')
- select RaSP (default is 'off')
The result window will display different things according to what you uploaded.
- base PDB is mutated to target sequence
- structure is colored by sequence conservation
- coloring by energy is available (quality depends on selection of minimization)
- coloring by hydrophobicity is available
- PDBs are structurally aligned (superimposed) according to sequence alignment
- structure is colored by sequence conservation
- coloring by energy is available (quality depends on selection of minimization)
- coloring by hydrophobicity is available
- you can move the structures independently by ... MICHELLE
Tutorial 2: Upload VCF (sequencing data)
VCF, the tab-delimited Variant Call Format, is incredibly powerful and flexible. As example of the minimal input format used to upload SNPs to MutationExplorer, see the sample file RTEL1.vcf, described in our manuscript. This tutorial lists the steps you will take to upload RTEL1.vcf and replicate the example page RTEL1 MutationExplorer results.RTEL1 Tutorial Steps
- From the home page, click on and you will see a new form with a detailed explanation of MutationExplorer’s VCF workflow and limitations.
- Select “long minimization” to replicate the energy values of our example page. Or, “quick” is OK if numerical precision is not a concern. “None” is fastest, but energetic calculations will be meaningless. Click the at right for more details.
- Click the under “RaSP” if you wish to skip this stability prediction calculation. Click the at right for more details.
- Click to start the calculations. The webpage tab at the top of the browser window will immedately indicate a refresh in progress. Very soon you are transported to a new page showing “Status of Job nnnnn”. Be patient as the calculation proceeds. It is a good idea to right-click on “Result Page” for later references. However, this URL will not be active until the various calculations complete. At that time, you will be automatically redirected to this same “Result Page”. Your patience will be well rewarded.
Troubleshooting your VCF uploads
The processes between the submission of your VCF file, and the display of an AlphaFold model are complex. Your VCF upload will be processed by the Ensembl Variant Effect Predictor (VEP), running locally on the MutationExplorer server. If your results are unexpected, we recommend you paste your VCF file into the public VEP website: https://ensembl.org/Tools/VEP. This is the fastest way to ensure that your VCF file is properly tab-delimited and contains the minimum required columns and headers. The raw output from the website should also list protein missense variants.
While seeing missense variants from the public VEP website is encouraging, other difficulties may remain. For example:
Tutorial 3a: Upload structure and mutations: upload model
- MutationExplorer can only analyze one protein at a time. You will need to split your VCF file and make separate runs, if your VCF file generates missense variations for multiple genes.
- The VEP often returns Ensembl transcript Ids which do not map to Swiss-Prot curated, canonical UniProt transcript IDs. Without a UniProt ID, MutationExplorer cannot retrieve an AlphaFold model. You may manually check cross-references between ENST… transcript IDs and Uniprot IDs at ensembl.org and uniprot.org, respectively.
- Inputting GRCh37/hg19 genomic positions, which are indistinguishably formatted, will result in wildly incorrect results. You must first “liftover” such coordinates, by converting your VCF coordinates to BED format, then submitting them to https://genome.ucsc.edu/cgi-bin/hgLiftOver. As you use these kinds of tools, remember that MutationExplorer requires GRCh38.
In the main web view, click on "upload structure and mutations" Structures can be uploaded in three different ways:
- directly from the PDB via PDB ID
- directly from the AlphafoldDB via UniProtKB ID
- upload a local PDB:
- No HETATM records (can be removed via filter button)
- TER after each chain
- different and non-empty chain identifiers
- no anisotropy entries or multi-state side-chains
Enter your E-Mail address to get notified when your minimization is finished.
When you are done, click on
The on page bottom leads to an example MutationExplorer session.Tutorial 3b: Upload structure and mutations: define mutations
There are three ways of providing mutational data:
1) Manual mutation definition
- A comma separated list of: CHAIN ':' RESIDUE-ID TARGET-AA-TYPE
- Example: A:23G,B:123A
This mutates the residue in chain A with the PDB residue ID 23 to glycine and in chain B the residue with residue ID 123 to alanine.
provides an overview over the available PDB chains in the uploaded PDB,
explains the mutation syntax,
provides amino acid one-letter codes.
Define a list of point amino acid substitutions with the following syntax:
Note that the all mutations will be done in the same model, multiple individual mutations are best done via the explorer interface.
2) Sequence Alignment
You can mutate the model that you uploaded in previous step using a sequence alignment in ClustalW format. One of the sequences in the alignment has to match exactly one sequence in the PDB that you uploaded in previous step. Since one alignment is associated with one chain, you can upload up to three alignments. If you need to mutate more than three chains, you can upload target sequences for the other chains or define mutations manually.
3) Target Sequence
Provide the sequence for each chain of the uploaded PDB that it will be mutated to. The target chain sequences as a chain of one-letter sequences can be provided as FASTA files (top), pasted directly (center) or directly fetched from UniProtKB (bottom). For each sequence, select the corresponding PDB chain from the dropdown menu. You can also combine different uploads for multi-chain proteins (e.g. Fasta for Chain A, Paste sequence for Chain B). Please ensure that the uploaded/pasted sequences exactly match the sequence in length (check your PDB!) and avoid sequence offsets resulting in mutation of all residues.
The main stage: MutationExplorer result window
In the explorer view, you will be presented with information about your set of mutations, relative per-residue energies and visualization of the protein.
Left side (info bar):
- Tree View: Outlines the mutations that were performed, where mut_0 will be the unmutated, minimized PDB file that serves as the basis of mutations. You can select entries in the tree view for individual mutations.
- "Color by" Dropdown menu: provides different coloration methods (Absolute Energy, Energy Difference to parent, absolute hydrophobicity, Difference in hydrophobicity) for the coloration of the protein in the main window
- button: pack all PDB files into a ZIP archive for downloading Mutate
- "Enter mutations" field: allows for further mutations to be entered, with the root being the currently active protein (printed in bold in the tree view). The "info" field onm the bottom provides info on the currently selected protein. Info
- Table Showing information about your current selection: Total Energy (Rosetta Energy units), the parent structure, introduced mutations and the exploration ID.
Right side (main window):
- A Mol* visualization of the protein with a blue-white-red color gradient, where blue represents negative values and red positive values in the default color setting. The mutated residues are highlighted with a red translucent sphere. Hovering over the cartoon will highlight residues and display a tooltip with further information in the bottom right corner.
- MDSrv Window: By default, shows a sequence alignment of your root and mutated protein with the currently mutated residue highlighted in red. Hovering over cartoon residues will also highlight the residue in the alignment and vice-versa. It will further display a tool-tip on the right side with information about the highlighted residue.
Clicking on any highlighted residue will zoom in, showing the selected residue and its surroundings in a ball-and-stick view. Different carbon colors indicate protein chains, otherwise atoms are colors by type. Dashed lines indicate residue-residue interactions like hydrogen bonds.
- In the tree view, click on the protein to serve as the root for your next mutation, e.g. mut_0_1, which is the first mutation entered in the webpage
- In the Mutate field, enter the mutations to be added to the protein. Click on the "info" box for more informations
- Click on and wait for your new protein to be added. The waiting message shows the name of the new mutation. In the tree, the new mutated protein will appear indented under the selected root
- Click on the Change Visualization tab in the bottom right of the main window to adjust the color scale range and the mutation markers
- First select for which structure you want to change the range of the color scale by choosing the corresponding file in the Structure drop-down menu. On the left and right side of the color scale, change the numbers and confirm with enter. The color scale will be adjusted upon clicking Apply below the scale
- The mutation marker color and size can be adjusted via the slider and color window. The marker can be turned off by clicking on ✓ On in the Show Field, setting it to X Off
- It is possible to hide all mutation spheres at once. Click the Disable all mutation spheres button in the Mutation Markers panel. To show all mutation spheres again, click the Enable all mutation spheres button
- If you are importing multiple structures into a scene, it may be beneficial to hide a specific structure. Select the structure you want to hide from the Structure drop-down menu. Then click the Hide Structure button at the bottom. To make the structure visible again, simply click the Show Structure button
You can adjust the metric that serves as the basis for coloration in the "Color by:" Dropdown menu. By default, the colormap will be from -2 to 2 in a blue-white-red gradient
Rematch clustal sequences to a structure
After importing structures and ClustalW files into MutationExplorer, MutationExplorer tries to match all sequences contained in the ClustalW file with the chains of available structures.
Immediately after a first match is found, it is applied.
This can lead to false matches, because the MutationExplorer does not search for further possible matches.
The result of such false matches may be that when hovering over the sequences in the Alignment View panel, the wrong amino acids are highlighted at the structures, or that mutation sphere markers are placed at wrong amino acids.
If this occurs, manual rematching must be performed.
- Open the Manuel Rematch Structure tab at the bottom left.
- If several ClustalW files have been imported, select in the drop-down menu Clustal File for which ClustalW file the manual rematching should take place.
- For each imported structure in the scene there is a block for rematching.
- Now select on the left hand side which sequence from the ClustalW file should be made on a certain structure. You do this in the drop-down menu Chain Lable.
- The selected sequence will be displayed below to support the matching.
- Now select the correct chain within the structure by adjusting the drop-down menus of Entity and Chain on the right hand side. You will see the corresponding sequence below.
- Within the displayed sequence on the right side, the longest matching amino acid sequence with the sequence selected in Clustal Chain will be highlighted in green.
- When the complete sequence on the right side is marked green, manual rematching can be performed.
- Apply the manual rematching by clicking on Apply Matching.
Homology modeling of multiple states of a protein
What AlphaFold is not able to do is to model different states of a protein. This can be achieved using MutationExplorer. For many classes and families of proteins, multiple states are available in the PDB. Different states can be modeled By selecting different PDBs as base structure and follow these steps:
- under “upload structure and mutations” enter the PDB ID of the base structure or upload a model from your machine
- ideally select the long minimization (for homology modeling the RaSP option can be disabled) and click the 'next' button
- for uploading the sequence to which you want to modify the base structure to there are multiple choices:
- paste or upload it under “target sequence” (when the target is rather similar to your base structure / template)
- if your target sequence has an Uniprot ID you can enter that
- you will have to indicate to which chain in the base structure the sequence shall be applied (available chains are listed)
- if it is a very remote relationship, follow the AlignMe procedure
- for other cases either also use AlignMe, or upload an alignment from another source via “sequence alignment”
The part that cannot be done by MutationExplorer is the minimization of the model. The user can choose between different possibilities, e.g. using Rosetta as explained above, or by performing MD simulations and many others. Whatever approach the user choosed, it requires sufficient sampling and thus has to be done locally.
Backmutating PDB structures to wildtype sequence for MD simulations
Structures deposited in the PDB are very often modified in order to solve them experimentally. Using MutationExplorer, it is a simple two-step process to mutate them back to their native sequence. Thus it is common step for simulating proteins to back-mutate PDB.
However, it should be noted that at its current state MutationExplorer is not able to complete the structure by modeling missing parts in the PDB.
Multi-state modeling of homologs via Molecular Dynamics simulation ensemblesMolecular dynamics simulation can generate ensembles of functionally relevant states. MutationExplorer maps those ensembles to close or distant homologs. For close homologs, an alignment will be automatically generated internally. Alternatively, a pre-calculated sequence alignment can be provided. Where more refined alignments are required, the server accepts results forwarded from the AlignMe website (https://doi.org/10.1093/nar/gkac391). Such sequence alignments carry nuanced details and MutationExplorer drills down to reveal them all.
Aiding design for experimental studies for protein characterizationWhen designing proteins, the effect of the variant is often only known after applying of an external software, via multiple design rounds. After each of those, the structures can be inspected. The power of the MutationExplorer lies in the possibility to do this directly in the viewer and select novels rounds of mutations immediatley and due to their influence on the structure.
(De-) Stabilize a protein-protein interface3D mapping of energies is not limited to monomers. Over single or multiple rounds of mutations, protein complexes or protein-protein interfaces can flexibly be designed to achieve (de)stabilization. The effect of each design choice can be explored visually.
Methods / code sources
Structure minimizationWe use the following Rosetta code to minimize the structure. The basis is the fixbb function, while the resfile does not contain any indicated mutation.
PATH_TO_ROSETTA/fixbb.static.linuxgccrelease -use_input_sc -in:file:s path_to_structure -resfile path_to_empty_res_file -nstruct 1 -linmem_ig 10 -out:pdb -out:prefix prefix -ex1 -ex2
MutationWe use the following Rosetta code to mutate the structure. The basis is the fixbb function.
PATH_TO_ROSETTA/fixbb.static.linuxgccrelease -use_input_sc -in:file:s path_to_structure -resfile path_to_empty_res_file -nstruct 1 -linmem_ig 10 -out:pdb -out:prefix prefix
VCF processingThe VCF processing code can be found at:
RaSPDetails about the RaSP software can be found here: https://doi.org/10.1101/2022.07.14.500157
The RaSP software code can be found on GitHub: https://github.com/KULL-Centre/papers/tree/main/2022/ML-ddG-Blaabjerg-et-al
calc-rasp.sh output.pdb chainID out_file_name output_directory
Mol* viewerStructure predictions are visualized in MutationExplorer with an adapted and extended version of Mol* (https://doi.org/10.1093/nar/gkab314), the successor of the NGL Viewer (https://doi.org/10.1093/nar/gkv402). The extensions of Mol* from MDsrv (https://doi.org/10.1093/nar/gkac398) have been incorporated into MutationExplorer so that sequence alignments are integratively displayed near structure visualizations. The adapted Mol* viewer can be found on GitHub:
LimitationsCurrently, MutationExplorer has no potentials targeted for membrane proteins available. Especially for residues facing the membrane, a special score-function and preparation is desired. Other residues, especially those outside of the membrane can be investigated with MutationExplorer without limitations.
Moreover, some mutations are generally challenging for Rosetta, foremost those from or to proline. Mutations from glycine may require conformational adaptation beyond the protocols used here, but proteins may be more flexible.
Ligands can only be handled as far as they are included in Rosetta’s energy/scoring functions.
MutationExplorer cannot handle very large structures, such as CryoEM structures available only in CIF format. Partial AlphaFold models for transcripts with more than 2,700 amino acids are not supported. We also caution that for large structures our default protocols for minimizations might be insufficient. Users should definitely minimize these on their local computer before upload. For this purpose, we recommend using the according command lines.
For further information, see the FAQs.
Also check out the example section.