mdciao: Accessible Analysis and Visualization of Molecular Dynamics Simulation Data
mdciao
is a Python module that provides quick, “one-shot” command-line tools to analyze molecular simulation data using residue-residue distances. mdciao
tries to automate as much as possible for non-experienced users while remaining highly customizable for advanced users, by exposing an API to construct your own analysis workflow.
Under the hood, the module mdtraj is doing most of the computation and handling of molecular information, using BioPython for sequence alignment, pandas for many table and IO related operations, and matplotlib for visualizaton. It tries to automatically use the consensus nomenclature for
GPCRs, e.g. Ballesteros-Weinstein-Numbering or structure-based schemes by Gloriam et al,
G-proteins, via Common G-alpha Numbering (CGN), and
Kinases, via their 85 pocket-residue numbering scheme.
by either using local files or on-the-fly lookups of the GPCRdb and/or KLIFS.
Basic Principle
mdciao
takes the files typically generated by a molecular dynamics (MD) simulation, i.e.
topology files, like prot.gro or top.pdb
trajectory files, like traj1.xtc, traj2.xtc
and calculates the time-traces of residue-residue distances, and from there, contact frequencies and distance distributions. The most simple command line call would look approximately like this:
mdc_neighborhoods.py top.pdb traj.xtc --residues L394
[...]
The following 5 contacts capture 3.88 (~90%) of the total frequency 4.31 (over 7 contacts with nonzero frequency).
As orientation value, the first 5 ctcs already capture 90.0% of 4.31.
The 5-th contact has a frequency of 0.50.
freq label residues fragments sum
1 0.96 L394@frag0 - R389@frag0 353 - 348 0 - 0 0.96
2 0.92 L394@frag0 - L388@frag0 353 - 347 0 - 0 1.88
3 0.79 L394@frag0 - L230@frag3 353 - 957 0 - 3 2.67
4 0.71 L394@frag0 - R385@frag0 353 - 344 0 - 0 3.38
5 0.50 L394@frag0 - K270@frag3 353 - 972 0 - 3 3.88
The following files have been created:
./neighborhood.overall@4.0_Ang.pdf
./neighborhood.LEU394@frag0@4.0_Ang.dat
./neighborhood.LEU394@frag0.time_trace@4.0_Ang.pdf
You can also invoke:
mdc_examples.py
for a list of all the built-in command-line toy-examples or:
mdc_notebooks.py
for live Jupyter notebooks play around with. These are shown in the Jupyter Notebook Gallery along with other real-life, more elaborated examples.
Note
A note of caution regarding the above definitions for contact and frequency:
the kinetic information is averaged out. Contacts quickly breaking and forming and contacts that break (or form) only once will have the same frequency as long as the fraction of total time they are formed is the same. For analysis taking kinetics into account, use. e.g. pyemma.
The sharp, “distance-only” cutoff can sometimes over- or under-represent some interaction types. Modules like get_contacts capture these interactions better, and have a ton of other features features.
Frequencies are just averages over the input data. In some cases, simply computing averages is a bad idea. The user is responsible for deciding over what data to average. For example, if your data is highly heterogenous you might want to cluster your data into into
cluster1.xtc
,cluster.2.xtc
etc and then do a per-cluster analysis withmdciao
These issues (if/when they arise) can be spotted easily by looking at the time-traces and informed decisions can be made wrt to parameters like the cutt-off value, number of contacts displayed and many others.