mdciao: Accessible Analysis and Visualization of Molecular Dynamics Simulation Data

Pip Package Python Package MacOs Package Coverage DOI License

_images/banner.png
_images/distro_and_violin.png
_images/timedep_ctc_matrix.png
_images/interface.combined.png

mdciao is a Python module that provides quick, “one-shot” command-line tools to analyze molecular simulation data using residue-residue distances. mdciao tries to automate as much as possible for non-experienced users while remaining highly customizable for advanced users, by exposing an API to construct your own analysis workflow.

Under the hood, the module mdtraj is doing most of the computation and handling of molecular information, using BioPython for sequence alignment, pandas for many table and IO related operations, and matplotlib for visualizaton. It tries to automatically use the consensus nomenclature for

by either using local files or on-the-fly lookups of the GPCRdb and/or KLIFS.

Basic Principle

mdciao takes the files typically generated by a molecular dynamics (MD) simulation, i.e.

  • topology files, like prot.gro or top.pdb

  • trajectory files, like traj1.xtc, traj2.xtc

and calculates the time-traces of residue-residue distances, and from there, contact frequencies and distance distributions. The most simple command line call would look approximately like this:

mdc_neighborhoods.py top.pdb traj.xtc --residues L394
[...]
The following 5 contacts capture 3.88 (~90%) of the total frequency 4.31 (over 7 contacts with nonzero frequency).
As orientation value, the first 5 ctcs already capture 90.0% of 4.31.
The 5-th contact has a frequency of 0.50.
   freq          label            residues  fragments   sum
1  0.96  L394@frag0 - R389@frag0  353 - 348    0 - 0   0.96
2  0.92  L394@frag0 - L388@frag0  353 - 347    0 - 0   1.88
3  0.79  L394@frag0 - L230@frag3  353 - 957    0 - 3   2.67
4  0.71  L394@frag0 - R385@frag0  353 - 344    0 - 0   3.38
5  0.50  L394@frag0 - K270@frag3  353 - 972    0 - 3   3.88
The following files have been created:
./neighborhood.overall@4.0_Ang.pdf
./neighborhood.LEU394@frag0@4.0_Ang.dat
./neighborhood.LEU394@frag0.time_trace@4.0_Ang.pdf

You can also invoke:

mdc_examples.py

for a list of all the built-in command-line toy-examples or:

mdc_notebooks.py

for live Jupyter notebooks play around with. These are shown in the Jupyter Notebook Gallery along with other real-life, more elaborated examples.

Note

A note of caution regarding the above definitions for contact and frequency:

  • the kinetic information is averaged out. Contacts quickly breaking and forming and contacts that break (or form) only once will have the same frequency as long as the fraction of total time they are formed is the same. For analysis taking kinetics into account, use. e.g. pyemma.

  • The sharp, “distance-only” cutoff can sometimes over- or under-represent some interaction types. Modules like get_contacts capture these interactions better, and have a ton of other features features.

  • Frequencies are just averages over the input data. In some cases, simply computing averages is a bad idea. The user is responsible for deciding over what data to average. For example, if your data is highly heterogenous you might want to cluster your data into into cluster1.xtc, cluster.2.xtc etc and then do a per-cluster analysis with mdciao

These issues (if/when they arise) can be spotted easily by looking at the time-traces and informed decisions can be made wrt to parameters like the cutt-off value, number of contacts displayed and many others.