.. mdciao documentation master file, created by
sphinx-quickstart on Fri Sep 6 11:54:24 2019.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
mdciao: Accessible Analysis and Visualization of Molecular Dynamics Simulation Data
===================================================================================
|Pip Package| |Python Package| |MacOs Package| |Coverage| |DOI| |License|
.. figure:: imgs/banner.png
:scale: 33%
.. figure:: imgs/distro_and_violin.png
:scale: 25%
.. figure:: imgs/timedep_ctc_matrix.png
:scale: 55%
.. _my-reference-label:
.. figure:: imgs/interface.combined.png
:scale: 33%
``mdciao`` is a Python module that provides quick, "one-shot" command-line tools to analyze molecular simulation data using residue-residue distances. ``mdciao`` tries to automate as much as possible for non-experienced users while remaining highly customizable for advanced users, by exposing an API to construct your own analysis workflow.
Under the hood, the module `mdtraj `_ is doing most of the computation and handling of molecular information, using `BioPython `_ for sequence alignment, `pandas `_ for many table and IO related operations, and `matplotlib `_ for visualization. It tries to automatically use the consensus nomenclature for
* GPCRs
* via `Ballesteros-Weinstein-Numbering `_ or structure-based schemes by `Gloriam et al `_ for the receptor's TM domain, or
* via generic-residue-numbering for the GAIN domain of `adhesion GPCRs `_
* G-proteins
* via `Common G-alpha Numbering (CGN) `_
* Kinases
* via their `85 pocket-residue numbering scheme `_
using local files or on-the-fly lookups of the `GPCRdb `_
and/or `KLIFS `_.
Basic Principle
---------------
``mdciao`` takes the files typically generated by a molecular dynamics (MD) simulation, i.e.
* topology files, like *prot.gro* or *top.pdb*
* trajectory files, like *traj1.xtc*, *traj2.xtc*
and calculates the time-traces of residue-residue distances, and from there, **contact frequencies** and **distance distributions**. The most simple command line call would look approximately like this::
mdc_neighborhoods.py top.pdb traj.xtc --residues L394
[...]
The following 6 contacts capture 5.26 (~97%) of the total frequency 5.43 (over 9 contacts with nonzero frequency at 4.50 Angstrom).
As orientation value, the first 6 ctcs already capture 90.0% of 5.43.
The 6-th contact has a frequency of 0.52.
freq label residues fragments sum
1 1.00 L394@frag0 - L388@frag0 353 - 347 0 - 0 1.00
2 1.00 L394@frag0 - R389@frag0 353 - 348 0 - 0 2.00
3 0.97 L394@frag0 - L230@frag3 353 - 957 0 - 3 2.97
4 0.97 L394@frag0 - R385@frag0 353 - 344 0 - 0 3.94
5 0.80 L394@frag0 - I233@frag3 353 - 960 0 - 3 4.74
6 0.52 L394@frag0 - K270@frag3 353 - 972 0 - 3 5.26
The following files have been created:
./neighborhood.overall@4.5_Ang.pdf
./neighborhood.LEU394@frag0@4.5_Ang.dat
./neighborhood.LEU394@frag0.time_trace@4.5_Ang.pdf
You can also invoke::
mdc_examples.py
for a list of all the built-in command-line toy-examples or::
mdc_notebooks.py
for live Jupyter notebooks play around with. These are shown in the :ref:`Jupyter Notebook Gallery` along with other real-life, more elaborated examples.
.. note::
A note of caution regarding the above definitions for *contact* and *frequency*:
* the kinetic information is averaged out. Contacts quickly breaking and forming and contacts that break (or form) only once **will have the same frequency** as long as the **fraction of total time** they are formed is the same. For analysis taking kinetics into account, use. e.g. `pyemma `_.
* The sharp, "distance-only" cutoff can sometimes over- or under-represent some interaction types. Modules like `get_contacts `_ or `ProLIF `_ and the `PLIP webserver `_ have individual geometric definitions for each interaction type.
* Frequencies are just **averages** over the input data. In some cases, *simply* computing averages is a bad idea. The user is `responsible for deciding over what data to average `_. For example, if your data is highly heterogenous you might want to `cluster `_ your data into into ``cluster1.xtc``, ``cluster.2.xtc`` etc and then do a per-cluster analysis with ``mdciao``. Same applies to single frames i.e. PDB files, where the word "frequency" doesn't make any sense.
These issues (if/when they arise) can be spotted easily by looking at the time-traces and informed decisions can be made wrt to parameters like the cutt-off value, number of contacts displayed and many others.
.. |Pip Package| image::
https://badge.fury.io/py/mdciao.svg
:target: https://badge.fury.io/py/mdciao
.. |Python Package| image::
https://github.com/gph82/mdciao/actions/workflows/python-package.yml/badge.svg
:target: https://github.com/gph82/mdciao/actions/workflows/python-package.yml
.. |MacOs Package| image::
https://github.com/gph82/mdciao/actions/workflows/python-package.macos.yml/badge.svg
:target: https://github.com/gph82/mdciao/actions/workflows/python-package.macos.yml
.. |Coverage| image::
https://codecov.io/gh/gph82/mdciao/branch/master/graph/badge.svg
:target: https://codecov.io/gh/gph82/mdciao
.. |License| image::
https://img.shields.io/github/license/gph82/mdciao
.. |DOI| image::
https://zenodo.org/badge/DOI/10.5281/zenodo.5643177.svg
:target: https://doi.org/10.5281/zenodo.5643177
.. there's this issue about the self-referencing TOC that I cannot solve
.. https://github.com/sphinx-doc/sphinx/issues/4602
.. toctree::
:hidden:
installation
CLI Tutorial
API Jupyter Notebook Tutorial
notebooks/Covid-19-Spike-Protein-Example.ipynb
notebooks/Covid-19-Spike-Protein-Interface.ipynb
cli_cli/cli_cli
api/api
gallery