Jupyter Notebook Tutorial

Roughly, the design idea of mdciao is that:

  • The CLI offers pre-packaged analysis pipelines that are essentially one-shot tools. They are an entry-point for non-experts and do not require any Python scripting knowledge. CLI tools are still highly customizable (check mdc_**.py -h or mdc_examples.py), but offer only some of the mdciao-functionalities.

  • The Python API, on the other hand, exposes:

    • CLI-equivalent functions via the mdciao.cli submodule. Here you’ll find evertying that the CLI offers, only as regular Python functions. This provides scripting flexibility, with the added value that now input and outputs are normal Python objects that can be further manipulated, by mdciao or any other Python module of your liking.

    • Other standalone submodules that the CLI uses under the hood, and that the user can access directly for any other scripting purpuse: plotting methods, alignment/sequence methods, nomenclature methods, PDB-methods etc.

Note

THE API IS NOT STABLE YET, if you are using mdciao in API mode, we assume you can handle future API changes without much hassle.

For clarity, this notebook loosely follows the same structure as the Overview section of the mdciaodocumentation. Other notebooks will follow soon, explaining basic concepts and/or advanced pipelines.

If you want to run this notebook on your own, please download and extract the data from here first. You can download it:

  • using the browser

  • using the terminal with wget http://proteinformatics.org/mdciao/mdciao_example.zip; unzip mdciao_example.zip

  • using mdciao’s own method mdciao.examples.fetch_example_data

[1]:
import mdciao, os
if not os.path.exists("mdciao_example"):
    mdciao.examples.fetch_example_data()
Unzipping to 'mdciao_example'

Basic Usage

Now we replicate the CLI command:

mdc_neighborhoods.py prot.pdb traj.xtc --residues L394 -nf #nf: don't use fragments

but in API mode. We use the method cli.residue_neighborhoods:

[2]:
result= mdciao.cli.residue_neighborhoods("L394",
                                         "mdciao_example/traj.xtc",
                                         "mdciao_example/prot.pdb",
                                         fragments=[None])
Will compute contact frequencies for (1 items):
mdciao_example/traj.xtc
 with a stride of 1 frames
Using method 'None' these fragments were found
fragment      0 with   1044 AAs     LEU4 (     0) -   P0G395 (1043  ) (0)  resSeq jumps

Will compute neighborhoods for the residues
L394
excluding 4 nearest neighbors

   residue      residx    fragment      resSeq       GPCR        CGN
    LEU394         353           0        394       None       None

Pre-computing likely neighborhoods by reducing the neighbor-list
to those within 15 Angstrom in the first frame of reference geom
'mdciao_example/prot.pdb':...done!
From 1035 potential distances, the neighborhoods have been reduced to only 74 potential contacts.
If this number is still too high (i.e. the computation is too slow), consider using a smaller nlist_cutoff_Ang
Streaming mdciao_example/traj.xtc (nr.   0) with stride  1 in chunks of  10000 frames. Now at chunk nr    0, frames so far    280
#idx   freq      contact       fragments     res_idxs      ctc_idx  Sum
1:     0.55   LEU394-ARG389       0-0         353-348        30     0.55
2:     0.47   LEU394-LYS270       0-0         353-972        65     1.02
3:     0.38   LEU394-LEU388       0-0         353-347        29     1.40
4:     0.23   LEU394-LEU230       0-0         353-957        51     1.62
5:     0.11   LEU394-ARG385       0-0         353-344        26     1.73
These 5 contacts capture 1.73 (~98%) of the total frequency 1.76 (over 74 input contacts)
As orientation value, the first 4 ctcs already capture 90.0% of 1.76.
The 4-th contact has a frequency of 0.23

The following files have been created:
./neighborhood.overall@3.5_Ang.pdf
./neighborhood.LEU394@frag0@3.5_Ang.dat
./neighborhood.LEU394@frag0.time_trace@3.5_Ang.pdf
../_images/notebooks_Tutorial_5_1.png

result is a dictionary of dictionaries, with the main result under the key neighborhoods. There, you’ll find a dictionary keyed with residue indices and valued with a ContactGroup for each residue neighborhood.

ContactGroups are mdciao classes that allow the further manipulation of contact data, molecular information and much more. Check here to learn more about mdciao classes.

[3]:
result["neighborhoods"][353]
[3]:
<mdciao.contacts.contacts.ContactGroup at 0x7f46c2d73cd0>

Using Python Objects

Please note that in API mode, inputs can be objects, for example mdtraj Trajectories. So, before calling the next mdciao.cli method, we use mdtraj to load the trajectory from our files:

[4]:
import mdtraj as md
traj = md.load("mdciao_example/traj.xtc", top="mdciao_example/prot.pdb")
traj
[4]:
<mdtraj.Trajectory with 280 frames, 8384 atoms, 1044 residues, and unitcells at 0x7f46c2ecbf10>

And we repeat the above command using the traj object. Please note that we’re also using the no_disk option so that no files are written to disk, in case we’re only interested in working in memory.

[5]:
result = mdciao.cli.residue_neighborhoods("L394",
                                          traj,
                                          fragments=[None],
                                          no_disk=True)
Will compute contact frequencies for (1 items):
<mdtraj.Trajectory with 280 frames, 8384 atoms, 1044 residues, and unitcells>
 with a stride of 1 frames
Using method 'None' these fragments were found
fragment      0 with   1044 AAs     LEU4 (     0) -   P0G395 (1043  ) (0)  resSeq jumps

Will compute neighborhoods for the residues
L394
excluding 4 nearest neighbors

   residue      residx    fragment      resSeq       GPCR        CGN
    LEU394         353           0        394       None       None

Pre-computing likely neighborhoods by reducing the neighbor-list
to those within 15 Angstrom in the first frame of reference geom
'<mdtraj.Trajectory with 1 frames, 8384 atoms, 1044 residues, and unitcells>':...done!
From 1035 potential distances, the neighborhoods have been reduced to only 74 potential contacts.
If this number is still too high (i.e. the computation is too slow), consider using a smaller nlist_cutoff_Ang
Streaming over trajectory object nr.   0 (   280 frames,    280 with stride  1) in chunks of 10000 frames. Now at chunk nr    0, frames so far    280
#idx   freq      contact       fragments     res_idxs      ctc_idx  Sum
1:     0.55   LEU394-ARG389       0-0         353-348        30     0.55
2:     0.47   LEU394-LYS270       0-0         353-972        65     1.02
3:     0.38   LEU394-LEU388       0-0         353-347        29     1.40
4:     0.23   LEU394-LEU230       0-0         353-957        51     1.62
5:     0.11   LEU394-ARG385       0-0         353-344        26     1.73
These 5 contacts capture 1.73 (~98%) of the total frequency 1.76 (over 74 input contacts)
As orientation value, the first 4 ctcs already capture 90.0% of 1.76.
The 4-th contact has a frequency of 0.23

../_images/notebooks_Tutorial_11_1.png
../_images/notebooks_Tutorial_11_2.png

Now, the more elaborated CLI-command:

mdc_neighborhoods.py prot.pdb traj.xtc -r L394 --GPCR adrb2_human --CGN 3SN6 -ni -at #ni: not interactive, at: show atom-types

We keep the no_disk option to avoid writing to disk, but you can change this if you want. Please note that some options do not carry exactly the same names as their CLI equivalents. E.g. ni in the CLI (= don’t be interactive) is now accept_guess in the API. These differences are needed for compatiblity with other methods, but might get unified in the future.

[6]:
result = mdciao.cli.residue_neighborhoods("L394",
                                          traj,
                                          GPCR_uniprot="adrb2_human",
                                          CGN_PDB="3SN6",
                                          accept_guess=True,
                                          plot_atomtypes=True,
                                          no_disk=True)
Will compute contact frequencies for (1 items):
<mdtraj.Trajectory with 280 frames, 8384 atoms, 1044 residues, and unitcells>
 with a stride of 1 frames
Using method 'lig_resSeq+' these fragments were found
fragment      0 with    354 AAs     LEU4 (     0) -   LEU394 (353   ) (0)  resSeq jumps
fragment      1 with    340 AAs     GLN1 (   354) -   ASN340 (693   ) (1)
fragment      2 with     66 AAs     ALA2 (   694) -    PHE67 (759   ) (2)
fragment      3 with    283 AAs    GLU30 (   760) -   LEU340 (1042  ) (3)  resSeq jumps
fragment      4 with      1 AAs   P0G395 (  1043) -   P0G395 (1043  ) (4)
No local file ./adrb2_human.xlsx found, checking online in
https://gpcrdb.org/services/residues/extended/adrb2_human ...done!
Please cite the following reference to the GPCRdb:
 * Kooistra et al, (2021) GPCRdb in 2021: Integrating GPCR sequence, structure and function
   Nucleic Acids Research 49, D335--D343
   https://doi.org/10.1093/nar/gkaa1080
For more information, call mdciao.nomenclature.references()
done without 404, continuing.
GPCR-labels align best with fragments: [3] (first-last: GLU30-LEU340).

These are the GPCR fragments mapped onto your topology:
    TM1 with     32 AAs    GLU30@1.29x29   (   760) -    PHE61@1.60x60   (791   ) (TM1)
   ICL1 with      4 AAs    GLU62@12.48x48  (   792) -    GLN65@12.51x51  (795   ) (ICL1)
    TM2 with     32 AAs    THR66@2.37x37   (   796) -    LYS97@2.68x67   (827   ) (TM2)
   ECL1 with      4 AAs    MET98@23.49x49  (   828) -   PHE101@23.52x52  (831   ) (ECL1)
    TM3 with     36 AAs   GLY102@3.21x21   (   832) -   SER137@3.56x56   (867   ) (TM3)
   ICL2 with      8 AAs   PRO138@34.50x50  (   868) -   LEU145@34.57x57  (875   ) (ICL2)
    TM4 with     27 AAs   THR146@4.38x38   (   876) -   HIS172@4.64x64   (902   ) (TM4)
   ECL2 with     20 AAs   TRP173           (   903) -   THR195           (922   ) (ECL2)  resSeq jumps
    TM5 with     42 AAs   ASN196@5.35x36   (   923) -   GLU237@5.76x76   (964   ) (TM5)
   ICL3 with      2 AAs   GLY238           (   965) -   ARG239           (966   ) (ICL3)
    TM6 with     35 AAs   CYS265@6.27x27   (   967) -   GLN299@6.61x61   (1001  ) (TM6)
   ECL3 with      4 AAs   ASP300           (  1002) -   ILE303           (1005  ) (ECL3)
    TM7 with     25 AAs   ARG304@7.31x30   (  1006) -   ARG328@7.55x55   (1030  ) (TM7)
     H8 with     12 AAs   SER329@8.47x47   (  1031) -   LEU340@8.58x58   (1042  ) (H8)
Using CGN-nomenclature, please cite
 * Flock et al, (2015) Universal allosteric mechanism for G$\alpha$ activation by GPCRs
   Nature 2015 524:7564 524, 173--179
   https://doi.org/10.1038/nature14663
No local file ./CGN_3SN6.txt found, checking online in
https://www.mrc-lmb.cam.ac.uk/CGN/lookup_results/3SN6.txt ...done without 404, continuing.
No local PDB file for 3SN6 found in directory '.', checking online in
https://files.rcsb.org/download/3SN6.pdb ...found! Continuing normally
CGN-labels align best with fragments: [0] (first-last: LEU4-LEU394).

These are the CGN fragments mapped onto your topology:
   G.HN with     28 AAs     THR9@G.HN.26   (     5) -    VAL36@G.HN.53   (32    ) (G.HN)
 G.hns1 with      3 AAs    TYR37@G.hns1.1  (    33) -    ALA39@G.hns1.3  (35    ) (G.hns1)
   G.S1 with      7 AAs    THR40@G.S1.1    (    36) -    LEU46@G.S1.7    (42    ) (G.S1)
 G.s1h1 with      6 AAs    GLY47@G.s1h1.1  (    43) -    GLY52@G.s1h1.6  (48    ) (G.s1h1)
   G.H1 with      7 AAs    LYS53@G.H1.1    (    49) -    GLN59@G.H1.7    (55    ) (G.H1)
   H.HA with     26 AAs    LYS88@H.HA.4    (    56) -   LEU113@H.HA.29   (81    ) (H.HA)
 H.hahb with      9 AAs   VAL114@H.hahb.1  (    82) -   PRO122@H.hahb.9  (90    ) (H.hahb)
   H.HB with     14 AAs   GLU123@H.HB.1    (    91) -   ASN136@H.HB.14   (104   ) (H.HB)
 H.hbhc with      7 AAs   VAL137@H.hbhc.1  (   105) -   PRO143@H.hbhc.15 (111   ) (H.hbhc)
   H.HC with     12 AAs   PRO144@H.HC.1    (   112) -   GLU155@H.HC.12   (123   ) (H.HC)
 H.hchd with      1 AAs   ASP156@H.hchd.1  (   124) -   ASP156@H.hchd.1  (124   ) (H.hchd)
   H.HD with     12 AAs   GLU157@H.HD.1    (   125) -   GLU168@H.HD.12   (136   ) (H.HD)
 H.hdhe with      5 AAs   TYR169@H.hdhe.1  (   137) -   ASP173@H.hdhe.5  (141   ) (H.hdhe)
   H.HE with     13 AAs   CYS174@H.HE.1    (   142) -   LYS186@H.HE.13   (154   ) (H.HE)
 H.hehf with      7 AAs   GLN187@H.hehf.1  (   155) -   SER193@H.hehf.7  (161   ) (H.hehf)
   H.HF with      6 AAs   ASP194@H.HF.1    (   162) -   ARG199@H.HF.6    (167   ) (H.HF)
 G.hfs2 with      5 AAs   CYS200@G.hfs2.1  (   168) -   GLY206@G.hfs2.7  (172   ) (G.hfs2)  resSeq jumps
   G.S2 with      8 AAs   ILE207@G.S2.1    (   173) -   VAL214@G.S2.8    (180   ) (G.S2)
 G.s2s3 with      2 AAs   ASP215@G.s2s3.1  (   181) -   LYS216@G.s2s3.2  (182   ) (G.s2s3)
   G.S3 with      8 AAs   VAL217@G.S3.1    (   183) -   VAL224@G.S3.8    (190   ) (G.S3)
 G.s3h2 with      3 AAs   GLY225@G.s3h2.1  (   191) -   GLN227@G.s3h2.3  (193   ) (G.s3h2)
   G.H2 with     10 AAs   ARG228@G.H2.1    (   194) -   CYS237@G.H2.10   (203   ) (G.H2)
 G.h2s4 with      5 AAs   PHE238@G.h2s4.1  (   204) -   THR242@G.h2s4.5  (208   ) (G.h2s4)
   G.S4 with      7 AAs   ALA243@G.S4.1    (   209) -   ALA249@G.S4.7    (215   ) (G.S4)
 G.s4h3 with      8 AAs   SER250@G.s4h3.1  (   216) -   ASN264@G.s4h3.15 (223   ) (G.s4h3)  resSeq jumps
   G.H3 with     18 AAs   ARG265@G.H3.1    (   224) -   LEU282@G.H3.18   (241   ) (G.H3)
 G.h3s5 with      3 AAs   ARG283@G.h3s5.1  (   242) -   ILE285@G.h3s5.3  (244   ) (G.h3s5)
   G.S5 with      7 AAs   SER286@G.S5.1    (   245) -   ASN292@G.S5.7    (251   ) (G.S5)
 G.s5hg with      1 AAs   LYS293@G.s5hg.1  (   252) -   LYS293@G.s5hg.1  (252   ) (G.s5hg)
   G.HG with     17 AAs   GLN294@G.HG.1    (   253) -   ASP310@G.HG.17   (269   ) (G.HG)
 G.hgh4 with     10 AAs   TYR311@G.hgh4.1  (   270) -   THR320@G.hgh4.10 (279   ) (G.hgh4)
   G.H4 with     27 AAs   PRO321@G.H4.1    (   280) -   ARG347@G.H4.27   (306   ) (G.H4)
 G.h4s6 with     11 AAs   ILE348@G.h4s6.1  (   307) -   TYR358@G.h4s6.20 (317   ) (G.h4s6)
   G.S6 with      5 AAs   CYS359@G.S6.1    (   318) -   PHE363@G.S6.5    (322   ) (G.S6)
 G.s6h5 with      5 AAs   THR364@G.s6h5.1  (   323) -   ASP368@G.s6h5.5  (327   ) (G.s6h5)
   G.H5 with     26 AAs   THR369@G.H5.1    (   328) -   LEU394@G.H5.26   (353   ) (G.H5)

Will compute neighborhoods for the residues
L394
excluding 4 nearest neighbors

   residue      residx    fragment      resSeq       GPCR        CGN
    LEU394         353           0        394       None    G.H5.26

Pre-computing likely neighborhoods by reducing the neighbor-list
to those within 15 Angstrom in the first frame of reference geom
'<mdtraj.Trajectory with 1 frames, 8384 atoms, 1044 residues, and unitcells>':...done!
From 1035 potential distances, the neighborhoods have been reduced to only 74 potential contacts.
If this number is still too high (i.e. the computation is too slow), consider using a smaller nlist_cutoff_Ang
Streaming over trajectory object nr.   0 (   280 frames,    280 with stride  1) in chunks of 10000 frames. Now at chunk nr    0, frames so far    280
#idx   freq      contact       fragments     res_idxs      ctc_idx  Sum
1:     0.55   LEU394-ARG389       0-0         353-348        30     0.55
2:     0.47   LEU394-LYS270       0-3         353-972        65     1.02
3:     0.38   LEU394-LEU388       0-0         353-347        29     1.40
4:     0.23   LEU394-LEU230       0-3         353-957        51     1.62
5:     0.11   LEU394-ARG385       0-0         353-344        26     1.73
These 5 contacts capture 1.73 (~98%) of the total frequency 1.76 (over 74 input contacts)
As orientation value, the first 4 ctcs already capture 90.0% of 1.76.
The 4-th contact has a frequency of 0.23

../_images/notebooks_Tutorial_13_1.png
../_images/notebooks_Tutorial_13_2.png

Consensus Nomenclature: GPCR and/or CGN

Above, we declared our intention to use GPCR consensus nomenclature and Common G-alpha Numbering (CGN) by passing the descriptor strings GPCR_uniprot="adrb2_human" or CGN_PDB="3SN6", respectively, to contact the online databases, in particular the

For that retrieval and handling of these labels, we use the module mdciao.nomenclature, which offers two main classes and other helper function to deal with the nomenclature. An overview of the relevant references is contained in the module’s documentation as well as in the documentation of each class, e.g. here.

Additionally, you can use mdciao.nomenclature.references to print all relevant nomenclature references:

[7]:
mdciao.nomenclature.references()
mdciao.nomenclature functions thanks to these online databases. Please cite them if you use this module:
 * Kooistra et al, (2021) GPCRdb in 2021: Integrating GPCR sequence, structure and function
   Nucleic Acids Research 49, D335--D343
   https://doi.org/10.1093/nar/gkaa1080
 * Berman et al, (2000) The Protein Data Bank
   Nucleic Acids Research 28, 235--242
   https://doi.org/10.1093/NAR/28.1.235
 * Flock et al, (2015) Universal allosteric mechanism for G$\alpha$ activation by GPCRs
   Nature 2015 524:7564 524, 173--179
   https://doi.org/10.1038/nature14663
 * Kanev et al, (2021) KLIFS: an overhaul after the first 5 years of supporting kinase research
   Nucleic Acids Research 49, D562--D569
   https://doi.org/10.1093/NAR/GKAA895

Additionally, depending on the chosen nomenclature type, you should cite:
 * Structure based GPCR notation
   - Isberg et al, (2015) Generic GPCR residue numbers - Aligning topology maps while minding the gaps
     Trends in Pharmacological Sciences 36, 22--31
     https://doi.org/10.1016/j.tips.2014.11.001
   - Isberg et al, (2016) GPCRdb: An information system for G protein-coupled receptors
     Nucleic Acids Research 44, D356--D364
     https://doi.org/10.1093/nar/gkv1178
 * Sequence based GPCR schemes:
   - Wootten et al, (2013) Polar transmembrane interactions drive formation of ligand-specific and signal
     pathway-biased family B G protein-coupled receptor conformations
     Proceedings of the National Academy of Sciences of the United States of America 110, 5211--5216
     https://doi.org/10.1073/pnas.1221585110
   - Pin et al, (2003) Evolution, structure, and activation mechanism of family 3/C G-protein-coupled
     receptors
     Pharmacology and Therapeutics 98, 325--354
     https://doi.org/10.1016/S0163-7258(03)00038-X
   - Wu et al, (2014) Structure of a class C GPCR metabotropic glutamate receptor 1 bound to an
     allosteric modulator
     Science 344, 58--64
     https://doi.org/10.1126/science.1249489
   - Oliveira et al, (1993) A common motif in G-protein-coupled seven transmembrane helix receptors
     Journal of Computer-Aided Molecular Design 7, 649--658
     https://doi.org/10.1007/BF00125323
   - Baldwin et al, (1993) The probable arrangement of the helices in G protein-coupled receptors.
     The EMBO Journal 12, 1693--1703
     https://doi.org/10.1002/J.1460-2075.1993.TB05814.X
   - Baldwin et al, (1997) An alpha-carbon template for the transmembrane helices in the rhodopsin family
     of G-protein-coupled receptors
     Journal of Molecular Biology 272, 144--164
     https://doi.org/10.1006/jmbi.1997.1240
   - Schwartz et al, (1994) Locating ligand-binding sites in 7tm receptors by protein engineering
     Current Opinion in Biotechnology 5, 434--444
     https://doi.org/10.1016/0958-1669(94)90054-X
   - Schwartz et al, (1995) Molecular mechanism of action of non-peptide ligands for peptide receptors
     Current Pharmaceutical Design 1, 325--342
 * KLIFS 85 ligand binding site residues of kinases
   - Van Linden et al, (2014) KLIFS: A knowledge-based structural database to navigate kinase-ligand
     interaction space
     Journal of Medicinal Chemistry 57, 249--277
     https://doi.org/10.1021/JM400378W
   - Kooistra et al, (2016) KLIFS: a structural kinase-ligand interaction database
     Nucleic Acids Research 44, D365--D371
     https://doi.org/10.1093/NAR/GKV1082
   - Kanev et al, (2021) KLIFS: an overhaul after the first 5 years of supporting kinase research
     Nucleic Acids Research 49, D562--D569
     https://doi.org/10.1093/NAR/GKAA895

You can find all these references distributed as BibTex file distributed with mdciao here
 * /home/perezheg/Programs/mdciao/tests/data/nomenclature/nomenclature.bib

The nomenclature classes produce standalone objects, and can do much more than just be inputs to mdciao.cli methods. As with any Python class, you can learn a lot about its methods and attributes by using the tab autocompletion feature of IPython. Or you can check here for more mdciao docs.

Since we’ll be using these labels more than once in the notebook, instead of using the network each time, we can have them as Python objects in memory. Alternatively, it’s possible to save the labeling data locally after the first database query. This allows for inspection and re-use of the retrieved data outside the notebook (in a spreadsheet, for example).

[8]:
from mdciao import nomenclature
GPCR = nomenclature.LabelerGPCR("adrb2_human",
                            #write_to_disk=True
                           )
CGN = nomenclature.LabelerCGN("3SN6",
                             # write_to_disk=True
                             )
No local file ./adrb2_human.xlsx found, checking online in
https://gpcrdb.org/services/residues/extended/adrb2_human ...done!
Please cite the following reference to the GPCRdb:
 * Kooistra et al, (2021) GPCRdb in 2021: Integrating GPCR sequence, structure and function
   Nucleic Acids Research 49, D335--D343
   https://doi.org/10.1093/nar/gkaa1080
For more information, call mdciao.nomenclature.references()
done without 404, continuing.
Using CGN-nomenclature, please cite
 * Flock et al, (2015) Universal allosteric mechanism for G$\alpha$ activation by GPCRs
   Nature 2015 524:7564 524, 173--179
   https://doi.org/10.1038/nature14663
No local file ./CGN_3SN6.txt found, checking online in
https://www.mrc-lmb.cam.ac.uk/CGN/lookup_results/3SN6.txt ...done without 404, continuing.
No local PDB file for 3SN6 found in directory '.', checking online in
https://files.rcsb.org/download/3SN6.pdb ...found! Continuing normally

Residue Selection

Now, we can play around with residue selection, replicating the CLI-command:

mdc_residues.py GLU*,P0G,380-394,G.HN.* prot.pdb --GPCR adrb2_human --CGN 3SN6 -ni

Check the docs here to check the output values res_idxs_list,fragments, and consensus_maps, although most of out useful output is written out.

Please note that we’re now using mdciao.nomenclature classes directly as inputs (GPCR and CGN), speeding up the method by avoiding queries over the network.

[9]:
res_idxs_list, fragments, consensus_maps = mdciao.cli.residue_selection(
    "GLU*,P0G,380-394,G.HN.*",
    traj,
    GPCR_uniprot=GPCR,
    CGN_PDB=CGN,
    accept_guess=True)
Using method 'lig_resSeq+' these fragments were found
fragment      0 with    354 AAs     LEU4 (     0) -   LEU394 (353   ) (0)  resSeq jumps
fragment      1 with    340 AAs     GLN1 (   354) -   ASN340 (693   ) (1)
fragment      2 with     66 AAs     ALA2 (   694) -    PHE67 (759   ) (2)
fragment      3 with    283 AAs    GLU30 (   760) -   LEU340 (1042  ) (3)  resSeq jumps
fragment      4 with      1 AAs   P0G395 (  1043) -   P0G395 (1043  ) (4)
GPCR-labels align best with fragments: [3] (first-last: GLU30-LEU340).

These are the GPCR fragments mapped onto your topology:
    TM1 with     32 AAs    GLU30@1.29x29   (   760) -    PHE61@1.60x60   (791   ) (TM1)
   ICL1 with      4 AAs    GLU62@12.48x48  (   792) -    GLN65@12.51x51  (795   ) (ICL1)
    TM2 with     32 AAs    THR66@2.37x37   (   796) -    LYS97@2.68x67   (827   ) (TM2)
   ECL1 with      4 AAs    MET98@23.49x49  (   828) -   PHE101@23.52x52  (831   ) (ECL1)
    TM3 with     36 AAs   GLY102@3.21x21   (   832) -   SER137@3.56x56   (867   ) (TM3)
   ICL2 with      8 AAs   PRO138@34.50x50  (   868) -   LEU145@34.57x57  (875   ) (ICL2)
    TM4 with     27 AAs   THR146@4.38x38   (   876) -   HIS172@4.64x64   (902   ) (TM4)
   ECL2 with     20 AAs   TRP173           (   903) -   THR195           (922   ) (ECL2)  resSeq jumps
    TM5 with     42 AAs   ASN196@5.35x36   (   923) -   GLU237@5.76x76   (964   ) (TM5)
   ICL3 with      2 AAs   GLY238           (   965) -   ARG239           (966   ) (ICL3)
    TM6 with     35 AAs   CYS265@6.27x27   (   967) -   GLN299@6.61x61   (1001  ) (TM6)
   ECL3 with      4 AAs   ASP300           (  1002) -   ILE303           (1005  ) (ECL3)
    TM7 with     25 AAs   ARG304@7.31x30   (  1006) -   ARG328@7.55x55   (1030  ) (TM7)
     H8 with     12 AAs   SER329@8.47x47   (  1031) -   LEU340@8.58x58   (1042  ) (H8)
CGN-labels align best with fragments: [0] (first-last: LEU4-LEU394).

These are the CGN fragments mapped onto your topology:
   G.HN with     28 AAs     THR9@G.HN.26   (     5) -    VAL36@G.HN.53   (32    ) (G.HN)
 G.hns1 with      3 AAs    TYR37@G.hns1.1  (    33) -    ALA39@G.hns1.3  (35    ) (G.hns1)
   G.S1 with      7 AAs    THR40@G.S1.1    (    36) -    LEU46@G.S1.7    (42    ) (G.S1)
 G.s1h1 with      6 AAs    GLY47@G.s1h1.1  (    43) -    GLY52@G.s1h1.6  (48    ) (G.s1h1)
   G.H1 with      7 AAs    LYS53@G.H1.1    (    49) -    GLN59@G.H1.7    (55    ) (G.H1)
   H.HA with     26 AAs    LYS88@H.HA.4    (    56) -   LEU113@H.HA.29   (81    ) (H.HA)
 H.hahb with      9 AAs   VAL114@H.hahb.1  (    82) -   PRO122@H.hahb.9  (90    ) (H.hahb)
   H.HB with     14 AAs   GLU123@H.HB.1    (    91) -   ASN136@H.HB.14   (104   ) (H.HB)
 H.hbhc with      7 AAs   VAL137@H.hbhc.1  (   105) -   PRO143@H.hbhc.15 (111   ) (H.hbhc)
   H.HC with     12 AAs   PRO144@H.HC.1    (   112) -   GLU155@H.HC.12   (123   ) (H.HC)
 H.hchd with      1 AAs   ASP156@H.hchd.1  (   124) -   ASP156@H.hchd.1  (124   ) (H.hchd)
   H.HD with     12 AAs   GLU157@H.HD.1    (   125) -   GLU168@H.HD.12   (136   ) (H.HD)
 H.hdhe with      5 AAs   TYR169@H.hdhe.1  (   137) -   ASP173@H.hdhe.5  (141   ) (H.hdhe)
   H.HE with     13 AAs   CYS174@H.HE.1    (   142) -   LYS186@H.HE.13   (154   ) (H.HE)
 H.hehf with      7 AAs   GLN187@H.hehf.1  (   155) -   SER193@H.hehf.7  (161   ) (H.hehf)
   H.HF with      6 AAs   ASP194@H.HF.1    (   162) -   ARG199@H.HF.6    (167   ) (H.HF)
 G.hfs2 with      5 AAs   CYS200@G.hfs2.1  (   168) -   GLY206@G.hfs2.7  (172   ) (G.hfs2)  resSeq jumps
   G.S2 with      8 AAs   ILE207@G.S2.1    (   173) -   VAL214@G.S2.8    (180   ) (G.S2)
 G.s2s3 with      2 AAs   ASP215@G.s2s3.1  (   181) -   LYS216@G.s2s3.2  (182   ) (G.s2s3)
   G.S3 with      8 AAs   VAL217@G.S3.1    (   183) -   VAL224@G.S3.8    (190   ) (G.S3)
 G.s3h2 with      3 AAs   GLY225@G.s3h2.1  (   191) -   GLN227@G.s3h2.3  (193   ) (G.s3h2)
   G.H2 with     10 AAs   ARG228@G.H2.1    (   194) -   CYS237@G.H2.10   (203   ) (G.H2)
 G.h2s4 with      5 AAs   PHE238@G.h2s4.1  (   204) -   THR242@G.h2s4.5  (208   ) (G.h2s4)
   G.S4 with      7 AAs   ALA243@G.S4.1    (   209) -   ALA249@G.S4.7    (215   ) (G.S4)
 G.s4h3 with      8 AAs   SER250@G.s4h3.1  (   216) -   ASN264@G.s4h3.15 (223   ) (G.s4h3)  resSeq jumps
   G.H3 with     18 AAs   ARG265@G.H3.1    (   224) -   LEU282@G.H3.18   (241   ) (G.H3)
 G.h3s5 with      3 AAs   ARG283@G.h3s5.1  (   242) -   ILE285@G.h3s5.3  (244   ) (G.h3s5)
   G.S5 with      7 AAs   SER286@G.S5.1    (   245) -   ASN292@G.S5.7    (251   ) (G.S5)
 G.s5hg with      1 AAs   LYS293@G.s5hg.1  (   252) -   LYS293@G.s5hg.1  (252   ) (G.s5hg)
   G.HG with     17 AAs   GLN294@G.HG.1    (   253) -   ASP310@G.HG.17   (269   ) (G.HG)
 G.hgh4 with     10 AAs   TYR311@G.hgh4.1  (   270) -   THR320@G.hgh4.10 (279   ) (G.hgh4)
   G.H4 with     27 AAs   PRO321@G.H4.1    (   280) -   ARG347@G.H4.27   (306   ) (G.H4)
 G.h4s6 with     11 AAs   ILE348@G.h4s6.1  (   307) -   TYR358@G.h4s6.20 (317   ) (G.h4s6)
   G.S6 with      5 AAs   CYS359@G.S6.1    (   318) -   PHE363@G.S6.5    (322   ) (G.S6)
 G.s6h5 with      5 AAs   THR364@G.s6h5.1  (   323) -   ASP368@G.s6h5.5  (327   ) (G.s6h5)
   G.H5 with     26 AAs   THR369@G.H5.1    (   328) -   LEU394@G.H5.26   (353   ) (G.H5)
0.0)       P0G395 in fragment 4 with residue index 1043
0.0)       ARG380 in fragment 0 with residue index 339 (CGN: G.H5.12)
0.0)       LEU394 in fragment 0 with residue index 353 (CGN: G.H5.26)
Your selection 'GLU*,P0G,380-394,G.HN.*' yields:
   residue      residx    fragment      resSeq       GPCR        CGN
     GLU10           6           0         10       None    G.HN.27
     GLU15          11           0         15       None    G.HN.32
     GLU16          12           0         16       None    G.HN.33
     GLU21          17           0         21       None    G.HN.38
     GLU27          23           0         27       None    G.HN.44
     GLU50          46           0         50       None   G.s1h1.4
    GLU101          69           0        101       None    H.HA.17
    GLU104          72           0        104       None    H.HA.20
    GLU118          86           0        118       None   H.hahb.5
    GLU123          91           0        123       None     H.HB.1
    GLU145         113           0        145       None     H.HC.2
    GLU148         116           0        148       None     H.HC.5
    GLU155         123           0        155       None    H.HC.12
    GLU157         125           0        157       None     H.HD.1
    GLU164         132           0        164       None     H.HD.8
    GLU168         136           0        168       None    H.HD.12
    GLU209         175           0        209       None     G.S2.3
    GLU230         196           0        230       None     G.H2.3
    GLU268         227           0        268       None     G.H3.4
    GLU299         258           0        299       None     G.HG.6
    GLU309         268           0        309       None    G.HG.16
    GLU314         273           0        314       None   G.hgh4.4
    GLU322         281           0        322       None     G.H4.2
    GLU327         286           0        327       None     G.H4.7
    GLU330         289           0        330       None    G.H4.10
    GLU344         303           0        344       None    G.H4.24
    GLU370         329           0        370       None     G.H5.2
    GLU392         351           0        392       None    G.H5.24
      GLU3         356           1          3       None       None
     GLU10         363           1         10       None       None
     GLU12         365           1         12       None       None
    GLU130         483           1        130       None       None
    GLU138         491           1        138       None       None
    GLU172         525           1        172       None       None
    GLU215         568           1        215       None       None
    GLU226         579           1        226       None       None
    GLU260         613           1        260       None       None
     GLU17         709           2         17       None       None
     GLU22         714           2         22       None       None
     GLU42         734           2         42       None       None
     GLU47         739           2         47       None       None
     GLU58         750           2         58       None       None
     GLU63         755           2         63       None       None
     GLU30         760           3         30    1.29x29       None
     GLU62         792           3         62   12.48x48       None
    GLU107         837           3        107    3.26x26       None
    GLU122         852           3        122    3.41x41       None
    GLU180         907           3        180       None       None
    GLU188         915           3        188       None       None
    GLU225         952           3        225    5.64x64       None
    GLU237         964           3        237    5.76x76       None
    GLU268         970           3        268    6.30x30       None
    GLU306        1008           3        306    7.33x32       None
    GLU338        1040           3        338    8.56x56       None
    P0G395        1043           4        395       None       None
    ARG380         339           0        380       None    G.H5.12
    ASP381         340           0        381       None    G.H5.13
    ILE382         341           0        382       None    G.H5.14
    ILE383         342           0        383       None    G.H5.15
    GLN384         343           0        384       None    G.H5.16
    ARG385         344           0        385       None    G.H5.17
    MET386         345           0        386       None    G.H5.18
    HIS387         346           0        387       None    G.H5.19
    LEU388         347           0        388       None    G.H5.20
    ARG389         348           0        389       None    G.H5.21
    GLN390         349           0        390       None    G.H5.22
    TYR391         350           0        391       None    G.H5.23
    LEU393         352           0        393       None    G.H5.25
    LEU394         353           0        394       None    G.H5.26
      THR9           5           0          9       None    G.HN.26
     ASP11           7           0         11       None    G.HN.28
     GLN12           8           0         12       None    G.HN.29
     ARG13           9           0         13       None    G.HN.30
     ASN14          10           0         14       None    G.HN.31
     LYS17          13           0         17       None    G.HN.34
     ALA18          14           0         18       None    G.HN.35
     GLN19          15           0         19       None    G.HN.36
     ARG20          16           0         20       None    G.HN.37
     ALA22          18           0         22       None    G.HN.39
     ASN23          19           0         23       None    G.HN.40
     LYS24          20           0         24       None    G.HN.41
     LYS25          21           0         25       None    G.HN.42
     ILE26          22           0         26       None    G.HN.43
     LYS28          24           0         28       None    G.HN.45
     GLN29          25           0         29       None    G.HN.46
     LEU30          26           0         30       None    G.HN.47
     GLN31          27           0         31       None    G.HN.48
     LYS32          28           0         32       None    G.HN.49
     ASP33          29           0         33       None    G.HN.50
     LYS34          30           0         34       None    G.HN.51
     GLN35          31           0         35       None    G.HN.52
     VAL36          32           0         36       None    G.HN.53

PDB Queries

Now we grab a structure directly from the PDB, replicating the CLI command:

mdc_pdb.py 3SN6 -o 3SN6.gro

by using mdciao.cli.pdb. Check here or use the inline docstring for more info. Please note that we’re not storing the retrived structure on disk, but rather having it in memory as an mdtraj.Trajectory:

[10]:
xray3SN6= mdciao.cli.pdb("3SN6")
xray3SN6
Checking https://files.rcsb.org/download/3SN6.pdb ...Please cite the following 3rd party publication:
 * Crystal structure of the beta2 adrenergic receptor-Gs protein complex
   Rasmussen, S.G. et al., Nature 2011
   https://doi.org/10.1038/nature10361
[10]:
<mdtraj.Trajectory with 1 frames, 10269 atoms, 1319 residues, and unitcells at 0x7f46c157b940>

Now we can use the awesome nglviewer to 3D-visualize the freshly grabbed structure inside the notebook.

We need to import the module first, which needs to be installed in your Python environment. If you don’t we recommend you install it via pip:

pip install nglview
jupyter-nbextension enable nglview --py --sys-prefix

If you don’t feel like installing now, you can continue use the notebook.

[11]:
try:
    import nglview
    iwd = nglview.show_mdtraj(xray3SN6)
except ImportError:
    iwd = None
iwd

Fragmentation Heuristics

Now we go to fragmentation heuristics, replicating the CLI command:

mdc_fragments.py 3SN6.gro

but with the object xray3SN6 (the .gro-file comes below) and by using the cli.fragments method. Check here or the inline docstring for more info. Also note that cli.fragments simply wraps around mdciao.fragments.get_fragments, and you can use that method (or others in mdciao.fragments) directly.

[12]:
frags= mdciao.cli.fragment_overview(xray3SN6.top)
Auto-detected fragments with method 'chains'
fragment      0 with    349 AAs     THR9 (     0) -   LEU394 (348   ) (0)  resSeq jumps
fragment      1 with    340 AAs     GLN1 (   349) -   ASN340 (688   ) (1)
fragment      2 with     58 AAs     ASN5 (   689) -    ARG62 (746   ) (2)
fragment      3 with    443 AAs  ASN1002 (   747) -   CYS341 (1189  ) (3)  resSeq jumps
fragment      4 with    128 AAs     GLN1 (  1190) -   SER128 (1317  ) (4)
fragment      5 with      1 AAs  P0G1601 (  1318) -  P0G1601 (1318  ) (5)

Auto-detected fragments with method 'resSeq'
fragment      0 with     51 AAs     THR9 (     0) -    GLN59 (50    ) (0)
fragment      1 with    115 AAs    LYS88 (    51) -   VAL202 (165   ) (1)
fragment      2 with     51 AAs   SER205 (   166) -   MET255 (216   ) (2)
fragment      3 with    132 AAs   THR263 (   217) -   LEU394 (348   ) (3)
fragment      4 with    340 AAs     GLN1 (   349) -   ASN340 (688   ) (4)
fragment      5 with     58 AAs     ASN5 (   689) -    ARG62 (746   ) (5)
fragment      6 with    159 AAs  ASN1002 (   747) -  ALA1160 (905   ) (6)
fragment      7 with    146 AAs    GLU30 (   906) -   ARG175 (1051  ) (7)
fragment      8 with     61 AAs   GLN179 (  1052) -   ARG239 (1112  ) (8)
fragment      9 with     77 AAs   CYS265 (  1113) -   CYS341 (1189  ) (9)
fragment     10 with    128 AAs     GLN1 (  1190) -   SER128 (1317  ) (10)
fragment     11 with      1 AAs  P0G1601 (  1318) -  P0G1601 (1318  ) (11)

Auto-detected fragments with method 'resSeq+'
fragment      0 with    349 AAs     THR9 (     0) -   LEU394 (348   ) (0)  resSeq jumps
fragment      1 with    340 AAs     GLN1 (   349) -   ASN340 (688   ) (1)
fragment      2 with     58 AAs     ASN5 (   689) -    ARG62 (746   ) (2)
fragment      3 with    159 AAs  ASN1002 (   747) -  ALA1160 (905   ) (3)
fragment      4 with    284 AAs    GLU30 (   906) -   CYS341 (1189  ) (4)  resSeq jumps
fragment      5 with    128 AAs     GLN1 (  1190) -   SER128 (1317  ) (5)
fragment      6 with      1 AAs  P0G1601 (  1318) -  P0G1601 (1318  ) (6)

Auto-detected fragments with method 'lig_resSeq+'
fragment      0 with    349 AAs     THR9 (     0) -   LEU394 (348   ) (0)  resSeq jumps
fragment      1 with    340 AAs     GLN1 (   349) -   ASN340 (688   ) (1)
fragment      2 with     58 AAs     ASN5 (   689) -    ARG62 (746   ) (2)
fragment      3 with    159 AAs  ASN1002 (   747) -  ALA1160 (905   ) (3)
fragment      4 with    284 AAs    GLU30 (   906) -   CYS341 (1189  ) (4)  resSeq jumps
fragment      5 with    128 AAs     GLN1 (  1190) -   SER128 (1317  ) (5)
fragment      6 with      1 AAs  P0G1601 (  1318) -  P0G1601 (1318  ) (6)

Auto-detected fragments with method 'bonds'
fragment      0 with    349 AAs     THR9 (     0) -   LEU394 (348   ) (0)  resSeq jumps
fragment      1 with    340 AAs     GLN1 (   349) -   ASN340 (688   ) (1)
fragment      2 with     58 AAs     ASN5 (   689) -    ARG62 (746   ) (2)
fragment      3 with    443 AAs  ASN1002 (   747) -   CYS341 (1189  ) (3)  resSeq jumps
fragment      4 with    128 AAs     GLN1 (  1190) -   SER128 (1317  ) (4)
fragment      5 with      1 AAs  P0G1601 (  1318) -  P0G1601 (1318  ) (5)

Auto-detected fragments with method 'resSeq_bonds'
fragment      0 with     51 AAs     THR9 (     0) -    GLN59 (50    ) (0)
fragment      1 with    115 AAs    LYS88 (    51) -   VAL202 (165   ) (1)
fragment      2 with     51 AAs   SER205 (   166) -   MET255 (216   ) (2)
fragment      3 with    132 AAs   THR263 (   217) -   LEU394 (348   ) (3)
fragment      4 with    340 AAs     GLN1 (   349) -   ASN340 (688   ) (4)
fragment      5 with     58 AAs     ASN5 (   689) -    ARG62 (746   ) (5)
fragment      6 with    159 AAs  ASN1002 (   747) -  ALA1160 (905   ) (6)
fragment      7 with    146 AAs    GLU30 (   906) -   ARG175 (1051  ) (7)
fragment      8 with     61 AAs   GLN179 (  1052) -   ARG239 (1112  ) (8)
fragment      9 with     77 AAs   CYS265 (  1113) -   CYS341 (1189  ) (9)
fragment     10 with    128 AAs     GLN1 (  1190) -   SER128 (1317  ) (10)
fragment     11 with      1 AAs  P0G1601 (  1318) -  P0G1601 (1318  ) (11)

Auto-detected fragments with method 'None'
fragment      0 with   1319 AAs     THR9 (     0) -  P0G1601 (1318  ) (0)  resSeq jumps

This call iterates through all available heuristics on the mdtrajTopology, arriving at different definitions of molecular fragments. They are all returned as a dictionary:

[13]:
frags.keys()
[13]:
dict_keys(['chains', 'resSeq', 'resSeq+', 'lig_resSeq+', 'bonds', 'resSeq_bonds', 'None'])

Please note that since xray3SN6 comes from the PDB directly, it contains chain descriptors, s.t. the method chains (first one) can simply list the chain information encoded into the PDB, which you can check here:

Auto-detected fragments with method 'chains'
fragment      0 with    349 AAs     THR9 (     0) -   LEU394 (348   ) (0)  resSeq jumps
fragment      1 with    340 AAs     GLN1 (   349) -   ASN340 (688   ) (1)
fragment      2 with     58 AAs     ASN5 (   689) -    ARG62 (746   ) (2)
fragment      3 with    443 AAs  ASN1002 (   747) -   CYS341 (1189  ) (3)  resSeq jumps
fragment      4 with    128 AAs     GLN1 (  1190) -   SER128 (1317  ) (4)
fragment      5 with      1 AAs  P0G1601 (  1318) -  P0G1601 (1318  ) (5)       5 with    1 AAs  P0G1601           (1318) -  P0G1601           (1318) (5)

These fragments are:

  1. G-protein \(\alpha\) sub-unit

  2. G-protein \(\beta\) sub-unit

  3. G-protein \(\gamma\) sub-unit

  4. \(\beta_2\) adrenergic receptor, together with the bacteriophage T4 lysozyme as N-terminus

  5. VHH nanobody

  6. Ligand P0G

However, we loose that chain information if we store the structure as .gro, which doesn’t encode for chains (i.e., the entire topology is put into a single chain).

[14]:
from tempfile import NamedTemporaryFile
with NamedTemporaryFile(suffix=".gro") as tmpgro:
    xray3SN6.save(tmpgro.name)
    xray3SN6gro = md.load(tmpgro.name)
mdciao.cli.fragment_overview(xray3SN6gro.top, methods=["chains", "lig_resSeq+"]);
Auto-detected fragments with method 'chains'
fragment      0 with   1319 AAs     THR9 (     0) -  P0G1601 (1318  ) (0)  resSeq jumps

Auto-detected fragments with method 'lig_resSeq+'
fragment      0 with    349 AAs     THR9 (     0) -   LEU394 (348   ) (0)  resSeq jumps
fragment      1 with    340 AAs     GLN1 (   349) -   ASN340 (688   ) (1)
fragment      2 with     58 AAs     ASN5 (   689) -    ARG62 (746   ) (2)
fragment      3 with    159 AAs  ASN1002 (   747) -  ALA1160 (905   ) (3)
fragment      4 with    284 AAs    GLU30 (   906) -   CYS341 (1189  ) (4)  resSeq jumps
fragment      5 with    128 AAs     GLN1 (  1190) -   SER128 (1317  ) (5)
fragment      6 with      1 AAs  P0G1601 (  1318) -  P0G1601 (1318  ) (6)

The lig_resSeq+ (the current default) has attempted to recover some meaningful fragments, closely resembling the original chains:

  1. G-protein \(\alpha\) sub-unit

  2. G-protein \(\beta\) sub-unit

  3. G-protein \(\gamma\) sub-unit

  4. Bacteriophage T4 lysozyme as N-terminus

  5. \(\beta_2\) adrenergic receptor

  6. VHH nanobody

  7. Ligand P0G

The former fragment 3 (4TL-\(\beta_2\)AR) chain has been broken up into T4L and \(\beta_2\)AR.

Interfaces

Now we move to a more elaborate command:

mdc_interface.py prot.pdb traj.xtc -fg1 0 -fg2 3 --GPCR adrb2_human --CGN 3SN6 -t "3SN6 beta2AR-Galpha interface" -ni

and replicate it using cli.interface. Check the docs here or in the method’s docstring.

Additionally, we now have two other notebooks explicitly devoted to the representation of interfaces:

[15]:
mdciao.cli.interface(traj,
                     frag_idxs_group_1=[0],
                     frag_idxs_group_2=[3],
                     GPCR_uniprot=GPCR,
                     CGN_PDB=CGN,
                     title="3SN6 beta2AR-Galpha interface",
                     accept_guess=True,
                     plot_timedep=False,
                     no_disk=True)
Will compute contact frequencies for trajectories:
<mdtraj.Trajectory with 280 frames, 8384 atoms, 1044 residues, and unitcells>
 with a stride of 1 frames
Using method 'lig_resSeq+' these fragments were found
fragment      0 with    354 AAs     LEU4 (     0) -   LEU394 (353   ) (0)  resSeq jumps
fragment      1 with    340 AAs     GLN1 (   354) -   ASN340 (693   ) (1)
fragment      2 with     66 AAs     ALA2 (   694) -    PHE67 (759   ) (2)
fragment      3 with    283 AAs    GLU30 (   760) -   LEU340 (1042  ) (3)  resSeq jumps
fragment      4 with      1 AAs   P0G395 (  1043) -   P0G395 (1043  ) (4)
GPCR-labels align best with fragments: [3] (first-last: GLU30-LEU340).

These are the GPCR fragments mapped onto your topology:
    TM1 with     32 AAs    GLU30@1.29x29   (   760) -    PHE61@1.60x60   (791   ) (TM1)
   ICL1 with      4 AAs    GLU62@12.48x48  (   792) -    GLN65@12.51x51  (795   ) (ICL1)
    TM2 with     32 AAs    THR66@2.37x37   (   796) -    LYS97@2.68x67   (827   ) (TM2)
   ECL1 with      4 AAs    MET98@23.49x49  (   828) -   PHE101@23.52x52  (831   ) (ECL1)
    TM3 with     36 AAs   GLY102@3.21x21   (   832) -   SER137@3.56x56   (867   ) (TM3)
   ICL2 with      8 AAs   PRO138@34.50x50  (   868) -   LEU145@34.57x57  (875   ) (ICL2)
    TM4 with     27 AAs   THR146@4.38x38   (   876) -   HIS172@4.64x64   (902   ) (TM4)
   ECL2 with     20 AAs   TRP173           (   903) -   THR195           (922   ) (ECL2)  resSeq jumps
    TM5 with     42 AAs   ASN196@5.35x36   (   923) -   GLU237@5.76x76   (964   ) (TM5)
   ICL3 with      2 AAs   GLY238           (   965) -   ARG239           (966   ) (ICL3)
    TM6 with     35 AAs   CYS265@6.27x27   (   967) -   GLN299@6.61x61   (1001  ) (TM6)
   ECL3 with      4 AAs   ASP300           (  1002) -   ILE303           (1005  ) (ECL3)
    TM7 with     25 AAs   ARG304@7.31x30   (  1006) -   ARG328@7.55x55   (1030  ) (TM7)
     H8 with     12 AAs   SER329@8.47x47   (  1031) -   LEU340@8.58x58   (1042  ) (H8)
CGN-labels align best with fragments: [0] (first-last: LEU4-LEU394).

These are the CGN fragments mapped onto your topology:
   G.HN with     28 AAs     THR9@G.HN.26   (     5) -    VAL36@G.HN.53   (32    ) (G.HN)
 G.hns1 with      3 AAs    TYR37@G.hns1.1  (    33) -    ALA39@G.hns1.3  (35    ) (G.hns1)
   G.S1 with      7 AAs    THR40@G.S1.1    (    36) -    LEU46@G.S1.7    (42    ) (G.S1)
 G.s1h1 with      6 AAs    GLY47@G.s1h1.1  (    43) -    GLY52@G.s1h1.6  (48    ) (G.s1h1)
   G.H1 with      7 AAs    LYS53@G.H1.1    (    49) -    GLN59@G.H1.7    (55    ) (G.H1)
   H.HA with     26 AAs    LYS88@H.HA.4    (    56) -   LEU113@H.HA.29   (81    ) (H.HA)
 H.hahb with      9 AAs   VAL114@H.hahb.1  (    82) -   PRO122@H.hahb.9  (90    ) (H.hahb)
   H.HB with     14 AAs   GLU123@H.HB.1    (    91) -   ASN136@H.HB.14   (104   ) (H.HB)
 H.hbhc with      7 AAs   VAL137@H.hbhc.1  (   105) -   PRO143@H.hbhc.15 (111   ) (H.hbhc)
   H.HC with     12 AAs   PRO144@H.HC.1    (   112) -   GLU155@H.HC.12   (123   ) (H.HC)
 H.hchd with      1 AAs   ASP156@H.hchd.1  (   124) -   ASP156@H.hchd.1  (124   ) (H.hchd)
   H.HD with     12 AAs   GLU157@H.HD.1    (   125) -   GLU168@H.HD.12   (136   ) (H.HD)
 H.hdhe with      5 AAs   TYR169@H.hdhe.1  (   137) -   ASP173@H.hdhe.5  (141   ) (H.hdhe)
   H.HE with     13 AAs   CYS174@H.HE.1    (   142) -   LYS186@H.HE.13   (154   ) (H.HE)
 H.hehf with      7 AAs   GLN187@H.hehf.1  (   155) -   SER193@H.hehf.7  (161   ) (H.hehf)
   H.HF with      6 AAs   ASP194@H.HF.1    (   162) -   ARG199@H.HF.6    (167   ) (H.HF)
 G.hfs2 with      5 AAs   CYS200@G.hfs2.1  (   168) -   GLY206@G.hfs2.7  (172   ) (G.hfs2)  resSeq jumps
   G.S2 with      8 AAs   ILE207@G.S2.1    (   173) -   VAL214@G.S2.8    (180   ) (G.S2)
 G.s2s3 with      2 AAs   ASP215@G.s2s3.1  (   181) -   LYS216@G.s2s3.2  (182   ) (G.s2s3)
   G.S3 with      8 AAs   VAL217@G.S3.1    (   183) -   VAL224@G.S3.8    (190   ) (G.S3)
 G.s3h2 with      3 AAs   GLY225@G.s3h2.1  (   191) -   GLN227@G.s3h2.3  (193   ) (G.s3h2)
   G.H2 with     10 AAs   ARG228@G.H2.1    (   194) -   CYS237@G.H2.10   (203   ) (G.H2)
 G.h2s4 with      5 AAs   PHE238@G.h2s4.1  (   204) -   THR242@G.h2s4.5  (208   ) (G.h2s4)
   G.S4 with      7 AAs   ALA243@G.S4.1    (   209) -   ALA249@G.S4.7    (215   ) (G.S4)
 G.s4h3 with      8 AAs   SER250@G.s4h3.1  (   216) -   ASN264@G.s4h3.15 (223   ) (G.s4h3)  resSeq jumps
   G.H3 with     18 AAs   ARG265@G.H3.1    (   224) -   LEU282@G.H3.18   (241   ) (G.H3)
 G.h3s5 with      3 AAs   ARG283@G.h3s5.1  (   242) -   ILE285@G.h3s5.3  (244   ) (G.h3s5)
   G.S5 with      7 AAs   SER286@G.S5.1    (   245) -   ASN292@G.S5.7    (251   ) (G.S5)
 G.s5hg with      1 AAs   LYS293@G.s5hg.1  (   252) -   LYS293@G.s5hg.1  (252   ) (G.s5hg)
   G.HG with     17 AAs   GLN294@G.HG.1    (   253) -   ASP310@G.HG.17   (269   ) (G.HG)
 G.hgh4 with     10 AAs   TYR311@G.hgh4.1  (   270) -   THR320@G.hgh4.10 (279   ) (G.hgh4)
   G.H4 with     27 AAs   PRO321@G.H4.1    (   280) -   ARG347@G.H4.27   (306   ) (G.H4)
 G.h4s6 with     11 AAs   ILE348@G.h4s6.1  (   307) -   TYR358@G.h4s6.20 (317   ) (G.h4s6)
   G.S6 with      5 AAs   CYS359@G.S6.1    (   318) -   PHE363@G.S6.5    (322   ) (G.S6)
 G.s6h5 with      5 AAs   THR364@G.s6h5.1  (   323) -   ASP368@G.s6h5.5  (327   ) (G.s6h5)
   G.H5 with     26 AAs   THR369@G.H5.1    (   328) -   LEU394@G.H5.26   (353   ) (G.H5)

Computing distances in the interface between fragments
0
and
3
The interface is restricted to the residues within 35.0 Angstrom of each other in the reference topology.
Computing interface...done!

From 100182 potential group_1-group_2 distances, the interface was reduced to only 20307 potential contacts.
If this number is still too high (i.e. the computation is too slow) consider using a smaller interface cutoff

  0%|                                                                                                                                                               | 0/1 [00:00<?, ?it/s]
Streaming over trajectory object nr.   0 (   280 frames,    280 with stride  1) in chunks of 10000 frames. Now at chunk nr    0, frames so far    280
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:04<00:00,  4.89s/it]
These 20 contacts capture 12.03 (~77%) of the total frequency 15.54 (over 20307 input contacts)
As orientation value, the first 31 ctcs already capture 90.0% of 15.54.
The 31-th contact has a frequency of 0.14


    freq                            label residue idxs    sum
0   1.00   D381@G.H5.13    - Q229@5.68x68      340 956   1.00
1   1.00   R385@G.H5.17    - Q229@5.68x68      344 956   2.00
2   1.00   D381@G.H5.13    - K232@5.71x71      340 959   3.00
3   0.98   Q384@G.H5.16    - I135@3.54x54      343 865   3.98
4   0.96      T350@G.h4s6.3   - R239@ICL3      309 966   4.93
5   0.85   E392@G.H5.24    - T274@6.36x36      351 976   5.79
6   0.68   Q384@G.H5.16    - Q229@5.68x68      343 956   6.47
7   0.64   H387@G.H5.19    - A134@3.53x53      346 864   7.10
8   0.57   Y391@G.H5.23    - T274@6.36x36      350 976   7.67
9   0.55   Y358@G.h4s6.20  - E237@5.76x76      317 964   8.23
10  0.49   R385@G.H5.17    - K232@5.71x71      344 959   8.71
11  0.47   L394@G.H5.26    - K270@6.32x32      353 972   9.19
12  0.47      L346@G.H4.26    - R239@ICL3      305 966   9.65
13  0.45   Y358@G.h4s6.20  - S236@5.75x75      317 963  10.11
14  0.39   Y360@G.S6.2     - S236@5.75x75      319 963  10.50
15  0.35   Q384@G.H5.16    - E225@5.64x64      343 952  10.85
16  0.34  H41@G.S1.2      - F139@34.51x51       37 869  11.19
17  0.31   Q384@G.H5.16    - T136@3.55x55      343 866  11.49
18  0.28  D139@H.hbhc.3   - K140@34.52x52      107 870  11.77
19  0.26   Y358@G.h4s6.20  - I233@5.72x72      317 960  12.03

             label  freq
0     Q384@G.H5.16  2.32
1     D381@G.H5.13  2.00
2     R385@G.H5.17  1.49
3   Y358@G.h4s6.20  1.26
4    T350@G.h4s6.3  0.96
5     E392@G.H5.24  0.85
6     H387@G.H5.19  0.64
7     Y391@G.H5.23  0.57
8     L394@G.H5.26  0.47
9     L346@G.H4.26  0.47
10     Y360@G.S6.2  0.39
11      H41@G.S1.2  0.34
12   D139@H.hbhc.3  0.28

            label  freq
0    Q229@5.68x68  2.68
1    K232@5.71x71  1.48
2       R239@ICL3  1.42
3    T274@6.36x36  1.42
4    I135@3.54x54  0.98
5    S236@5.75x75  0.85
6    A134@3.53x53  0.64
7    E237@5.76x76  0.55
8    K270@6.32x32  0.47
9    E225@5.64x64  0.35
10  F139@34.51x51  0.34
11   T136@3.55x55  0.31
12  K140@34.52x52  0.28
13   I233@5.72x72  0.26
Drawing this many dots (637 residues + 53 padding spaces) in a panel 10.0 inches wide/high
forces too small dotsizes and fontsizes. If crowding effects occur, either reduce the
number of residues or increase the panel size
[15]:
<mdciao.contacts.contacts.ContactGroup at 0x7f46a1095e50>
../_images/notebooks_Tutorial_34_6.png
../_images/notebooks_Tutorial_34_7.png
../_images/notebooks_Tutorial_34_8.png

Sites

Now we use a different approach. Instead of letting mdciao discover contacts automatically, we list them beforehand as site dictionaries, and feed this dictionaries to directly to the method cli.sites. The CLI command we’re replicating is:

mdc_sites.py prot.pdb traj.xtc --site tip.json -at -nf -sa #sa: short AA-names

However, in the API-spirit, we’re not even using a file on disk to define the site, but create it on the fly as a Python dictionary:

[16]:
my_site = {
    "name":"interface small",
    "pairs": {"AAresSeq": [
        "L394-K270",
        "D381-Q229",
        "Q384-Q229",
        "R385-Q229",
        "D381-K232",
        "Q384-I135"
        ]}}
[17]:
sites = mdciao.cli.sites([my_site],
                         traj,
                         no_disk=True,
                         plot_atomtypes=True,
                         fragments=[None],
                         short_AA_names=True)
Will compute the sites
 site dict with name 'interface small'
in the trajectories:
<mdtraj.Trajectory with 280 frames, 8384 atoms, 1044 residues, and unitcells>
 with a stride of 1 frames.

Using method 'None' these fragments were found
fragment      0 with   1044 AAs     LEU4 (     0) -   P0G395 (1043  ) (0)  resSeq jumps
These are the residues that could be found:
   residue      residx    fragment      resSeq       GPCR        CGN
    ASP381         340           0        340       None       None
    GLN384         343           0        343       None       None
    ARG385         344           0        344       None       None
    LEU394         353           0        353       None       None
    ILE135         865           0        865       None       None
    GLN229         956           0        956       None       None
    LYS232         959           0        959       None       None
    LYS270         972           0        972       None       None
Streaming over trajectory object nr.   0 (   280 frames,    280 with stride  1) in chunks of 10000 frames. Now at chunk nr    0, frames so far    280
   freq                         label residue idxs   sum
0  0.47  L394@frag0      - K270@frag0      353 972  0.47
1  1.00  D381@frag0      - Q229@frag0      340 956  1.47
2  0.68  Q384@frag0      - Q229@frag0      343 956  2.15
3  1.00  R385@frag0      - Q229@frag0      344 956  3.15
4  1.00  D381@frag0      - K232@frag0      340 959  4.15
5  0.98  Q384@frag0      - I135@frag0      343 865  5.13

../_images/notebooks_Tutorial_37_1.png

The return value sites is a dictionary keyed with the site names (interface small in this case) and valued with mdciao's ContactGroup-objects.

[18]:
sites
[18]:
{'interface small': <mdciao.contacts.contacts.ContactGroup at 0x7f468fdc7f10>}

Contact Groups

The ContactGroup class is at the center of mdciao and offers extensive of manipulation through it’s methods. A helpful analogy would be that, what the Trajectory is to mdtraj, the ContactGroup is to mdciao. Both classes:

  • store a lot of organized information for further use

  • have attributes and methods that can be used standalone

  • can themselves be the input for other methods (of mdtraj and mdciao, respectively).

  • are rarely created from scratch, but rather generated by the module itself.

The best way to learn about the ContactGroup is to inspect it with the autocomplete feature if IPython and check the informative names of the attributes and methods.

If you’re in a hurry, mdciao offers a quick way to generate a ContactGroup to play around with and investigate it’s methods and attributes:

[19]:
from mdciao import examples
CG = examples.ContactGroupL394()
CG
[19]:
<mdciao.contacts.contacts.ContactGroup at 0x7f469cc43610>

However, instead of using CG now, we go back to object sites that resulted from using cli.sites above. The returned sites-object is a dictionary keyed with site names (you can compute different sites simultaneously) and valued with ContactGroups. In our case (check above) we called it it interface small

[20]:
mysite = sites["interface small"]

Frequencies as Bars

We use the class’s method plot_freqs_as_bars to produce the now familiar neighborhood plots:

[21]:
mysite.plot_freqs_as_bars(3.5,
                          shorten_AAs=True,
                          defrag="@",
                          atom_types=True);
../_images/notebooks_Tutorial_45_0.png

Frequencies as Distributions

It is also very useful to inspect the residue-residue distances of any ContactGroup by looking at their overall distributions instead of their frequencies, since the hard cutoffs can sometimes hide part of the story:

[22]:
jax = mysite.plot_distance_distributions(bins=15,
                                         defrag="@",
                                         ctc_cutoff_Ang=3.5
                                        )
../_images/notebooks_Tutorial_47_0.png

Please note that, because the example dataset is quite small (280 frames and 2.8 ns) and we are simply histogramming (=counting), the curves aren’t very smooth. Histograms of real data will look better.

Frequencies as Violins

Other ways of looking at distance-data as distributions is to use violin plots, which uses a density estimator (check the bw_method-parameter here) to generate smooth densities and plot them vertically. This is somehow in-between the histogram plot and the frequency-bar plot:

[23]:
jax = mysite.plot_violins(defrag="@",
                          ctc_cutoff_Ang=3.5,
                          color="tab10",
                         )
../_images/notebooks_Tutorial_49_0.png

Note

In principle, we could also use a density estimator in plot_distance_distributionsto make them look smooth, but we have decided to leave them as.

Comparisons Between Contact Groups

Finally, we replicate the CLI comand

mdc_compare.py 3SN6.X.ARG131@4.0_Ang.dat 3SN6.MD.ARG131@4.0_Ang.dat -k Xray,MD -t "3SN6 cutoff 4AA" -a R131

in API mode. This looks different because most of the inputs will now be Python objects in memory.

First, we create the Xray and the MD ContactGroups separately:

[24]:
R131_Xray = mdciao.cli.residue_neighborhoods("R131", xray3SN6,
                                             ctc_cutoff_Ang=4,
                                             no_disk=True,
                                             GPCR_uniprot=GPCR,
                                             figures=False,
                                             CGN_PDB=CGN,
                                             accept_guess=True)["neighborhoods"];
Will compute contact frequencies for (1 items):
<mdtraj.Trajectory with 1 frames, 10269 atoms, 1319 residues, and unitcells>
 with a stride of 1 frames
Using method 'lig_resSeq+' these fragments were found
fragment      0 with    349 AAs     THR9 (     0) -   LEU394 (348   ) (0)  resSeq jumps
fragment      1 with    340 AAs     GLN1 (   349) -   ASN340 (688   ) (1)
fragment      2 with     58 AAs     ASN5 (   689) -    ARG62 (746   ) (2)
fragment      3 with    159 AAs  ASN1002 (   747) -  ALA1160 (905   ) (3)
fragment      4 with    284 AAs    GLU30 (   906) -   CYS341 (1189  ) (4)  resSeq jumps
fragment      5 with    128 AAs     GLN1 (  1190) -   SER128 (1317  ) (5)
fragment      6 with      1 AAs  P0G1601 (  1318) -  P0G1601 (1318  ) (6)
GPCR-labels align best with fragments: [4] (first-last: GLU30-CYS341).

These are the GPCR fragments mapped onto your topology:
    TM1 with     32 AAs    GLU30@1.29x29   (   906) -    PHE61@1.60x60   (937   ) (TM1)
   ICL1 with      4 AAs    GLU62@12.48x48  (   938) -    GLN65@12.51x51  (941   ) (ICL1)
    TM2 with     32 AAs    THR66@2.37x37   (   942) -    LYS97@2.68x67   (973   ) (TM2)
   ECL1 with      4 AAs    THR98@23.49x49  (   974) -   PHE101@23.52x52  (977   ) (ECL1)
    TM3 with     36 AAs   GLY102@3.21x21   (   978) -   SER137@3.56x56   (1013  ) (TM3)
   ICL2 with      8 AAs   PRO138@34.50x50  (  1014) -   LEU145@34.57x57  (1021  ) (ICL2)
    TM4 with     27 AAs   THR146@4.38x38   (  1022) -   HIS172@4.64x64   (1048  ) (TM4)
   ECL2 with     20 AAs   TRP173           (  1049) -   THR195           (1068  ) (ECL2)  resSeq jumps
    TM5 with     42 AAs   ASN196@5.35x36   (  1069) -   GLU237@5.76x76   (1110  ) (TM5)
   ICL3 with      2 AAs   GLY238           (  1111) -   ARG239           (1112  ) (ICL3)
    TM6 with     35 AAs   CYS265@6.27x27   (  1113) -   GLN299@6.61x61   (1147  ) (TM6)
   ECL3 with      4 AAs   ASP300           (  1148) -   ILE303           (1151  ) (ECL3)
    TM7 with     25 AAs   ARG304@7.31x30   (  1152) -   ARG328@7.55x55   (1176  ) (TM7)
     H8 with     13 AAs   SER329@8.47x47   (  1177) -   CYS341@8.59x59   (1189  ) (H8)
CGN-labels align best with fragments: [0] (first-last: THR9-LEU394).

These are the CGN fragments mapped onto your topology:
   G.HN with     28 AAs     THR9@G.HN.26   (     0) -    VAL36@G.HN.53   (27    ) (G.HN)
 G.hns1 with      3 AAs    TYR37@G.hns1.1  (    28) -    ALA39@G.hns1.3  (30    ) (G.hns1)
   G.S1 with      7 AAs    THR40@G.S1.1    (    31) -    LEU46@G.S1.7    (37    ) (G.S1)
 G.s1h1 with      6 AAs    GLY47@G.s1h1.1  (    38) -    GLY52@G.s1h1.6  (43    ) (G.s1h1)
   G.H1 with      7 AAs    LYS53@G.H1.1    (    44) -    GLN59@G.H1.7    (50    ) (G.H1)
   H.HA with     26 AAs    LYS88@H.HA.4    (    51) -   LEU113@H.HA.29   (76    ) (H.HA)
 H.hahb with      9 AAs   VAL114@H.hahb.1  (    77) -   PRO122@H.hahb.9  (85    ) (H.hahb)
   H.HB with     14 AAs   GLU123@H.HB.1    (    86) -   ASN136@H.HB.14   (99    ) (H.HB)
 H.hbhc with      7 AAs   VAL137@H.hbhc.1  (   100) -   PRO143@H.hbhc.15 (106   ) (H.hbhc)
   H.HC with     12 AAs   PRO144@H.HC.1    (   107) -   GLU155@H.HC.12   (118   ) (H.HC)
 H.hchd with      1 AAs   ASP156@H.hchd.1  (   119) -   ASP156@H.hchd.1  (119   ) (H.hchd)
   H.HD with     12 AAs   GLU157@H.HD.1    (   120) -   GLU168@H.HD.12   (131   ) (H.HD)
 H.hdhe with      5 AAs   TYR169@H.hdhe.1  (   132) -   ASP173@H.hdhe.5  (136   ) (H.hdhe)
   H.HE with     13 AAs   CYS174@H.HE.1    (   137) -   LYS186@H.HE.13   (149   ) (H.HE)
 H.hehf with      7 AAs   GLN187@H.hehf.1  (   150) -   SER193@H.hehf.7  (156   ) (H.hehf)
   H.HF with      6 AAs   ASP194@H.HF.1    (   157) -   ARG199@H.HF.6    (162   ) (H.HF)
 G.hfs2 with      5 AAs   CYS200@G.hfs2.1  (   163) -   GLY206@G.hfs2.7  (167   ) (G.hfs2)  resSeq jumps
   G.S2 with      8 AAs   ILE207@G.S2.1    (   168) -   VAL214@G.S2.8    (175   ) (G.S2)
 G.s2s3 with      2 AAs   ASP215@G.s2s3.1  (   176) -   LYS216@G.s2s3.2  (177   ) (G.s2s3)
   G.S3 with      8 AAs   VAL217@G.S3.1    (   178) -   VAL224@G.S3.8    (185   ) (G.S3)
 G.s3h2 with      3 AAs   GLY225@G.s3h2.1  (   186) -   GLN227@G.s3h2.3  (188   ) (G.s3h2)
   G.H2 with     10 AAs   ARG228@G.H2.1    (   189) -   CYS237@G.H2.10   (198   ) (G.H2)
 G.h2s4 with      5 AAs   PHE238@G.h2s4.1  (   199) -   THR242@G.h2s4.5  (203   ) (G.h2s4)
   G.S4 with      7 AAs   ALA243@G.S4.1    (   204) -   ALA249@G.S4.7    (210   ) (G.S4)
 G.s4h3 with      8 AAs   SER250@G.s4h3.1  (   211) -   ASN264@G.s4h3.15 (218   ) (G.s4h3)  resSeq jumps
   G.H3 with     18 AAs   ARG265@G.H3.1    (   219) -   LEU282@G.H3.18   (236   ) (G.H3)
 G.h3s5 with      3 AAs   ARG283@G.h3s5.1  (   237) -   ILE285@G.h3s5.3  (239   ) (G.h3s5)
   G.S5 with      7 AAs   SER286@G.S5.1    (   240) -   ASN292@G.S5.7    (246   ) (G.S5)
 G.s5hg with      1 AAs   LYS293@G.s5hg.1  (   247) -   LYS293@G.s5hg.1  (247   ) (G.s5hg)
   G.HG with     17 AAs   GLN294@G.HG.1    (   248) -   ASP310@G.HG.17   (264   ) (G.HG)
 G.hgh4 with     10 AAs   TYR311@G.hgh4.1  (   265) -   THR320@G.hgh4.10 (274   ) (G.hgh4)
   G.H4 with     27 AAs   PRO321@G.H4.1    (   275) -   ARG347@G.H4.27   (301   ) (G.H4)
 G.h4s6 with     11 AAs   ILE348@G.h4s6.1  (   302) -   TYR358@G.h4s6.20 (312   ) (G.h4s6)
   G.S6 with      5 AAs   CYS359@G.S6.1    (   313) -   PHE363@G.S6.5    (317   ) (G.S6)
 G.s6h5 with      5 AAs   THR364@G.s6h5.1  (   318) -   ASP368@G.s6h5.5  (322   ) (G.s6h5)
   G.H5 with     26 AAs   THR369@G.H5.1    (   323) -   LEU394@G.H5.26   (348   ) (G.H5)

Will compute neighborhoods for the residues
R131
excluding 4 nearest neighbors

   residue      residx    fragment      resSeq       GPCR        CGN
    ARG131        1007           4        131    3.50x50       None

Pre-computing likely neighborhoods by reducing the neighbor-list
to those within 15 Angstrom in the first frame of reference geom
'<mdtraj.Trajectory with 1 frames, 10269 atoms, 1319 residues, and unitcells>':...done!
From 1310 potential distances, the neighborhoods have been reduced to only 120 potential contacts.
If this number is still too high (i.e. the computation is too slow), consider using a smaller nlist_cutoff_Ang
Streaming over trajectory object nr.   0 (     1 frames,      1 with stride  1) in chunks of 10000 frames. Now at chunk nr    0, frames so far      1
#idx   freq      contact       fragments     res_idxs      ctc_idx  Sum
1:     1.00   ARG131-TYR391       4-0        1007-345        14     1.00
2:     1.00   ARG131-TYR326       4-4        1007-1174      111     2.00
3:     1.00   ARG131-ILE278       4-4        1007-1126       97     3.00
These 4 contacts capture 3.00 (~100%) of the total frequency 3.00 (over 120 input contacts)
As orientation value, the first 3 ctcs already capture 90.0% of 3.00.
The 3-th contact has a frequency of 1.00

[25]:
R131_MD = mdciao.cli.residue_neighborhoods("R131",traj,
                                           ctc_cutoff_Ang=4,
                                           no_disk=True,
                                           GPCR_uniprot=GPCR,
                                           figures=False,
                                           CGN_PDB=CGN,
                                           accept_guess=True)["neighborhoods"];
Will compute contact frequencies for (1 items):
<mdtraj.Trajectory with 280 frames, 8384 atoms, 1044 residues, and unitcells>
 with a stride of 1 frames
Using method 'lig_resSeq+' these fragments were found
fragment      0 with    354 AAs     LEU4 (     0) -   LEU394 (353   ) (0)  resSeq jumps
fragment      1 with    340 AAs     GLN1 (   354) -   ASN340 (693   ) (1)
fragment      2 with     66 AAs     ALA2 (   694) -    PHE67 (759   ) (2)
fragment      3 with    283 AAs    GLU30 (   760) -   LEU340 (1042  ) (3)  resSeq jumps
fragment      4 with      1 AAs   P0G395 (  1043) -   P0G395 (1043  ) (4)
GPCR-labels align best with fragments: [3] (first-last: GLU30-LEU340).

These are the GPCR fragments mapped onto your topology:
    TM1 with     32 AAs    GLU30@1.29x29   (   760) -    PHE61@1.60x60   (791   ) (TM1)
   ICL1 with      4 AAs    GLU62@12.48x48  (   792) -    GLN65@12.51x51  (795   ) (ICL1)
    TM2 with     32 AAs    THR66@2.37x37   (   796) -    LYS97@2.68x67   (827   ) (TM2)
   ECL1 with      4 AAs    MET98@23.49x49  (   828) -   PHE101@23.52x52  (831   ) (ECL1)
    TM3 with     36 AAs   GLY102@3.21x21   (   832) -   SER137@3.56x56   (867   ) (TM3)
   ICL2 with      8 AAs   PRO138@34.50x50  (   868) -   LEU145@34.57x57  (875   ) (ICL2)
    TM4 with     27 AAs   THR146@4.38x38   (   876) -   HIS172@4.64x64   (902   ) (TM4)
   ECL2 with     20 AAs   TRP173           (   903) -   THR195           (922   ) (ECL2)  resSeq jumps
    TM5 with     42 AAs   ASN196@5.35x36   (   923) -   GLU237@5.76x76   (964   ) (TM5)
   ICL3 with      2 AAs   GLY238           (   965) -   ARG239           (966   ) (ICL3)
    TM6 with     35 AAs   CYS265@6.27x27   (   967) -   GLN299@6.61x61   (1001  ) (TM6)
   ECL3 with      4 AAs   ASP300           (  1002) -   ILE303           (1005  ) (ECL3)
    TM7 with     25 AAs   ARG304@7.31x30   (  1006) -   ARG328@7.55x55   (1030  ) (TM7)
     H8 with     12 AAs   SER329@8.47x47   (  1031) -   LEU340@8.58x58   (1042  ) (H8)
CGN-labels align best with fragments: [0] (first-last: LEU4-LEU394).

These are the CGN fragments mapped onto your topology:
   G.HN with     28 AAs     THR9@G.HN.26   (     5) -    VAL36@G.HN.53   (32    ) (G.HN)
 G.hns1 with      3 AAs    TYR37@G.hns1.1  (    33) -    ALA39@G.hns1.3  (35    ) (G.hns1)
   G.S1 with      7 AAs    THR40@G.S1.1    (    36) -    LEU46@G.S1.7    (42    ) (G.S1)
 G.s1h1 with      6 AAs    GLY47@G.s1h1.1  (    43) -    GLY52@G.s1h1.6  (48    ) (G.s1h1)
   G.H1 with      7 AAs    LYS53@G.H1.1    (    49) -    GLN59@G.H1.7    (55    ) (G.H1)
   H.HA with     26 AAs    LYS88@H.HA.4    (    56) -   LEU113@H.HA.29   (81    ) (H.HA)
 H.hahb with      9 AAs   VAL114@H.hahb.1  (    82) -   PRO122@H.hahb.9  (90    ) (H.hahb)
   H.HB with     14 AAs   GLU123@H.HB.1    (    91) -   ASN136@H.HB.14   (104   ) (H.HB)
 H.hbhc with      7 AAs   VAL137@H.hbhc.1  (   105) -   PRO143@H.hbhc.15 (111   ) (H.hbhc)
   H.HC with     12 AAs   PRO144@H.HC.1    (   112) -   GLU155@H.HC.12   (123   ) (H.HC)
 H.hchd with      1 AAs   ASP156@H.hchd.1  (   124) -   ASP156@H.hchd.1  (124   ) (H.hchd)
   H.HD with     12 AAs   GLU157@H.HD.1    (   125) -   GLU168@H.HD.12   (136   ) (H.HD)
 H.hdhe with      5 AAs   TYR169@H.hdhe.1  (   137) -   ASP173@H.hdhe.5  (141   ) (H.hdhe)
   H.HE with     13 AAs   CYS174@H.HE.1    (   142) -   LYS186@H.HE.13   (154   ) (H.HE)
 H.hehf with      7 AAs   GLN187@H.hehf.1  (   155) -   SER193@H.hehf.7  (161   ) (H.hehf)
   H.HF with      6 AAs   ASP194@H.HF.1    (   162) -   ARG199@H.HF.6    (167   ) (H.HF)
 G.hfs2 with      5 AAs   CYS200@G.hfs2.1  (   168) -   GLY206@G.hfs2.7  (172   ) (G.hfs2)  resSeq jumps
   G.S2 with      8 AAs   ILE207@G.S2.1    (   173) -   VAL214@G.S2.8    (180   ) (G.S2)
 G.s2s3 with      2 AAs   ASP215@G.s2s3.1  (   181) -   LYS216@G.s2s3.2  (182   ) (G.s2s3)
   G.S3 with      8 AAs   VAL217@G.S3.1    (   183) -   VAL224@G.S3.8    (190   ) (G.S3)
 G.s3h2 with      3 AAs   GLY225@G.s3h2.1  (   191) -   GLN227@G.s3h2.3  (193   ) (G.s3h2)
   G.H2 with     10 AAs   ARG228@G.H2.1    (   194) -   CYS237@G.H2.10   (203   ) (G.H2)
 G.h2s4 with      5 AAs   PHE238@G.h2s4.1  (   204) -   THR242@G.h2s4.5  (208   ) (G.h2s4)
   G.S4 with      7 AAs   ALA243@G.S4.1    (   209) -   ALA249@G.S4.7    (215   ) (G.S4)
 G.s4h3 with      8 AAs   SER250@G.s4h3.1  (   216) -   ASN264@G.s4h3.15 (223   ) (G.s4h3)  resSeq jumps
   G.H3 with     18 AAs   ARG265@G.H3.1    (   224) -   LEU282@G.H3.18   (241   ) (G.H3)
 G.h3s5 with      3 AAs   ARG283@G.h3s5.1  (   242) -   ILE285@G.h3s5.3  (244   ) (G.h3s5)
   G.S5 with      7 AAs   SER286@G.S5.1    (   245) -   ASN292@G.S5.7    (251   ) (G.S5)
 G.s5hg with      1 AAs   LYS293@G.s5hg.1  (   252) -   LYS293@G.s5hg.1  (252   ) (G.s5hg)
   G.HG with     17 AAs   GLN294@G.HG.1    (   253) -   ASP310@G.HG.17   (269   ) (G.HG)
 G.hgh4 with     10 AAs   TYR311@G.hgh4.1  (   270) -   THR320@G.hgh4.10 (279   ) (G.hgh4)
   G.H4 with     27 AAs   PRO321@G.H4.1    (   280) -   ARG347@G.H4.27   (306   ) (G.H4)
 G.h4s6 with     11 AAs   ILE348@G.h4s6.1  (   307) -   TYR358@G.h4s6.20 (317   ) (G.h4s6)
   G.S6 with      5 AAs   CYS359@G.S6.1    (   318) -   PHE363@G.S6.5    (322   ) (G.S6)
 G.s6h5 with      5 AAs   THR364@G.s6h5.1  (   323) -   ASP368@G.s6h5.5  (327   ) (G.s6h5)
   G.H5 with     26 AAs   THR369@G.H5.1    (   328) -   LEU394@G.H5.26   (353   ) (G.H5)

Will compute neighborhoods for the residues
R131
excluding 4 nearest neighbors

   residue      residx    fragment      resSeq       GPCR        CGN
    ARG131         861           3        131    3.50x50       None

Pre-computing likely neighborhoods by reducing the neighbor-list
to those within 15 Angstrom in the first frame of reference geom
'<mdtraj.Trajectory with 1 frames, 8384 atoms, 1044 residues, and unitcells>':...done!
From 1035 potential distances, the neighborhoods have been reduced to only 104 potential contacts.
If this number is still too high (i.e. the computation is too slow), consider using a smaller nlist_cutoff_Ang
Streaming over trajectory object nr.   0 (   280 frames,    280 with stride  1) in chunks of 10000 frames. Now at chunk nr    0, frames so far    280
#idx   freq      contact       fragments     res_idxs      ctc_idx  Sum
1:     0.88   ARG131-TYR391       3-0         861-350        11     0.88
2:     0.69   ARG131-TYR326       3-3         861-1028       95     1.56
3:     0.44   ARG131-TYR219       3-3         861-946        65     2.00
4:     0.12   ARG131-ILE278       3-3         861-980        84     2.12
These 5 contacts capture 2.12 (~100%) of the total frequency 2.12 (over 104 input contacts)
As orientation value, the first 3 ctcs already capture 90.0% of 2.12.
The 3-th contact has a frequency of 0.44

Please note that, because the molecular topologies differ, the residue R131 is has different indices in each topology, namely 1007 in the X-ray crystal, 861 in the MD simulation:

[26]:
R131_Xray, R131_MD
[26]:
({1007: <mdciao.contacts.contacts.ContactGroup at 0x7f468fa05fd0>},
 {861: <mdciao.contacts.contacts.ContactGroup at 0x7f469d472910>})

That will frequently be the case when comparing different proteins of the same family, or topologies where not all sub-units have been modelled etc or antying that produces a shift in these indices.

mdciao understands R131 automatically and doesn’t ask more questions, as long as there’s an obvious R131 candidate. Otherwise the user will be prompted for disambiguation.

In this case, now we create a dictionary of ContactGroups that represent the R131 in both topologies:

[27]:
R131 = {
    "Xray": R131_Xray[1007],
    "MD"  : R131_MD[861]
}
#np.save("R131.npz",R131)

Now we can just pass this dictionary to cli.compare to see their contact frequencies. This module is pretty flexible on inputs and outputs, check the documentation to learn more:

[28]:
mdciao.cli.compare(R131, ctc_cutoff_Ang=4, defrag=None, anchor="R131");
These interactions are not shared:
Y219@5.58x58
Their cumulative ctc freq is 0.44.
Created files
freq_comparison.pdf
freq_comparison.xlsx
../_images/notebooks_Tutorial_60_1.png