Jupyter Notebook Tutorial¶
Roughly, the design idea of mdciao
is that:
The CLI offers pre-packaged analysis pipelines that are essentially one-shot tools. They are an entry-point for non-experts and do not require any Python scripting knowledge. CLI tools are still highly customizable (check
mdc_**.py -h
ormdc_examples.py
), but offer only some of themdciao
-functionalities.The Python API, on the other hand, exposes:
CLI-equivalent functions via the
mdciao.cli
submodule. Here you’ll find evertying that the CLI offers, only as regular Python functions. This provides scripting flexibility, with the added value that now input and outputs are normal Python objects that can be further manipulated, bymdciao
or any other Python module of your liking.Other standalone submodules that the CLI uses under the hood, and that the user can access directly for any other scripting purpuse: plotting methods, alignment/sequence methods, nomenclature methods, PDB-methods etc.
Note
THE API IS NOT STABLE YET, if you are using mdciao
in API mode, we assume you can handle future API changes without much hassle.
For clarity, this notebook loosely follows the same structure as the Overview section of the mdciao
documentation. Other notebooks will follow soon, explaining basic concepts and/or advanced pipelines.
If you want to run this notebook on your own, please download and extract the data from here first. You can download it:
using the browser
using the terminal with
wget http://proteinformatics.org/mdciao/mdciao_example.zip; unzip mdciao_example.zip
using mdciao’s own method mdciao.examples.fetch_example_data
[1]:
import mdciao, os
if not os.path.exists("mdciao_example"):
mdciao.examples.fetch_example_data()
Unzipping to 'mdciao_example'
Basic Usage¶
Now we replicate the CLI command:
mdc_neighborhoods.py prot.pdb traj.xtc --residues L394 -nf #nf: don't use fragments
but in API mode. We use the method cli.residue_neighborhoods:
[2]:
result= mdciao.cli.residue_neighborhoods("L394",
"mdciao_example/traj.xtc",
"mdciao_example/prot.pdb",
fragments=[None])
Will compute contact frequencies for (1 items):
mdciao_example/traj.xtc
with a stride of 1 frames
Using method 'None' these fragments were found
fragment 0 with 1044 AAs LEU4 ( 0) - P0G395 (1043 ) (0) resSeq jumps
Will compute neighborhoods for the residues
L394
excluding 4 nearest neighbors
residue residx fragment resSeq GPCR CGN
LEU394 353 0 394 None None
Pre-computing likely neighborhoods by reducing the neighbor-list
to those within 15 Angstrom in the first frame of reference geom
'mdciao_example/prot.pdb':...done!
From 1035 potential distances, the neighborhoods have been reduced to only 74 potential contacts.
If this number is still too high (i.e. the computation is too slow), consider using a smaller nlist_cutoff_Ang
Streaming mdciao_example/traj.xtc (nr. 0) with stride 1 in chunks of 10000 frames. Now at chunk nr 0, frames so far 280
#idx freq contact fragments res_idxs ctc_idx Sum
1: 0.55 LEU394-ARG389 0-0 353-348 30 0.55
2: 0.47 LEU394-LYS270 0-0 353-972 65 1.02
3: 0.38 LEU394-LEU388 0-0 353-347 29 1.40
4: 0.23 LEU394-LEU230 0-0 353-957 51 1.62
5: 0.11 LEU394-ARG385 0-0 353-344 26 1.73
These 5 contacts capture 1.73 (~98%) of the total frequency 1.76 (over 74 input contacts)
As orientation value, the first 4 ctcs already capture 90.0% of 1.76.
The 4-th contact has a frequency of 0.23
The following files have been created:
./neighborhood.overall@3.5_Ang.pdf
./neighborhood.LEU394@frag0@3.5_Ang.dat
./neighborhood.LEU394@frag0.time_trace@3.5_Ang.pdf
result
is a dictionary of dictionaries, with the main result under the key neighborhoods
. There, you’ll find a dictionary keyed with residue indices and valued with a ContactGroup for each residue neighborhood.
ContactGroups are mdciao
classes that allow the further manipulation of contact data, molecular information and much more. Check here to learn more about mdciao
classes.
[3]:
result["neighborhoods"][353]
[3]:
<mdciao.contacts.contacts.ContactGroup at 0x7f46c2d73cd0>
Using Python Objects¶
Please note that in API mode, inputs can be objects, for example mdtraj
Trajectories. So, before calling the next mdciao.cli
method, we use mdtraj
to load the trajectory from our files:
[4]:
import mdtraj as md
traj = md.load("mdciao_example/traj.xtc", top="mdciao_example/prot.pdb")
traj
[4]:
<mdtraj.Trajectory with 280 frames, 8384 atoms, 1044 residues, and unitcells at 0x7f46c2ecbf10>
And we repeat the above command using the traj
object. Please note that we’re also using the no_disk
option so that no files are written to disk, in case we’re only interested in working in memory.
[5]:
result = mdciao.cli.residue_neighborhoods("L394",
traj,
fragments=[None],
no_disk=True)
Will compute contact frequencies for (1 items):
<mdtraj.Trajectory with 280 frames, 8384 atoms, 1044 residues, and unitcells>
with a stride of 1 frames
Using method 'None' these fragments were found
fragment 0 with 1044 AAs LEU4 ( 0) - P0G395 (1043 ) (0) resSeq jumps
Will compute neighborhoods for the residues
L394
excluding 4 nearest neighbors
residue residx fragment resSeq GPCR CGN
LEU394 353 0 394 None None
Pre-computing likely neighborhoods by reducing the neighbor-list
to those within 15 Angstrom in the first frame of reference geom
'<mdtraj.Trajectory with 1 frames, 8384 atoms, 1044 residues, and unitcells>':...done!
From 1035 potential distances, the neighborhoods have been reduced to only 74 potential contacts.
If this number is still too high (i.e. the computation is too slow), consider using a smaller nlist_cutoff_Ang
Streaming over trajectory object nr. 0 ( 280 frames, 280 with stride 1) in chunks of 10000 frames. Now at chunk nr 0, frames so far 280
#idx freq contact fragments res_idxs ctc_idx Sum
1: 0.55 LEU394-ARG389 0-0 353-348 30 0.55
2: 0.47 LEU394-LYS270 0-0 353-972 65 1.02
3: 0.38 LEU394-LEU388 0-0 353-347 29 1.40
4: 0.23 LEU394-LEU230 0-0 353-957 51 1.62
5: 0.11 LEU394-ARG385 0-0 353-344 26 1.73
These 5 contacts capture 1.73 (~98%) of the total frequency 1.76 (over 74 input contacts)
As orientation value, the first 4 ctcs already capture 90.0% of 1.76.
The 4-th contact has a frequency of 0.23
Now, the more elaborated CLI-command:
mdc_neighborhoods.py prot.pdb traj.xtc -r L394 --GPCR adrb2_human --CGN 3SN6 -ni -at #ni: not interactive, at: show atom-types
We keep the no_disk
option to avoid writing to disk, but you can change this if you want. Please note that some options do not carry exactly the same names as their CLI equivalents. E.g. ni
in the CLI (= don’t be interactive) is now accept_guess
in the API. These differences are needed for compatiblity with other methods, but might get unified in the future.
[6]:
result = mdciao.cli.residue_neighborhoods("L394",
traj,
GPCR_uniprot="adrb2_human",
CGN_PDB="3SN6",
accept_guess=True,
plot_atomtypes=True,
no_disk=True)
Will compute contact frequencies for (1 items):
<mdtraj.Trajectory with 280 frames, 8384 atoms, 1044 residues, and unitcells>
with a stride of 1 frames
Using method 'lig_resSeq+' these fragments were found
fragment 0 with 354 AAs LEU4 ( 0) - LEU394 (353 ) (0) resSeq jumps
fragment 1 with 340 AAs GLN1 ( 354) - ASN340 (693 ) (1)
fragment 2 with 66 AAs ALA2 ( 694) - PHE67 (759 ) (2)
fragment 3 with 283 AAs GLU30 ( 760) - LEU340 (1042 ) (3) resSeq jumps
fragment 4 with 1 AAs P0G395 ( 1043) - P0G395 (1043 ) (4)
No local file ./adrb2_human.xlsx found, checking online in
https://gpcrdb.org/services/residues/extended/adrb2_human ...done!
Please cite the following reference to the GPCRdb:
* Kooistra et al, (2021) GPCRdb in 2021: Integrating GPCR sequence, structure and function
Nucleic Acids Research 49, D335--D343
https://doi.org/10.1093/nar/gkaa1080
For more information, call mdciao.nomenclature.references()
done without 404, continuing.
GPCR-labels align best with fragments: [3] (first-last: GLU30-LEU340).
These are the GPCR fragments mapped onto your topology:
TM1 with 32 AAs GLU30@1.29x29 ( 760) - PHE61@1.60x60 (791 ) (TM1)
ICL1 with 4 AAs GLU62@12.48x48 ( 792) - GLN65@12.51x51 (795 ) (ICL1)
TM2 with 32 AAs THR66@2.37x37 ( 796) - LYS97@2.68x67 (827 ) (TM2)
ECL1 with 4 AAs MET98@23.49x49 ( 828) - PHE101@23.52x52 (831 ) (ECL1)
TM3 with 36 AAs GLY102@3.21x21 ( 832) - SER137@3.56x56 (867 ) (TM3)
ICL2 with 8 AAs PRO138@34.50x50 ( 868) - LEU145@34.57x57 (875 ) (ICL2)
TM4 with 27 AAs THR146@4.38x38 ( 876) - HIS172@4.64x64 (902 ) (TM4)
ECL2 with 20 AAs TRP173 ( 903) - THR195 (922 ) (ECL2) resSeq jumps
TM5 with 42 AAs ASN196@5.35x36 ( 923) - GLU237@5.76x76 (964 ) (TM5)
ICL3 with 2 AAs GLY238 ( 965) - ARG239 (966 ) (ICL3)
TM6 with 35 AAs CYS265@6.27x27 ( 967) - GLN299@6.61x61 (1001 ) (TM6)
ECL3 with 4 AAs ASP300 ( 1002) - ILE303 (1005 ) (ECL3)
TM7 with 25 AAs ARG304@7.31x30 ( 1006) - ARG328@7.55x55 (1030 ) (TM7)
H8 with 12 AAs SER329@8.47x47 ( 1031) - LEU340@8.58x58 (1042 ) (H8)
Using CGN-nomenclature, please cite
* Flock et al, (2015) Universal allosteric mechanism for G$\alpha$ activation by GPCRs
Nature 2015 524:7564 524, 173--179
https://doi.org/10.1038/nature14663
No local file ./CGN_3SN6.txt found, checking online in
https://www.mrc-lmb.cam.ac.uk/CGN/lookup_results/3SN6.txt ...done without 404, continuing.
No local PDB file for 3SN6 found in directory '.', checking online in
https://files.rcsb.org/download/3SN6.pdb ...found! Continuing normally
CGN-labels align best with fragments: [0] (first-last: LEU4-LEU394).
These are the CGN fragments mapped onto your topology:
G.HN with 28 AAs THR9@G.HN.26 ( 5) - VAL36@G.HN.53 (32 ) (G.HN)
G.hns1 with 3 AAs TYR37@G.hns1.1 ( 33) - ALA39@G.hns1.3 (35 ) (G.hns1)
G.S1 with 7 AAs THR40@G.S1.1 ( 36) - LEU46@G.S1.7 (42 ) (G.S1)
G.s1h1 with 6 AAs GLY47@G.s1h1.1 ( 43) - GLY52@G.s1h1.6 (48 ) (G.s1h1)
G.H1 with 7 AAs LYS53@G.H1.1 ( 49) - GLN59@G.H1.7 (55 ) (G.H1)
H.HA with 26 AAs LYS88@H.HA.4 ( 56) - LEU113@H.HA.29 (81 ) (H.HA)
H.hahb with 9 AAs VAL114@H.hahb.1 ( 82) - PRO122@H.hahb.9 (90 ) (H.hahb)
H.HB with 14 AAs GLU123@H.HB.1 ( 91) - ASN136@H.HB.14 (104 ) (H.HB)
H.hbhc with 7 AAs VAL137@H.hbhc.1 ( 105) - PRO143@H.hbhc.15 (111 ) (H.hbhc)
H.HC with 12 AAs PRO144@H.HC.1 ( 112) - GLU155@H.HC.12 (123 ) (H.HC)
H.hchd with 1 AAs ASP156@H.hchd.1 ( 124) - ASP156@H.hchd.1 (124 ) (H.hchd)
H.HD with 12 AAs GLU157@H.HD.1 ( 125) - GLU168@H.HD.12 (136 ) (H.HD)
H.hdhe with 5 AAs TYR169@H.hdhe.1 ( 137) - ASP173@H.hdhe.5 (141 ) (H.hdhe)
H.HE with 13 AAs CYS174@H.HE.1 ( 142) - LYS186@H.HE.13 (154 ) (H.HE)
H.hehf with 7 AAs GLN187@H.hehf.1 ( 155) - SER193@H.hehf.7 (161 ) (H.hehf)
H.HF with 6 AAs ASP194@H.HF.1 ( 162) - ARG199@H.HF.6 (167 ) (H.HF)
G.hfs2 with 5 AAs CYS200@G.hfs2.1 ( 168) - GLY206@G.hfs2.7 (172 ) (G.hfs2) resSeq jumps
G.S2 with 8 AAs ILE207@G.S2.1 ( 173) - VAL214@G.S2.8 (180 ) (G.S2)
G.s2s3 with 2 AAs ASP215@G.s2s3.1 ( 181) - LYS216@G.s2s3.2 (182 ) (G.s2s3)
G.S3 with 8 AAs VAL217@G.S3.1 ( 183) - VAL224@G.S3.8 (190 ) (G.S3)
G.s3h2 with 3 AAs GLY225@G.s3h2.1 ( 191) - GLN227@G.s3h2.3 (193 ) (G.s3h2)
G.H2 with 10 AAs ARG228@G.H2.1 ( 194) - CYS237@G.H2.10 (203 ) (G.H2)
G.h2s4 with 5 AAs PHE238@G.h2s4.1 ( 204) - THR242@G.h2s4.5 (208 ) (G.h2s4)
G.S4 with 7 AAs ALA243@G.S4.1 ( 209) - ALA249@G.S4.7 (215 ) (G.S4)
G.s4h3 with 8 AAs SER250@G.s4h3.1 ( 216) - ASN264@G.s4h3.15 (223 ) (G.s4h3) resSeq jumps
G.H3 with 18 AAs ARG265@G.H3.1 ( 224) - LEU282@G.H3.18 (241 ) (G.H3)
G.h3s5 with 3 AAs ARG283@G.h3s5.1 ( 242) - ILE285@G.h3s5.3 (244 ) (G.h3s5)
G.S5 with 7 AAs SER286@G.S5.1 ( 245) - ASN292@G.S5.7 (251 ) (G.S5)
G.s5hg with 1 AAs LYS293@G.s5hg.1 ( 252) - LYS293@G.s5hg.1 (252 ) (G.s5hg)
G.HG with 17 AAs GLN294@G.HG.1 ( 253) - ASP310@G.HG.17 (269 ) (G.HG)
G.hgh4 with 10 AAs TYR311@G.hgh4.1 ( 270) - THR320@G.hgh4.10 (279 ) (G.hgh4)
G.H4 with 27 AAs PRO321@G.H4.1 ( 280) - ARG347@G.H4.27 (306 ) (G.H4)
G.h4s6 with 11 AAs ILE348@G.h4s6.1 ( 307) - TYR358@G.h4s6.20 (317 ) (G.h4s6)
G.S6 with 5 AAs CYS359@G.S6.1 ( 318) - PHE363@G.S6.5 (322 ) (G.S6)
G.s6h5 with 5 AAs THR364@G.s6h5.1 ( 323) - ASP368@G.s6h5.5 (327 ) (G.s6h5)
G.H5 with 26 AAs THR369@G.H5.1 ( 328) - LEU394@G.H5.26 (353 ) (G.H5)
Will compute neighborhoods for the residues
L394
excluding 4 nearest neighbors
residue residx fragment resSeq GPCR CGN
LEU394 353 0 394 None G.H5.26
Pre-computing likely neighborhoods by reducing the neighbor-list
to those within 15 Angstrom in the first frame of reference geom
'<mdtraj.Trajectory with 1 frames, 8384 atoms, 1044 residues, and unitcells>':...done!
From 1035 potential distances, the neighborhoods have been reduced to only 74 potential contacts.
If this number is still too high (i.e. the computation is too slow), consider using a smaller nlist_cutoff_Ang
Streaming over trajectory object nr. 0 ( 280 frames, 280 with stride 1) in chunks of 10000 frames. Now at chunk nr 0, frames so far 280
#idx freq contact fragments res_idxs ctc_idx Sum
1: 0.55 LEU394-ARG389 0-0 353-348 30 0.55
2: 0.47 LEU394-LYS270 0-3 353-972 65 1.02
3: 0.38 LEU394-LEU388 0-0 353-347 29 1.40
4: 0.23 LEU394-LEU230 0-3 353-957 51 1.62
5: 0.11 LEU394-ARG385 0-0 353-344 26 1.73
These 5 contacts capture 1.73 (~98%) of the total frequency 1.76 (over 74 input contacts)
As orientation value, the first 4 ctcs already capture 90.0% of 1.76.
The 4-th contact has a frequency of 0.23
Consensus Nomenclature: GPCR and/or CGN¶
Above, we declared our intention to use GPCR consensus nomenclature and Common G-alpha Numbering (CGN) by passing the descriptor strings GPCR_uniprot="adrb2_human"
or CGN_PDB="3SN6"
, respectively, to contact the online databases, in particular the
GPCRdb, Kooistra et al, (2021) GPCRdb in 2021: Integrating GPCR sequence, structure and function, Nucleic Acids Research 49, D335–D343, https://doi.org/10.1093/nar/gkaa1080
For that retrieval and handling of these labels, we use the module mdciao.nomenclature
, which offers two main classes and other helper function to deal with the nomenclature. An overview of the relevant references is contained in the module’s
documentation as well as in the documentation of each class, e.g. here.
Additionally, you can use mdciao.nomenclature.references to print all relevant nomenclature references:
[7]:
mdciao.nomenclature.references()
mdciao.nomenclature functions thanks to these online databases. Please cite them if you use this module:
* Kooistra et al, (2021) GPCRdb in 2021: Integrating GPCR sequence, structure and function
Nucleic Acids Research 49, D335--D343
https://doi.org/10.1093/nar/gkaa1080
* Berman et al, (2000) The Protein Data Bank
Nucleic Acids Research 28, 235--242
https://doi.org/10.1093/NAR/28.1.235
* Flock et al, (2015) Universal allosteric mechanism for G$\alpha$ activation by GPCRs
Nature 2015 524:7564 524, 173--179
https://doi.org/10.1038/nature14663
* Kanev et al, (2021) KLIFS: an overhaul after the first 5 years of supporting kinase research
Nucleic Acids Research 49, D562--D569
https://doi.org/10.1093/NAR/GKAA895
Additionally, depending on the chosen nomenclature type, you should cite:
* Structure based GPCR notation
- Isberg et al, (2015) Generic GPCR residue numbers - Aligning topology maps while minding the gaps
Trends in Pharmacological Sciences 36, 22--31
https://doi.org/10.1016/j.tips.2014.11.001
- Isberg et al, (2016) GPCRdb: An information system for G protein-coupled receptors
Nucleic Acids Research 44, D356--D364
https://doi.org/10.1093/nar/gkv1178
* Sequence based GPCR schemes:
- Wootten et al, (2013) Polar transmembrane interactions drive formation of ligand-specific and signal
pathway-biased family B G protein-coupled receptor conformations
Proceedings of the National Academy of Sciences of the United States of America 110, 5211--5216
https://doi.org/10.1073/pnas.1221585110
- Pin et al, (2003) Evolution, structure, and activation mechanism of family 3/C G-protein-coupled
receptors
Pharmacology and Therapeutics 98, 325--354
https://doi.org/10.1016/S0163-7258(03)00038-X
- Wu et al, (2014) Structure of a class C GPCR metabotropic glutamate receptor 1 bound to an
allosteric modulator
Science 344, 58--64
https://doi.org/10.1126/science.1249489
- Oliveira et al, (1993) A common motif in G-protein-coupled seven transmembrane helix receptors
Journal of Computer-Aided Molecular Design 7, 649--658
https://doi.org/10.1007/BF00125323
- Baldwin et al, (1993) The probable arrangement of the helices in G protein-coupled receptors.
The EMBO Journal 12, 1693--1703
https://doi.org/10.1002/J.1460-2075.1993.TB05814.X
- Baldwin et al, (1997) An alpha-carbon template for the transmembrane helices in the rhodopsin family
of G-protein-coupled receptors
Journal of Molecular Biology 272, 144--164
https://doi.org/10.1006/jmbi.1997.1240
- Schwartz et al, (1994) Locating ligand-binding sites in 7tm receptors by protein engineering
Current Opinion in Biotechnology 5, 434--444
https://doi.org/10.1016/0958-1669(94)90054-X
- Schwartz et al, (1995) Molecular mechanism of action of non-peptide ligands for peptide receptors
Current Pharmaceutical Design 1, 325--342
* KLIFS 85 ligand binding site residues of kinases
- Van Linden et al, (2014) KLIFS: A knowledge-based structural database to navigate kinase-ligand
interaction space
Journal of Medicinal Chemistry 57, 249--277
https://doi.org/10.1021/JM400378W
- Kooistra et al, (2016) KLIFS: a structural kinase-ligand interaction database
Nucleic Acids Research 44, D365--D371
https://doi.org/10.1093/NAR/GKV1082
- Kanev et al, (2021) KLIFS: an overhaul after the first 5 years of supporting kinase research
Nucleic Acids Research 49, D562--D569
https://doi.org/10.1093/NAR/GKAA895
You can find all these references distributed as BibTex file distributed with mdciao here
* /home/perezheg/Programs/mdciao/tests/data/nomenclature/nomenclature.bib
The nomenclature classes produce standalone objects, and can do much more than just be inputs to mdciao.cli
methods. As with any Python class, you can learn a lot about its methods and attributes by using the tab autocompletion feature of IPython. Or you can check here for more mdciao
docs.
Since we’ll be using these labels more than once in the notebook, instead of using the network each time, we can have them as Python objects in memory. Alternatively, it’s possible to save the labeling data locally after the first database query. This allows for inspection and re-use of the retrieved data outside the notebook (in a spreadsheet, for example).
[8]:
from mdciao import nomenclature
GPCR = nomenclature.LabelerGPCR("adrb2_human",
#write_to_disk=True
)
CGN = nomenclature.LabelerCGN("3SN6",
# write_to_disk=True
)
No local file ./adrb2_human.xlsx found, checking online in
https://gpcrdb.org/services/residues/extended/adrb2_human ...done!
Please cite the following reference to the GPCRdb:
* Kooistra et al, (2021) GPCRdb in 2021: Integrating GPCR sequence, structure and function
Nucleic Acids Research 49, D335--D343
https://doi.org/10.1093/nar/gkaa1080
For more information, call mdciao.nomenclature.references()
done without 404, continuing.
Using CGN-nomenclature, please cite
* Flock et al, (2015) Universal allosteric mechanism for G$\alpha$ activation by GPCRs
Nature 2015 524:7564 524, 173--179
https://doi.org/10.1038/nature14663
No local file ./CGN_3SN6.txt found, checking online in
https://www.mrc-lmb.cam.ac.uk/CGN/lookup_results/3SN6.txt ...done without 404, continuing.
No local PDB file for 3SN6 found in directory '.', checking online in
https://files.rcsb.org/download/3SN6.pdb ...found! Continuing normally
Residue Selection¶
Now, we can play around with residue selection, replicating the CLI-command:
mdc_residues.py GLU*,P0G,380-394,G.HN.* prot.pdb --GPCR adrb2_human --CGN 3SN6 -ni
Check the docs here to check the output values res_idxs_list
,fragments
, and consensus_maps
, although most of out useful output is written out.
Please note that we’re now using mdciao.nomenclature
classes directly as inputs (GPCR
and CGN
), speeding up the method by avoiding queries over the network.
[9]:
res_idxs_list, fragments, consensus_maps = mdciao.cli.residue_selection(
"GLU*,P0G,380-394,G.HN.*",
traj,
GPCR_uniprot=GPCR,
CGN_PDB=CGN,
accept_guess=True)
Using method 'lig_resSeq+' these fragments were found
fragment 0 with 354 AAs LEU4 ( 0) - LEU394 (353 ) (0) resSeq jumps
fragment 1 with 340 AAs GLN1 ( 354) - ASN340 (693 ) (1)
fragment 2 with 66 AAs ALA2 ( 694) - PHE67 (759 ) (2)
fragment 3 with 283 AAs GLU30 ( 760) - LEU340 (1042 ) (3) resSeq jumps
fragment 4 with 1 AAs P0G395 ( 1043) - P0G395 (1043 ) (4)
GPCR-labels align best with fragments: [3] (first-last: GLU30-LEU340).
These are the GPCR fragments mapped onto your topology:
TM1 with 32 AAs GLU30@1.29x29 ( 760) - PHE61@1.60x60 (791 ) (TM1)
ICL1 with 4 AAs GLU62@12.48x48 ( 792) - GLN65@12.51x51 (795 ) (ICL1)
TM2 with 32 AAs THR66@2.37x37 ( 796) - LYS97@2.68x67 (827 ) (TM2)
ECL1 with 4 AAs MET98@23.49x49 ( 828) - PHE101@23.52x52 (831 ) (ECL1)
TM3 with 36 AAs GLY102@3.21x21 ( 832) - SER137@3.56x56 (867 ) (TM3)
ICL2 with 8 AAs PRO138@34.50x50 ( 868) - LEU145@34.57x57 (875 ) (ICL2)
TM4 with 27 AAs THR146@4.38x38 ( 876) - HIS172@4.64x64 (902 ) (TM4)
ECL2 with 20 AAs TRP173 ( 903) - THR195 (922 ) (ECL2) resSeq jumps
TM5 with 42 AAs ASN196@5.35x36 ( 923) - GLU237@5.76x76 (964 ) (TM5)
ICL3 with 2 AAs GLY238 ( 965) - ARG239 (966 ) (ICL3)
TM6 with 35 AAs CYS265@6.27x27 ( 967) - GLN299@6.61x61 (1001 ) (TM6)
ECL3 with 4 AAs ASP300 ( 1002) - ILE303 (1005 ) (ECL3)
TM7 with 25 AAs ARG304@7.31x30 ( 1006) - ARG328@7.55x55 (1030 ) (TM7)
H8 with 12 AAs SER329@8.47x47 ( 1031) - LEU340@8.58x58 (1042 ) (H8)
CGN-labels align best with fragments: [0] (first-last: LEU4-LEU394).
These are the CGN fragments mapped onto your topology:
G.HN with 28 AAs THR9@G.HN.26 ( 5) - VAL36@G.HN.53 (32 ) (G.HN)
G.hns1 with 3 AAs TYR37@G.hns1.1 ( 33) - ALA39@G.hns1.3 (35 ) (G.hns1)
G.S1 with 7 AAs THR40@G.S1.1 ( 36) - LEU46@G.S1.7 (42 ) (G.S1)
G.s1h1 with 6 AAs GLY47@G.s1h1.1 ( 43) - GLY52@G.s1h1.6 (48 ) (G.s1h1)
G.H1 with 7 AAs LYS53@G.H1.1 ( 49) - GLN59@G.H1.7 (55 ) (G.H1)
H.HA with 26 AAs LYS88@H.HA.4 ( 56) - LEU113@H.HA.29 (81 ) (H.HA)
H.hahb with 9 AAs VAL114@H.hahb.1 ( 82) - PRO122@H.hahb.9 (90 ) (H.hahb)
H.HB with 14 AAs GLU123@H.HB.1 ( 91) - ASN136@H.HB.14 (104 ) (H.HB)
H.hbhc with 7 AAs VAL137@H.hbhc.1 ( 105) - PRO143@H.hbhc.15 (111 ) (H.hbhc)
H.HC with 12 AAs PRO144@H.HC.1 ( 112) - GLU155@H.HC.12 (123 ) (H.HC)
H.hchd with 1 AAs ASP156@H.hchd.1 ( 124) - ASP156@H.hchd.1 (124 ) (H.hchd)
H.HD with 12 AAs GLU157@H.HD.1 ( 125) - GLU168@H.HD.12 (136 ) (H.HD)
H.hdhe with 5 AAs TYR169@H.hdhe.1 ( 137) - ASP173@H.hdhe.5 (141 ) (H.hdhe)
H.HE with 13 AAs CYS174@H.HE.1 ( 142) - LYS186@H.HE.13 (154 ) (H.HE)
H.hehf with 7 AAs GLN187@H.hehf.1 ( 155) - SER193@H.hehf.7 (161 ) (H.hehf)
H.HF with 6 AAs ASP194@H.HF.1 ( 162) - ARG199@H.HF.6 (167 ) (H.HF)
G.hfs2 with 5 AAs CYS200@G.hfs2.1 ( 168) - GLY206@G.hfs2.7 (172 ) (G.hfs2) resSeq jumps
G.S2 with 8 AAs ILE207@G.S2.1 ( 173) - VAL214@G.S2.8 (180 ) (G.S2)
G.s2s3 with 2 AAs ASP215@G.s2s3.1 ( 181) - LYS216@G.s2s3.2 (182 ) (G.s2s3)
G.S3 with 8 AAs VAL217@G.S3.1 ( 183) - VAL224@G.S3.8 (190 ) (G.S3)
G.s3h2 with 3 AAs GLY225@G.s3h2.1 ( 191) - GLN227@G.s3h2.3 (193 ) (G.s3h2)
G.H2 with 10 AAs ARG228@G.H2.1 ( 194) - CYS237@G.H2.10 (203 ) (G.H2)
G.h2s4 with 5 AAs PHE238@G.h2s4.1 ( 204) - THR242@G.h2s4.5 (208 ) (G.h2s4)
G.S4 with 7 AAs ALA243@G.S4.1 ( 209) - ALA249@G.S4.7 (215 ) (G.S4)
G.s4h3 with 8 AAs SER250@G.s4h3.1 ( 216) - ASN264@G.s4h3.15 (223 ) (G.s4h3) resSeq jumps
G.H3 with 18 AAs ARG265@G.H3.1 ( 224) - LEU282@G.H3.18 (241 ) (G.H3)
G.h3s5 with 3 AAs ARG283@G.h3s5.1 ( 242) - ILE285@G.h3s5.3 (244 ) (G.h3s5)
G.S5 with 7 AAs SER286@G.S5.1 ( 245) - ASN292@G.S5.7 (251 ) (G.S5)
G.s5hg with 1 AAs LYS293@G.s5hg.1 ( 252) - LYS293@G.s5hg.1 (252 ) (G.s5hg)
G.HG with 17 AAs GLN294@G.HG.1 ( 253) - ASP310@G.HG.17 (269 ) (G.HG)
G.hgh4 with 10 AAs TYR311@G.hgh4.1 ( 270) - THR320@G.hgh4.10 (279 ) (G.hgh4)
G.H4 with 27 AAs PRO321@G.H4.1 ( 280) - ARG347@G.H4.27 (306 ) (G.H4)
G.h4s6 with 11 AAs ILE348@G.h4s6.1 ( 307) - TYR358@G.h4s6.20 (317 ) (G.h4s6)
G.S6 with 5 AAs CYS359@G.S6.1 ( 318) - PHE363@G.S6.5 (322 ) (G.S6)
G.s6h5 with 5 AAs THR364@G.s6h5.1 ( 323) - ASP368@G.s6h5.5 (327 ) (G.s6h5)
G.H5 with 26 AAs THR369@G.H5.1 ( 328) - LEU394@G.H5.26 (353 ) (G.H5)
0.0) P0G395 in fragment 4 with residue index 1043
0.0) ARG380 in fragment 0 with residue index 339 (CGN: G.H5.12)
0.0) LEU394 in fragment 0 with residue index 353 (CGN: G.H5.26)
Your selection 'GLU*,P0G,380-394,G.HN.*' yields:
residue residx fragment resSeq GPCR CGN
GLU10 6 0 10 None G.HN.27
GLU15 11 0 15 None G.HN.32
GLU16 12 0 16 None G.HN.33
GLU21 17 0 21 None G.HN.38
GLU27 23 0 27 None G.HN.44
GLU50 46 0 50 None G.s1h1.4
GLU101 69 0 101 None H.HA.17
GLU104 72 0 104 None H.HA.20
GLU118 86 0 118 None H.hahb.5
GLU123 91 0 123 None H.HB.1
GLU145 113 0 145 None H.HC.2
GLU148 116 0 148 None H.HC.5
GLU155 123 0 155 None H.HC.12
GLU157 125 0 157 None H.HD.1
GLU164 132 0 164 None H.HD.8
GLU168 136 0 168 None H.HD.12
GLU209 175 0 209 None G.S2.3
GLU230 196 0 230 None G.H2.3
GLU268 227 0 268 None G.H3.4
GLU299 258 0 299 None G.HG.6
GLU309 268 0 309 None G.HG.16
GLU314 273 0 314 None G.hgh4.4
GLU322 281 0 322 None G.H4.2
GLU327 286 0 327 None G.H4.7
GLU330 289 0 330 None G.H4.10
GLU344 303 0 344 None G.H4.24
GLU370 329 0 370 None G.H5.2
GLU392 351 0 392 None G.H5.24
GLU3 356 1 3 None None
GLU10 363 1 10 None None
GLU12 365 1 12 None None
GLU130 483 1 130 None None
GLU138 491 1 138 None None
GLU172 525 1 172 None None
GLU215 568 1 215 None None
GLU226 579 1 226 None None
GLU260 613 1 260 None None
GLU17 709 2 17 None None
GLU22 714 2 22 None None
GLU42 734 2 42 None None
GLU47 739 2 47 None None
GLU58 750 2 58 None None
GLU63 755 2 63 None None
GLU30 760 3 30 1.29x29 None
GLU62 792 3 62 12.48x48 None
GLU107 837 3 107 3.26x26 None
GLU122 852 3 122 3.41x41 None
GLU180 907 3 180 None None
GLU188 915 3 188 None None
GLU225 952 3 225 5.64x64 None
GLU237 964 3 237 5.76x76 None
GLU268 970 3 268 6.30x30 None
GLU306 1008 3 306 7.33x32 None
GLU338 1040 3 338 8.56x56 None
P0G395 1043 4 395 None None
ARG380 339 0 380 None G.H5.12
ASP381 340 0 381 None G.H5.13
ILE382 341 0 382 None G.H5.14
ILE383 342 0 383 None G.H5.15
GLN384 343 0 384 None G.H5.16
ARG385 344 0 385 None G.H5.17
MET386 345 0 386 None G.H5.18
HIS387 346 0 387 None G.H5.19
LEU388 347 0 388 None G.H5.20
ARG389 348 0 389 None G.H5.21
GLN390 349 0 390 None G.H5.22
TYR391 350 0 391 None G.H5.23
LEU393 352 0 393 None G.H5.25
LEU394 353 0 394 None G.H5.26
THR9 5 0 9 None G.HN.26
ASP11 7 0 11 None G.HN.28
GLN12 8 0 12 None G.HN.29
ARG13 9 0 13 None G.HN.30
ASN14 10 0 14 None G.HN.31
LYS17 13 0 17 None G.HN.34
ALA18 14 0 18 None G.HN.35
GLN19 15 0 19 None G.HN.36
ARG20 16 0 20 None G.HN.37
ALA22 18 0 22 None G.HN.39
ASN23 19 0 23 None G.HN.40
LYS24 20 0 24 None G.HN.41
LYS25 21 0 25 None G.HN.42
ILE26 22 0 26 None G.HN.43
LYS28 24 0 28 None G.HN.45
GLN29 25 0 29 None G.HN.46
LEU30 26 0 30 None G.HN.47
GLN31 27 0 31 None G.HN.48
LYS32 28 0 32 None G.HN.49
ASP33 29 0 33 None G.HN.50
LYS34 30 0 34 None G.HN.51
GLN35 31 0 35 None G.HN.52
VAL36 32 0 36 None G.HN.53
PDB Queries¶
Now we grab a structure directly from the PDB, replicating the CLI command:
mdc_pdb.py 3SN6 -o 3SN6.gro
by using mdciao.cli.pdb
. Check here or use the inline docstring for more info. Please note that we’re not storing the retrived structure on disk, but rather having it in memory as an mdtraj.Trajectory
:
[10]:
xray3SN6= mdciao.cli.pdb("3SN6")
xray3SN6
Checking https://files.rcsb.org/download/3SN6.pdb ...Please cite the following 3rd party publication:
* Crystal structure of the beta2 adrenergic receptor-Gs protein complex
Rasmussen, S.G. et al., Nature 2011
https://doi.org/10.1038/nature10361
[10]:
<mdtraj.Trajectory with 1 frames, 10269 atoms, 1319 residues, and unitcells at 0x7f46c157b940>
Now we can use the awesome nglviewer to 3D-visualize the freshly grabbed structure inside the notebook.
We need to import the module first, which needs to be installed in your Python environment. If you don’t we recommend you install it via pip:
pip install nglview
jupyter-nbextension enable nglview --py --sys-prefix
If you don’t feel like installing now, you can continue use the notebook.
[11]:
try:
import nglview
iwd = nglview.show_mdtraj(xray3SN6)
except ImportError:
iwd = None
iwd
Fragmentation Heuristics¶
Now we go to fragmentation heuristics, replicating the CLI command:
mdc_fragments.py 3SN6.gro
but with the object xray3SN6
(the .gro
-file comes below) and by using the cli.fragments
method. Check here or the inline docstring for more info. Also note that cli.fragments
simply wraps around mdciao.fragments.get_fragments, and
you can use that method (or others in mdciao.fragments) directly.
[12]:
frags= mdciao.cli.fragment_overview(xray3SN6.top)
Auto-detected fragments with method 'chains'
fragment 0 with 349 AAs THR9 ( 0) - LEU394 (348 ) (0) resSeq jumps
fragment 1 with 340 AAs GLN1 ( 349) - ASN340 (688 ) (1)
fragment 2 with 58 AAs ASN5 ( 689) - ARG62 (746 ) (2)
fragment 3 with 443 AAs ASN1002 ( 747) - CYS341 (1189 ) (3) resSeq jumps
fragment 4 with 128 AAs GLN1 ( 1190) - SER128 (1317 ) (4)
fragment 5 with 1 AAs P0G1601 ( 1318) - P0G1601 (1318 ) (5)
Auto-detected fragments with method 'resSeq'
fragment 0 with 51 AAs THR9 ( 0) - GLN59 (50 ) (0)
fragment 1 with 115 AAs LYS88 ( 51) - VAL202 (165 ) (1)
fragment 2 with 51 AAs SER205 ( 166) - MET255 (216 ) (2)
fragment 3 with 132 AAs THR263 ( 217) - LEU394 (348 ) (3)
fragment 4 with 340 AAs GLN1 ( 349) - ASN340 (688 ) (4)
fragment 5 with 58 AAs ASN5 ( 689) - ARG62 (746 ) (5)
fragment 6 with 159 AAs ASN1002 ( 747) - ALA1160 (905 ) (6)
fragment 7 with 146 AAs GLU30 ( 906) - ARG175 (1051 ) (7)
fragment 8 with 61 AAs GLN179 ( 1052) - ARG239 (1112 ) (8)
fragment 9 with 77 AAs CYS265 ( 1113) - CYS341 (1189 ) (9)
fragment 10 with 128 AAs GLN1 ( 1190) - SER128 (1317 ) (10)
fragment 11 with 1 AAs P0G1601 ( 1318) - P0G1601 (1318 ) (11)
Auto-detected fragments with method 'resSeq+'
fragment 0 with 349 AAs THR9 ( 0) - LEU394 (348 ) (0) resSeq jumps
fragment 1 with 340 AAs GLN1 ( 349) - ASN340 (688 ) (1)
fragment 2 with 58 AAs ASN5 ( 689) - ARG62 (746 ) (2)
fragment 3 with 159 AAs ASN1002 ( 747) - ALA1160 (905 ) (3)
fragment 4 with 284 AAs GLU30 ( 906) - CYS341 (1189 ) (4) resSeq jumps
fragment 5 with 128 AAs GLN1 ( 1190) - SER128 (1317 ) (5)
fragment 6 with 1 AAs P0G1601 ( 1318) - P0G1601 (1318 ) (6)
Auto-detected fragments with method 'lig_resSeq+'
fragment 0 with 349 AAs THR9 ( 0) - LEU394 (348 ) (0) resSeq jumps
fragment 1 with 340 AAs GLN1 ( 349) - ASN340 (688 ) (1)
fragment 2 with 58 AAs ASN5 ( 689) - ARG62 (746 ) (2)
fragment 3 with 159 AAs ASN1002 ( 747) - ALA1160 (905 ) (3)
fragment 4 with 284 AAs GLU30 ( 906) - CYS341 (1189 ) (4) resSeq jumps
fragment 5 with 128 AAs GLN1 ( 1190) - SER128 (1317 ) (5)
fragment 6 with 1 AAs P0G1601 ( 1318) - P0G1601 (1318 ) (6)
Auto-detected fragments with method 'bonds'
fragment 0 with 349 AAs THR9 ( 0) - LEU394 (348 ) (0) resSeq jumps
fragment 1 with 340 AAs GLN1 ( 349) - ASN340 (688 ) (1)
fragment 2 with 58 AAs ASN5 ( 689) - ARG62 (746 ) (2)
fragment 3 with 443 AAs ASN1002 ( 747) - CYS341 (1189 ) (3) resSeq jumps
fragment 4 with 128 AAs GLN1 ( 1190) - SER128 (1317 ) (4)
fragment 5 with 1 AAs P0G1601 ( 1318) - P0G1601 (1318 ) (5)
Auto-detected fragments with method 'resSeq_bonds'
fragment 0 with 51 AAs THR9 ( 0) - GLN59 (50 ) (0)
fragment 1 with 115 AAs LYS88 ( 51) - VAL202 (165 ) (1)
fragment 2 with 51 AAs SER205 ( 166) - MET255 (216 ) (2)
fragment 3 with 132 AAs THR263 ( 217) - LEU394 (348 ) (3)
fragment 4 with 340 AAs GLN1 ( 349) - ASN340 (688 ) (4)
fragment 5 with 58 AAs ASN5 ( 689) - ARG62 (746 ) (5)
fragment 6 with 159 AAs ASN1002 ( 747) - ALA1160 (905 ) (6)
fragment 7 with 146 AAs GLU30 ( 906) - ARG175 (1051 ) (7)
fragment 8 with 61 AAs GLN179 ( 1052) - ARG239 (1112 ) (8)
fragment 9 with 77 AAs CYS265 ( 1113) - CYS341 (1189 ) (9)
fragment 10 with 128 AAs GLN1 ( 1190) - SER128 (1317 ) (10)
fragment 11 with 1 AAs P0G1601 ( 1318) - P0G1601 (1318 ) (11)
Auto-detected fragments with method 'None'
fragment 0 with 1319 AAs THR9 ( 0) - P0G1601 (1318 ) (0) resSeq jumps
This call iterates through all available heuristics on the mdtraj
Topology, arriving at different definitions of molecular fragments. They are all returned as a dictionary:
[13]:
frags.keys()
[13]:
dict_keys(['chains', 'resSeq', 'resSeq+', 'lig_resSeq+', 'bonds', 'resSeq_bonds', 'None'])
Please note that since xray3SN6
comes from the PDB directly, it contains chain descriptors, s.t. the method chains
(first one) can simply list the chain information encoded into the PDB, which you can check here:
Auto-detected fragments with method 'chains'
fragment 0 with 349 AAs THR9 ( 0) - LEU394 (348 ) (0) resSeq jumps
fragment 1 with 340 AAs GLN1 ( 349) - ASN340 (688 ) (1)
fragment 2 with 58 AAs ASN5 ( 689) - ARG62 (746 ) (2)
fragment 3 with 443 AAs ASN1002 ( 747) - CYS341 (1189 ) (3) resSeq jumps
fragment 4 with 128 AAs GLN1 ( 1190) - SER128 (1317 ) (4)
fragment 5 with 1 AAs P0G1601 ( 1318) - P0G1601 (1318 ) (5) 5 with 1 AAs P0G1601 (1318) - P0G1601 (1318) (5)
These fragments are:
G-protein \(\alpha\) sub-unit
G-protein \(\beta\) sub-unit
G-protein \(\gamma\) sub-unit
\(\beta_2\) adrenergic receptor, together with the bacteriophage T4 lysozyme as N-terminus
VHH nanobody
Ligand P0G
However, we loose that chain information if we store the structure as .gro
, which doesn’t encode for chains (i.e., the entire topology is put into a single chain).
[14]:
from tempfile import NamedTemporaryFile
with NamedTemporaryFile(suffix=".gro") as tmpgro:
xray3SN6.save(tmpgro.name)
xray3SN6gro = md.load(tmpgro.name)
mdciao.cli.fragment_overview(xray3SN6gro.top, methods=["chains", "lig_resSeq+"]);
Auto-detected fragments with method 'chains'
fragment 0 with 1319 AAs THR9 ( 0) - P0G1601 (1318 ) (0) resSeq jumps
Auto-detected fragments with method 'lig_resSeq+'
fragment 0 with 349 AAs THR9 ( 0) - LEU394 (348 ) (0) resSeq jumps
fragment 1 with 340 AAs GLN1 ( 349) - ASN340 (688 ) (1)
fragment 2 with 58 AAs ASN5 ( 689) - ARG62 (746 ) (2)
fragment 3 with 159 AAs ASN1002 ( 747) - ALA1160 (905 ) (3)
fragment 4 with 284 AAs GLU30 ( 906) - CYS341 (1189 ) (4) resSeq jumps
fragment 5 with 128 AAs GLN1 ( 1190) - SER128 (1317 ) (5)
fragment 6 with 1 AAs P0G1601 ( 1318) - P0G1601 (1318 ) (6)
The lig_resSeq+
(the current default) has attempted to recover some meaningful fragments, closely resembling the original chains:
G-protein \(\alpha\) sub-unit
G-protein \(\beta\) sub-unit
G-protein \(\gamma\) sub-unit
Bacteriophage T4 lysozyme as N-terminus
\(\beta_2\) adrenergic receptor
VHH nanobody
Ligand P0G
The former fragment 3 (4TL-\(\beta_2\)AR) chain has been broken up into T4L and \(\beta_2\)AR.
Interfaces¶
Now we move to a more elaborate command:
mdc_interface.py prot.pdb traj.xtc -fg1 0 -fg2 3 --GPCR adrb2_human --CGN 3SN6 -t "3SN6 beta2AR-Galpha interface" -ni
and replicate it using cli.interface
. Check the docs here or in the method’s docstring.
Additionally, we now have two other notebooks explicitly devoted to the representation of interfaces:
[15]:
mdciao.cli.interface(traj,
frag_idxs_group_1=[0],
frag_idxs_group_2=[3],
GPCR_uniprot=GPCR,
CGN_PDB=CGN,
title="3SN6 beta2AR-Galpha interface",
accept_guess=True,
plot_timedep=False,
no_disk=True)
Will compute contact frequencies for trajectories:
<mdtraj.Trajectory with 280 frames, 8384 atoms, 1044 residues, and unitcells>
with a stride of 1 frames
Using method 'lig_resSeq+' these fragments were found
fragment 0 with 354 AAs LEU4 ( 0) - LEU394 (353 ) (0) resSeq jumps
fragment 1 with 340 AAs GLN1 ( 354) - ASN340 (693 ) (1)
fragment 2 with 66 AAs ALA2 ( 694) - PHE67 (759 ) (2)
fragment 3 with 283 AAs GLU30 ( 760) - LEU340 (1042 ) (3) resSeq jumps
fragment 4 with 1 AAs P0G395 ( 1043) - P0G395 (1043 ) (4)
GPCR-labels align best with fragments: [3] (first-last: GLU30-LEU340).
These are the GPCR fragments mapped onto your topology:
TM1 with 32 AAs GLU30@1.29x29 ( 760) - PHE61@1.60x60 (791 ) (TM1)
ICL1 with 4 AAs GLU62@12.48x48 ( 792) - GLN65@12.51x51 (795 ) (ICL1)
TM2 with 32 AAs THR66@2.37x37 ( 796) - LYS97@2.68x67 (827 ) (TM2)
ECL1 with 4 AAs MET98@23.49x49 ( 828) - PHE101@23.52x52 (831 ) (ECL1)
TM3 with 36 AAs GLY102@3.21x21 ( 832) - SER137@3.56x56 (867 ) (TM3)
ICL2 with 8 AAs PRO138@34.50x50 ( 868) - LEU145@34.57x57 (875 ) (ICL2)
TM4 with 27 AAs THR146@4.38x38 ( 876) - HIS172@4.64x64 (902 ) (TM4)
ECL2 with 20 AAs TRP173 ( 903) - THR195 (922 ) (ECL2) resSeq jumps
TM5 with 42 AAs ASN196@5.35x36 ( 923) - GLU237@5.76x76 (964 ) (TM5)
ICL3 with 2 AAs GLY238 ( 965) - ARG239 (966 ) (ICL3)
TM6 with 35 AAs CYS265@6.27x27 ( 967) - GLN299@6.61x61 (1001 ) (TM6)
ECL3 with 4 AAs ASP300 ( 1002) - ILE303 (1005 ) (ECL3)
TM7 with 25 AAs ARG304@7.31x30 ( 1006) - ARG328@7.55x55 (1030 ) (TM7)
H8 with 12 AAs SER329@8.47x47 ( 1031) - LEU340@8.58x58 (1042 ) (H8)
CGN-labels align best with fragments: [0] (first-last: LEU4-LEU394).
These are the CGN fragments mapped onto your topology:
G.HN with 28 AAs THR9@G.HN.26 ( 5) - VAL36@G.HN.53 (32 ) (G.HN)
G.hns1 with 3 AAs TYR37@G.hns1.1 ( 33) - ALA39@G.hns1.3 (35 ) (G.hns1)
G.S1 with 7 AAs THR40@G.S1.1 ( 36) - LEU46@G.S1.7 (42 ) (G.S1)
G.s1h1 with 6 AAs GLY47@G.s1h1.1 ( 43) - GLY52@G.s1h1.6 (48 ) (G.s1h1)
G.H1 with 7 AAs LYS53@G.H1.1 ( 49) - GLN59@G.H1.7 (55 ) (G.H1)
H.HA with 26 AAs LYS88@H.HA.4 ( 56) - LEU113@H.HA.29 (81 ) (H.HA)
H.hahb with 9 AAs VAL114@H.hahb.1 ( 82) - PRO122@H.hahb.9 (90 ) (H.hahb)
H.HB with 14 AAs GLU123@H.HB.1 ( 91) - ASN136@H.HB.14 (104 ) (H.HB)
H.hbhc with 7 AAs VAL137@H.hbhc.1 ( 105) - PRO143@H.hbhc.15 (111 ) (H.hbhc)
H.HC with 12 AAs PRO144@H.HC.1 ( 112) - GLU155@H.HC.12 (123 ) (H.HC)
H.hchd with 1 AAs ASP156@H.hchd.1 ( 124) - ASP156@H.hchd.1 (124 ) (H.hchd)
H.HD with 12 AAs GLU157@H.HD.1 ( 125) - GLU168@H.HD.12 (136 ) (H.HD)
H.hdhe with 5 AAs TYR169@H.hdhe.1 ( 137) - ASP173@H.hdhe.5 (141 ) (H.hdhe)
H.HE with 13 AAs CYS174@H.HE.1 ( 142) - LYS186@H.HE.13 (154 ) (H.HE)
H.hehf with 7 AAs GLN187@H.hehf.1 ( 155) - SER193@H.hehf.7 (161 ) (H.hehf)
H.HF with 6 AAs ASP194@H.HF.1 ( 162) - ARG199@H.HF.6 (167 ) (H.HF)
G.hfs2 with 5 AAs CYS200@G.hfs2.1 ( 168) - GLY206@G.hfs2.7 (172 ) (G.hfs2) resSeq jumps
G.S2 with 8 AAs ILE207@G.S2.1 ( 173) - VAL214@G.S2.8 (180 ) (G.S2)
G.s2s3 with 2 AAs ASP215@G.s2s3.1 ( 181) - LYS216@G.s2s3.2 (182 ) (G.s2s3)
G.S3 with 8 AAs VAL217@G.S3.1 ( 183) - VAL224@G.S3.8 (190 ) (G.S3)
G.s3h2 with 3 AAs GLY225@G.s3h2.1 ( 191) - GLN227@G.s3h2.3 (193 ) (G.s3h2)
G.H2 with 10 AAs ARG228@G.H2.1 ( 194) - CYS237@G.H2.10 (203 ) (G.H2)
G.h2s4 with 5 AAs PHE238@G.h2s4.1 ( 204) - THR242@G.h2s4.5 (208 ) (G.h2s4)
G.S4 with 7 AAs ALA243@G.S4.1 ( 209) - ALA249@G.S4.7 (215 ) (G.S4)
G.s4h3 with 8 AAs SER250@G.s4h3.1 ( 216) - ASN264@G.s4h3.15 (223 ) (G.s4h3) resSeq jumps
G.H3 with 18 AAs ARG265@G.H3.1 ( 224) - LEU282@G.H3.18 (241 ) (G.H3)
G.h3s5 with 3 AAs ARG283@G.h3s5.1 ( 242) - ILE285@G.h3s5.3 (244 ) (G.h3s5)
G.S5 with 7 AAs SER286@G.S5.1 ( 245) - ASN292@G.S5.7 (251 ) (G.S5)
G.s5hg with 1 AAs LYS293@G.s5hg.1 ( 252) - LYS293@G.s5hg.1 (252 ) (G.s5hg)
G.HG with 17 AAs GLN294@G.HG.1 ( 253) - ASP310@G.HG.17 (269 ) (G.HG)
G.hgh4 with 10 AAs TYR311@G.hgh4.1 ( 270) - THR320@G.hgh4.10 (279 ) (G.hgh4)
G.H4 with 27 AAs PRO321@G.H4.1 ( 280) - ARG347@G.H4.27 (306 ) (G.H4)
G.h4s6 with 11 AAs ILE348@G.h4s6.1 ( 307) - TYR358@G.h4s6.20 (317 ) (G.h4s6)
G.S6 with 5 AAs CYS359@G.S6.1 ( 318) - PHE363@G.S6.5 (322 ) (G.S6)
G.s6h5 with 5 AAs THR364@G.s6h5.1 ( 323) - ASP368@G.s6h5.5 (327 ) (G.s6h5)
G.H5 with 26 AAs THR369@G.H5.1 ( 328) - LEU394@G.H5.26 (353 ) (G.H5)
Computing distances in the interface between fragments
0
and
3
The interface is restricted to the residues within 35.0 Angstrom of each other in the reference topology.
Computing interface...done!
From 100182 potential group_1-group_2 distances, the interface was reduced to only 20307 potential contacts.
If this number is still too high (i.e. the computation is too slow) consider using a smaller interface cutoff
0%| | 0/1 [00:00<?, ?it/s]
Streaming over trajectory object nr. 0 ( 280 frames, 280 with stride 1) in chunks of 10000 frames. Now at chunk nr 0, frames so far 280
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:04<00:00, 4.89s/it]
These 20 contacts capture 12.03 (~77%) of the total frequency 15.54 (over 20307 input contacts)
As orientation value, the first 31 ctcs already capture 90.0% of 15.54.
The 31-th contact has a frequency of 0.14
freq label residue idxs sum
0 1.00 D381@G.H5.13 - Q229@5.68x68 340 956 1.00
1 1.00 R385@G.H5.17 - Q229@5.68x68 344 956 2.00
2 1.00 D381@G.H5.13 - K232@5.71x71 340 959 3.00
3 0.98 Q384@G.H5.16 - I135@3.54x54 343 865 3.98
4 0.96 T350@G.h4s6.3 - R239@ICL3 309 966 4.93
5 0.85 E392@G.H5.24 - T274@6.36x36 351 976 5.79
6 0.68 Q384@G.H5.16 - Q229@5.68x68 343 956 6.47
7 0.64 H387@G.H5.19 - A134@3.53x53 346 864 7.10
8 0.57 Y391@G.H5.23 - T274@6.36x36 350 976 7.67
9 0.55 Y358@G.h4s6.20 - E237@5.76x76 317 964 8.23
10 0.49 R385@G.H5.17 - K232@5.71x71 344 959 8.71
11 0.47 L394@G.H5.26 - K270@6.32x32 353 972 9.19
12 0.47 L346@G.H4.26 - R239@ICL3 305 966 9.65
13 0.45 Y358@G.h4s6.20 - S236@5.75x75 317 963 10.11
14 0.39 Y360@G.S6.2 - S236@5.75x75 319 963 10.50
15 0.35 Q384@G.H5.16 - E225@5.64x64 343 952 10.85
16 0.34 H41@G.S1.2 - F139@34.51x51 37 869 11.19
17 0.31 Q384@G.H5.16 - T136@3.55x55 343 866 11.49
18 0.28 D139@H.hbhc.3 - K140@34.52x52 107 870 11.77
19 0.26 Y358@G.h4s6.20 - I233@5.72x72 317 960 12.03
label freq
0 Q384@G.H5.16 2.32
1 D381@G.H5.13 2.00
2 R385@G.H5.17 1.49
3 Y358@G.h4s6.20 1.26
4 T350@G.h4s6.3 0.96
5 E392@G.H5.24 0.85
6 H387@G.H5.19 0.64
7 Y391@G.H5.23 0.57
8 L394@G.H5.26 0.47
9 L346@G.H4.26 0.47
10 Y360@G.S6.2 0.39
11 H41@G.S1.2 0.34
12 D139@H.hbhc.3 0.28
label freq
0 Q229@5.68x68 2.68
1 K232@5.71x71 1.48
2 R239@ICL3 1.42
3 T274@6.36x36 1.42
4 I135@3.54x54 0.98
5 S236@5.75x75 0.85
6 A134@3.53x53 0.64
7 E237@5.76x76 0.55
8 K270@6.32x32 0.47
9 E225@5.64x64 0.35
10 F139@34.51x51 0.34
11 T136@3.55x55 0.31
12 K140@34.52x52 0.28
13 I233@5.72x72 0.26
Drawing this many dots (637 residues + 53 padding spaces) in a panel 10.0 inches wide/high
forces too small dotsizes and fontsizes. If crowding effects occur, either reduce the
number of residues or increase the panel size
[15]:
<mdciao.contacts.contacts.ContactGroup at 0x7f46a1095e50>
Sites¶
Now we use a different approach. Instead of letting mdciao
discover contacts automatically, we list them beforehand as site
dictionaries, and feed this dictionaries to directly to the method cli.sites
. The CLI command we’re replicating is:
mdc_sites.py prot.pdb traj.xtc --site tip.json -at -nf -sa #sa: short AA-names
However, in the API-spirit, we’re not even using a file on disk to define the site
, but create it on the fly as a Python dictionary:
[16]:
my_site = {
"name":"interface small",
"pairs": {"AAresSeq": [
"L394-K270",
"D381-Q229",
"Q384-Q229",
"R385-Q229",
"D381-K232",
"Q384-I135"
]}}
[17]:
sites = mdciao.cli.sites([my_site],
traj,
no_disk=True,
plot_atomtypes=True,
fragments=[None],
short_AA_names=True)
Will compute the sites
site dict with name 'interface small'
in the trajectories:
<mdtraj.Trajectory with 280 frames, 8384 atoms, 1044 residues, and unitcells>
with a stride of 1 frames.
Using method 'None' these fragments were found
fragment 0 with 1044 AAs LEU4 ( 0) - P0G395 (1043 ) (0) resSeq jumps
These are the residues that could be found:
residue residx fragment resSeq GPCR CGN
ASP381 340 0 340 None None
GLN384 343 0 343 None None
ARG385 344 0 344 None None
LEU394 353 0 353 None None
ILE135 865 0 865 None None
GLN229 956 0 956 None None
LYS232 959 0 959 None None
LYS270 972 0 972 None None
Streaming over trajectory object nr. 0 ( 280 frames, 280 with stride 1) in chunks of 10000 frames. Now at chunk nr 0, frames so far 280
freq label residue idxs sum
0 0.47 L394@frag0 - K270@frag0 353 972 0.47
1 1.00 D381@frag0 - Q229@frag0 340 956 1.47
2 0.68 Q384@frag0 - Q229@frag0 343 956 2.15
3 1.00 R385@frag0 - Q229@frag0 344 956 3.15
4 1.00 D381@frag0 - K232@frag0 340 959 4.15
5 0.98 Q384@frag0 - I135@frag0 343 865 5.13
The return value sites
is a dictionary keyed with the site names (interface small
in this case) and valued with mdciao's
ContactGroup-objects.
[18]:
sites
[18]:
{'interface small': <mdciao.contacts.contacts.ContactGroup at 0x7f468fdc7f10>}
Contact Groups¶
The ContactGroup class is at the center of mdciao
and offers extensive of manipulation through it’s methods. A helpful analogy would be that, what the Trajectory is to mdtraj
, the
ContactGroup is to mdciao
. Both classes:
store a lot of organized information for further use
have attributes and methods that can be used standalone
can themselves be the input for other methods (of
mdtraj
andmdciao
, respectively).are rarely created from scratch, but rather generated by the module itself.
The best way to learn about the ContactGroup is to inspect it with the autocomplete feature if IPython and check the informative names of the attributes and methods.
If you’re in a hurry, mdciao
offers a quick way to generate a ContactGroup to play around with and investigate it’s methods and attributes:
[19]:
from mdciao import examples
CG = examples.ContactGroupL394()
CG
[19]:
<mdciao.contacts.contacts.ContactGroup at 0x7f469cc43610>
However, instead of using CG
now, we go back to object sites
that resulted from using cli.sites
above. The returned sites
-object is a dictionary keyed with site names (you can compute different sites simultaneously) and valued with ContactGroups. In our case (check above) we called it it interface small
[20]:
mysite = sites["interface small"]
Frequencies as Bars¶
We use the class’s method plot_freqs_as_bars to produce the now familiar neighborhood plots:
[21]:
mysite.plot_freqs_as_bars(3.5,
shorten_AAs=True,
defrag="@",
atom_types=True);
Frequencies as Distributions¶
It is also very useful to inspect the residue-residue distances of any ContactGroup by looking at their overall distributions instead of their frequencies, since the hard cutoffs can sometimes hide part of the story:
[22]:
jax = mysite.plot_distance_distributions(bins=15,
defrag="@",
ctc_cutoff_Ang=3.5
)
Please note that, because the example dataset is quite small (280 frames and 2.8 ns) and we are simply histogramming (=counting), the curves aren’t very smooth. Histograms of real data will look better.
Frequencies as Violins¶
Other ways of looking at distance-data as distributions is to use violin plots, which uses a density estimator (check the bw_method
-parameter here) to generate smooth densities and plot them vertically. This is somehow in-between the histogram plot and the
frequency-bar plot:
[23]:
jax = mysite.plot_violins(defrag="@",
ctc_cutoff_Ang=3.5,
color="tab10",
)
Note
In principle, we could also use a density estimator in plot_distance_distributions
to make them look smooth, but we have decided to leave them as.
Comparisons Between Contact Groups¶
Finally, we replicate the CLI comand
mdc_compare.py 3SN6.X.ARG131@4.0_Ang.dat 3SN6.MD.ARG131@4.0_Ang.dat -k Xray,MD -t "3SN6 cutoff 4AA" -a R131
in API mode. This looks different because most of the inputs will now be Python objects in memory.
First, we create the Xray and the MD ContactGroups separately:
[24]:
R131_Xray = mdciao.cli.residue_neighborhoods("R131", xray3SN6,
ctc_cutoff_Ang=4,
no_disk=True,
GPCR_uniprot=GPCR,
figures=False,
CGN_PDB=CGN,
accept_guess=True)["neighborhoods"];
Will compute contact frequencies for (1 items):
<mdtraj.Trajectory with 1 frames, 10269 atoms, 1319 residues, and unitcells>
with a stride of 1 frames
Using method 'lig_resSeq+' these fragments were found
fragment 0 with 349 AAs THR9 ( 0) - LEU394 (348 ) (0) resSeq jumps
fragment 1 with 340 AAs GLN1 ( 349) - ASN340 (688 ) (1)
fragment 2 with 58 AAs ASN5 ( 689) - ARG62 (746 ) (2)
fragment 3 with 159 AAs ASN1002 ( 747) - ALA1160 (905 ) (3)
fragment 4 with 284 AAs GLU30 ( 906) - CYS341 (1189 ) (4) resSeq jumps
fragment 5 with 128 AAs GLN1 ( 1190) - SER128 (1317 ) (5)
fragment 6 with 1 AAs P0G1601 ( 1318) - P0G1601 (1318 ) (6)
GPCR-labels align best with fragments: [4] (first-last: GLU30-CYS341).
These are the GPCR fragments mapped onto your topology:
TM1 with 32 AAs GLU30@1.29x29 ( 906) - PHE61@1.60x60 (937 ) (TM1)
ICL1 with 4 AAs GLU62@12.48x48 ( 938) - GLN65@12.51x51 (941 ) (ICL1)
TM2 with 32 AAs THR66@2.37x37 ( 942) - LYS97@2.68x67 (973 ) (TM2)
ECL1 with 4 AAs THR98@23.49x49 ( 974) - PHE101@23.52x52 (977 ) (ECL1)
TM3 with 36 AAs GLY102@3.21x21 ( 978) - SER137@3.56x56 (1013 ) (TM3)
ICL2 with 8 AAs PRO138@34.50x50 ( 1014) - LEU145@34.57x57 (1021 ) (ICL2)
TM4 with 27 AAs THR146@4.38x38 ( 1022) - HIS172@4.64x64 (1048 ) (TM4)
ECL2 with 20 AAs TRP173 ( 1049) - THR195 (1068 ) (ECL2) resSeq jumps
TM5 with 42 AAs ASN196@5.35x36 ( 1069) - GLU237@5.76x76 (1110 ) (TM5)
ICL3 with 2 AAs GLY238 ( 1111) - ARG239 (1112 ) (ICL3)
TM6 with 35 AAs CYS265@6.27x27 ( 1113) - GLN299@6.61x61 (1147 ) (TM6)
ECL3 with 4 AAs ASP300 ( 1148) - ILE303 (1151 ) (ECL3)
TM7 with 25 AAs ARG304@7.31x30 ( 1152) - ARG328@7.55x55 (1176 ) (TM7)
H8 with 13 AAs SER329@8.47x47 ( 1177) - CYS341@8.59x59 (1189 ) (H8)
CGN-labels align best with fragments: [0] (first-last: THR9-LEU394).
These are the CGN fragments mapped onto your topology:
G.HN with 28 AAs THR9@G.HN.26 ( 0) - VAL36@G.HN.53 (27 ) (G.HN)
G.hns1 with 3 AAs TYR37@G.hns1.1 ( 28) - ALA39@G.hns1.3 (30 ) (G.hns1)
G.S1 with 7 AAs THR40@G.S1.1 ( 31) - LEU46@G.S1.7 (37 ) (G.S1)
G.s1h1 with 6 AAs GLY47@G.s1h1.1 ( 38) - GLY52@G.s1h1.6 (43 ) (G.s1h1)
G.H1 with 7 AAs LYS53@G.H1.1 ( 44) - GLN59@G.H1.7 (50 ) (G.H1)
H.HA with 26 AAs LYS88@H.HA.4 ( 51) - LEU113@H.HA.29 (76 ) (H.HA)
H.hahb with 9 AAs VAL114@H.hahb.1 ( 77) - PRO122@H.hahb.9 (85 ) (H.hahb)
H.HB with 14 AAs GLU123@H.HB.1 ( 86) - ASN136@H.HB.14 (99 ) (H.HB)
H.hbhc with 7 AAs VAL137@H.hbhc.1 ( 100) - PRO143@H.hbhc.15 (106 ) (H.hbhc)
H.HC with 12 AAs PRO144@H.HC.1 ( 107) - GLU155@H.HC.12 (118 ) (H.HC)
H.hchd with 1 AAs ASP156@H.hchd.1 ( 119) - ASP156@H.hchd.1 (119 ) (H.hchd)
H.HD with 12 AAs GLU157@H.HD.1 ( 120) - GLU168@H.HD.12 (131 ) (H.HD)
H.hdhe with 5 AAs TYR169@H.hdhe.1 ( 132) - ASP173@H.hdhe.5 (136 ) (H.hdhe)
H.HE with 13 AAs CYS174@H.HE.1 ( 137) - LYS186@H.HE.13 (149 ) (H.HE)
H.hehf with 7 AAs GLN187@H.hehf.1 ( 150) - SER193@H.hehf.7 (156 ) (H.hehf)
H.HF with 6 AAs ASP194@H.HF.1 ( 157) - ARG199@H.HF.6 (162 ) (H.HF)
G.hfs2 with 5 AAs CYS200@G.hfs2.1 ( 163) - GLY206@G.hfs2.7 (167 ) (G.hfs2) resSeq jumps
G.S2 with 8 AAs ILE207@G.S2.1 ( 168) - VAL214@G.S2.8 (175 ) (G.S2)
G.s2s3 with 2 AAs ASP215@G.s2s3.1 ( 176) - LYS216@G.s2s3.2 (177 ) (G.s2s3)
G.S3 with 8 AAs VAL217@G.S3.1 ( 178) - VAL224@G.S3.8 (185 ) (G.S3)
G.s3h2 with 3 AAs GLY225@G.s3h2.1 ( 186) - GLN227@G.s3h2.3 (188 ) (G.s3h2)
G.H2 with 10 AAs ARG228@G.H2.1 ( 189) - CYS237@G.H2.10 (198 ) (G.H2)
G.h2s4 with 5 AAs PHE238@G.h2s4.1 ( 199) - THR242@G.h2s4.5 (203 ) (G.h2s4)
G.S4 with 7 AAs ALA243@G.S4.1 ( 204) - ALA249@G.S4.7 (210 ) (G.S4)
G.s4h3 with 8 AAs SER250@G.s4h3.1 ( 211) - ASN264@G.s4h3.15 (218 ) (G.s4h3) resSeq jumps
G.H3 with 18 AAs ARG265@G.H3.1 ( 219) - LEU282@G.H3.18 (236 ) (G.H3)
G.h3s5 with 3 AAs ARG283@G.h3s5.1 ( 237) - ILE285@G.h3s5.3 (239 ) (G.h3s5)
G.S5 with 7 AAs SER286@G.S5.1 ( 240) - ASN292@G.S5.7 (246 ) (G.S5)
G.s5hg with 1 AAs LYS293@G.s5hg.1 ( 247) - LYS293@G.s5hg.1 (247 ) (G.s5hg)
G.HG with 17 AAs GLN294@G.HG.1 ( 248) - ASP310@G.HG.17 (264 ) (G.HG)
G.hgh4 with 10 AAs TYR311@G.hgh4.1 ( 265) - THR320@G.hgh4.10 (274 ) (G.hgh4)
G.H4 with 27 AAs PRO321@G.H4.1 ( 275) - ARG347@G.H4.27 (301 ) (G.H4)
G.h4s6 with 11 AAs ILE348@G.h4s6.1 ( 302) - TYR358@G.h4s6.20 (312 ) (G.h4s6)
G.S6 with 5 AAs CYS359@G.S6.1 ( 313) - PHE363@G.S6.5 (317 ) (G.S6)
G.s6h5 with 5 AAs THR364@G.s6h5.1 ( 318) - ASP368@G.s6h5.5 (322 ) (G.s6h5)
G.H5 with 26 AAs THR369@G.H5.1 ( 323) - LEU394@G.H5.26 (348 ) (G.H5)
Will compute neighborhoods for the residues
R131
excluding 4 nearest neighbors
residue residx fragment resSeq GPCR CGN
ARG131 1007 4 131 3.50x50 None
Pre-computing likely neighborhoods by reducing the neighbor-list
to those within 15 Angstrom in the first frame of reference geom
'<mdtraj.Trajectory with 1 frames, 10269 atoms, 1319 residues, and unitcells>':...done!
From 1310 potential distances, the neighborhoods have been reduced to only 120 potential contacts.
If this number is still too high (i.e. the computation is too slow), consider using a smaller nlist_cutoff_Ang
Streaming over trajectory object nr. 0 ( 1 frames, 1 with stride 1) in chunks of 10000 frames. Now at chunk nr 0, frames so far 1
#idx freq contact fragments res_idxs ctc_idx Sum
1: 1.00 ARG131-TYR391 4-0 1007-345 14 1.00
2: 1.00 ARG131-TYR326 4-4 1007-1174 111 2.00
3: 1.00 ARG131-ILE278 4-4 1007-1126 97 3.00
These 4 contacts capture 3.00 (~100%) of the total frequency 3.00 (over 120 input contacts)
As orientation value, the first 3 ctcs already capture 90.0% of 3.00.
The 3-th contact has a frequency of 1.00
[25]:
R131_MD = mdciao.cli.residue_neighborhoods("R131",traj,
ctc_cutoff_Ang=4,
no_disk=True,
GPCR_uniprot=GPCR,
figures=False,
CGN_PDB=CGN,
accept_guess=True)["neighborhoods"];
Will compute contact frequencies for (1 items):
<mdtraj.Trajectory with 280 frames, 8384 atoms, 1044 residues, and unitcells>
with a stride of 1 frames
Using method 'lig_resSeq+' these fragments were found
fragment 0 with 354 AAs LEU4 ( 0) - LEU394 (353 ) (0) resSeq jumps
fragment 1 with 340 AAs GLN1 ( 354) - ASN340 (693 ) (1)
fragment 2 with 66 AAs ALA2 ( 694) - PHE67 (759 ) (2)
fragment 3 with 283 AAs GLU30 ( 760) - LEU340 (1042 ) (3) resSeq jumps
fragment 4 with 1 AAs P0G395 ( 1043) - P0G395 (1043 ) (4)
GPCR-labels align best with fragments: [3] (first-last: GLU30-LEU340).
These are the GPCR fragments mapped onto your topology:
TM1 with 32 AAs GLU30@1.29x29 ( 760) - PHE61@1.60x60 (791 ) (TM1)
ICL1 with 4 AAs GLU62@12.48x48 ( 792) - GLN65@12.51x51 (795 ) (ICL1)
TM2 with 32 AAs THR66@2.37x37 ( 796) - LYS97@2.68x67 (827 ) (TM2)
ECL1 with 4 AAs MET98@23.49x49 ( 828) - PHE101@23.52x52 (831 ) (ECL1)
TM3 with 36 AAs GLY102@3.21x21 ( 832) - SER137@3.56x56 (867 ) (TM3)
ICL2 with 8 AAs PRO138@34.50x50 ( 868) - LEU145@34.57x57 (875 ) (ICL2)
TM4 with 27 AAs THR146@4.38x38 ( 876) - HIS172@4.64x64 (902 ) (TM4)
ECL2 with 20 AAs TRP173 ( 903) - THR195 (922 ) (ECL2) resSeq jumps
TM5 with 42 AAs ASN196@5.35x36 ( 923) - GLU237@5.76x76 (964 ) (TM5)
ICL3 with 2 AAs GLY238 ( 965) - ARG239 (966 ) (ICL3)
TM6 with 35 AAs CYS265@6.27x27 ( 967) - GLN299@6.61x61 (1001 ) (TM6)
ECL3 with 4 AAs ASP300 ( 1002) - ILE303 (1005 ) (ECL3)
TM7 with 25 AAs ARG304@7.31x30 ( 1006) - ARG328@7.55x55 (1030 ) (TM7)
H8 with 12 AAs SER329@8.47x47 ( 1031) - LEU340@8.58x58 (1042 ) (H8)
CGN-labels align best with fragments: [0] (first-last: LEU4-LEU394).
These are the CGN fragments mapped onto your topology:
G.HN with 28 AAs THR9@G.HN.26 ( 5) - VAL36@G.HN.53 (32 ) (G.HN)
G.hns1 with 3 AAs TYR37@G.hns1.1 ( 33) - ALA39@G.hns1.3 (35 ) (G.hns1)
G.S1 with 7 AAs THR40@G.S1.1 ( 36) - LEU46@G.S1.7 (42 ) (G.S1)
G.s1h1 with 6 AAs GLY47@G.s1h1.1 ( 43) - GLY52@G.s1h1.6 (48 ) (G.s1h1)
G.H1 with 7 AAs LYS53@G.H1.1 ( 49) - GLN59@G.H1.7 (55 ) (G.H1)
H.HA with 26 AAs LYS88@H.HA.4 ( 56) - LEU113@H.HA.29 (81 ) (H.HA)
H.hahb with 9 AAs VAL114@H.hahb.1 ( 82) - PRO122@H.hahb.9 (90 ) (H.hahb)
H.HB with 14 AAs GLU123@H.HB.1 ( 91) - ASN136@H.HB.14 (104 ) (H.HB)
H.hbhc with 7 AAs VAL137@H.hbhc.1 ( 105) - PRO143@H.hbhc.15 (111 ) (H.hbhc)
H.HC with 12 AAs PRO144@H.HC.1 ( 112) - GLU155@H.HC.12 (123 ) (H.HC)
H.hchd with 1 AAs ASP156@H.hchd.1 ( 124) - ASP156@H.hchd.1 (124 ) (H.hchd)
H.HD with 12 AAs GLU157@H.HD.1 ( 125) - GLU168@H.HD.12 (136 ) (H.HD)
H.hdhe with 5 AAs TYR169@H.hdhe.1 ( 137) - ASP173@H.hdhe.5 (141 ) (H.hdhe)
H.HE with 13 AAs CYS174@H.HE.1 ( 142) - LYS186@H.HE.13 (154 ) (H.HE)
H.hehf with 7 AAs GLN187@H.hehf.1 ( 155) - SER193@H.hehf.7 (161 ) (H.hehf)
H.HF with 6 AAs ASP194@H.HF.1 ( 162) - ARG199@H.HF.6 (167 ) (H.HF)
G.hfs2 with 5 AAs CYS200@G.hfs2.1 ( 168) - GLY206@G.hfs2.7 (172 ) (G.hfs2) resSeq jumps
G.S2 with 8 AAs ILE207@G.S2.1 ( 173) - VAL214@G.S2.8 (180 ) (G.S2)
G.s2s3 with 2 AAs ASP215@G.s2s3.1 ( 181) - LYS216@G.s2s3.2 (182 ) (G.s2s3)
G.S3 with 8 AAs VAL217@G.S3.1 ( 183) - VAL224@G.S3.8 (190 ) (G.S3)
G.s3h2 with 3 AAs GLY225@G.s3h2.1 ( 191) - GLN227@G.s3h2.3 (193 ) (G.s3h2)
G.H2 with 10 AAs ARG228@G.H2.1 ( 194) - CYS237@G.H2.10 (203 ) (G.H2)
G.h2s4 with 5 AAs PHE238@G.h2s4.1 ( 204) - THR242@G.h2s4.5 (208 ) (G.h2s4)
G.S4 with 7 AAs ALA243@G.S4.1 ( 209) - ALA249@G.S4.7 (215 ) (G.S4)
G.s4h3 with 8 AAs SER250@G.s4h3.1 ( 216) - ASN264@G.s4h3.15 (223 ) (G.s4h3) resSeq jumps
G.H3 with 18 AAs ARG265@G.H3.1 ( 224) - LEU282@G.H3.18 (241 ) (G.H3)
G.h3s5 with 3 AAs ARG283@G.h3s5.1 ( 242) - ILE285@G.h3s5.3 (244 ) (G.h3s5)
G.S5 with 7 AAs SER286@G.S5.1 ( 245) - ASN292@G.S5.7 (251 ) (G.S5)
G.s5hg with 1 AAs LYS293@G.s5hg.1 ( 252) - LYS293@G.s5hg.1 (252 ) (G.s5hg)
G.HG with 17 AAs GLN294@G.HG.1 ( 253) - ASP310@G.HG.17 (269 ) (G.HG)
G.hgh4 with 10 AAs TYR311@G.hgh4.1 ( 270) - THR320@G.hgh4.10 (279 ) (G.hgh4)
G.H4 with 27 AAs PRO321@G.H4.1 ( 280) - ARG347@G.H4.27 (306 ) (G.H4)
G.h4s6 with 11 AAs ILE348@G.h4s6.1 ( 307) - TYR358@G.h4s6.20 (317 ) (G.h4s6)
G.S6 with 5 AAs CYS359@G.S6.1 ( 318) - PHE363@G.S6.5 (322 ) (G.S6)
G.s6h5 with 5 AAs THR364@G.s6h5.1 ( 323) - ASP368@G.s6h5.5 (327 ) (G.s6h5)
G.H5 with 26 AAs THR369@G.H5.1 ( 328) - LEU394@G.H5.26 (353 ) (G.H5)
Will compute neighborhoods for the residues
R131
excluding 4 nearest neighbors
residue residx fragment resSeq GPCR CGN
ARG131 861 3 131 3.50x50 None
Pre-computing likely neighborhoods by reducing the neighbor-list
to those within 15 Angstrom in the first frame of reference geom
'<mdtraj.Trajectory with 1 frames, 8384 atoms, 1044 residues, and unitcells>':...done!
From 1035 potential distances, the neighborhoods have been reduced to only 104 potential contacts.
If this number is still too high (i.e. the computation is too slow), consider using a smaller nlist_cutoff_Ang
Streaming over trajectory object nr. 0 ( 280 frames, 280 with stride 1) in chunks of 10000 frames. Now at chunk nr 0, frames so far 280
#idx freq contact fragments res_idxs ctc_idx Sum
1: 0.88 ARG131-TYR391 3-0 861-350 11 0.88
2: 0.69 ARG131-TYR326 3-3 861-1028 95 1.56
3: 0.44 ARG131-TYR219 3-3 861-946 65 2.00
4: 0.12 ARG131-ILE278 3-3 861-980 84 2.12
These 5 contacts capture 2.12 (~100%) of the total frequency 2.12 (over 104 input contacts)
As orientation value, the first 3 ctcs already capture 90.0% of 2.12.
The 3-th contact has a frequency of 0.44
Please note that, because the molecular topologies differ, the residue R131
is has different indices in each topology, namely 1007 in the X-ray crystal, 861 in the MD simulation:
[26]:
R131_Xray, R131_MD
[26]:
({1007: <mdciao.contacts.contacts.ContactGroup at 0x7f468fa05fd0>},
{861: <mdciao.contacts.contacts.ContactGroup at 0x7f469d472910>})
That will frequently be the case when comparing different proteins of the same family, or topologies where not all sub-units have been modelled etc or antying that produces a shift in these indices.
mdciao
understands R131
automatically and doesn’t ask more questions, as long as there’s an obvious R131
candidate. Otherwise the user will be prompted for disambiguation.
In this case, now we create a dictionary of ContactGroups that represent the R131 in both topologies:
[27]:
R131 = {
"Xray": R131_Xray[1007],
"MD" : R131_MD[861]
}
#np.save("R131.npz",R131)
Now we can just pass this dictionary to cli.compare
to see their contact frequencies. This module is pretty flexible on inputs and outputs, check the documentation to learn more:
[28]:
mdciao.cli.compare(R131, ctc_cutoff_Ang=4, defrag=None, anchor="R131");
These interactions are not shared:
Y219@5.58x58
Their cumulative ctc freq is 0.44.
Created files
freq_comparison.pdf
freq_comparison.xlsx