Covid-19 Spike Protein Example 1: Residue Neighborhoods

In this notebook, we use mdciao to analyze publicly available MD data of the Covid-19 Spike Protein, curated in the impressive **COVID-19 Molecular Structure and Therapeutics Hub** put together by the Molecular Sciences Software Institute (molSSI).

In particular, we use the data generated in the Chodera-Lab by Ivy Zhang, consisting of Folding@home simulations of the SARS-CoV-2 spike RBD bound to human ACE2 (725.3 µs ). We quote:

All-atom MD simulations of the SARS-CoV-2 spike protein receptor binding domain (RBD) bound to human angiotensin converting enzyme-related carboypeptidase (ACE2), simulated using Folding@Home. The “wild-type” RBD and three mutants (N439K, K417V, and the double mutant N439K/K417V) were simulated.
RUNs denote different RBD mutants: N439K (RUN0), K417V (RUN1), N439K/K417V (RUN2), and WT (RUN3). CLONEs denote different independent replica trajectories

Goals

We are going to use mdciao to compare the effect of the above mutations on the mutated residues’s vicinities or neighborhoods. This is rather simple:

  1. Compute the residue neighborhoods for the mutated positions N439 and K417 in all four datasets, using mdciao.cli.residue_neighborhoods

  2. Compare these neighborhoods using mdciao.cli.compare for changes in the residue-residue contacts.

Residues of Interest and Molecular Topology

Since we’re not really familiar with the dataset, we can do several things to ease us into the data. Most helpful is to visualize one sample file in 3D, using the awesome nglviewer and mdtraj. We can visually inspect the setup, its components etc:

Note

A great deal of mdciao’s handling of molecular information and I/O is done using mdtraj.

[3]:
import nglview, mdtraj as md, os
datadir = "/media/perezheg/6TB_HD/FAH_aws/"
iwd = nglview.show_mdtraj(md.load(os.path.join(datadir, "run3-clone0.h5"))[0])
iwd
/home/perezheg/miniconda3/lib/python3.8/site-packages/mdtraj/core/trajectory.py:441: UserWarning: top= kwargs ignored since this file parser does not support it
  warnings.warn("top= kwargs ignored since this file parser does not support it")

Next, we’ll use mdciao’s mdciao.cli.residue_selection to locate the mutated residues in the molecular topology. Since mdciao.cli.residue_selection internally calls mdciao.fragments.get_fragments, we also get to see how the topology is structured into chains

Note

chains isn’t really a heuristic, since the .h5 files contain chain-tags for the residues, but that will very often not be the case.

[4]:
import mdciao
import numpy as np
from glob import glob
import os
[5]:
for ff in sorted(glob(os.path.join(datadir, "run?-clone0*h5"))):
    print("Chains for files of type:",os.path.basename(ff))
    mdciao.cli.residue_selection([417,439],ff,fragments=["chains"]);
    #frags = mdciao.fragments.get_fragments(ff, method="chains");
    print()
Chains for files of type: run0-clone0.h5
Using method 'chains' these fragments were found
fragment      0 with    196 AAs   ACE332 (   0) -   NME527 (195 ) (0)
fragment      1 with     10 AAs   UYB729 ( 196) -   0fA738 (205 ) (1)
fragment      2 with    709 AAs    ACE18 ( 206) -   NME726 (914 ) (2)
fragment      3 with     58 AAs   UYB739 ( 915) -   0fA796 (972 ) (3)
fragment      4 with      2 AAs    CL973 ( 973) -      ZN1 (974 ) (4)  resSeq jumps
fragment      5 with    365 AAs   Na+976 ( 975) -  Na+1340 (1339) (5)
0.0)       LYS417 in fragment 0 with residue index 85
2.0)       HIS417 in fragment 2 with residue index 605

0.0)       LYS439 in fragment 0 with residue index 107
2.0)       LEU439 in fragment 2 with residue index 627

Your selection '[417, 439]' yields:
   residue      residx    fragment      resSeq
    LYS417          85           0         417
    HIS417         605           2         417
    LYS439         107           0         439
    LEU439         627           2         439

Chains for files of type: run1-clone0.h5
Using method 'chains' these fragments were found
fragment      0 with    196 AAs   ACE332 (   0) -   NME527 (195 ) (0)
fragment      1 with     10 AAs   UYB729 ( 196) -   0fA738 (205 ) (1)
fragment      2 with    709 AAs    ACE18 ( 206) -   NME726 (914 ) (2)
fragment      3 with     58 AAs   UYB739 ( 915) -   0fA796 (972 ) (3)
fragment      4 with      2 AAs    CL973 ( 973) -      ZN1 (974 ) (4)  resSeq jumps
fragment      5 with    367 AAs   Na+976 ( 975) -  Na+1342 (1341) (5)
0.0)       VAL417 in fragment 0 with residue index 85
2.0)       HIS417 in fragment 2 with residue index 605

0.0)       ASN439 in fragment 0 with residue index 107
2.0)       LEU439 in fragment 2 with residue index 627

Your selection '[417, 439]' yields:
   residue      residx    fragment      resSeq
    VAL417          85           0         417
    HIS417         605           2         417
    ASN439         107           0         439
    LEU439         627           2         439

Chains for files of type: run2-clone0.h5
Using method 'chains' these fragments were found
fragment      0 with    196 AAs   ACE332 (   0) -   NME527 (195 ) (0)
fragment      1 with     10 AAs   UYB729 ( 196) -   0fA738 (205 ) (1)
fragment      2 with    709 AAs    ACE18 ( 206) -   NME726 (914 ) (2)
fragment      3 with     58 AAs   UYB739 ( 915) -   0fA796 (972 ) (3)
fragment      4 with      2 AAs    CL973 ( 973) -      ZN1 (974 ) (4)  resSeq jumps
fragment      5 with    366 AAs   Na+976 ( 975) -  Na+1341 (1340) (5)
0.0)       VAL417 in fragment 0 with residue index 85
2.0)       HIS417 in fragment 2 with residue index 605

0.0)       LYS439 in fragment 0 with residue index 107
2.0)       LEU439 in fragment 2 with residue index 627

Your selection '[417, 439]' yields:
   residue      residx    fragment      resSeq
    VAL417          85           0         417
    HIS417         605           2         417
    LYS439         107           0         439
    LEU439         627           2         439

Chains for files of type: run3-clone0.h5
Using method 'chains' these fragments were found
fragment      0 with    196 AAs   ACE332 (   0) -   NME527 (195 ) (0)
fragment      1 with     10 AAs   UYB729 ( 196) -   0fA738 (205 ) (1)
fragment      2 with    709 AAs    ACE18 ( 206) -   NME726 (914 ) (2)
fragment      3 with     58 AAs   UYB739 ( 915) -   0fA796 (972 ) (3)
fragment      4 with      2 AAs    CL973 ( 973) -      ZN1 (974 ) (4)  resSeq jumps
fragment      5 with    366 AAs   Na+976 ( 975) -  Na+1341 (1340) (5)
0.0)       LYS417 in fragment 0 with residue index 85
2.0)       HIS417 in fragment 2 with residue index 605

0.0)       ASN439 in fragment 0 with residue index 107
2.0)       LEU439 in fragment 2 with residue index 627

Your selection '[417, 439]' yields:
   residue      residx    fragment      resSeq
    LYS417          85           0         417
    HIS417         605           2         417
    ASN439         107           0         439
    LEU439         627           2         439

Some observations:

  1. All setups have the same overall topology, with the same six fragments/chains:

  • RBD (SARS-CoV-2 spike protein receptor binding domain)

  • Glycosylations@RBD

  • ACE (human angiotensin converting enzyme-related carboypeptidase)

  • Glysicalations@RBD

  • ions (probably crystallographic ions CL and ZN)

  • ions (NaCl)

  1. We clearly identify that the residues with resSeq 417 and 439 in the Spike-RBD have the serial indices 85 and 107, respectively in all setups. It’s the part of the outputs that look like this:

    ``` 0.0) LYS417 in fragment 0 with residue index 85 2.0) HIS417 in fragment 2 with residue index 605 0.0) ASN439 in fragment 0 with residue index 107 2.0) LEU439 in fragment 2 with residue index 627 Your selection ‘[417, 439]’ yields: residue residx fragment resSeq BW CGN LYS417 85 0 417 None None HIS417 605 2 417 None None ASN439 107 0 439 None None LEU439 627 2 439 None None

  2. Depending on the RUN index (0 to 3), these positions have different aminoacids (AAs) because of point mutations:

  • run0 : N439K

  • run1 : K417V

  • run2 : N439K/K417V

  • run3 : WT

We can also use the RUN1,RUN2,... filename-schemes to create a dictionary assigning filenaming-schemes to a setup, as we do below with the filename2system-dictionary:

[6]:
filename2system = {
    "run0" : "N439K",
    "run1" : "K417V",
    "run2" : "N439K/K417V",
    "run3" : "WT"
}

Viewing Local Changes through Residue Neighborhoods

We now compute the residue neighborhoods around the mutated sites using mdciao.cli.residue_neighborhoods.

Since the point-mutations didn’t alter the serial indices 85 and 107 for resSeqs 417 and 439, respectively, we can use them to address these positions in all setups.

Computing the Neighborhoods

[7]:
if False:
    neighborhoods = {}
    for fn, syskey in filename2system.items():
        ineigh = mdciao.cli.residue_neighborhoods([85,107],
                                                  os.path.join(datadir,fn+"*.h5"),
                                                  res_idxs=True,
                                                  n_jobs=7,
                                                  fragment_names=["RBD","GLY_RBD","ACE","GLY_ACE","ions","ions"],
                                                  fragments="chains",
                                                  ctc_control=1.,
                                                  no_disk=True,
                                                  plot_timedep=False)
        neighborhoods[syskey]=ineigh
else:
    pos417 = np.load("pos417.npy",allow_pickle=True)[()]
    pos439 = np.load("pos439.npy",allow_pickle=True)[()]

After we have computed the per-residue neighborhoods, the next cell is just some reordering of the data into more practical, per-residue dictionaries:

[8]:
if "neighborhoods" in locals():
    neighborhoods = {key : neighborhoods[key] for key in ['WT', 'K417V', 'N439K','N439K/K417V']}
    pos417 = {key:val[85] for key, val in neighborhoods.items()}
    pos439 = {key:val[107] for key, val in neighborhoods.items()}

Before we continue, we can save the neighborhoods for later use, s.t. they don’t have to be computed every time:

[9]:
np.save("pos417.npy",pos417)
np.save("pos439.npy",pos439)

Plotting the Contact-Frequencies

Next, we use mdciao.cli.compare plot the per-residue contact-frequencies of all four systems next to one another. The function call can be minimal or very customized, here we present some parameter values that yield the most informative plots:

[10]:
from matplotlib import pyplot as plt
myfig, myax = plt.subplots(1,2, sharey=True, figsize=(20,6))
mdciao.cli.compare(pos417, ctc_cutoff_Ang=4,
                   mutations_dict={"V417": "K417",
                                   "K439": "N439"
                                  },
                   anchor="K417",
                   defrag=None,
                   width=.15,
                   lower_cutoff_val=.2,
                   fontsize=12,
                   legend_rows=2,
                   colors="Set2",
                   ax=myax[0],
                   #distro=True,
                   #n_cols=3,
                  );

mdciao.cli.compare(pos439, ctc_cutoff_Ang=4,
                   mutations_dict={"V417": "K417",
                                   "K439": "N439"
                                  },
                   anchor="N439",
                   width=.15,
                   lower_cutoff_val=.2,
                   defrag=None,
                   fontsize=12,
                   legend_rows=2,
                   colors="Set2",
                   ax=myax[1],
                  );
These interactions are not shared:
0YB754@GLY_ACE, 0fA755@GLY_ACE, 4YB748@GLY_ACE, E23@ACE, F456@RBD, K26@ACE, N460@RBD, Na+1240@ions, Na+1338@ions, Q493@RBD, R403@RBD, R408@RBD, VMB749@GLY_ACE
Their cumulative ctc freq is 0.66.
/home/perezheg/Programs/mdciao/mdciao/cli/cli.py:2078: UserWarning: The figure layout has changed to tight
  myfig.tight_layout()
Created files
freq_comparison.pdf
freq_comparison.xlsx
These interactions are not shared:
0YB768@GLY_ACE, 0YB770@GLY_ACE, 0fA771@GLY_ACE, 2MA732@GLY_RBD, 2MA767@GLY_ACE, 2MA769@GLY_ACE, 4YB733@GLY_RBD, E329@ACE, F497@RBD, G326@ACE, R509@RBD, T324@ACE, T500@RBD, V503@RBD, VMB766@GLY_ACE
Their cumulative ctc freq is 0.02.
/home/perezheg/Programs/mdciao/mdciao/plots/plots.py:1002: UserWarning: The figure layout has changed to tight
  _plt.gcf().tight_layout()
/home/perezheg/Programs/mdciao/mdciao/cli/cli.py:2078: UserWarning: The figure layout has changed to tight
  myfig.tight_layout()
Created files
freq_comparison.pdf
freq_comparison.xlsx
../_images/notebooks_Covid-19-Spike-Protein-Example_20_5.png

Some observations:

Distributions and Violin Plots

Frequencies with a hard cutoff (4 Angstrom, in this case) are reductive, and don’t tell the whole story, but allow for a compact overview a the large dataset (8K trajectories, 4 setups) in one plot per mutated residue, and already give an impression of the local changes around the mutated sites.

To get a cut-off independent comparison, we can use several mdciao-methods:

  1. Use again mdciao.cli.compare, but uncommenting the distro=True parameter: For each residue neighborhood, this will produce a multi-panel figure (you can control its column-to-row form with n_cols) where each panel contains the distribution of one residue-residue distance for all systems: 'WT', 'K417V', 'N439K','N439K/K417V'. The distributions are much more telling of the behaviour of the residue-residue distance, informing about it’s about it’s modes and spread.

Note

You also have to comment-out the lower_cutoff_val=.2 parameter.

  1. Compact all the distributions into one single-panel, using violin-plots. The violins are simply the same distributions as above, only mirored (to get the nice, symmetric violins) and oriented vertically along the y-axis

Let’s try both options for the K417-neighborhood:

[11]:
mdciao.cli.compare(pos417, ctc_cutoff_Ang=4,
                   mutations_dict={"V417": "K417",
                                   "K439": "N439"
                                  },
                   anchor="K417",
                   defrag=None,
                   width=.15,
                   lower_cutoff_val=.2,
                   fontsize=12,
                   legend_rows=2,
                   colors="Set2",
                   distro=True,
                   n_cols=3,
                   #sharex=True,
                  );
/home/perezheg/Programs/mdciao/mdciao/cli/cli.py:2078: UserWarning: The figure layout has changed to tight
  myfig.tight_layout()
Created files
freq_comparison.pdf
freq_comparison.xlsx
../_images/notebooks_Covid-19-Spike-Protein-Example_23_2.png

The multi-panel figure gives more detail about each residue-residue distance, but makes comparing accross distances a bit harder (even if you use the sharex=True parameter, which you should try).

Now we try the comparison via violin plots, using mdciao.plots.compare_violins:

[12]:
mdciao.plots.compare_violins(pos417,
                             ctc_cutoff_Ang=4,
                             figsize=(15,5),
                             colors="Set2",
                             defrag=None,
                             anchor="K417",
                             legend_rows=2,
                             mutations_dict={"V417": "K417",
                                             "K439": "N439"
                                            }
                            );
../_images/notebooks_Covid-19-Spike-Protein-Example_25_0.png

This plot is now more crowded but contains more information in one single panel, allowing to quickly differentiate the residue-residue distance-distributions, and the effect that the mutations have on them. We see how some of them remain compact, some are simply shifted, some are bimodal, have long tails etc. Please note the use of the zero_freq=0.01 parameter by default in mdciao.compare.compare_violins has taken hidden some distributions away (the Na+ ions in particular).

Note

It’s a good exercise to compare the violins with the individual, per-contact panels from above and see that, indeed, the violins are just mirrored distributions.

Targeting Pre-Defined Sites

Wheareas the mdciao.cli.residue_neighborhoods generates a list of likely neighbors by scannig over all possible pairs, over all data, we can also simply provide a list of residue-residue distances of interest, regardless of whether they’re likely neighbors or not. This has the advantage of reducing the number of overall residue-residue distances computed, but neglects anything not included in that list.

In mdciao, such a small(ish) group of contacts is called a site, and there’s an API-submodule to handle them as well as a CLI-method to compute them, mdciao.cli.sites.

To select these contacts of interest, we can use our observations of the frequency plots from above. We will focus on the contacts most impacted by the mutations, and put them into a site definition, which is just a python dictionary with some human-readable form.

Note

Although the word ``site`` somehow hints at the contacts being spatially close (like a binding site, or a motif, etc), this doesn’t have to be the case. ``site`` is just a label under which to group contacts.

In any case, you can pass a list of sites to the CLI and API methods if you prefer to break down a large(ish) list of contacts into several smaller ones.

Check mdciao.sites for more info on constructing sites, interactively or as .JSON files.

[13]:
my_site ={"name" :"changed contacts",
          "pairs":{"AAresSeq":[
                     "K417-D30",
                     "K417-L455",
                     "K417-R454",
                     "K417-Y453",
                     "N439-P499",
                     "N439-Q498"
                     ]}}

We do a little Python to make sure the above dictionary works with all mutants:

[14]:
from copy import deepcopy
my_sites_mut_adapted={}
for syskey in filename2system.values():
    #print(syskey)
    my_sites_mut_adapted[syskey]=deepcopy(my_site)
    mutations_dict={}

    #Only populate mutations_dict if needed
    if syskey!="WT":
        for mut in syskey.split("/"):
            mutations_dict.update({mut[:4]:mut[-1]+mut[1:-1]})
    # We use one of mdciao's str_and_dict method for recursive mutation
    my_sites_mut_adapted[syskey]["pairs"]['AAresSeq']=[mdciao.utils.str_and_dict.replace_w_dict(pair,mutations_dict) for pair in my_sites_mut_adapted[syskey]["pairs"]["AAresSeq"]]
my_sites_mut_adapted
[14]:
{'N439K': {'name': 'changed contacts',
  'pairs': {'AAresSeq': ['K417-D30',
    'K417-L455',
    'K417-R454',
    'K417-Y453',
    'K439-P499',
    'K439-Q498']}},
 'K417V': {'name': 'changed contacts',
  'pairs': {'AAresSeq': ['V417-D30',
    'V417-L455',
    'V417-R454',
    'V417-Y453',
    'N439-P499',
    'N439-Q498']}},
 'N439K/K417V': {'name': 'changed contacts',
  'pairs': {'AAresSeq': ['V417-D30',
    'V417-L455',
    'V417-R454',
    'V417-Y453',
    'K439-P499',
    'K439-Q498']}},
 'WT': {'name': 'changed contacts',
  'pairs': {'AAresSeq': ['K417-D30',
    'K417-L455',
    'K417-R454',
    'K417-Y453',
    'N439-P499',
    'N439-Q498']}}}

Now that we have per-setup site definitions, we simply iterate again over setups passing the corresponding site definition:

Note

simply means that a very simple call to the method would be enough. Below, we use the optional parameters to showcase some of the advanced functionality.

Computing the Sites

[15]:
sites = {}
for fn, syskey in filename2system.items():
    print(syskey)
    pattern = os.path.join(datadir,fn+"*.h5")
    sites[syskey] = mdciao.cli.sites([my_sites_mut_adapted[syskey]],
                                     pattern,
                                     n_jobs=7,
                                     ctc_cutoff_Ang=4,
                                     fragment_names=["RBD","GLY_RBD","ACE","GLY_ACE","ions","ions"],
                                     fragments="chains",
                                     no_disk=True,
                                     figures=False,
                                     plot_timedep=False)["changed contacts"]
N439K
/home/perezheg/miniconda3/lib/python3.8/site-packages/mdtraj/core/trajectory.py:441: UserWarning: top= kwargs ignored since this file parser does not support it
  warnings.warn("top= kwargs ignored since this file parser does not support it")
Will compute the sites
 site dict with name 'changed contacts'
in the trajectories:
/media/perezheg/6TB_HD/FAH_aws/run0-clone0.h5
/media/perezheg/6TB_HD/FAH_aws/run0-clone1.h5
/media/perezheg/6TB_HD/FAH_aws/run0-clone2.h5
/media/perezheg/6TB_HD/FAH_aws/run0-clone3.h5
/media/perezheg/6TB_HD/FAH_aws/run0-clone4.h5
/media/perezheg/6TB_HD/FAH_aws/run0-clone5.h5
/media/perezheg/6TB_HD/FAH_aws/run0-clone6.h5
/media/perezheg/6TB_HD/FAH_aws/run0-clone7.h5
/media/perezheg/6TB_HD/FAH_aws/run0-clone8.h5
/media/perezheg/6TB_HD/FAH_aws/run0-clone9.h5
/media/perezheg/6TB_HD/FAH_aws/run0-clone10.h5
/media/perezheg/6TB_HD/FAH_aws/run0-clone11.h5
/media/perezheg/6TB_HD/FAH_aws/run0-clone12.h5
/media/perezheg/6TB_HD/FAH_aws/run0-clone13.h5
/media/perezheg/6TB_HD/FAH_aws/run0-clone14.h5
...[long list: omitted 1969 items]...
/media/perezheg/6TB_HD/FAH_aws/run0-clone1985.h5
/media/perezheg/6TB_HD/FAH_aws/run0-clone1986.h5
/media/perezheg/6TB_HD/FAH_aws/run0-clone1987.h5
/media/perezheg/6TB_HD/FAH_aws/run0-clone1988.h5
/media/perezheg/6TB_HD/FAH_aws/run0-clone1989.h5
/media/perezheg/6TB_HD/FAH_aws/run0-clone1990.h5
/media/perezheg/6TB_HD/FAH_aws/run0-clone1991.h5
/media/perezheg/6TB_HD/FAH_aws/run0-clone1992.h5
/media/perezheg/6TB_HD/FAH_aws/run0-clone1993.h5
/media/perezheg/6TB_HD/FAH_aws/run0-clone1994.h5
/media/perezheg/6TB_HD/FAH_aws/run0-clone1995.h5
/media/perezheg/6TB_HD/FAH_aws/run0-clone1996.h5
/media/perezheg/6TB_HD/FAH_aws/run0-clone1997.h5
/media/perezheg/6TB_HD/FAH_aws/run0-clone1998.h5
/media/perezheg/6TB_HD/FAH_aws/run0-clone1999.h5
 with a stride of 1 frames.

Using method 'chains' these fragments were found
fragment      0 with    196 AAs   ACE332 (   0) -   NME527 (195 ) (0)
fragment      1 with     10 AAs   UYB729 ( 196) -   0fA738 (205 ) (1)
fragment      2 with    709 AAs    ACE18 ( 206) -   NME726 (914 ) (2)
fragment      3 with     58 AAs   UYB739 ( 915) -   0fA796 (972 ) (3)
fragment      4 with      2 AAs    CL973 ( 973) -      ZN1 (974 ) (4)  resSeq jumps
fragment      5 with    365 AAs   Na+976 ( 975) -  Na+1340 (1339) (5)
These are the residues that could be found:
   residue      residx    fragment      resSeq
    LYS417          85           0         417
    LYS439         107           0         439
    TYR453         121           0         453
    ARG454         122           0         454
    LEU455         123           0         455
    GLN498         166           0         498
    PRO499         167           0         499
     ASP30         218           2          30

Site 'changed contacts':
   freq                label   residues fragments   sum
0  0.81  K417@RBD - D30@ACE    85 - 218     0 - 2  0.81
1  0.88  K417@RBD - L455@RBD   85 - 123     0 - 0  1.68
2  0.43  K417@RBD - R454@RBD   85 - 122     0 - 0  2.12
3  0.06  K417@RBD - Y453@RBD   85 - 121     0 - 0  2.18
4  0.33  K439@RBD - P499@RBD  107 - 167     0 - 0  2.52
5  0.00  K439@RBD - Q498@RBD  107 - 166     0 - 0  2.52

K417V
/home/perezheg/miniconda3/lib/python3.8/site-packages/mdtraj/core/trajectory.py:441: UserWarning: top= kwargs ignored since this file parser does not support it
  warnings.warn("top= kwargs ignored since this file parser does not support it")
Will compute the sites
 site dict with name 'changed contacts'
in the trajectories:
/media/perezheg/6TB_HD/FAH_aws/run1-clone0.h5
/media/perezheg/6TB_HD/FAH_aws/run1-clone1.h5
/media/perezheg/6TB_HD/FAH_aws/run1-clone2.h5
/media/perezheg/6TB_HD/FAH_aws/run1-clone3.h5
/media/perezheg/6TB_HD/FAH_aws/run1-clone4.h5
/media/perezheg/6TB_HD/FAH_aws/run1-clone5.h5
/media/perezheg/6TB_HD/FAH_aws/run1-clone6.h5
/media/perezheg/6TB_HD/FAH_aws/run1-clone7.h5
/media/perezheg/6TB_HD/FAH_aws/run1-clone8.h5
/media/perezheg/6TB_HD/FAH_aws/run1-clone9.h5
/media/perezheg/6TB_HD/FAH_aws/run1-clone10.h5
/media/perezheg/6TB_HD/FAH_aws/run1-clone11.h5
/media/perezheg/6TB_HD/FAH_aws/run1-clone12.h5
/media/perezheg/6TB_HD/FAH_aws/run1-clone13.h5
/media/perezheg/6TB_HD/FAH_aws/run1-clone14.h5
...[long list: omitted 1968 items]...
/media/perezheg/6TB_HD/FAH_aws/run1-clone1985.h5
/media/perezheg/6TB_HD/FAH_aws/run1-clone1986.h5
/media/perezheg/6TB_HD/FAH_aws/run1-clone1987.h5
/media/perezheg/6TB_HD/FAH_aws/run1-clone1988.h5
/media/perezheg/6TB_HD/FAH_aws/run1-clone1989.h5
/media/perezheg/6TB_HD/FAH_aws/run1-clone1990.h5
/media/perezheg/6TB_HD/FAH_aws/run1-clone1991.h5
/media/perezheg/6TB_HD/FAH_aws/run1-clone1992.h5
/media/perezheg/6TB_HD/FAH_aws/run1-clone1993.h5
/media/perezheg/6TB_HD/FAH_aws/run1-clone1994.h5
/media/perezheg/6TB_HD/FAH_aws/run1-clone1995.h5
/media/perezheg/6TB_HD/FAH_aws/run1-clone1996.h5
/media/perezheg/6TB_HD/FAH_aws/run1-clone1997.h5
/media/perezheg/6TB_HD/FAH_aws/run1-clone1998.h5
/media/perezheg/6TB_HD/FAH_aws/run1-clone1999.h5
 with a stride of 1 frames.

Using method 'chains' these fragments were found
fragment      0 with    196 AAs   ACE332 (   0) -   NME527 (195 ) (0)
fragment      1 with     10 AAs   UYB729 ( 196) -   0fA738 (205 ) (1)
fragment      2 with    709 AAs    ACE18 ( 206) -   NME726 (914 ) (2)
fragment      3 with     58 AAs   UYB739 ( 915) -   0fA796 (972 ) (3)
fragment      4 with      2 AAs    CL973 ( 973) -      ZN1 (974 ) (4)  resSeq jumps
fragment      5 with    367 AAs   Na+976 ( 975) -  Na+1342 (1341) (5)
These are the residues that could be found:
   residue      residx    fragment      resSeq
    VAL417          85           0         417
    ASN439         107           0         439
    TYR453         121           0         453
    ARG454         122           0         454
    LEU455         123           0         455
    GLN498         166           0         498
    PRO499         167           0         499
     ASP30         218           2          30

Site 'changed contacts':
   freq                label   residues fragments   sum
0  0.00  V417@RBD - D30@ACE    85 - 218     0 - 2  0.00
1  0.58  V417@RBD - L455@RBD   85 - 123     0 - 0  0.58
2  0.90  V417@RBD - R454@RBD   85 - 122     0 - 0  1.47
3  0.45  V417@RBD - Y453@RBD   85 - 121     0 - 0  1.92
4  0.98  N439@RBD - P499@RBD  107 - 167     0 - 0  2.91
5  0.55  N439@RBD - Q498@RBD  107 - 166     0 - 0  3.46

N439K/K417V
/home/perezheg/miniconda3/lib/python3.8/site-packages/mdtraj/core/trajectory.py:441: UserWarning: top= kwargs ignored since this file parser does not support it
  warnings.warn("top= kwargs ignored since this file parser does not support it")
Will compute the sites
 site dict with name 'changed contacts'
in the trajectories:
/media/perezheg/6TB_HD/FAH_aws/run2-clone0.h5
/media/perezheg/6TB_HD/FAH_aws/run2-clone1.h5
/media/perezheg/6TB_HD/FAH_aws/run2-clone2.h5
/media/perezheg/6TB_HD/FAH_aws/run2-clone3.h5
/media/perezheg/6TB_HD/FAH_aws/run2-clone4.h5
/media/perezheg/6TB_HD/FAH_aws/run2-clone5.h5
/media/perezheg/6TB_HD/FAH_aws/run2-clone6.h5
/media/perezheg/6TB_HD/FAH_aws/run2-clone7.h5
/media/perezheg/6TB_HD/FAH_aws/run2-clone8.h5
/media/perezheg/6TB_HD/FAH_aws/run2-clone9.h5
/media/perezheg/6TB_HD/FAH_aws/run2-clone10.h5
/media/perezheg/6TB_HD/FAH_aws/run2-clone11.h5
/media/perezheg/6TB_HD/FAH_aws/run2-clone12.h5
/media/perezheg/6TB_HD/FAH_aws/run2-clone13.h5
/media/perezheg/6TB_HD/FAH_aws/run2-clone14.h5
...[long list: omitted 1970 items]...
/media/perezheg/6TB_HD/FAH_aws/run2-clone1985.h5
/media/perezheg/6TB_HD/FAH_aws/run2-clone1986.h5
/media/perezheg/6TB_HD/FAH_aws/run2-clone1987.h5
/media/perezheg/6TB_HD/FAH_aws/run2-clone1988.h5
/media/perezheg/6TB_HD/FAH_aws/run2-clone1989.h5
/media/perezheg/6TB_HD/FAH_aws/run2-clone1990.h5
/media/perezheg/6TB_HD/FAH_aws/run2-clone1991.h5
/media/perezheg/6TB_HD/FAH_aws/run2-clone1992.h5
/media/perezheg/6TB_HD/FAH_aws/run2-clone1993.h5
/media/perezheg/6TB_HD/FAH_aws/run2-clone1994.h5
/media/perezheg/6TB_HD/FAH_aws/run2-clone1995.h5
/media/perezheg/6TB_HD/FAH_aws/run2-clone1996.h5
/media/perezheg/6TB_HD/FAH_aws/run2-clone1997.h5
/media/perezheg/6TB_HD/FAH_aws/run2-clone1998.h5
/media/perezheg/6TB_HD/FAH_aws/run2-clone1999.h5
 with a stride of 1 frames.

Using method 'chains' these fragments were found
fragment      0 with    196 AAs   ACE332 (   0) -   NME527 (195 ) (0)
fragment      1 with     10 AAs   UYB729 ( 196) -   0fA738 (205 ) (1)
fragment      2 with    709 AAs    ACE18 ( 206) -   NME726 (914 ) (2)
fragment      3 with     58 AAs   UYB739 ( 915) -   0fA796 (972 ) (3)
fragment      4 with      2 AAs    CL973 ( 973) -      ZN1 (974 ) (4)  resSeq jumps
fragment      5 with    366 AAs   Na+976 ( 975) -  Na+1341 (1340) (5)
These are the residues that could be found:
   residue      residx    fragment      resSeq
    VAL417          85           0         417
    LYS439         107           0         439
    TYR453         121           0         453
    ARG454         122           0         454
    LEU455         123           0         455
    GLN498         166           0         498
    PRO499         167           0         499
     ASP30         218           2          30

Site 'changed contacts':
   freq                label   residues fragments   sum
0  0.00  V417@RBD - D30@ACE    85 - 218     0 - 2  0.00
1  0.64  V417@RBD - L455@RBD   85 - 123     0 - 0  0.64
2  0.94  V417@RBD - R454@RBD   85 - 122     0 - 0  1.58
3  0.52  V417@RBD - Y453@RBD   85 - 121     0 - 0  2.10
4  0.40  K439@RBD - P499@RBD  107 - 167     0 - 0  2.50
5  0.00  K439@RBD - Q498@RBD  107 - 166     0 - 0  2.50

WT
/home/perezheg/miniconda3/lib/python3.8/site-packages/mdtraj/core/trajectory.py:441: UserWarning: top= kwargs ignored since this file parser does not support it
  warnings.warn("top= kwargs ignored since this file parser does not support it")
Will compute the sites
 site dict with name 'changed contacts'
in the trajectories:
/media/perezheg/6TB_HD/FAH_aws/run3-clone0.h5
/media/perezheg/6TB_HD/FAH_aws/run3-clone1.h5
/media/perezheg/6TB_HD/FAH_aws/run3-clone2.h5
/media/perezheg/6TB_HD/FAH_aws/run3-clone3.h5
/media/perezheg/6TB_HD/FAH_aws/run3-clone4.h5
/media/perezheg/6TB_HD/FAH_aws/run3-clone5.h5
/media/perezheg/6TB_HD/FAH_aws/run3-clone6.h5
/media/perezheg/6TB_HD/FAH_aws/run3-clone7.h5
/media/perezheg/6TB_HD/FAH_aws/run3-clone8.h5
/media/perezheg/6TB_HD/FAH_aws/run3-clone9.h5
/media/perezheg/6TB_HD/FAH_aws/run3-clone10.h5
/media/perezheg/6TB_HD/FAH_aws/run3-clone11.h5
/media/perezheg/6TB_HD/FAH_aws/run3-clone12.h5
/media/perezheg/6TB_HD/FAH_aws/run3-clone13.h5
/media/perezheg/6TB_HD/FAH_aws/run3-clone14.h5
...[long list: omitted 1969 items]...
/media/perezheg/6TB_HD/FAH_aws/run3-clone1985.h5
/media/perezheg/6TB_HD/FAH_aws/run3-clone1986.h5
/media/perezheg/6TB_HD/FAH_aws/run3-clone1987.h5
/media/perezheg/6TB_HD/FAH_aws/run3-clone1988.h5
/media/perezheg/6TB_HD/FAH_aws/run3-clone1989.h5
/media/perezheg/6TB_HD/FAH_aws/run3-clone1990.h5
/media/perezheg/6TB_HD/FAH_aws/run3-clone1991.h5
/media/perezheg/6TB_HD/FAH_aws/run3-clone1992.h5
/media/perezheg/6TB_HD/FAH_aws/run3-clone1993.h5
/media/perezheg/6TB_HD/FAH_aws/run3-clone1994.h5
/media/perezheg/6TB_HD/FAH_aws/run3-clone1995.h5
/media/perezheg/6TB_HD/FAH_aws/run3-clone1996.h5
/media/perezheg/6TB_HD/FAH_aws/run3-clone1997.h5
/media/perezheg/6TB_HD/FAH_aws/run3-clone1998.h5
/media/perezheg/6TB_HD/FAH_aws/run3-clone1999.h5
 with a stride of 1 frames.

Using method 'chains' these fragments were found
fragment      0 with    196 AAs   ACE332 (   0) -   NME527 (195 ) (0)
fragment      1 with     10 AAs   UYB729 ( 196) -   0fA738 (205 ) (1)
fragment      2 with    709 AAs    ACE18 ( 206) -   NME726 (914 ) (2)
fragment      3 with     58 AAs   UYB739 ( 915) -   0fA796 (972 ) (3)
fragment      4 with      2 AAs    CL973 ( 973) -      ZN1 (974 ) (4)  resSeq jumps
fragment      5 with    366 AAs   Na+976 ( 975) -  Na+1341 (1340) (5)
These are the residues that could be found:
   residue      residx    fragment      resSeq
    LYS417          85           0         417
    ASN439         107           0         439
    TYR453         121           0         453
    ARG454         122           0         454
    LEU455         123           0         455
    GLN498         166           0         498
    PRO499         167           0         499
     ASP30         218           2          30

Site 'changed contacts':
   freq                label   residues fragments   sum
0  0.70  K417@RBD - D30@ACE    85 - 218     0 - 2  0.70
1  0.92  K417@RBD - L455@RBD   85 - 123     0 - 0  1.62
2  0.45  K417@RBD - R454@RBD   85 - 122     0 - 0  2.08
3  0.05  K417@RBD - Y453@RBD   85 - 121     0 - 0  2.13
4  0.99  N439@RBD - P499@RBD  107 - 167     0 - 0  3.12
5  0.54  N439@RBD - Q498@RBD  107 - 166     0 - 0  3.65

We can also save the sites for later use:

[16]:
np.save("sites.npy",sites)

Reporting the Sites

Finally, we can use again the compare-method of mdciao to see the impact of the mutations on the site. Since the overall number of computed distances is much smaller, we can use the distro==True and get much more information on the distances without having a large figure:

[17]:
#sites = np.load("sites.npy",allow_pickle=True)[()]
mdciao.cli.compare(sites, distro=True,
                   n_cols=2,
                   mutations_dict={"V417": "K417",
                                   "K439": "N439"
                                  },
                   sharex=True,
                   ctc_cutoff_Ang=4,
                   colors="Set2");
/home/perezheg/Programs/mdciao/mdciao/cli/cli.py:2078: UserWarning: The figure layout has changed to tight
  myfig.tight_layout()
Created files
freq_comparison.pdf
freq_comparison.xlsx
../_images/notebooks_Covid-19-Spike-Protein-Example_38_2.png

We see the effects mentioned before about the disruption of salt-bridges by e.g. with D30, in the upper left panel. We also notice that the frequencies reported in the legend coincide with those initially reported in the bar plots at the very top of this page.

Note

When comparing these distributions obtained via the site-method, with the distributions reported further up, both as per-contact panels or in the one-panel-for-all violin-plots, you’ll notice that some distributions are missing. Look, for example, at the the K417-D30 violins: two of them are missing.

This is per-design: when computing those residue neighborhoods using mdciao.cli.residue_neighborhoods, we specified ctc_cutoff_Ang = 4. This means that only those residues considered neighbors at a cutoff of 4 Angstrom are kept in the results. In the K417V mutants, the salt bridge with D30 is totally disrupted and never contributes to the frequency (only very marginally), so it has been completely eliminated from the neighborhood.

site-methods, on the other hand, have a static list of target residue-pairs and report on them regardless of whether they contribute to the neighborhood or not. The ctc_cutoff_Ang is used only to inform on frequencies, but not to control what contacts are kept in the site.

Two last observations:

  • sites or neighborhoods are both ContactGroup-objects, and behave very similarly. A ContactGroup is mdciao’s class to handle groups of residue-residue distances.

  • If you suspect that a residue-residue pair is missing from your figures and/or tables, please take a look at our FAQ: Missing Contacts.

3D Visualization

In future releases, we will include a method in the ContactGroup to automatically highlight it’s residues in an nglviewer widget. For now, we simply expose here how that method would look like:

[18]:
def show_neighborhood3D(self,iwd,ctc_cutoff_Ang=None, top=None, representation="licorice"):
    if top is None:
        assert isinstance(self.top, md.Topology)
    if ctc_cutoff_Ang is None:
        freqs = np.ones(self.n_ctcs)
    else:
        freqs = self.frequency_per_contact(ctc_cutoff_Ang)

    if self.is_neighborhood:
        anchor_ats = [aa.index for aa in self.top.residue(self.shared_anchor_residue_index).atoms]
    atoms = []
    for fr, pair in zip(freqs, self.res_idxs_pairs):
        if fr >0:
            for rr in pair:
                atoms.extend([aa.index for aa in self.top.residue(rr).atoms])
    [atoms.remove(aa) for aa in anchor_ats]
    iwd.add_representation(representation, selection=atoms)
    iwd.add_representation("ball_and_stick", selection=anchor_ats)
    iwd.add_representation("spacefill", selection=anchor_ats)

iwd = nglview.show_mdtraj(md.load(os.path.join(datadir,"run3-clone0.h5"))[0])
show_neighborhood3D(pos417["WT"],
                    iwd,
                    #representation="spacefill"
                   )
iwd
/home/perezheg/miniconda3/lib/python3.8/site-packages/mdtraj/core/trajectory.py:441: UserWarning: top= kwargs ignored since this file parser does not support it
  warnings.warn("top= kwargs ignored since this file parser does not support it")
[ ]: