Comparing Frequencies: Consensus Nomenclature

In this notebook, we exploit the GPCR consensus nomenclature to compute and compare contact frequencies across four GPCRs that have very little sequence identity.

Nevertheless, the consensus nomenclature will allow us to:

Use the same function calls for all systems, regardless of the underlying primary sequence
Compare the frequencies across systems by using consensus labels

The four systems we will be comparing are:
* Beta 2 adrenergic receptor in complex with Gs-protein.
Provided kindly by Dr. H. Batebi

Growth hormone secretagogue receptor type 1, ghrelin receptor for short.

Provided kindly by Dr. A. Vogel
Neuropeptide Y receptor type 1, Y1 receptor for short, in apo form.

Provided kindly by Dr. A. Vogel.
Active mu-opioid receptor bound to the agonist morphine.

Kindly made available for this purpose by the GPCRmd.

All these example datasets will be downloaded on the fly using mdciao.examples.fetch_example_data, please note the individual references for each dataset there.

Also note that mdciao ships with all references regarding the used nomenclature schemes, you can issue mdciao.nomenclature.references to print them out.

[1]:

import mdciao

Download Trajectory Data

Throughout the notebook, we will use the same aliases used by mdciao.examples.fetch_example_data to adress the different sytems/datasets, "b2ar@Gs", "ghrelin@ghsr" , "mor@muor", "y1_apo", but one could create an alias dictionary for nicer tagging of plots etc (see note at the bottom of the notebook).

[2]:

systems = ["b2ar@Gs",
           "ghrelin@ghsr" ,
           "mor@muor",
           "y1_apo"]
for system in systems:
    d = mdciao.examples.fetch_example_data(system,
                                           unzip=system,
                                           skip_on_existing=True)

Unzipping to 'b2ar@Gs'

Unzipping to 'ghrelin@ghsr'

Unzipping to 'mor@muor'

Unzipping to 'y1_apo'

Nomenclature Data

We will get the nomenclature data, i.e. the per-residue consensus labels mapped to the canonical primary sequence of the receptor, from the GPCRdb (in this case). The lookup happens via UniProt Entry Names and uses mdciao’s nomenclature clases.

These objects contain all nomenclature information, and map the consensus labels and fragments (“3.50”, or “TM3”, respectively) not only to the canonical sequence, but to tht of the topologies at hand, using class methods like top2labels and top2frags.

So, as a user, you need to know these UniProt Entry Names for each one of your systems.

[3]:

key2GPCR_UniProt = {"b2ar@Gs" :      "adrb2_human",
                    "ghrelin@ghsr" : "ghsr_human",
                    "mor@muor" :     "oprm_mouse",
                    "y1_apo" :       "npy1r_human"
                   }
GPCR = {key : mdciao.nomenclature.LabelerGPCR(val, scheme="BW",
                                              write_to_disk=True, local_path=key) for key, val in key2GPCR_UniProt.items()}

No local file b2ar@Gs/adrb2_human.xlsx found, checking online in
https://gpcrdb.org/services/residues/extended/adrb2_human ...done!
Please cite the following reference to the GPCRdb:
 * Kooistra et al, (2021) GPCRdb in 2021: Integrating GPCR sequence, structure and function
   Nucleic Acids Research 49, D335--D343
   https://doi.org/10.1093/nar/gkaa1080
For more information, call mdciao.nomenclature.references()
wrote b2ar@Gs/adrb2_human.xlsx for future use
No local file ghrelin@ghsr/ghsr_human.xlsx found, checking online in
https://gpcrdb.org/services/residues/extended/ghsr_human ...done!
Please cite the following reference to the GPCRdb:
 * Kooistra et al, (2021) GPCRdb in 2021: Integrating GPCR sequence, structure and function
   Nucleic Acids Research 49, D335--D343
   https://doi.org/10.1093/nar/gkaa1080
For more information, call mdciao.nomenclature.references()
wrote ghrelin@ghsr/ghsr_human.xlsx for future use
No local file mor@muor/oprm_mouse.xlsx found, checking online in
https://gpcrdb.org/services/residues/extended/oprm_mouse ...done!
Please cite the following reference to the GPCRdb:
 * Kooistra et al, (2021) GPCRdb in 2021: Integrating GPCR sequence, structure and function
   Nucleic Acids Research 49, D335--D343
   https://doi.org/10.1093/nar/gkaa1080
For more information, call mdciao.nomenclature.references()
wrote mor@muor/oprm_mouse.xlsx for future use
No local file y1_apo/npy1r_human.xlsx found, checking online in
https://gpcrdb.org/services/residues/extended/npy1r_human ...done!
Please cite the following reference to the GPCRdb:
 * Kooistra et al, (2021) GPCRdb in 2021: Integrating GPCR sequence, structure and function
   Nucleic Acids Research 49, D335--D343
   https://doi.org/10.1093/nar/gkaa1080
For more information, call mdciao.nomenclature.references()
wrote y1_apo/npy1r_human.xlsx for future use

For the "b2ar@Gs system, we also get the CGN labels, i.e. those for the G-protein.

[4]:

CGN = {"b2ar@Gs" : mdciao.nomenclature.LabelerCGN("gnas2_human", write_to_disk=True, local_path="b2ar@Gs")}

No local file b2ar@Gs/gnas2_human.xlsx found, checking online in
https://gpcrdb.org/services/residues/extended/gnas2_human ...done!
Please cite the following reference to the GPCRdb:
 * Kooistra et al, (2021) GPCRdb in 2021: Integrating GPCR sequence, structure and function
   Nucleic Acids Research 49, D335--D343
   https://doi.org/10.1093/nar/gkaa1080
Please cite the following reference to the CGN nomenclature:
 * Flock et al, (2015) Universal allosteric mechanism for G$\alpha$ activation by GPCRs
   Nature 2015 524:7564 524, 173--179
   https://doi.org/10.1038/nature14663
For more information, call mdciao.nomenclature.references()
wrote b2ar@Gs/gnas2_human.xlsx for future use

Note that in the above cells we’re storing the retrieved data as .xlsx-files in the individual directories.

Residue Neighborhood of 3.50

We start by computing the residue neighborhood of notorious residue 3.50 of TM3 in all systems, without even knowing what residue precisely is 3.50 in all systems.

Since we will store the results in the rn dictionary and compare them across systems later, we’re supressing all outputs in the cell below, but feel free to comment out the %%capture and the #figures=False statements

[5]:

%%capture
rn = {}
for system in systems:
    rn[system] = list(mdciao.cli.residue_neighborhoods("3.50", f"{system}/traj.xtc", topology=f"{system}/top.pdb",
                                                       output_dir=system, GPCR_UniProt=GPCR[system], accept_guess=True,
                                                       ctc_control=1.0,
                                                       no_disk=True,
                                                       figures=False,
                                                       CGN_UniProt=CGN.get(system,None),
                                                       fragments="chains").values())[0]

Compare Frequency Bars

We show all contact frequencies of all four systems together using mdciao.plots.compare_groups_of_contacts.

To show what the plot would look like without the consensus labels, initially we won’t make use of them:

[6]:

fig, ax ,plotted_freqs = mdciao.plots.compare_groups_of_contacts(rn, ctc_cutoff_Ang=4.5,
                                                                 figsize=(20,5),
                                                                 defrag=None);

These interactions are not shared:
C146@3.55-R141@3.50, D340@8.47-R165@3.50, E257@6.29-R138@3.50, I234@5.61-R138@3.50, I235@5.61-R141@3.50, I261@6.33-R138@3.50, I278@6.33-R165@3.50, I278@6.40-R131@3.50, I325@7.52-R131@3.50, I72@2.43-R131@3.50, K152@34.54-R141@3.50, K328@8.48-R141@3.50, K329@8.49-R141@3.50, L239@5.65-R141@3.50, L265@6.37-R138@3.50, L323@7.56-R138@3.50, L339@7.56-R165@3.50, L82@2.43-R141@3.50, M244@5.71-R138@3.50, M255@5.61-R165@3.50, M281@6.36-R165@3.50, M326@7.56-R141@3.50, N262@6.34-R138@3.50, R131@3.50-V222@5.61, R131@3.50-Y219@5.58, R131@3.50-Y326@7.53, R131@3.50-Y391@G.H5.23, R138@3.50-T258@6.30, R138@3.50-T75@2.39, R138@3.50-Y231@5.58, R138@3.50-Y253@6.25, R141@3.50-R150@34.52, R141@3.50-S327@8.47, R141@3.50-T76@2.37, R141@3.50-T78@2.39, R141@3.50-Y232@5.58, R141@3.50-Y323@7.53, R141@3.50-Y330@8.50, R165@3.50-R179@34.57, R165@3.50-T103@2.39, R165@3.50-Y106@2.42, R165@3.50-Y252@5.58
Their cumulative ctc freq is 18.11.

../_images/notebooks_09.Consensus_Labels_12_1.png

This plot is a mess, since all 3.50 residues are different residues in all systems: * R131 in b2ar@Gs * R141 in ghrelin@ghsr * R165 in mor@muor * R138 in y1_apo

Of course, the same goes for all other residues. This means there’s no shared contact pairs to be compared agains each other when using residue names.

However, if we specifiy AA_format="try_consensus", the method will try to use those labels (which are unified across systems) when possible:

[7]:

fig, ax ,plotted_freqs = mdciao.plots.compare_groups_of_contacts(rn, ctc_cutoff_Ang=4.5, defrag=None,
                                                                 AA_format="try_consensus",
                                                                 anchor="3.50",
                                                                 sort_by="consensus",
                                                                );

These interactions are not shared:
2.37, 2.39, 2.42, 2.43, 3.55, 34.52, 34.54, 34.57, 5.65, 5.71, 6.25, 6.29, 6.30, 6.33, 6.34, 6.36, 6.37, 6.40, 7.52, 7.53, 7.56, 8.47, 8.48, 8.49, 8.50, G.H5.23
Their cumulative ctc freq is 13.55.

../_images/notebooks_09.Consensus_Labels_14_1.png

Much nicer. We also make the plot even more compact by using: * anchor="3.50" this eliminates “3.50” from all labels and only uses the label of the other residue in the residue pair. * sort_by="consensus" sorts the contacts, insted of by descending frequency (like the first plot), by their consensus label.

Also note, in b2ar@Gs, one residue corresponds to the Gs-unit, the 23rd residue of the α5 helix.

Interface: TM3 vs all other TMs

We now compute whole interfaces between fragments following the same approach, i.e using consensus labels. In this case, we’re computing the contacts of the TM3 with all other elements of the system. We’re doing so by using the keyword arguments:

interface_selection_1="TM3",
interface_selection_2="*",

For the purposes of this notebook, we focus on the usage of consensus descriptors, but please read the full documentation in mdciao.cli.interface for how interfaces can be defined.

Also note that we’re supressing the output since we will be comparing (like above) the different systems later. Just comment out the %%capture if you want to see the output and the figures. ### Compute

[8]:

%%capture
intf = {}
for system in systems:
    intf[system] = mdciao.cli.interface(f"{system}/traj.xtc", topology=f"{system}/top.pdb", output_dir=system,
                                        GPCR_UniProt=GPCR[system],
                                        accept_guess=True,
                                        CGN_UniProt=CGN.get(system, None),
                                        ctc_control=1.0,
                                        interface_selection_1="TM3",
                                        interface_selection_2="*",
                                        no_disk=True,
                                        fragments="chains",
                                        plot_timedep=False,
                                        figures=True,
                                        self_interface=True,
                                        title=f"interface {system}",
                                        n_nearest=4
                                       )

Compare Frequency Bars

Using the same method as above, we compare now frequencies, but instead of resolving to each individual pair (there’s about 150 TM3 vs all contacts), we aggregate the frequencies to each residue using per_residue=True. This loses the per-pair information but makes the plot more compact to begin with.

[9]:

mdciao.plots.plots.compare_groups_of_contacts(intf, ctc_cutoff_Ang=4.5, defrag=None,
                                              per_residue=True,
                                              AA_format="try_consensus",
                                              figsize=None,
                                              sort_by="consensus",
                                              lower_cutoff_val=1,
                                              interface=True,
                                             );

These interactions are not shared:
3.23, 3.56
Their cumulative ctc freq is 2.61.
These interactions are not shared:
2.37, 2.38, 2.50, 2.57, 2.59, 2.61, 2.63, 2.64, 34.50, 34.52, 34.53, 34.55, 34.56, 34.57, 4.38, 4.46, 4.58, 4.60, 4.63, 45.52, 5.38, 5.39, 5.42, 5.43, 5.47, 5.51, 5.65, 5.70, 5.71, 5.72, 6.23, 6.25, 6.29, 6.30, 6.33, 6.34, 6.36, 6.37, 6.40, 6.41, 6.45, 6.48, 6.51, 6.52, 6.55, 7.35, 7.39, 7.42, 7.43, 7.45, 7.46, 7.49, 7.52, 7.53, 7.56, 8.47, C184@ECL2, C190@ECL2, E188@ECL2, G.H5.16, G.H5.19, G.H5.20, G.H5.23, G.H5.25, G1, G183@ECL2, K247@ICL3, M179@ECL2, M248@ICL3, MOR353, P0G395, P150@ICL2, R146@ICL2, R149@ICL2, R175@ECL2, R187@ECL2, S3, T207@ECL2, T208@ECL2, V178@ECL2, V184@ECL2, W148@ICL2, Y174@ECL2, Y185@ECL2
Their cumulative ctc freq is 151.03.

../_images/notebooks_09.Consensus_Labels_19_1.png

There’s a lot to unpack here, but we we can immediately see that e.g. 3.28 and 3.37 behave differently across systems. We’ll check them later, but now we pick a representation that tries to be compact without losing pair information.

Compare Flareplots

[10]:

from matplotlib import pyplot as plt
myfig, myax = plt.subplots(2,2, sharex=True, sharey=True, figsize=(70,70), tight_layout=True)
for (system, iintf) , iax in zip(intf.items(), myax.flatten()):
    consensus_maps=[GPCR[system]]
    iCGN = CGN.get(system, None)
    if iCGN is not None:
        consensus_maps.append(iCGN)
    iintf.plot_freqs_as_flareplot(4.5, fragments="chains",
                                  scheme="consensus_sparse",
                                  aura=iintf.frequency_sum_per_residue_idx_dict(4.5, return_array=True),
                                  panelsize=15,
                                  ax=iax,
                                  consensus_maps=consensus_maps)
    iax.set_title(f"{system}: TM3 interface at 4.5 Å", fontsize=60)
print()
myfig.tight_layout(h_pad=5)

Auto-detected fragments with method 'chains'
fragment      0 with    354 AAs     LEU4 (   0) -   LEU394 (353 ) (0) resSeq jumps
fragment      1 with    340 AAs     GLN1 ( 354) -   ASN340 (693 ) (1)
fragment      2 with     66 AAs     ALA2 ( 694) -    PHE67 (759 ) (2)
fragment      3 with    207 AAs    GLU30 ( 760) -   ARG239 (966 ) (3) resSeq jumps
fragment      4 with     76 AAs   CYS265 ( 967) -   LEU340 (1042) (4)
fragment      5 with      1 AAs   P0G395 (1043) -   P0G395 (1043) (5)
Auto-detected fragments with method 'chains'
fragment      0 with     15 AAs     GLY1 (   0) -    ARG15 (14  ) (0)
fragment      1 with    308 AAs    ASP32 (  15) -   GLY339 (322 ) (1)
Auto-detected fragments with method 'chains'
fragment      0 with    289 AAs    SER64 (   0) -   ILE352 (288 ) (0)
fragment      1 with      1 AAs   MOR353 ( 289) -   MOR353 (289 ) (1)
Auto-detected fragments with method 'chains'
fragment      0 with    321 AAs    PHE18 (   0) -   ASP339 (320 ) (0) resSeq jumps

../_images/notebooks_09.Consensus_Labels_22_1.png

This representation tries to capture the system’s topology, and the individual pairs as well as collective behaviours. We have an `entire notebook <>`__ devoted on how these plots work and how one can fine-tune them. One of the easiest things to spot is the difference in TM3 vs TM6 contacts.

Coarse-Graining to Consensus Fragments

We can also coarse-grain the frequencies to the fragments, showing a bit the structure and scaffolding role of TM3 in the TM-bundle. All this, without having to directly define the fragments for each system:

[11]:

import pandas
CG_freqs = {}
for (system, iintf) , iax in zip(intf.items(), myax.flatten()):
    consensus_maps=[GPCR[system]]
    iCGN = CGN.get(system, None)
    if iCGN is not None:
        consensus_maps.append(iCGN)
    CG_freqs[system] = iintf.frequency_as_contact_matrix_CG (4.5, fragments=mdciao.fragments.get_fragments(iintf.top, "chains"),
                                                             sparse=True,
                                  consensus_labelers=consensus_maps)["TM3"]
df = pandas.DataFrame(CG_freqs).T
df = df[[key for key in mdciao.nomenclature.nomenclature._GPCR_fragments+mdciao.nomenclature.nomenclature._CGN_fragments if key in df.keys()]]
df.round(1).fillna("")

Auto-detected fragments with method 'chains'
fragment      0 with    354 AAs     LEU4 (   0) -   LEU394 (353 ) (0) resSeq jumps
fragment      1 with    340 AAs     GLN1 ( 354) -   ASN340 (693 ) (1)
fragment      2 with     66 AAs     ALA2 ( 694) -    PHE67 (759 ) (2)
fragment      3 with    207 AAs    GLU30 ( 760) -   ARG239 (966 ) (3) resSeq jumps
fragment      4 with     76 AAs   CYS265 ( 967) -   LEU340 (1042) (4)
fragment      5 with      1 AAs   P0G395 (1043) -   P0G395 (1043) (5)
Auto-detected fragments with method 'chains'
fragment      0 with     15 AAs     GLY1 (   0) -    ARG15 (14  ) (0)
fragment      1 with    308 AAs    ASP32 (  15) -   GLY339 (322 ) (1)
Auto-detected fragments with method 'chains'
fragment      0 with    289 AAs    SER64 (   0) -   ILE352 (288 ) (0)
fragment      1 with      1 AAs   MOR353 ( 289) -   MOR353 (289 ) (1)
Auto-detected fragments with method 'chains'
fragment      0 with    321 AAs    PHE18 (   0) -   ASP339 (320 ) (0) resSeq jumps

[11]:

	TM2	ECL1	TM3	ICL2	TM4	ECL2	TM5	ICL3	TM6	TM7	H8	G.H5
b2ar@Gs	27.2	3.0	0.2	7.6	23.3	12.1	23.0		4.1	8.9		8.3
ghrelin@ghsr	27.3	2.9	0.2	1.3	27.6	8.1	25.7		2.1	5.0	0.4
mor@muor	21.0	2.8	0.0	8.4	19.6	10.4	25.0		12.0	1.8
y1_apo	28.3	1.3	0.1	2.5	26.4	5.7	27.7	0.6	13.8	0.5

The table above summarized TM3’s behavior, contact-wise, with the other elements. Some things are immediately apparent, like mor@muor having less contacts with TM2.

Residue Neighborhood: Select 3.37 via consensus labels without recomputing anything

We now go back to the residue-level analysis.

We coould re-compute 3.37’s neighbordhood the same way we did 3.50’s initially. However, the individual contacts have already been computed when computing TM3’s interface, s.t. we can simply filter the interface for any contacts containing 3.37 (or any other residue) using ContactGroup.select_by_residues

[12]:

rn337 = {system : iintf.select_by_residues("3.37") for system, iintf in intf.items()}

Compare Frequency Bars

[13]:

mdciao.plots.compare_groups_of_contacts(rn337, ctc_cutoff_Ang=4.5, AA_format="try_consensus", anchor="3.37", sort_by="consensus");

These interactions are not shared:
4.49, 4.52, 4.57, 4.60, 5.42, 5.43, 5.45, P0G395
Their cumulative ctc freq is 7.35.

../_images/notebooks_09.Consensus_Labels_31_1.png

As we saw in the interface plot, 3.37 in ghrelin@ghsr has many more contacts than the other systems. Now we see why: that the Ghrelin-receptor’s 3.37 residue is interacting with TM4 and TM5 more than the rest.

Compare Contact-Distances

We will inspect this further, first using violin plots in the next cell:

[14]:

fig, ax, plotted_labels = mdciao.plots.compare_violins(rn337,
                                                       ctc_cutoff_Ang=4.5, defrag=None,
                                                       AA_format="try_consensus",
                                                       sort_by="consensus",
                                                       anchor="3.37",
                                                       figsize=None);

../_images/notebooks_09.Consensus_Labels_34_0.png

Turns out, 4.57, 4.60, 5.42, and 5.43 weren’t even included other 3.37 neighborhoods (mor@muor, b2ar@Gs, y1_apo), because they never came in contact with 3.37 (or almost never, according to min_freq=0.1, the default value of mdciao.cli.interface).

Still, we can use mdciao’s mdciao.cli.sites to provide an explict list of pairs of residues we want to look at, regardless of them forming a contact or not.

Sites

You guessed right, we can specifiy them simply using the consensus labels. We stored them in the cell above in the plotted_labels variable, so it’s very easy to pass them on to the site definitions:

[15]:

site_337_def = {"name": "selected 3.77 distances",
                "pairs" : {"consensus" : [f"3.37-{label}" for label in plotted_labels]}}
site_337_def

[15]:

{'name': 'selected 3.77 distances',
 'pairs': {'consensus': ['3.37-4.49',
   '3.37-4.52',
   '3.37-4.53',
   '3.37-4.56',
   '3.37-4.57',
   '3.37-4.60',
   '3.37-5.42',
   '3.37-5.43',
   '3.37-5.45',
   '3.37-5.46',
   '3.37-P0G395']}}

Again, feel free to comment in %%capture to see all outputs, but we will be comparing the distances two cells below anyways.

[16]:

%%capture
site_337 = {}
for system in systems:
    site_337[system] = mdciao.cli.sites(site_337_def,
                                        f"{system}/traj.xtc", topology=f"{system}/top.pdb", output_dir=system,
                                        GPCR_UniProt=GPCR[system], accept_guess=True,
                                        no_disk=True,allow_partial_sites=True,
                                        CGN_UniProt=CGN.get(system, None),figures=False,
                                        fragments="chains")["selected 3.77 distances"]

[17]:

fig, ax, plotted_labels, repframes = mdciao.plots.compare_violins(site_337, defrag=None,
                                                                  AA_format="try_consensus",
                                                                  anchor="3.37",
                                                                  sort_by="consensus",
                                                                  representatives=True,
                                                                  ctc_cutoff_Ang=4.5,
                                       );

Returning frame 149 of traj nr. 0: b2ar@Gs/traj.xtc
Returning frame 302 of traj nr. 0: ghrelin@ghsr/traj.xtc
Returning frame 35 of traj nr. 0: mor@muor/traj.xtc
Returning frame 331 of traj nr. 0: y1_apo/traj.xtc

../_images/notebooks_09.Consensus_Labels_39_1.png

We see that the distances for4.60, and 5.43 for mor@muor, b2ar@Gs, y1_apo virtually never cross the 4.5 Å cutoff.

Also note that by using representatives=True we have triggered some things in the violin plots. For each system, we have tried to locate a frame in the trajectory in which the shown residue-residue distances adopt values close to the most likely value (=where the violin is widest). You can read about these representative frames here: ContactGroup.repframes.

These frames are represented as dots inside the individual violins and also as returned geometries in form of mdtraj trajectories (stored in the repframes returned value) which we can visualize in 3D.

Visualizing Representative Frames

The first thing we note is that geometries won’t be aligned, because they’re all coming from different simulations. Also, b2ar@Gs has the whole G-protein as well.

[18]:

import nglview
from matplotlib import colors as mplcolors
colors = mdciao.plots.color_dict_guesser("tab10", repframes.keys())
iwd = nglview.NGLWidget()
for ii, (system, geom) in enumerate(repframes.items()):
    iwd.add_trajectory(geom)#, [[gpcr_idxs]]), title="test")
    iwd.clear_representations(component=ii)
    iwd.add_cartoon(component=ii, color=mplcolors.to_hex(colors[system]), name=system, radius=.1)
iwd

To produce a high quality alignment of the receptor structures, even with low primary-sequence identity, we can arrive at a multiple-sequence-alignment (MSA) via the consensus labels, which act as a proxy for sequence identity. For this, we use mdciao’s AlignerConsensus class. There’s a whole notebook about them here.

[19]:

AC = mdciao.nomenclature.AlignerConsensus(GPCR, tops={key : val.top for key, val in repframes.items()})

Fragments derived from 'resSeq':
fragment      0 with    320 AAs    PHE18           (   0) -   PHE337           (319 ) (0)
fragment      1 with      1 AAs   ASP339           ( 320) -   ASP339           (320 ) (1)
* Alignment  0
There are clashes between the above 'resSeq' definitions and the consensus fragments' definitions:
fragment C-term with      3 AAs   ASN336           ( 318) -   ASP339           (320 ) (C-term) resSeq jumps
clashes/spreads across these 'resSeq' fragments:
fragment      0 with    320 AAs    PHE18           (   0) -   PHE337           (319 ) (0)
fragment      1 with      1 AAs   ASP339           ( 320) -   ASP339           (320 ) (1)
but all the residues of `top` that have been assigned to C-term using this alignment are contained
(= same name, same index) in the reference C-term, meaning, there's probably just a hole in C-term
in your `top`. Rerun with `verbose=True` to show that part of this alignment here.
=> C-term is hence considered compatible with alignment '0'.

Now that we have the AlignerConsensus object, we can use the sequence_match method to get an overview of how different the primary sequences are:

[20]:

AC.sequence_match()

[20]:

	b2ar@Gs	ghrelin@ghsr	mor@muor	y1_apo
b2ar@Gs	100%	23%	26%	24%
ghrelin@ghsr	23%	100%	26%	24%
mor@muor	26%	26%	100%	26%
y1_apo	24%	24%	26%	100%

We can do a lot of things with the AC object, e.g. this is the consensus-MSA for TM3

[21]:

AC.AAresSeq_match("3.*")

[21]:

	consensus	b2ar@Gs	ghrelin@ghsr	mor@muor	y1_apo
73	3.21	G102	G112	G136	G109
74	3.22	N103	D113	N137	E110
75	3.23	F104	L114	I138	A111
76	3.24	W105	L115	L139	M112
77	3.25	C106	C116	C140	C113
78	3.26	E107	K117	K141	K114
79	3.27	F108	L118	I142	L115
80	3.28	W109	F119	V143	N116
81	3.29	T110	Q120	I144	P117
82	3.30	S111	F121	S145	F118
83	3.31	I112	V122	I146	V119
84	3.32	D113	S123	D147	Q120
85	3.33	V114	E124	Y148	C121
86	3.34	L115	S125	Y149	V122
87	3.35	C116	C126	N150	S123
88	3.36	V117	T127	M151	I124
89	3.37	T118	Y128	F152	T125
90	3.38	A119	A129	T153	V126
91	3.39	S120	T130	S154	S127
92	3.40	I121	V131	I155	I128
93	3.41	E122	L132	F156	F129
94	3.42	T123	T133	T157	S130
95	3.43	L124	I134	L158	L131
96	3.44	C125	T135	C159	V132
97	3.45	V126	A136	T160	L133
98	3.46	I127	L137	M161	I134
99	3.47	A128	S138	S162	A135
100	3.48	V129	V139	V163	V136
101	3.49	D130	E140	D164	E137
102	3.50	R131	R141	R165	R138
103	3.51	Y132	Y142	Y166	H139
104	3.52	F133	F143	I167	Q140
105	3.53	A134	A144	A168	L141
106	3.54	I135	I145	V169	I142
107	3.55	T136	C146	C170	I143
108	3.56	S137	F147	H171	N144

Superpose Geometries

For the alignment we use the selection “?.*,-6.*,8.*” which selects all TMs, except TM6 and H8:

[22]:

CA_idxs = AC.CAidxs_match("?.*,-6.*,8.*")

[23]:

ref_key = "b2ar@Gs"
for ii, (system, geom) in enumerate(repframes.items()):
    if system!=ref_key:
        geom.superpose(repframes[ref_key], atom_indices=CA_idxs[system], ref_atom_indices=CA_idxs[ref_key])

Show Geometries

[24]:

iwd = nglview.NGLWidget()
for ii, (system, geom) in enumerate(repframes.items()):
    iwd.add_trajectory(geom)
    iwd.clear_representations(component=ii)
    iwd.add_cartoon(component=ii, color=mplcolors.to_hex(colors[system]), name=system, radius=.1)
iwd

Show 3.37’s neighborhood

Finally, we fine-tune the 3D representation to include the most varying contacts of 3.37: 4.60, 5.42, and 5.43, which we show first as a table across the four systems:

[25]:

show=["3.37","4.60", "5.42", "5.43"]
AC.AAresSeq_match(",".join(show))

[25]:

	consensus	b2ar@Gs	ghrelin@ghsr	mor@muor	y1_apo
89	3.37	T118	Y128	F152	T125
139	4.60	P168	I178	V202	F173
159	5.42	S203	V216	V236	L215
160	5.43	S204	S217	F237	L216

[26]:

iwd = nglview.NGLWidget()
ref_key = "b2ar@Gs"
for ii, (system, geom) in enumerate(repframes.items()):
    iwd.add_trajectory(geom, title=system)
    iwd.clear_representations(component=ii)
    iwd.add_cartoon(component=ii, color=mplcolors.to_hex(colors[system]), name=system, radius=.1)
    AAs = " ".join([GPCR[system].conlab2AA[jj][1:] for jj in show])
    iwd.add_licorice(component=ii, selection=f"({AAs}) and not Hydrogen", radius=.5, color=mplcolors.to_hex(colors[system]))
iwd.gui_style = "ngl"
iwd

From the 3D plot above we can make some observations, most clearly we note that ghrelin@ghsr has the bulikest residue, Y128@3.50, which is pointing straight down to a region where TM5 appears to have a bulge, precisely towards the 3.50 position, something the other TM5s don’t have.

Final Remarks

Some final observations

The point of this notebook isn’t to arrive at a particular finding but rather to showcase the utility of streamilining the contact-analysis across diffeent systems using consensus nomenclature.
We have kept the system names as they are downloaded with mdciao.examples.fetch_example_data, because they all follow the convention of having a traj.xtc and top.pdb files, but you can map any topology and trajectory files using aliases and dictionaries:

```python alias = {”b2ar@Gs” : “adrb2”, “ghrelin@ghsr” : “ghsr”, “mor@muor” : “muor”, “y1_apo” : “y1” } #these are just examples of possible other topology filenames. top = {“adrb2” : “adrb2/prot1.pdb”,
“ghsr” : “ghrelin/system.pdb”, “muor” : “muor/muor.pdb”, “y1” : “y1/top.pdb” } #these are just examples of possible other trajectory filenames. trajs = {“adrb2” : “adrb2/traj.xtc”,
“ghsr” : “ghrelin/traj1.xtc”, “ghrelin/traj2.xtc”, “muor” : “muor/run*.xtc”, “y1” : “y1/run1.xtc” }

```

Althouth the trajectories we have been using are similar in number of frames, they are wildly different in simulated physical length, s.o there isn’t really much physical or biological sense in comparing them other than for this demo:

* b2ar@Gs:      280 frames, dt = 10ps, 2.8ns  in total
* ghrelin@ghsr: 411 frames, dt = 10ns, 41μs   in total
* mor@muor:     400 frames, dt = 10ns, 40μs   in total
* y1_apo:       528 frames, dt = 50ns, 26.4ns in total

[ ]: