Use Consensus Labels as Multiple-Sequence-Alignment (MSA)

In this notebook, we will use mdciao and the consensus labels of the GPCRdb to structurally align four GPCR structures of four receptors with low sequence identity:

  • opsin, OPS in the rest of the notebook

  • beta2 adrenergic receptor, B2AR in the rest of the notebook

  • mu-opioid receptor, MUOR in the rest of the notebook

  • dopamine D1 receptor, DOP in the rest of the notebook

Whereas we use directly PDB structures, the point should be clear that the MSA works on any arbitary geometries and topologies that can be imported into the notebook, in particular user provided trajectories.

[1]:
import mdciao
import nglview
import matplotlib

Load PDB Structures into the Notebook

[2]:
pdbs = {"OPS"  : mdciao.cli.pdb("3CAP"),
        "B2AR" : mdciao.cli.pdb("3SN6"),
        "MUOR" : mdciao.cli.pdb("6DDF"),
        "DOP"  : mdciao.cli.pdb("7CKW")}
Checking https://files.rcsb.org/download/3CAP.pdb ...Please cite the following 3rd party publication:
 * Crystal structure of the ligand-free G-protein-coupled receptor opsin
   Park, J.H. et al., Nature 2008
   https://doi.org/10.1038/nature07063
Please cite the following 3rd party publication:b ...
 * Crystal structure of the beta2 adrenergic receptor-Gs protein complex
   Rasmussen, S.G. et al., Nature 2011
   https://doi.org/10.1038/nature10361
Checking https://files.rcsb.org/download/6DDF.pdb ...
/home/guille/miniconda3/lib/python3.12/site-packages/mdtraj/formats/pdb/pdbfile.py:208: UserWarning: Unlikely unit cell vectors detected in PDB file likely resulting from a dummy CRYST1 record. Discarding unit cell vectors.
  warnings.warn(
Please cite the following 3rd party publication:
 * Structure of the mu-opioid receptor-Giprotein complex.
   Koehl, A. et al., Nature 2018
   https://doi.org/10.1038/s41586-018-0219-7
Please cite the following 3rd party publication:b ...
 * Ligand recognition and allosteric regulation of DRD1-Gs signaling complexes.
   Xiao, P. et al., Cell 2021
   https://doi.org/10.1016/j.cell.2021.01.028

Load Consensus Labels from the GPCRdb into the Notebook

[3]:
maps = { "OPS": mdciao.nomenclature.LabelerGPCR("opsd_bovin"),
        "B2AR": mdciao.nomenclature.LabelerGPCR("adrb2_human"),
        "MUOR": mdciao.nomenclature.LabelerGPCR("oprm_mouse"),
        "DOP" : mdciao.nomenclature.LabelerGPCR("DRD1_HUMAN")
       }
No local file ./opsd_bovin.xlsx found, checking online in
done!://gpcrdb.org/services/residues/extended/opsd_bovin ...
Please cite the following reference to the GPCRdb:
 * Kooistra et al, (2021) GPCRdb in 2021: Integrating GPCR sequence, structure and function
   Nucleic Acids Research 49, D335--D343
   https://doi.org/10.1093/nar/gkaa1080
For more information, call mdciao.nomenclature.references()
No local file ./adrb2_human.xlsx found, checking online in
done!://gpcrdb.org/services/residues/extended/adrb2_human ...
Please cite the following reference to the GPCRdb:
 * Kooistra et al, (2021) GPCRdb in 2021: Integrating GPCR sequence, structure and function
   Nucleic Acids Research 49, D335--D343
   https://doi.org/10.1093/nar/gkaa1080
For more information, call mdciao.nomenclature.references()
No local file ./oprm_mouse.xlsx found, checking online in
done!://gpcrdb.org/services/residues/extended/oprm_mouse ...
Please cite the following reference to the GPCRdb:
 * Kooistra et al, (2021) GPCRdb in 2021: Integrating GPCR sequence, structure and function
   Nucleic Acids Research 49, D335--D343
   https://doi.org/10.1093/nar/gkaa1080
For more information, call mdciao.nomenclature.references()
No local file ./DRD1_HUMAN.xlsx found, checking online in
done!://gpcrdb.org/services/residues/extended/drd1_human ...
Please cite the following reference to the GPCRdb:
 * Kooistra et al, (2021) GPCRdb in 2021: Integrating GPCR sequence, structure and function
   Nucleic Acids Research 49, D335--D343
   https://doi.org/10.1093/nar/gkaa1080
For more information, call mdciao.nomenclature.references()

Use the Consensus Labels to Trim down the PDBs to the just Receptors

This can happen regardless of chain definitions and co-crystalized entities

[4]:
pdb_just_receptor = {}
for key, pdb  in pdbs.items():
    print(key)
    receptor_residue_idxs = mdciao.nomenclature.guess_by_nomenclature(maps[key],
                                                               pdb.top,
                                                               fragments="resSeq",
                                                               return_residue_idxs=True,
                                                               accept_guess=True,
                                                              return_str=False)
    pdb_just_receptor[key] = mdciao.fragments.fragment_slice(pdb, [receptor_residue_idxs])
    print()
OPS
Auto-detected fragments with method 'resSeq'
fragment      0 with    326 AAs     MET1 (   0) -   ASN326 (325 ) (0)
fragment      1 with    326 AAs     MET1 ( 326) -   ASN326 (651 ) (1)
fragment      2 with      4 AAs     NAG1 ( 652) -     BMA4 (655 ) (2)
fragment      3 with      6 AAs     NAG1 ( 656) -     BMA4 (661 ) (3) resSeq jumps
fragment      4 with      2 AAs     NAG1 ( 662) -     NAG2 (663 ) (4)
fragment      5 with      4 AAs   BGL801 ( 664) -   BGL804 (667 ) (5)
fragment      6 with      1 AAs   PLM901 ( 668) -   PLM901 (668 ) (6)
fragment      7 with      2 AAs   BGL805 ( 669) -   BGL806 (670 ) (7)
fragment      8 with      1 AAs   PLM902 ( 671) -   PLM902 (671 ) (8)
fragment      9 with     10 AAs   HOH902 ( 672) -   HOH907 (681 ) (9) resSeq jumps
The GPCR-labels align best with fragments: [0] (first-last: MET1-ASN326).

B2AR
Auto-detected fragments with method 'resSeq'
fragment      0 with     51 AAs     THR9 (   0) -    GLN59 (50  ) (0)
fragment      1 with    115 AAs    LYS88 (  51) -   VAL202 (165 ) (1)
fragment      2 with     51 AAs   SER205 ( 166) -   MET255 (216 ) (2)
fragment      3 with    132 AAs   THR263 ( 217) -   LEU394 (348 ) (3)
fragment      4 with    340 AAs     GLN1 ( 349) -   ASN340 (688 ) (4)
fragment      5 with     58 AAs     ASN5 ( 689) -    ARG62 (746 ) (5)
fragment      6 with    159 AAs  ASN1002 ( 747) -  ALA1160 (905 ) (6)
fragment      7 with    146 AAs    GLU30 ( 906) -   ARG175 (1051) (7)
fragment      8 with     61 AAs   GLN179 (1052) -   ARG239 (1112) (8)
fragment      9 with     77 AAs   CYS265 (1113) -   CYS341 (1189) (9)
fragment     10 with    128 AAs     GLN1 (1190) -   SER128 (1317) (10)
fragment     11 with      1 AAs  P0G1601 (1318) -  P0G1601 (1318) (11)
The GPCR-labels align best with fragments: [7, 8, 9] (first-last: GLU30-CYS341).

MUOR
Auto-detected fragments with method 'resSeq'
fragment      0 with     51 AAs     LEU5 (   0) -    ILE55 (50  ) (0)
fragment      1 with     52 AAs   THR182 (  51) -   VAL233 (102 ) (1)
fragment      2 with    114 AAs   ASN241 ( 103) -   PHE354 (216 ) (2)
fragment      3 with    336 AAs     ASP5 ( 217) -   ASN340 (552 ) (3)
fragment      4 with     53 AAs     ILE9 ( 553) -    PHE61 (605 ) (4)
fragment      5 with    281 AAs    MET65 ( 606) -   ARG345 (886 ) (5)
fragment      6 with      5 AAs     TYR1 ( 887) -     ETA5 (891 ) (6)
The GPCR-labels align best with fragments: [5] (first-last: MET65-ARG345).

DOP
Auto-detected fragments with method 'resSeq'
fragment      0 with     53 AAs    ASP11 (   0) -    LEU63 (52  ) (0)
fragment      1 with     51 AAs   THR205 (  53) -   MET255 (103 ) (1)
fragment      2 with    132 AAs   THR263 ( 104) -   LEU394 (235 ) (2)
fragment      3 with    340 AAs     GLY1 ( 236) -   ASN340 (575 ) (3)
fragment      4 with     58 AAs     ASN5 ( 576) -    ARG62 (633 ) (4)
fragment      5 with    128 AAs     GLN1 ( 634) -   SER128 (761 ) (5)
fragment      6 with    146 AAs    PHE20 ( 762) -   LYS165 (907 ) (6)
fragment      7 with     53 AAs   ASN185 ( 908) -   HIS237 (960 ) (7)
fragment      8 with     36 AAs   SER263 ( 961) -   CYS298 (996 ) (8)
fragment      9 with     38 AAs   CYS307 ( 997) -   LEU344 (1034) (9)
fragment     10 with      2 AAs   G3C501 (1035) -   CLR502 (1036) (10)
The GPCR-labels align best with fragments: [6, 7, 8, 9] (first-last: PHE20-LEU344).

Receptors are not 3D Aligned

This is a bit obvious because they come from different PDBs, but helps highlight the point

[5]:
colors = {"MUOR":"tab:red",
          "OPS":"tab:blue",
          "B2AR":"tab:green",
          "DOP": "tab:orange"}
iwd = nglview.NGLWidget()
for ii, (key, geom) in enumerate(pdb_just_receptor.items()):
    iwd.add_trajectory(geom)
    iwd.clear_representations(component=ii)
    iwd.add_cartoon(color=matplotlib.colors.to_hex(colors[key]), component=ii)
iwd

Use the Consensus Labels to generate an AlignerConsensus for MSA

[6]:
AC = mdciao.nomenclature.AlignerConsensus(maps,
                                          tops={key : geom.top for key, geom in pdb_just_receptor.items()})

Sequence Identity within Residues with Consensus Labels

[7]:
AC.sequence_match()
[7]:
OPS B2AR MUOR DOP
OPS 100% 22% 24% 21%
B2AR 22% 100% 28% 42%
MUOR 24% 28% 100% 28%
DOP 21% 42% 28% 100%

Pick a Reference Structure (e.g. Opsin) and 3D-align all Receptors on It

We do this using the CAidxs_match method of the AlignerConsensus that will generate pairs of indices matching one another via their consensus labels. For brevity, here we show the example in the “3.50…3.59” region of TM3, but for the 3D alignment, we take all consensus labels

[8]:
AC.CAidxs_match("3.5*", keys=["OPS","B2AR"])
[8]:
consensus OPS B2AR
115 3.50x50 1065 746
116 3.51x51 1076 757
117 3.52x52 1088 769
118 3.53x53 1095 780
119 3.54x54 1102 785
120 3.55x55 1109 793
121 3.56x56 1115 800
[9]:
ref_key = "OPS"
ref_geom = pdb_just_receptor[ref_key]
for key, geom in pdb_just_receptor.items():
     if key!=ref_key:
        ref_CAs, key_CAs = AC.CAidxs_match(keys=[ref_key, key])[[ref_key, key]].values.T
        geom.superpose(ref_geom, atom_indices=key_CAs, ref_atom_indices=ref_CAs)

Receptors are now 3D-aligned

[10]:
iwd = nglview.NGLWidget()
for ii, (key, geom) in enumerate(pdb_just_receptor.items()):
    iwd.add_trajectory(geom)
    iwd.clear_representations(component=ii)
    iwd.add_cartoon(color=matplotlib.colors.to_hex(colors[key]), component=ii)
iwd
[ ]: