mdciao.cli.interface

mdciao.cli.interface(trajectories, topology=None, fragments='lig_resSeq+', interface_selection_1=None, interface_selection_2=None, AA_selection=None, GPCR_UniProt='None', CGN_UniProt='None', KLIFS_string=None, chunksize_in_frames=2000, ctc_cutoff_Ang=4.5, curve_color='auto', fragment_names=None, graphic_dpi=150, graphic_ext='.pdf', background=True, ctc_control=50, n_smooth_hw=0, pbc=True, output_desc='interface', output_dir='.', short_AA_names=False, stride=1, t_unit='ns', plot_timedep=True, accept_guess=False, n_jobs=1, n_nearest=0, sort_by_av_ctcs=True, scheme='closest-heavy', separate_N_ctcs=False, table_ext='dat', title=None, min_freq=0.1, contact_matrix=True, cmap='binary', flareplot=True, save_nomenclature_files=False, no_disk=False, savefigs=True, savetabs=True, savetrajs=False, figures=True, self_interface=False, n_repframes=1, progressbar=True)

Contact-frequencies between two groups of residues

The two groups of residues can be defined directly:

  • by using specific residue indices or ranges

  • by using defined molecular fragments, e.g. chains defined in the topology or pdb-file.

  • by guessing molecular fragments, using some fragmentation heuristic.

  • by guessing molecular fragments, using a consensus nomenclature like GPCR, CGN or KLIFS generic residue numbering.

The fragment definition and the fragment selection are separate, i.e. there might be six chains but one can specify to compute the interface between chains [0,1] vs [2,3]. Read more in the documentation for fragments, interface_selection_1, and interface_selection_2.

One can further refine the fragment selection at the level of single aminoacids (AAs) using AA_selection. This can fine-tune the residues of interest if the fragment definitions are too broad. See the docstring for more info.

Typically, the two groups of residues conforming both sides of the interface, also called interface members, do not share common residues, because the members belong to different molecular units. For example, in a receptor–G-protein complex, one partner is the receptor and the other partner is the G-protein.

Note

If your definitions of interface_selection_1 and interface_selection_2 lead to some overlap between the interface members (see below), mdciao’s default is to ignore contact pairs within the same fragment. E.g., in the context of a GPCR, computing “TM3” vs “TM*” (“TM3” vs “all TMs”) won’t include TM3-TM3 contacts by default. To include these (or equivalent) contacts set self_interface = True.

Another example could be computing the interface of the C-terminus of a receptor with the entire receptor, where it might be useful to include the contacts of the C-terminus with itself.

When using self_interface = True, it’s advisable to increase n_nearest, since otherwise neighboring residues of the shared set (the TM3-TM3 or the Cterm-Cterm) will always appear as formed.

See the documentation on fragments, interface_selection_1, interface_selection_2, AA_selection, n_nearest and self_interface.

Finally, the interface strength, defined as the per-residue sum of contacts participating in the interface, is written as the bfactor in a .pdb file called (for the default ctc_cutoff_Ang`=4) ‘interface.overall@4.0_Ang.as_bfactors.pdb’. You can see an example of how to use this file (e.g. with VMD) in the online documentation. The structures, i.e. frames, in that .pdb-file are chosen using the method :obj:`mdciao.contacts.ContactGroup.repframes . See below the parameter n_repframes for more info.

Parameters:
  • trajectories (str, mdtraj.Trajectory or lists thereof) – The MD-trajectories to calculate the frequencies from. This input is pretty flexible. For more info check mdciao.utils.str_and_dict.get_trajectories_from_input. Accepted values are:

  • topology (str or Trajectory, default is None) – The topology associated with the trajectories If None, the topology of the first trajectory will be used, i.e. when no topology is passed, the first trajectory has to be either a .gro or .pdb file, or an Trajectory object

  • fragments (str, list, None, default is “lig_resSeq+”) –

    How to fragment the topology. Will be used for:

    • tagging of residues, e.g. “GLU30@frag1

    • disambiguation of residues, e.g. more than one “GLU30” exists.

    • grouping of residues in graphical representations, e.g. flareplots

    • defining the interface fragments

    There exist several input modes:

    • A single string with the name of a fragmentation heuristic, e.g. “lig_resSeq+”, which is the default and usually yields good results. See mdciao.fragments.get_fragments for more info on defaults and other heuristics.

    • A list of definitions. Each entry of this list can be:

    • an iterable of integers (lists or np.arrays, e.g. np.arange(20,30)

    • a range expressed as an integer string, “20-30”

    • a ranges expressed as residue descriptors “GLU30-LEU40”

    • A special string, “consensus”, to use consensus

    subdomains, like “TM1” or “G.H5”, as fragment definitions.

    Numeric expressions are interpreted as zero-indexed, unique residue serial indices, i.e. 30-40 does not necessarily equate “GLU30-LEU40” unless serial and sequence index coincide. If there’s more than one “GLU30”, the user gets asked to disambiguate.

    Please note, since fragment definiton and fragment selection are separate, one can use consensus definitions to define the interface regardless of having passed “consensus” here. I.e., you can use fragments=’chains’ to divide the topology for representation and residue-tagging purposes but then define the interface as:

    >>> interface_selection_1="TM3"
    >>> interface_selection_2="TM2"
    

    to compute the interface of TM3 vs TM2 in a GPCR. For this mode of selection to work, the only condition is that the consensus labels have been provided via GPCR_Uniprot, CGN_UniProt or KLIFS_string (see below).

  • interface_selection_1 (str or list, default is None) – Selection of the fragments that belong to one side of the interface. Strings can be CSVs and include:

    • ranges, e.g. ‘1,3-4’

    • wildcards, e.g. “TM*” or “G.H.??”

    • exclusions, e.g. “TM*,-TM6” (all TMs except TM6)

    The default (None) is to prompt the user for information, except when:

    • fragments yielded only one fragment that doesn’t cover the whole topology. Then all othe residues are put into a second fragment and then the interface is computed between these two fragments.

    • fragments yielded just two fragments. Then the interface is computed between these two fragments.

  • interface_selection_2 (str or list, default is None) – Selection of the fragments that belong to the other side of the interface. Strings can be CSVs and include:

    • ranges, e.g. ‘1,3-4’

    • wildcards, e.g. “TM*” or “G.H.??”

    • exclusions, e.g. “TM*,-TM6” (all TMs except TM6)

    The default (None) is to prompt the user for information, except when:

    • fragments yielded only one fragment that doesn’t cover the whole topology. Then all othe residues are put into a second fragment and then the interface is computed between these two fragments.

    • fragments yielded just two fragments. Then the interface is computed between these two fragments.

  • AA_selection (str or list, default is None) – Whatever the fragment definition and fragment selection has been, one can further refine the list of potential residue pairs by making a selection at the level of single aminoacids (AAs). E.g., if (like above) one has selected the interface to be “TM3” vs “TM2”,

    >>> interface_selection_1="TM3"
    >>> interface_selection_2="TM2"
    

    but wants to select only some regions of those helices, one can pass here an AA_selection. This can be a string or a list of two items:

    • A string leads to a boolean “or” selection, i.e. keep residue pair [ii,jj] if either ii or jj match AA_selection. E.g.

      >>> AA_selection = "3.45-3.55"
      

      is equivalent of “3.45-3.55” vs “TM2” contacts

    • A list of with two items (each a string expression) leads to a boolean “and” selection, i.e. keep residue pair [ii,jj] if ii and jj match AA_selection. E.g.

      >>> AA_selection = ["3.45-3.55","2.45-2.55"]
      

      is equivalent of “3.45-3.55” vs “2.45-2.55” contacts.

    The strings for the selection are interpreted by rangeexpand_residues2residxs, so read there for more info on what expressions are allowed, like mixed descriptors and wildcards, eg: “GLU*,ARG*,GDP*,LEU394,GLU30-ARG50”. are valid.

    Finally, CSVs are interpreted as boolean “or”, i.e.:

    >>> AA_selection = "GLU30,TRP50"
    

    will select pairs that contain GLU30 or TRP50. If you are sure about your residue pair selection, i.e. you have a very specific list of residue-pairs you want to compute, use mdciao.cli.sites.

  • GPCR_UniProt (str or mdciao.nomenclature.LabelerGPCR, default is None) – For GPCR nomenclature. If str, e.g. “adrb2_human”. will try to locate a local filename or do a web lookup in the GPCRdb. If mdciao.nomenclature.LabelerGPCR, use this object directly (allows for object re-use when in API mode). See mdciao.nomenclature for more info and references. Please note the difference between UniProt Accession Code and UniProt entry name as explained here .

  • CGN_UniProt (str or mdciao.nomenclature.LabelerCGN, default is None) – For CGN (G-alpha Numbering definitions) nomenclature. If str, e.g. “gnas2_human”, try to locate local filenames “gnas2_human.xlsx” or do web lookups in the GPCRdb. If mdciao.nomenclature.LabelerCGN, use this object directly (allows for object re-use when in API mode) See mdciao.nomenclature for more info and references.

  • KLIFS_string (str or mdciao.nomenclature.LabelerKLIFS, default is None) – String for kinase KLIFS nomenclature. First, try to locate a local file that directly has the KLIFS_string as a name. If that fails, then combine the filename-format expected by mdciao.nomenclature.LabelerKLIFS with KLIFS_string to construct a filename and check again. If that doesn’t work, then go online to contact the KLIFS database.

    For the online lookup in the KLIFS database, the string has to be formatted “key:value”, which ultimately leads to a given KLIFS entry. Acceptable keys and values for KLIFS_string are:

    • “UniProtAC”, e.g. “UniProtAC:P31751”

    • “kinase_ID”, e.g. “kinase_ID:2”

    • “structure_ID”, e.g. “structure_ID:1904”, e.g. “P31751”,

    Please check the documentation on mdciao.nomenclature.LabelerKLIFS for a more elaborate explanation on when to pick one of these key:value pairs.

    Finally, if KLIFS_string is an mdciao.nomenclature.LabelerKLIFS, use this object directly (allows for object re-use when in API mode). See mdciao.nomenclature for more info and references. Alos, please note the difference between UniProt Accession Code and UniProt entry name as explained here .

  • chunksize_in_frames (int, default is 2000) – Stream through the trajectories in chunks of this size.

  • ctc_cutoff_Ang (float, default is 4.5) – Any residue-residue distance is considered a contact if d<=ctc_cutoff_Ang

  • curve_color (str, default is ‘auto’) – Type of color used for the curves. Alternatives are “P” or “H”

  • fragment_names (string, list of strings, or None. Default is None.) – Default is not to use fragment names. Otherwise, you can pass a string of comma-separated values or a list of fragment names. You have to provide as many names as there are fragments. The special string “auto” names fragments “frag0”, “frag1” up to the number of fragments. mdciao will use mdciao.utils.str_and_dict.replace4latex to try to generate LaTeX expressions from stuff like “Galpha”.

  • graphic_dpi (int, default is 150) – Dots per Inch (DPI) of the graphic output. Only has an effect for bitmap outputs.

  • graphic_ext (str, default is ‘.pdf’) – The extension (=format) of the saved figures

  • background (bool, or color-like, (str, hex, rgb), default is True) – When smoothing, the original curve can appear in the background in different colors * True: use a fainted version of color * False: don’t plot any background * color-like: use this color for the background,

    can be: str, hex, rgba, anything matplotlib.pyplot.colors understands

  • ctc_control (int, default is 50) – Control the number of reported contacts. Can be an integer (keep the first n contacts) or a float representing a fraction [0,1] of the total number of contacts. Default is 50.

  • n_smooth_hw (int, default is 0) – Plots of the time-traces will be smoothed using a window of 2*n_smooth_hw

  • pbc (bool, default is True) – Use periodic boundary conditions, i.e. the minimum image convention, to compute distances.

  • output_desc (str, default is ‘interface’) – Descriptor for output files.

  • output_dir (str, default is ‘.’) – Directory to which the results are written.

  • short_AA_names (bool, default is False) – Use one-letter aminoacid names when possible, e.g. K145 instead of Lys145.

  • stride (int, default is 1) – Stride the input data by this number of frames

  • t_unit (str, default is ‘ns’) – Unit used for the temporal axis.

  • plot_timedep (bool, default is True) – Plot and save time-traces of the contacts

  • accept_guess (bool, default is False) – Accept mdciao’s guesses regarding fragment identification using nomenclature labels

  • n_jobs (int, default is 1) – Number of processors to use. The parallelization is done over trajectories and not over contacts, beyond n_jobs>n_trajs parallelization will not have any effect.

  • n_nearest (int, default is 0) – Exclude these many bonded neighbors for each residue. Usually, the chosen molecular fragments belong to different chains and don’t share any bonds, so this parameter has no effect. However, if you choose to compare molecular fragments that are bonded (e.g. the C-terminus with the rest of the molecule), there’s one pair that’ll be bonded across the fragment-boundary, yielding one contact that’s always formed. Setting n_nearest to 1 will delete this contact.

  • sort_by_av_ctcs (bool, default is True) – When presenting the results summarized by residue, sort by sum of frequencies (~average number of contacts). Default is True.

  • scheme (str, default is ‘closest-heavy’) – Type of scheme for computing distance between residues. Choices are {‘ca’, ‘closest’, ‘closest- heavy’, ‘sidechain’, ‘sidechain-heavy’}. See mdtraj.compute_distances documentation for more info

  • separate_N_ctcs (bool, default is False) – Separate the plot with the total number contacts from the time-trace plot.

  • table_ext (str, default is “dat”) – The extension (=format) of the saved tables

  • title (NoneType, default is None) – Name of the system. Used for figure titles (not filenames) Defaults to output_desc if None is given

  • min_freq (float, default is 0.1) – Do not show frequencies smaller than this. If you notice the output being truncated at values too far above this value, you need to increase the ctc_control parameter.

  • contact_matrix (bool, default is True) – Produce a plot of the interface contact matrix

  • cmap (str, default is ‘binary’) – The colormap for the contact matrix. Default is ‘binary’ which is black and white, but you can choose anything from here: https://matplotlib.org/3.1.0/tutorials/colors/colormaps.html

  • flareplot (bool, default is True) – Produce a flare plot of interface the contact matrix. Regardless of the graphic_ext, the flareplot will always be in .pdf-format, unless graphic_ext is ‘svg’.

  • save_nomenclature_files (bool, default is False) – Save available nomenclature definitions to disk so that they can be accessed locally in later uses.

  • no_disk (bool, default is False) – If True, don’t save any files at all: figs, tables, trajs, nomenclature

  • savefigs (bool, default is True) – Save the figures

  • savetabs (bool, default is True) – Save the frequency tables

  • savetrajs (bool, default is False) – Save the timetraces

  • figures (bool, default is True) – Draw figures

  • self_interface (bool, default is False) – Allow the interface members to share residues

  • n_repframes (int, default is 1) – The number of representative frames to write to the .pdb file ‘interface.overall@4.0_Ang.as_bfactors.pdb’, where the interface strength has been stored as bfactor. A “representative frame” means, in this context, a frame where the pairs of residues participating in the interface are, on average over all pairs, at a distance close to the most-likely the residue-residue distances over all data. This has some caveats, expressed in the documentation of mdciao.contacts.ContactGroup.repframes. To check what frames have been chosen as representative, it is better to run mdciao in API mode and call mdciao.contacts.ContactGroup.n_repframes with show_violins=True. A value of 0 means don’t write any such .pdb file. Max value of 50 frames is enforced.

  • progressbar (bool, default is True) – Report progress as the computation advances.

Returns:

CG_interface – The object containing the mdciao.contacts.ContactPair objects tha conform the interface. If no contacts have been found, returns None.

Return type:

mdciao.contacts.ContactGroup