mdciao.cli.sites

mdciao.cli.sites(site_inputs, trajectories, topology=None, ctc_cutoff_Ang=4.5, stride=1, scheme='closest-heavy', chunksize_in_frames=2000, n_smooth_hw=0, pbc=True, GPCR_UniProt='None', CGN_UniProt='None', KLIFS_string=None, fragments='lig_resSeq+', allow_partial_sites=False, default_fragment_index=None, fragment_names=None, output_dir='.', graphic_ext='.pdf', t_unit='ns', curve_color='auto', background=True, graphic_dpi=150, short_AA_names=False, save_nomenclature_files=False, ylim_Ang=10, n_jobs=1, accept_guess=False, table_ext='dat', output_desc='sites', plot_atomtypes=False, distro=False, no_disk=False, savefigs=True, savetabs=True, savetrajs=False, figures=True, plot_timedep=True, progressbar=True)

Compute distances between groups of contact-pairs that are already pre-defined as sites

Parameters:
  • site_inputs (dict or path to file, or list thereof) – Site(s) to compute. A site can be either a path to a site file in json format or directly a site dictionary. A site dictionary is something like

    >>> {"name": "interesting contacts",
    >>>  "pairs": {"AAresSeq": ["L394-K270",
    >>>                         "D381-Q229"]}}
    

    Any site containing a residue that can’t be found in the topology will be discarded. The list in “pairs” can be specified as:

    • ‘AAresSeq’:

    >>> {"name": "interesting contacts",
    >>>  "pairs": {"AAresSeq": ["L394-K270",
    >>>                         "D381-Q229"]}}
    The 'AAresSeq' definitions are transferable to
    another system where the same aminoacids are
    present, regardless of their actual zero-indexing.
    
    • ‘residx’:

    >>> {"name": "interesting contacts",
    >>>  "pairs": {"residx":[[353,972],
    >>>                      [340,956]]}}
    The 'pairs' definitions are only transferable
    across systems as long both systems share the same
    zero-indexing scheme.
    
    • ‘consensus’

    >>> {"name": "interesting contacts",
    >>>  "pairs": {"consensus": ["G.H5.26-6.32x32",
    >>>                          "G.H5.13-5.68x68"]}}
    The 'consensus' definitions are transferable to
    another system, even if the selected aminoacids are
    different. Please note, in order
    to use 'consenus' definitions, you need
    to pass at least one (or more) of the `GPCR_UniProt`,
    `CGN_UniProt` or `KLIFS_string` arguments, else
    there is no way to know to which residues the labels
    belong to.
    

    See mdciao.sites for more info on the site format.

  • trajectories (str, mdtraj.Trajectory or lists thereof) – The MD-trajectories to calculate the frequencies from. This input is pretty flexible. For more info check mdciao.utils.str_and_dict.get_trajectories_from_input. Accepted values are:

  • topology (str or Trajectory, default is None) – The topology associated with the trajectories If None, the topology of the first trajectory will be used, i.e. when no topology is passed, the first trajectory has to be either a .gro or .pdb file, or an Trajectory object

  • ctc_cutoff_Ang (float, default is 4.5) – Any residue-residue distance is considered a contact if d<=ctc_cutoff_Ang

  • stride (int, default is 1) – Stride the input data by this number of frames

  • scheme (str, default is ‘closest-heavy’) – Type of scheme for computing distance between residues. Choices are {‘ca’, ‘closest’, ‘closest- heavy’, ‘sidechain’, ‘sidechain-heavy’}. See mdtraj documentation for more info

  • chunksize_in_frames (int, default is 2000) – Stream through the trajectories in chunks of this size.

  • n_smooth_hw (int, default is 0) – Plots of the time-traces will be smoothed using a window of 2*n_smooth_hw

  • pbc (bool, default is True) – Use periodic boundary conditions, i.e. the minimum image convention, to compute distances.

  • GPCR_UniProt (str or mdciao.nomenclature.LabelerGPCR, default is None) – For GPCR nomenclature. If str, e.g. “adrb2_human”. will try to locate a local filename or do a web lookup in the GPCRdb. If mdciao.nomenclature.LabelerGPCR, use this object directly (allows for object re-use when in API mode). See mdciao.nomenclature for more info and references. Please note the difference between UniProt Accession Code and UniProt entry name as explained here .

  • CGN_UniProt (str or mdciao.nomenclature.LabelerCGN, default is None) – For CGN (G-alpha Numbering definitions) nomenclature. If str, e.g. “gnas2_human”, try to locate local filenames “gnas2_human.xlsx” or do web lookups in the GPCRdb. If mdciao.nomenclature.LabelerCGN, use this object directly (allows for object re-use when in API mode) See mdciao.nomenclature for more info and references.

  • KLIFS_string (str or mdciao.nomenclature.LabelerKLIFS, default is None) – String for kinase KLIFS nomenclature. First, try to locate a local file that directly has the KLIFS_string as a name. If that fails, then combine the filename-format expected by mdciao.nomenclature.LabelerKLIFS with KLIFS_string to construct a filename and check again. If that doesn’t work, then go online to contact the KLIFS database.

    For the online lookup in the KLIFS database, the string has to be formatted “key:value”, which ultimately leads to a given KLIFS entry. Acceptable keys and values for KLIFS_string are:

    • “UniProtAC”, e.g. “UniProtAC:P31751”

    • “kinase_ID”, e.g. “kinase_ID:2”

    • “structure_ID”, e.g. “structure_ID:1904”, e.g. “P31751”,

    Please check the documentation on mdciao.nomenclature.LabelerKLIFS for a more elaborate explanation on when to pick one of these key:value pairs.

    Finally, if KLIFS_string is an mdciao.nomenclature.LabelerKLIFS, use this object directly (allows for object re-use when in API mode). See mdciao.nomenclature for more info and references. Alos, please note the difference between UniProt Accession Code and UniProt entry name as explained here .

  • fragments (str, list, None, default is “lig_resSeq+”) – Topology fragments. There exist several input modes:

    • Name of a fragmentation heuristic, e.g. “lig_resSeq+”, which is the default of and usually yields good results. See mdciao.fragments.get_fragments for more info on defaults and other heuristics.

    • List of len N that can mix different possibilities:

      • iterable of integers (lists or np.arrays, e.g. np.arange(20,30)

      • ranges expressed as integer strings, “20-30”

      • ranges expressed as residue descriptors [“GLU30-LEU40”]

    • “consensus” : use things like “TM*” or “G.H*”, i.e. GPCR or CGN-sub-subunit labels.

    Numeric expressions are interpreted as zero-indexed and unique residue serial indices, i.e. 30-40 does not necessarily equate “GLU30-LEU40” unless serial and sequence index coincide. If there’s more than one “GLU30”, the user gets asked to disambiguate. The resulting fragments need not cover all of the topology, they only need to not overlap.

  • default_fragment_index (NoneType, default is None) – In case a residue identified as, e.g, “GLU30”, appears more than one time in the topology, e.g. in case of a dimer, pass which fragment/monomer should be chosen by default. The default behaviour (None) will prompt the user when necessary

  • fragment_names (string, list of strings, or None. Default is None.) – Default is not to use fragment names. Otherwise, you can pass a string of comma-separated values or a list of fragment names. You have to provide as many names as there are fragments. The special string “auto” names fragments “frag0”, “frag1” up to the number of fragments. mdciao will use mdciao.utils.str_and_dict.replace4latex to try to generate LaTeX expressions from stuff like “Galpha”.

  • allow_partial_sites (bool, default is False) – If False, a single missing residue is enough to discard an entire site. If True, the site definitions get modified to keep the residues that could be found

  • output_dir (str, default is ‘.’) – directory to which the results are written

  • graphic_ext (str, default is ‘.pdf’) – Extension of the output graphics, default is .pdf

  • t_unit (str, default is ‘ns’) – Unit used for the temporal axis.

  • curve_color (str, default is ‘auto’) – Type of color used for the curves. Alternatives are “P” or “H”

  • background (bool, or color-like, (str, hex, rgb), default is True) – When smoothing, the original curve can appear in the background in different colors * True: use a fainted version of color * False: don’t plot any background * color-like: use this color for the background,

    can be: str, hex, rgba, anything matplotlib.pyplot.colors understands

  • graphic_dpi (int, default is 150) – Dots per Inch (DPI) of the graphic output. Only has an effect for bitmap outputs.

  • short_AA_names (bool, default is False) – Use one-letter aminoacid names when possible, e.g. K145 insted of Lys145.

  • save_nomenclature_files (bool, default is False) – Save available nomenclature definitions to disk so that they can be accessed locally in later uses.

  • ylim_Ang (int, default is 10) – Limit in Angstrom of the y-axis of the time-traces. Switch to any other float or ‘auto’ for automatic scaling

  • n_jobs (int, default is 1) – Number of processors to use. The parallelization is done over trajectories and not over contacts, beyond n_jobs>n_trajs parallelization will not have any effect

  • accept_guess (bool, default is False) – Accept mdciao’s guesses regarding fragment identification using nomenclature labels

  • table_ext (str, default is dat) – Extension for tabled files (.dat, .txt, .xlsx).

  • output_desc – Descriptor for output files.

  • plot_atomtypes (bool, default is False) – Add the atom-types to the frequency bars by ‘hatching’ them. ‘–’ is sidechain-sidechain ‘|’ is backbone-backbone ‘' is backbone-sidechain ‘/’ is sidechain-backbone. See Fig XX for an example

  • distro (bool, default is False) – Plot distance distributions instead of contact bar plots

  • savefigs (bool, default is True) – Save the figures

  • savetabs (bool, default is True) – Save the frequency tables

  • savetrajs (bool, default is False) – Save the timetraces

  • no_disk (bool, default is False) – If True, don’t save any files at all: figs, tables, trajs, nomenclature

  • figures (bool, default is True) – Draw figures

  • plot_timedep (bool, default is True) – Plot time-traces of the contacts

  • progressbar (bool, default is True) – Report progress as the computation advances.

Returns:

CG_site – Keyed with the site name, its values are the mdciao.contacts.ContactGroup-objects, that conform each site

Return type:

dictionary