mdciao.cli.interface¶
-
mdciao.cli.
interface
(trajectories, topology=None, frag_idxs_group_1=None, frag_idxs_group_2=None, GPCR_uniprot='None', CGN_PDB='None', KLIFS_uniprotAC=None, chunksize_in_frames=10000, ctc_cutoff_Ang=3.5, curve_color='auto', fragments=['lig_resSeq+'], fragment_names='', graphic_dpi=150, graphic_ext='.pdf', background=True, interface_cutoff_Ang=35, ctc_control=20, n_smooth_hw=0, output_desc='interface', output_dir='.', short_AA_names=False, stride=1, t_unit='ns', plot_timedep=True, accept_guess=False, n_jobs=1, n_nearest=0, sort_by_av_ctcs=True, scheme='closest-heavy', separate_N_ctcs=False, table_ext='dat', title=None, min_freq=0.1, contact_matrix=True, cmap='binary', flareplot=True, save_nomenclature_files=False, no_disk=False, savefigs=True, savetabs=True, savetrajs=False, figures=True, self_interface=False)¶ Contact-frequencies between two groups of residues
The groups of residues can be defined directly by using residue indices or by defining molecular fragments and using these definitions as a shorthand to address large sub-domains of the molecular topology. See in particular the documentation for
fragments
,frag_idxs_group_1
obj:frag_idxs_group_2.Typically, the two groups of residues conforming both sides of the interface, also called interface members, do not share common residues, because the members belong to different molecular units. For example, in a receptor–G-protein complex, one partner is the receptor and the other partner is the G-protein.
By default, mdciao.interface doesn’t allow interface members to share residues. However, sometimes it’s useful to allow it because the contacts of one fragment with itself (the self-contacts) are also important. E.g. the C-terminus of a receptor interfacing with the entire receptor, including the C-terminus. To allow for this behaviour, use
self_interface
= True, and possibly increasen_nearest
, since otherwise neighboring residues of the shared set (e.g. C-terminus) will always appear as formed.- Parameters
trajectories –
The MD-trajectories to calculate the frequencies from. This input is pretty flexible. For more info check
mdciao.utils.str_and_dict.get_sorted_trajectories
. Accepted values are:pattern, e.g. “*.ext”
one string containing a filename
list of filenames
one
Trajectory
objectlist of
Trajectory
objects
topology (str or
Trajectory
, default is None) – The topology associated with thetrajectories
If None, the topology of the firsttrajectory
will be used, i.e. when notopology
is passed, the firsttrajectory
has to be either a .gro or .pdb file, or anTrajectory
objectfrag_idxs_group_1 (NoneType, default is None) – Indices of the fragments that belong to the group_1. Strings can be CSVs and include ranges, e.g. ‘1,3-4’, or be consensus labels “TM*,-TM6”. Defaults to None which will prompt the user of information, except when only two fragments are present. Then it defaults to [0]
frag_idxs_group_2 (NoneType, default is None) – Indices of the fragments that belong to the group_2. Strings can be CSVs and include ranges, e.g. ‘1,3-4’, or be consensus labels “TM*,-TM6”. Defaults to None which will prompt the user of information, except when only two fragments are present. Then it defaults to [1]
GPCR_uniprot (str or
mdciao.nomenclature.LabelerGPCR
, default is None) – For GPCR nomenclature. If str, e.g. “adrb2_human”. will try to locate a local filename or do a web lookup in the GPCRdb. Ifmdciao.nomenclature.LabelerGPCR
, use this object directly (allows for object re-use when in API mode). Seemdciao.nomenclature
for more info and references. Please note the difference between UniProt Accession Code and UniProt entry name as explained here .CGN_PDB (str or
mdciao.nomenclature.LabelerCGN
, default is None) – For CGN (G-alpha Numbering definitions) nomenclature. If str, e.g. “3SN6”, try to locate local filenames (“3SN6.pdb”, “CGN_3SN6.txt”) or do web lookups in https://www.mrc-lmb.cam.ac.uk/CGN/ and http://www.rcsb.org/. Ifmdciao.nomenclature.LabelerCGN
, use this object directly (allows for object re-use when in API mode) Seemdciao.nomenclature
for more info and references.KLIFS_uniprotAC (str or
mdciao.nomenclature.LabelerKLIFS
, default is None) –Uniprot Accession Code for kinase KLIFS nomenclature. If str, e.g. “P31751”, try to locate a local filename or do a web lookup in the GPCRdb. If
mdciao.nomenclature.LabelerKLIFS
, use this object directly (allows for object re-use when in API mode). Seemdciao.nomenclature
for more info and references. Please note the difference between UniProt Accession Code and UniProt entry name as explained here .chunksize_in_frames (int, default is 10000) – Stream through the trajectory data in chunks of this many frames Can lead to memory errors if
n_jobs
makes it so that e.g. 4 trajectories of 10000 frames each are loaded to memory and their residue-residue distances computedctc_cutoff_Ang (float, default is 3.5) – Any residue-residue distance is considered a contact if d<=ctc_cutoff_Ang
curve_color (str, default is 'auto') – Type of color used for the curves. Alternatives are “P” or “H”
fragments (list, default is ['lig_resSeq+']) –
Fragment control. For compatibility reasons, it has to be a list, even if it only has one element. There exist several input modes:
[“consensus”] : use things like “TM*” or “G.H*”, i.e.
GPCR or CGN-sub-subunit labels.
List of len 1 with some fragmentation heuristic, e.g.
[“lig_resSeq+”] : will use the default of
mdciao.fragments.get_fragments
. See there for info on defaults and other heuristics.List of len N that can mix different possibilities:
iterable of integers (lists or np.arrays, e.g. np.arange(20,30)
ranges expressed as integer strings, “20-30”
ranges expressed as residue descriptors [“GLU30-LEU40”]
Numeric expressions are interepreted as zero-indexed and unique residue serial indices, i.e. 30-40 does not necessarily equate “GLU30-LEU40” unless serial and sequence index coincide. If there’s more than one “GLU30”, the user gets asked to disambiguate. The resulting fragments need not cover all of the topology, they only need to not overlap.
fragment_names (str or list, default is '') – If string, it has to be a list of comma-separated values. If you want unnamed fragments, use None, “None”, or “”. Has to contain names for all fragments that result from
fragments
or more. mdciao wil try to usereplace4latex
to generate LaTeX expressions from stuff like “Galpha” You can use fragment_names=”None” or “” to avoid using fragment namesgraphic_dpi (int, default is 150) – Dots per Inch (DPI) of the graphic output. Only has an effect for bitmap outputs.
graphic_ext (str, default is '.pdf') – The extension (=format) of the saved figures
background (bool, or color-like, (str, hex, rgb), default is True) –
When smoothing, the original curve can appear in the background in different colors * True: use a fainted version of
color
* False: don’t plot any background * color-like: use this color for the background,can be: str, hex, rgba, anything matplotlib.pyplot.colors understands
interface_cutoff_Ang (float, default is 35) – The interface between both groups is defined as the set of group_1-group_2-distances that are within this cutoff in the reference topology. Otherwise, a large number of non-necessary distances (e.g. between N-terminus and G-protein) are computed. Default is 35. Setting this cutoff to None is equivalent to using no cutoff, i.e. all possible contacts are regarded
ctc_control (int, default is 20) – Control the number of reported contacts. Can be an integer (keep the first n contacts) or a float representing a fraction [0,1] of the total number of contacts. Default is 5.
n_smooth_hw (int, default is 0) – Plots of the time-traces will be smoothed using a window of 2*n_smooth_hw
output_desc (str, default is 'interface') – Descriptor for output files.
output_dir (str, default is '.') – Directory to which the results are written.
short_AA_names (bool, default is False) – Use one-letter aminoacid names when possible, e.g. K145 insted of Lys145.
stride (int, default is 1) – Stride the input data by this number of frames
t_unit (str, default is 'ns') – Unit used for the temporal axis.
plot_timedep (bool, default is True) – Plot and save time-traces of the contacts
accept_guess (bool, default is False) – Accept mdciao’s guesses regarding fragment identification using nomenclature labels
n_jobs (int, default is 1) – Number of processors to use. The parallelization is done over trajectories and not over contacts, beyond n_jobs>n_trajs parallelization will not have any effect.
n_nearest (int, default is 0) – Exclude these many bonded neighbors for each residue. Usually, the chosen molecular fragments belong to different chains and don’t share any bonds, so this parameter has no effect. However, if you choose to compare molecular fragments that are bonded (e.g. the C-terminus with the rest of the molecule), there’s one pair that’ll be bonded across the fragment-boundary, yielding one contact that’s always formed. Setting
n_nearest
to 1 will delete this contact.sort_by_av_ctcs (bool, default is True) – When presenting the results summarized by residue, sort by sum of frequencies (~average number of contacts). Default is True.
scheme (str, default is 'closest-heavy') – Type of scheme for computing distance between residues. Choices are {‘ca’, ‘closest’, ‘closest- heavy’, ‘sidechain’, ‘sidechain-heavy’}. See
mdtraj.compute_distances
documentation for more infoseparate_N_ctcs (bool, default is False) – Separate the plot with the total number contacts from the time-trace plot.
table_ext (str, default is "dat") – The extension (=format) of the saved tables
title (NoneType, default is None) – Name of the system. Used for figure titles (not filenames) Defaults to
output_desc
if None is givenmin_freq (float, default is 0.1) – Do not show frequencies smaller than this. If you notice the output beingtruncated a values too far away from this, you need to increase the
ctc_control
parametercontact_matrix (bool, default is True) – Produce a plot of the interface contact matrix
cmap (str, default is 'binary') – The colormap for the contact matrix. Default is ‘binary’ which is black and white, but you can choose anthing from here: https://matplotlib.org/3.1.0/tutorials/colors/colormaps.html
flareplot (bool, default is True) – Produce a flare plot of interface the contact matrix. Regardless of the
graphic_ext
, the flareplot will always be in .pdf-format, unlessgraphic_ext
is ‘svg’.save_nomenclature_files (bool, default is False) – Save available nomenclature definitions to disk so that they can be accessed locally in later uses.
no_disk (bool, default is False) – If True, don’t save any files at all: figs, tables, trajs, nomenclature
savefigs (bool, default is True) – Save the figures
savetabs (bool, default is True) – Save the frequency tables
savetrajs (bool, default is False) – Save the timetraces
figures (bool, default is True) – Draw figures
self_interface (bool, default is False) – Allow the interface members to share residues
- Returns
CG_interface – The object containing the
mdciao.contacts.ContactPair
objects tha conform the interface.- Return type