mdciao.cli.interface
- mdciao.cli.interface(trajectories, topology=None, fragments='lig_resSeq+', interface_selection_1=None, interface_selection_2=None, AA_selection=None, GPCR_UniProt='None', CGN_UniProt='None', KLIFS_string=None, chunksize_in_frames=2000, ctc_cutoff_Ang=4.5, curve_color='auto', fragment_names=None, graphic_dpi=150, graphic_ext='.pdf', background=True, ctc_control=50, n_smooth_hw=0, pbc=True, output_desc='interface', output_dir='.', short_AA_names=False, stride=1, t_unit='ns', plot_timedep=True, accept_guess=False, n_jobs=1, n_nearest=0, sort_by_av_ctcs=True, scheme='closest-heavy', separate_N_ctcs=False, table_ext='dat', title=None, min_freq=0.1, contact_matrix=True, cmap='binary', flareplot=True, save_nomenclature_files=False, no_disk=False, savefigs=True, savetabs=True, savetrajs=False, figures=True, self_interface=False, n_repframes=1, progressbar=True)
Contact-frequencies between two groups of residues
The two groups of residues can be defined directly:
by using specific residue indices or ranges
by using defined molecular fragments, e.g. chains defined in the topology or pdb-file.
by guessing molecular fragments, using some fragmentation heuristic.
by guessing molecular fragments, using a consensus nomenclature like GPCR, CGN or KLIFS generic residue numbering.
The fragment definition and the fragment selection are separate, i.e. there might be six chains but one can specify to compute the interface between chains [0,1] vs [2,3]. Read more in the documentation for fragments, interface_selection_1, and interface_selection_2.
One can further refine the fragment selection at the level of single aminoacids (AAs) using AA_selection. This can fine-tune the residues of interest if the fragment definitions are too broad. See the docstring for more info.
Typically, the two groups of residues conforming both sides of the interface, also called interface members, do not share common residues, because the members belong to different molecular units. For example, in a receptor–G-protein complex, one partner is the receptor and the other partner is the G-protein.
Note
If your definitions of interface_selection_1 and interface_selection_2 lead to some overlap between the interface members (see below), mdciao’s default is to ignore contact pairs within the same fragment. E.g., in the context of a GPCR, computing “TM3” vs “TM*” (“TM3” vs “all TMs”) won’t include TM3-TM3 contacts by default. To include these (or equivalent) contacts set self_interface = True.
Another example could be computing the interface of the C-terminus of a receptor with the entire receptor, where it might be useful to include the contacts of the C-terminus with itself.
When using self_interface = True, it’s advisable to increase n_nearest, since otherwise neighboring residues of the shared set (the TM3-TM3 or the Cterm-Cterm) will always appear as formed.
See the documentation on fragments, interface_selection_1, interface_selection_2, AA_selection, n_nearest and self_interface.
Finally, the interface strength, defined as the per-residue sum of contacts participating in the interface, is written as the bfactor in a .pdb file called (for the default ctc_cutoff_Ang`=4) ‘interface.overall@4.0_Ang.as_bfactors.pdb’. You can see an example of how to use this file (e.g. with VMD) in the online documentation. The structures, i.e. frames, in that .pdb-file are chosen using the method :obj:`mdciao.contacts.ContactGroup.repframes . See below the parameter n_repframes for more info.
- Parameters:
trajectories (str,
mdtraj.Trajectory
or lists thereof) – The MD-trajectories to calculate the frequencies from. This input is pretty flexible. For more info checkmdciao.utils.str_and_dict.get_trajectories_from_input
. Accepted values are:pattern, e.g. “*.ext”
one string containing a filename
list of filenames
one
mdtraj.Trajectory
objectlist of
mdtraj.Trajectory
objectslist mixing filenames and
mdtraj.Trajectory
objects
topology (str or
Trajectory
, default is None) – The topology associated with thetrajectories
If None, the topology of the firsttrajectory
will be used, i.e. when notopology
is passed, the firsttrajectory
has to be either a .gro or .pdb file, or anTrajectory
objectfragments (str, list, None, default is “lig_resSeq+”) –
How to fragment the topology. Will be used for:
tagging of residues, e.g. “GLU30@frag1”
disambiguation of residues, e.g. more than one “GLU30” exists.
grouping of residues in graphical representations, e.g. flareplots
defining the interface fragments
There exist several input modes:
A single string with the name of a fragmentation heuristic, e.g. “lig_resSeq+”, which is the default and usually yields good results. See
mdciao.fragments.get_fragments
for more info on defaults and other heuristics.A list of definitions. Each entry of this list can be:
an iterable of integers (lists or np.arrays, e.g. np.arange(20,30)
a range expressed as an integer string, “20-30”
a ranges expressed as residue descriptors “GLU30-LEU40”
A special string, “consensus”, to use consensus
subdomains, like “TM1” or “G.H5”, as fragment definitions.
Numeric expressions are interpreted as zero-indexed, unique residue serial indices, i.e. 30-40 does not necessarily equate “GLU30-LEU40” unless serial and sequence index coincide. If there’s more than one “GLU30”, the user gets asked to disambiguate.
Please note, since fragment definiton and fragment selection are separate, one can use consensus definitions to define the interface regardless of having passed “consensus” here. I.e., you can use fragments=’chains’ to divide the topology for representation and residue-tagging purposes but then define the interface as:
>>> interface_selection_1="TM3" >>> interface_selection_2="TM2"
to compute the interface of TM3 vs TM2 in a GPCR. For this mode of selection to work, the only condition is that the consensus labels have been provided via GPCR_Uniprot, CGN_UniProt or KLIFS_string (see below).
interface_selection_1 (str or list, default is None) – Selection of the fragments that belong to one side of the interface. Strings can be CSVs and include:
ranges, e.g. ‘1,3-4’
wildcards, e.g. “TM*” or “G.H.??”
exclusions, e.g. “TM*,-TM6” (all TMs except TM6)
The default (None) is to prompt the user for information, except when:
fragments yielded only one fragment that doesn’t cover the whole topology. Then all othe residues are put into a second fragment and then the interface is computed between these two fragments.
fragments yielded just two fragments. Then the interface is computed between these two fragments.
interface_selection_2 (str or list, default is None) – Selection of the fragments that belong to the other side of the interface. Strings can be CSVs and include:
ranges, e.g. ‘1,3-4’
wildcards, e.g. “TM*” or “G.H.??”
exclusions, e.g. “TM*,-TM6” (all TMs except TM6)
The default (None) is to prompt the user for information, except when:
fragments yielded only one fragment that doesn’t cover the whole topology. Then all othe residues are put into a second fragment and then the interface is computed between these two fragments.
fragments yielded just two fragments. Then the interface is computed between these two fragments.
AA_selection (str or list, default is None) – Whatever the fragment definition and fragment selection has been, one can further refine the list of potential residue pairs by making a selection at the level of single aminoacids (AAs). E.g., if (like above) one has selected the interface to be “TM3” vs “TM2”,
>>> interface_selection_1="TM3" >>> interface_selection_2="TM2"
but wants to select only some regions of those helices, one can pass here an AA_selection. This can be a string or a list of two items:
A string leads to a boolean “or” selection, i.e. keep residue pair [ii,jj] if either ii or jj match AA_selection. E.g.
>>> AA_selection = "3.45-3.55"
is equivalent of “3.45-3.55” vs “TM2” contacts
A list of with two items (each a string expression) leads to a boolean “and” selection, i.e. keep residue pair [ii,jj] if ii and jj match AA_selection. E.g.
>>> AA_selection = ["3.45-3.55","2.45-2.55"]
is equivalent of “3.45-3.55” vs “2.45-2.55” contacts.
The strings for the selection are interpreted by
rangeexpand_residues2residxs
, so read there for more info on what expressions are allowed, like mixed descriptors and wildcards, eg: “GLU*,ARG*,GDP*,LEU394,GLU30-ARG50”. are valid.Finally, CSVs are interpreted as boolean “or”, i.e.:
>>> AA_selection = "GLU30,TRP50"
will select pairs that contain GLU30 or TRP50. If you are sure about your residue pair selection, i.e. you have a very specific list of residue-pairs you want to compute, use
mdciao.cli.sites
.GPCR_UniProt (str or
mdciao.nomenclature.LabelerGPCR
, default is None) – For GPCR nomenclature. If str, e.g. “adrb2_human”. will try to locate a local filename or do a web lookup in the GPCRdb. Ifmdciao.nomenclature.LabelerGPCR
, use this object directly (allows for object re-use when in API mode). Seemdciao.nomenclature
for more info and references. Please note the difference between UniProt Accession Code and UniProt entry name as explained here .CGN_UniProt (str or
mdciao.nomenclature.LabelerCGN
, default is None) – For CGN (G-alpha Numbering definitions) nomenclature. If str, e.g. “gnas2_human”, try to locate local filenames “gnas2_human.xlsx” or do web lookups in the GPCRdb. Ifmdciao.nomenclature.LabelerCGN
, use this object directly (allows for object re-use when in API mode) Seemdciao.nomenclature
for more info and references.KLIFS_string (str or
mdciao.nomenclature.LabelerKLIFS
, default is None) – String for kinase KLIFS nomenclature. First, try to locate a local file that directly has the KLIFS_string as a name. If that fails, then combine the filename-format expected bymdciao.nomenclature.LabelerKLIFS
with KLIFS_string to construct a filename and check again. If that doesn’t work, then go online to contact the KLIFS database.For the online lookup in the KLIFS database, the string has to be formatted “key:value”, which ultimately leads to a given KLIFS entry. Acceptable keys and values for KLIFS_string are:
“UniProtAC”, e.g. “UniProtAC:P31751”
“kinase_ID”, e.g. “kinase_ID:2”
“structure_ID”, e.g. “structure_ID:1904”, e.g. “P31751”,
Please check the documentation on
mdciao.nomenclature.LabelerKLIFS
for a more elaborate explanation on when to pick one of these key:value pairs.Finally, if KLIFS_string is an
mdciao.nomenclature.LabelerKLIFS
, use this object directly (allows for object re-use when in API mode). Seemdciao.nomenclature
for more info and references. Alos, please note the difference between UniProt Accession Code and UniProt entry name as explained here .chunksize_in_frames (int, default is 2000) – Stream through the trajectories in chunks of this size.
ctc_cutoff_Ang (float, default is 4.5) – Any residue-residue distance is considered a contact if d<=ctc_cutoff_Ang
curve_color (str, default is ‘auto’) – Type of color used for the curves. Alternatives are “P” or “H”
fragment_names (string, list of strings, or None. Default is None.) – Default is not to use fragment names. Otherwise, you can pass a string of comma-separated values or a list of fragment names. You have to provide as many names as there are fragments. The special string “auto” names fragments “frag0”, “frag1” up to the number of fragments. mdciao will use
mdciao.utils.str_and_dict.replace4latex
to try to generate LaTeX expressions from stuff like “Galpha”.graphic_dpi (int, default is 150) – Dots per Inch (DPI) of the graphic output. Only has an effect for bitmap outputs.
graphic_ext (str, default is ‘.pdf’) – The extension (=format) of the saved figures
background (bool, or color-like, (str, hex, rgb), default is True) – When smoothing, the original curve can appear in the background in different colors * True: use a fainted version of
color
* False: don’t plot any background * color-like: use this color for the background,can be: str, hex, rgba, anything matplotlib.pyplot.colors understands
ctc_control (int, default is 50) – Control the number of reported contacts. Can be an integer (keep the first n contacts) or a float representing a fraction [0,1] of the total number of contacts. Default is 50.
n_smooth_hw (int, default is 0) – Plots of the time-traces will be smoothed using a window of 2*n_smooth_hw
pbc (bool, default is True) – Use periodic boundary conditions, i.e. the minimum image convention, to compute distances.
output_desc (str, default is ‘interface’) – Descriptor for output files.
output_dir (str, default is ‘.’) – Directory to which the results are written.
short_AA_names (bool, default is False) – Use one-letter aminoacid names when possible, e.g. K145 instead of Lys145.
stride (int, default is 1) – Stride the input data by this number of frames
t_unit (str, default is ‘ns’) – Unit used for the temporal axis.
plot_timedep (bool, default is True) – Plot and save time-traces of the contacts
accept_guess (bool, default is False) – Accept mdciao’s guesses regarding fragment identification using nomenclature labels
n_jobs (int, default is 1) – Number of processors to use. The parallelization is done over trajectories and not over contacts, beyond n_jobs>n_trajs parallelization will not have any effect.
n_nearest (int, default is 0) – Exclude these many bonded neighbors for each residue. Usually, the chosen molecular fragments belong to different chains and don’t share any bonds, so this parameter has no effect. However, if you choose to compare molecular fragments that are bonded (e.g. the C-terminus with the rest of the molecule), there’s one pair that’ll be bonded across the fragment-boundary, yielding one contact that’s always formed. Setting
n_nearest
to 1 will delete this contact.sort_by_av_ctcs (bool, default is True) – When presenting the results summarized by residue, sort by sum of frequencies (~average number of contacts). Default is True.
scheme (str, default is ‘closest-heavy’) – Type of scheme for computing distance between residues. Choices are {‘ca’, ‘closest’, ‘closest- heavy’, ‘sidechain’, ‘sidechain-heavy’}. See
mdtraj.compute_distances
documentation for more infoseparate_N_ctcs (bool, default is False) – Separate the plot with the total number contacts from the time-trace plot.
table_ext (str, default is “dat”) – The extension (=format) of the saved tables
title (NoneType, default is None) – Name of the system. Used for figure titles (not filenames) Defaults to
output_desc
if None is givenmin_freq (float, default is 0.1) – Do not show frequencies smaller than this. If you notice the output being truncated at values too far above this value, you need to increase the
ctc_control
parameter.contact_matrix (bool, default is True) – Produce a plot of the interface contact matrix
cmap (str, default is ‘binary’) – The colormap for the contact matrix. Default is ‘binary’ which is black and white, but you can choose anything from here: https://matplotlib.org/3.1.0/tutorials/colors/colormaps.html
flareplot (bool, default is True) – Produce a flare plot of interface the contact matrix. Regardless of the
graphic_ext
, the flareplot will always be in .pdf-format, unlessgraphic_ext
is ‘svg’.save_nomenclature_files (bool, default is False) – Save available nomenclature definitions to disk so that they can be accessed locally in later uses.
no_disk (bool, default is False) – If True, don’t save any files at all: figs, tables, trajs, nomenclature
savefigs (bool, default is True) – Save the figures
savetabs (bool, default is True) – Save the frequency tables
savetrajs (bool, default is False) – Save the timetraces
figures (bool, default is True) – Draw figures
self_interface (bool, default is False) – Allow the interface members to share residues
n_repframes (int, default is 1) – The number of representative frames to write to the .pdb file ‘interface.overall@4.0_Ang.as_bfactors.pdb’, where the interface strength has been stored as bfactor. A “representative frame” means, in this context, a frame where the pairs of residues participating in the interface are, on average over all pairs, at a distance close to the most-likely the residue-residue distances over all data. This has some caveats, expressed in the documentation of
mdciao.contacts.ContactGroup.repframes
. To check what frames have been chosen as representative, it is better to run mdciao in API mode and callmdciao.contacts.ContactGroup.n_repframes
with show_violins=True. A value of 0 means don’t write any such .pdb file. Max value of 50 frames is enforced.progressbar (bool, default is True) – Report progress as the computation advances.
- Returns:
CG_interface – The object containing the
mdciao.contacts.ContactPair
objects tha conform the interface. If no contacts have been found, returns None.- Return type: