mdciao.cli.residue_neighborhoods
- mdciao.cli.residue_neighborhoods(residues, trajectories, topology=None, res_idxs=False, ctc_cutoff_Ang=4.5, stride=1, ctc_control=6, n_nearest=4, scheme='closest-heavy', min_freq=0.01, chunksize_in_frames=2000, n_smooth_hw=0, sort=True, pbc=True, ylim_Ang=15, fragments='lig_resSeq+', fragment_names='auto', fragment_colors=None, graphic_ext='.pdf', table_ext='.dat', GPCR_UniProt=None, CGN_UniProt=None, KLIFS_string=None, output_dir='.', output_desc='neighborhood', t_unit='ns', curve_color='auto', background=True, graphic_dpi=150, short_AA_names=False, allow_same_fragment_ctcs=True, save_nomenclature_files=False, plot_timedep=True, n_cols=4, distro=False, n_jobs=1, separate_N_ctcs=False, accept_guess=False, switch_off_Ang=None, plot_atomtypes=False, no_disk=False, savefigs=True, savetabs=True, savetrajs=False, figures=True, naive_bonds=False, progressbar=True)
Per-residue neighborhoods based on contact frequencies between pairs of residues.
A neighborhood is a
mdciao.contacts.ContactGroup
-object containing a set ofmdciao.contacts.ContactPair
-objects with a shared residue, called the anchor_residue.The contact frequencies will be printed, plotted and saved. The residue-residue distance time-traces used for their computation will be also returned
Note
The time-independent figures (e.g. “neighborhood.overall@3.5_Ang.pdf”) are always shown whereas the time-dependent figures (e.g. “neighborhood.GDP395.time_trace@3.5_Ang.pdf”) are never shown, because the number of time-traces becomes very high very quickly. It’s easier to look at them with an outside viewer.
The user may be prompted when necessary, although this behaviour can be turned off with
accept_guess
Input can be from disk and/or from memory (see below).
Can be parallelized up to the number of used trajectories.
Many other optional parameters are exposed to allow fine-tuning of the computing, plotting, printing, and saving. Additional information can be regarding nomenclature, fragmentation heuristics and/or naming and or/coloring, residue labeling, time-trace averaging, data-streaming,
- Parameters:
residues (int, iterable of ints or str) – The residue(s) for which the neighborhood will be computed. This input is pretty flexible wrt to strings and numbers, which are interpreted as sequence indices unless
res_idxs
is True Valid inputs are:residues = [1,10,11,12]
residues = ‘1,10,11,12’
residues = ‘1,10-12’
residues = [1]
residues = 1
residues = ‘1’
residues = ‘1,10-12,GLU*,GDP*,E30’
Please refer to
mdciao.utils.residue_and_atom.rangeexpand_residues2residxs
for more infotrajectories (str,
mdtraj.Trajectory
or lists thereof) – The MD-trajectories to calculate the frequencies from. This input is pretty flexible. For more info checkmdciao.utils.str_and_dict.get_trajectories_from_input
. Accepted values are:pattern, e.g. “*.ext”
one string containing a filename
list of filenames
one
mdtraj.Trajectory
objectlist of
mdtraj.Trajectory
objectslist mixing filenames and
mdtraj.Trajectory
objects
topology (str or
Trajectory
, default is None) – The topology associated with thetrajectories
If None, the topology of the firsttrajectory
will be used, i.e. when notopology
is passed, the firsttrajectory
has to be either a .gro or .pdb file, or anTrajectory
object
- Other Parameters:
res_idxs (bool, default is False) –
- Whether the indices of
residues
should be understood as zero-indexed, residue serial indices or
residue sequence, e.g. 30 in GLU30, this is called ‘resSeq’
in an
mdtraj.core.Residue
-object
- Whether the indices of
ctc_cutoff_Ang (float, default is 4.5) – Any residue-residue distance is considered a contact if d<=ctc_cutoff_Ang
stride (int, default is 1) – Stride the input data by this number of frames
ctc_control (int or float, default is 6) – Control the number of reported contacts. Can be an integer (keep the first n contacts) or a float representing a fraction [0,1] of the total number of contacts. Default is 6.
n_nearest (int, default is 4) – Exclude these many bonded neighbors for each residue
min_freq (float, default is 0.01) – Do not show frequencies smaller than this. If you notice the output being truncated at values too far above this value, you need to increase the
ctc_control
parameter.scheme (str, default is ‘closest-heavy’) – Type of scheme for computing distance between residues. Choices are {‘ca’, ‘closest’, ‘closest- heavy’, ‘sidechain’, ‘sidechain-heavy’}. See
mdtraj.compute_distances
documentation for more infochunksize_in_frames (int, default is 2000) – Stream through the trajectories in chunks of this size.
n_smooth_hw (int, default is 0) – Plots of the time-traces will be smoothed using a window of 2*n_smooth_hw
sort (bool, default is True) – Sort the input
residues
according to their indicespbc (bool, default is True) – Use periodic boundary conditions, i.e. the minimum image convention, to compute distances.
ylim_Ang (float, default is 15) – Limit in Angstrom of the y-axis of the time-traces. Default is 15. Switch to any other float or ‘auto’ for automatic scaling
fragments (str, list, None, default is “lig_resSeq+”) – Topology fragments. There exist several input modes:
Name of a fragmentation heuristic, e.g. “lig_resSeq+”, which is the default of and usually yields good results. See
mdciao.fragments.get_fragments
for more info on defaults and other heuristics.List of len N that can mix different possibilities:
iterable of integers (lists or np.arrays, e.g. np.arange(20,30)
ranges expressed as integer strings, “20-30”
ranges expressed as residue descriptors [“GLU30-LEU40”]
“consensus” : use things like “TM*” or “G.H*”, i.e. GPCR or CGN-sub-subunit labels.
Numeric expressions are interpreted as zero-indexed and unique residue serial indices, i.e. 30-40 does not necessarily equate “GLU30-LEU40” unless serial and sequence index coincide. If there’s more than one “GLU30”, the user gets asked to disambiguate. The resulting fragments need not cover all of the topology, they only need to not overlap.
fragment_names (string, list of strings, or None. Default is “auto”.) – The default “auto” names fragments “frag0”, “frag1” up however many fragments there are. If string and not “auto”, it has to be a list of comma-separated values, with enough names for however many fragments there are. mdciao will use
mdciao.utils.str_and_dict.replace4latex
to try to generate LaTeX expressions from stuff like “Galpha”. You can use None or “” to avoid using fragment names.fragment_colors (None, boolean or list, default is None) – Assign colors to fragments. These colors will be used to color-code the frequency bars. If True, colors will be automatically selected, otherwise picked from the list. Use with cautions, plots get shrill quickly
graphic_ext (str, default is “.pdf”) – The extension (=format) of the saved figures
table_ext (str, default is “.dat”) – The extension (=format) of the saved tables
GPCR_UniProt (str or
mdciao.nomenclature.LabelerGPCR
, default is None) – For GPCR nomenclature. If str, e.g. “adrb2_human”. will try to locate a local filename or do a web lookup in the GPCRdb. Ifmdciao.nomenclature.LabelerGPCR
, use this object directly (allows for object re-use when in API mode). Seemdciao.nomenclature
for more info and references. Please note the difference between UniProt Accession Code and UniProt entry name as explained here .CGN_UniProt (str or
mdciao.nomenclature.LabelerCGN
, default is None) – For CGN (G-alpha Numbering definitions) nomenclature. If str, e.g. “gnas2_human”, try to locate local filenames “gnas2_human.xlsx” or do web lookups in the GPCRdb. Ifmdciao.nomenclature.LabelerCGN
, use this object directly (allows for object re-use when in API mode) Seemdciao.nomenclature
for more info and references.KLIFS_string (str or
mdciao.nomenclature.LabelerKLIFS
, default is None) – String for kinase KLIFS nomenclature. First, try to locate a local file that directly has the KLIFS_string as a name. If that fails, then combine the filename-format expected bymdciao.nomenclature.LabelerKLIFS
with KLIFS_string to construct a filename and check again. If that doesn’t work, then go online to contact the KLIFS database.For the online lookup in the KLIFS database, the string has to be formatted “key:value”, which ultimately leads to a given KLIFS entry. Acceptable keys and values for KLIFS_string are:
“UniProtAC”, e.g. “UniProtAC:P31751”
“kinase_ID”, e.g. “kinase_ID:2”
“structure_ID”, e.g. “structure_ID:1904”, e.g. “P31751”,
Please check the documentation on
mdciao.nomenclature.LabelerKLIFS
for a more elaborate explanation on when to pick one of these key:value pairs.Finally, if KLIFS_string is an
mdciao.nomenclature.LabelerKLIFS
, use this object directly (allows for object re-use when in API mode). Seemdciao.nomenclature
for more info and references. Alos, please note the difference between UniProt Accession Code and UniProt entry name as explained here .output_dir (str, default is ‘.’) – directory to which the results are written.
output_desc (str, default is ‘neighborhood’) – Descriptor for output files.
t_unit (str, default is ‘ns’) – Unit used for the temporal axis.
curve_color (str, default is ‘auto’) – Type of color used for the curves. Alternatives are “P” or “H”
background (bool, or color-like, (str, hex, rgb), default is True) – When smoothing, the original curve can appear in the background in different colors * True: use a fainted version of
color
* False: don’t plot any background * color-like: use this color for the background,can be: str, hex, rgba, anything matplotlib.pyplot.colors understands
graphic_dpi (int, default is 150) – Dots per Inch (DPI) of the graphic output. Only has an effect for bitmap outputs.
short_AA_names (bool, default is False) – Use one-letter aminoacid names when possible, e.g. K145 insted of Lys145.
allow_same_fragment_ctcs (bool, default is True) – Allow contacts whithin the same fragment.
save_nomenclature_files (bool, default is False) – Save available nomenclature definitions to disk so that they can be accessed locally in later uses.
plot_timedep (bool, default is True) – Plot and save time-traces of the contacts
n_cols (int, default is 4) – number of columns of the overall plot.
distro (bool, default is False) – Plot distance distributions instead of contact bar plots
n_jobs (int, default is 1) – Number of processors to use. The parallelization is done over trajectories and not over contacts, beyond n_jobs>n_trajs parallelization will not have any effect.
separate_N_ctcs (bool, default is False) – Separate the plot with the total number contacts from the time-trace plot.
accept_guess (bool, default is False) – Accept mdciao’s guesses regarding fragment identification using nomenclature labels
switch_off_Ang (NoneType, default is None) – Use a linear switchoff instead of a crisp one.
plot_atomtypes (bool, default is False) – Add the atom-types to the frequency bars by ‘hatching’ them. ‘–’ is sidechain-sidechain ‘|’ is backbone-backbone ‘' is backbone-sidechain ‘/’ is sidechain-backbone. See Fig XX for an example
savefigs (bool, default is True) – Save the figures
savetabs (bool, default is True) – Save the frequency tables
savetrajs (bool, default is False) – Save the timetraces
no_disk (bool, default is False) – If True, don’t save any files at all: figs, tables, trajs, nomenclature
figures (bool, default is True) – Draw figures
naive_bonds (bool, default is False) – If
top
doesn’t automatically yield a list bonds between residues, build naive (=linear) bonds usingmdciao.utils.bonds.top2residue_bond_matrix_naive
These bonds are needed to exclude bonded neighbors usingn_nearest
progressbar (bool, default is True) – Report progress as the computation advances.
- Returns:
neighborhoods – Keyed by unique, zero-indexed residue indices, valued with
mdciao.contacts.ContactGroup
objects If no contacts have been found, returns None.- Return type:
dict