mdciao.contacts.ContactPair
- class mdciao.contacts.ContactPair(res_idxs_pair, ctc_trajs, time_trajs, top=None, trajs=None, atom_pair_trajs=None, fragment_idxs=None, fragment_names=None, fragment_colors=None, anchor_residue_idx=None, consensus_labels=None, consensus_fragnames=None)
Container for a contacts between two residues
This is the first level of abstraction of mdciao. It is the “closest” to the actual data, and its methods carry out most of the low-level operations on the data, e.g., the frequency calculations or the basic plotting. Other classes like
ContactGroup
usually just wrap around a collection ofContactPair
-objects and use their methods.This class just needs the pair of residue (serial) indices, the time-traces of the distances between the residues (for all input trajectories), and the time-traces of the timestamps in those trajectories.
Many other pieces of complementary information can be provided as optional parameters, allowing the class to produce better plots, labels, and tables.
Some sanity checks are carried out upon instantiation to ensure things like same number of steps in the in the distance and timestamp time-traces.
Note
Higher-level methods in the API, like those exposed by
mdciao.cli
will returnContactPair
orContactGroup
objects already instantiated and ready to use. It is recommened to use those instead of individually callingContactPair
orContactGroup
.- __init__(res_idxs_pair, ctc_trajs, time_trajs, top=None, trajs=None, atom_pair_trajs=None, fragment_idxs=None, fragment_names=None, fragment_colors=None, anchor_residue_idx=None, consensus_labels=None, consensus_fragnames=None)
- Parameters:
res_idxs_pair (iterable of two ints) – pair of residue indices, corresponding to the zero-indexed, serial number of the residues
ctc_trajs (list of iterables of floats) – time traces of the contact in nm. len(ctc_trajs) is N_trajs. Each traj can have different lengths Will be cast into arrays.
time_trajs (list of iterables of floats) – time traces of the time-values, in ps. Not having the same shape as ctc_trajs will raise an error
top (
mdtraj.Topology
, default is None) – topology associated with the contacttrajs (list, default is None) – The molecular trajectories for which the contact has been evaluated. The list can contain of
Trajectory
objects or strings with pathnames to the trajectory files. Not having the same shape as ctc_trajs will raise an erroratom_pair_trajs (list of iterables of integers, default is None) – Time traces of the pair of atom indices responsible for the distance in
ctc_trajs
Has to be of len(ctc_trajs) and each iterable of shape(Nframes, 2)fragment_idxs (iterable of two ints, default is None) – Indices of the fragments the residues of
res_idxs_pair
fragment_names (iterable of two strings, default is None) – Names of the fragments the residues of
res_idxs_pair
fragment_colors (iterable of len 2, default is None) – Colors associated to the fragments of the residues of
res_idxs_pair
. A color is anything thatmatplotlib.colors
recognizesanchor_residue_idx (int, default is None) – Label this residue as the anchor of the contact, i.e. the residue that’s shared across a number of contacts. Has to be in
res_idxs_pair
.Note
- Using this argument will automatically populate other properties, like (this is not a complete list)
anchor_index
will contain the [0,1] index of the anchor residue inres_idxs_pair
partner_index
will contain the [0,1] index of the partner residue inres_idxs_pair
partner_residue_index
will contain the other index ofres_idx_pair
and other properties which depend on having defined an anchor and a partner
- Furthermore, if a topology is parsed as an argument:
anchor_residue_name
will contain the anchor residue as anmdtraj.core.Topology.Residue
objectpartner_residue_name
will contain the partner residue as anmdtraj.core.Topology.Residue
object
consensus_labels (iterable of strings, default is None) – Consensus nomenclature of the residues of
res_idxs_pair
consensus_fragnames (iterable of strings, default is None) – Consensus fragments names of the residues of
res_idxs_pair
Methods
__init__
(res_idxs_pair, ctc_trajs, time_trajs)- Parameters:
res_idxs_pair (iterable of two ints) -- pair of residue indices, corresponding to the zero-indexed, serial number of the residues
binarize_trajs
(ctc_cutoff_Ang[, switch_off_Ang])Turn each distance-trajectory into a boolean using a cutoff.
copy
()copy this object by re-instantiating another
ContactPair
object with the same attributes.count_formed_atom_pairs
(ctc_cutoff_Ang[, sort])Count how many times each atom-pair is considered in contact in the trajectories
distro_overall_trajs
([bins])Wrapper around
numpy.histogram
to produce a distribution of the distance values (not the contact frequencies) this contact over all trajectoriesfrequency_dict
(ctc_cutoff_Ang[, ...])Returns the
frequency_overall_trajs
as a more informative dictionary with keys "freq", "residues", "fragments", "label"frequency_overall_trajs
(ctc_cutoff_Ang[, ...])How many times this contact is formed overall frames.
frequency_per_traj
(ctc_cutoff_Ang[, ...])Contact frequencies for each trajectory
gen_label
([AA_format, fragments, delete_anchor])Generate a labels with different parameters
label_flex
([AA_format, pad_label, defrag, ...])A more flexible method to produce the label of this ContactPair
partial_counts_formed_atom_pairs
(ctc_cutoff_Ang)Count how many times each atom-pair is considered in contact in the trajectories
plot_distance_distribution
([label, ...])Plot the distance distribution of this ContactPair
plot_timetrace
([ax, color_scheme, ...])Plot this ContactPair's timetraces for all trajs onto ax
For those frames in which the contact is formed, group them by relative frequencies of individual atom pairs
retop
(top, mapping[, deepcopy])Return a copy of this object with a different topology.
save
(filename)Save this
ContactPair
as a pickleAttributes
fragments
label
labels
n
neighborhood
residues
The time-traces of the contact distance for each trajectory stacked into one array
rtype: int or float, maximum time from list of list of time
rtype: int or float, maximum time from list of list of time
Contains time-traces stored as a
_TimeTraces
objects- binarize_trajs(ctc_cutoff_Ang, switch_off_Ang=None)
Turn each distance-trajectory into a boolean using a cutoff. The comparison is done using “<=”, s.t. d=ctc_cutoff yields True
Whereas
ctc_cutoff_Ang
is in Angstrom, the trajectories are in nm, as produced bymdtraj.compute_contacts
Note
The method creates a dictionary in self._binarized_trajs keyed with the ctc_cutoff_Ang, to avoid re-computing already binarized trajs
- Parameters:
ctc_cutoff_Ang (float) – Cutoff in Angstrom. The comparison operator is “<=”
- Returns:
bintrajs
- Return type:
list of boolean arrays with the same shape as the trajectories
- copy()
copy this object by re-instantiating another
ContactPair
object with the same attributes. In theory self == self.copy() should hold (but not self is self.copy()- Returns:
CP
- Return type:
- count_formed_atom_pairs(ctc_cutoff_Ang, sort=True)
Count how many times each atom-pair is considered in contact in the trajectories
Ideally we would return a dictionary but atom pairs is not hashable
- Parameters:
ctc_cutoff_Ang (float) – Cutoff in Angstrom. The comparison operator is “<=”
sort (boolean, default is True) – Return the counts by descending order
- Returns:
atom_pairs (list of atom pairs)
counts (list of ints)
- distro_overall_trajs(bins=10)
Wrapper around
numpy.histogram
to produce a distribution of the distance values (not the contact frequencies) this contact over all trajectories- Parameters:
bins (int or anything
numpy.histogram
accepts)- Returns:
h (_np.ndarray) – The counts (integer valued)
x (_np.ndarray) – The bin edges
(length(hist)+1)
.
- frequency_dict(ctc_cutoff_Ang, switch_off_Ang=None, atom_types=False, **kwargs_label_flex)
Returns the
frequency_overall_trajs
as a more informative dictionary with keys “freq”, “residues”, “fragments”, “label”- Parameters:
ctc_cutoff_Ang (float) – Cutoff in Angstrom. The comparison operator is “<=”
switch_off_Ang (float, default is None) – TODO
atom_types (bool, default is false) – Include the relative frequency of atom-type-pairs involved in the contact
kwargs_label_flex (dict) – Optional arguments for
label_flex
. The optional parameters of are:
- Other Parameters:
AA_format (str, default is “short”) –
- Amino-acid format for the label, can be
“short”: A35@4.55
“long”: ALA35@4.50
“just_consensus”: 4.50 if consensus labels are present, else fail
“try_consensus”: 4.50 if consensus labels are present, else
fallback to “short”
pad_label (bool, default is True) – Pad the labels with whitespace so that stacked contact labels become easier-to-read in plain ascii formats
defrag (char, default is None) – Character to use when defragging the contact label. Default is to leave them as is, e.g. would be “@”
fmt1 (str, default is “%-15s”) – Specify how the labels of res1 should be formatted. Only has effect if pad_label is True
fmt2 (str, default is “%-15s”) – Specify how the labels of res2 should be formatted. Only has effect if pad_label is True
- Returns:
fdict
- Return type:
dictionary
- frequency_overall_trajs(ctc_cutoff_Ang, switch_off_Ang=None)
How many times this contact is formed overall frames. Frequencies have values between 0 and 1
- Parameters:
ctc_cutoff_Ang (float) – Cutoff in Angstrom. The comparison operator is “<=”
- Returns:
freq – Frequency of the contact over all trajectories
- Return type:
float
- frequency_per_traj(ctc_cutoff_Ang, switch_off_Ang=None)
Contact frequencies for each trajectory
- Parameters:
ctc_cutoff_Ang (float) – Cutoff in Angstrom. The comparison operator is “<=”
- Returns:
freqs
- Return type:
array of len self.n.n_trajs with floats between [0,1]
- gen_label(AA_format='short', fragments=False, delete_anchor=False)
Generate a labels with different parameters
- Parameters:
AA_format (str, default is “short”) –
- Options are:
“short”: “E30@3.50”
“long”: GLU30@3.50
“just_consensus”: 3.50, fail if none is found
“try_consensus”: 3.50, fallback to “short” if none is found
fragments (bool, default is False) – Include fragment information Will get the “best” information available, ie consensus>fragname>fragindex When trying to get consensus labels, this option is ignored, s.t. the full “E30@3.50” is returned regardless.
delete_anchor (bool, default is False) – Delete the anchor from the label
- Returns:
label – The contact label, containing both or only one residue, depending on the value of delete_anchor.
- Return type:
str
- label_flex(AA_format='short', pad_label=True, defrag=None, fmt1='%-15s', fmt2='%-15s')
A more flexible method to produce the label of this ContactPair
- Parameters:
AA_format (str, default is “short”) –
- Amino-acid format for the label, can be
“short”: A35@4.55
“long”: ALA35@4.50
“just_consensus”: 4.50 if consensus labels are present, else fail
“try_consensus”: 4.50 if consensus labels are present, else
fallback to “short”
pad_label (bool, default is True) – Pad the labels with whitespace so that stacked contact labels become easier-to-read in plain ascii formats
defrag (char, default is None) – Character to use when defragging the contact label. Default is to leave them as is, e.g. would be “@”
fmt1 (str, default is “%-15s”) – Specify how the labels of res1 should be formatted. Only has effect if pad_label is True
fmt2 (str, default is “%-15s”) – Specify how the labels of res2 should be formatted. Only has effect if pad_label is True
- Returns:
label
- Return type:
str
- partial_counts_formed_atom_pairs(ctc_cutoff_Ang, switch_off_Ang=None, sort=True)
Count how many times each atom-pair is considered in contact in the trajectories
Since the
switch_off_Ang
parameter introduces partial counts, the return value need not be integer countsIdeally we would return a dictionary but atom pairs is not hashable
- Parameters:
ctc_cutoff_Ang (float) – Cutoff in Angstrom. The comparison operator is “<=”
sort (boolean, default is True) – Return the counts by descending order
- Returns:
atom_pairs (list of atom pairs)
counts (list of ints)
- plot_distance_distribution(label=None, shorten_AAs=False, defrag=None, ctc_cutoff_Ang=None, delete_anchor=False, xlim=None, **kwargs_histogram_w_smoothing_auto) Axes
Plot the distance distribution of this ContactPair
- Parameters:
label (str or None, default is None) – Default behavior is to construct the label automatically using shorten_AAs, defrag, and ctc_cutoff_Ang, but any label can be passed here to override automatic label generation.
shorten_AAs (bool, default is False) – Shorten residue labels from e.g. GLU30 to E30
defrag (None or char) – None means do not defrag the contact label. A character, e.g. “@” means use this character to defrag
ctc_cutoff_Ang (float or None, default is None) – If float, use this cutoff to compute frequencies and add them to the labels. Also, draw a vertical line in the plot. Before the vertical line is drawn, it’s checked whether the plot already contains a similar line.
delete_anchor (bool, default is False) – If True (and possible), the anchor residue will be deleted from the label.
kwargs_histogram_w_smoothing_auto (dict) – Optional parameters for
mdciao.plots.histogram_w_smoothing_auto
, which are listed below
- Other Parameters:
bins (int, default is 10) – Since this will be passed directly to
numpy.histogram
it can also take the same values thatnumpy.histogram_bin_edges
can take.ax (
Axes
or None, default is None) – The axis to draw onto. If None, the current axis will be used invokinggca
. If there’s no current axis, one will be created.smooth_bw (bool or float, default is True) – If True, smooth the histogram using a Gaussian-kernel-density estimation with an estimator bandwidth of .5 (Angstrom). If float, use this value as estimator bandwidth, check
matplotlib.mlab.GaussianKDE
for more info.background (bool, or color-like, (str, hex, rgb), default is True) – When smoothing, the original curve can appear in the background in different colors
True: use a fainted version of color
False: don’t plot any background
color-like: use this color for the background, can be: str, hex, rgba, anything
matplotlib.pyplot.colors
understands
fill_below (bool, default is True) – Fill the area underneath the histogram with a shade of color
color (None or color-like, default is None) – Default behaviour is to take the next color of the color-cycle of the plot.
alpha_below (float, default is .25) – The are below the curve will be filled with this alpha (transparency) value. Only has an effect if fill_below is True
maxcount (bool or positive float, default is False) – Normalize when plotting the histogram, s.t. different datasets can be plotted together at the same height even with very different number of absolute counts. If True, counts will be normalized to the maximum number of counts, s.t. histograms will peak at 1. If any other positive value, that’s where the peak will be.
- Returns:
ax – The axis (new or inputed) where the distribution has been plotted.
- Return type:
- plot_timetrace(ax=None, color_scheme=None, ctc_cutoff_Ang=None, switch_off_Ang=None, n_smooth_hw=0, dt=1, background=True, shorten_AAs=False, t_unit='ps', ylim_Ang=10, max_handles_per_row=4)
Plot this ContactPair’s timetraces for all trajs onto ax
- Parameters:
ax (None,
Axes
) – The axis where to plot the timetrace. Default is to plot on the current axis, and if there’s no current axes, a new one will be created. If a new one is created, it’ll have the default width and height, you have to change it afterwards or create it beforehand with your desired size.color_scheme (list, default is None) – Pass a list of colors, each one should be understandable by
matplotlib.colors.is_color_like
ctc_cutoff_Ang (float or None, default is None) – The cutoff to use, in Angstrom. If None, don’t use any cutoff.
n_smooth_hw (int, default is 0) – Size, in frames, of half the window size of the smoothing window
dt (float, default is 1) – How many units of t_unit one frame represents
background (bool, or color-like, (str, hex, rgb), default is True) – When smoothing, the original curve can appear in the background in different colors
True: use a fainted version of color
False: don’t plot any background
color-like: use this color for the background,
can be: str, hex, rgba, anything
matplotlib.colors.is_color_like
understandsshorten_AAs (bool, default is False) – Whether to shorten the AA labels
t_unit (str, default is ‘ps’) – The time unit with which to label the x-axis
ylim_Ang (float or “auto”) – The limit in Angstrom of the y-axis
max_handles_per_row (int, default is 4) – How many rows the legend can have
- Returns:
ax – The axis with the plotted timetrace
- Return type:
Axes
- relative_frequency_of_formed_atom_pairs_overall_trajs(ctc_cutoff_Ang, switch_off_Ang=None, keep_resname=False, aggregate_by_atomtype=True, min_freq=0.05)
For those frames in which the contact is formed, group them by relative frequencies of individual atom pairs
- Parameters:
ctc_cutoff_Ang (float) – Cutoff in Angstrom. The comparison operator is “<=”
keep_resname (bool, default is False) – Keep the atom’s residue name in its descriptor. Only make sense if consolidate_by_atom_type is False
aggregate_by_atomtype (bool, default is True) – Aggregate the frequencies of the contact by tye atom types involved. Atom types are backbone, sidechain or other (BB,SC, X)
min_freq (float, default is .05) – Do not report relative frequencies below this cutoff, e.g. “BB-BB”:.9, “BB-SC”:0.03, “SC-SC”:0.03, “SC-BB”:0.03 gets reported as “BB-BB”:.9
- Returns:
out_dict – Relative freqs, keyed by atom-type (atoms) involved in the contact The order is the same as in
self.ctc_labels
- Return type:
dictionary
- retop(top, mapping, deepcopy=False, **CP_kwargs)
Return a copy of this object with a different topology.
Uses the
mapping
to generate new residue- and and atom-indices where necessary, using the rest of the object’s attributes (time-traces, labels, colors, fragments…) as they were.Note
- This method will (rightly) fail if:
the mapping doesn’t contain the needed residues
the individual atoms of those residues cannot be uniquely mapped between topologies
- Parameters:
- top (
Topology
) – The new topology mapping (indexable (array, dict, list)) – A mapping of old residue indices to new residue indices. Usually, comes from aligning the old and the new topology using
mdciao.utils.sequence.maptops
. These maps only contain (key,value) pairs whenever there’s been a “match”, s.t this method will fail ifmaping
doesn’t contain all the residues in thisContactPair
.deepcopy (bool, default is False) – Use
copy.deepcopy
on the attributes when creating the newContactPair
. If False, the identity holds:>>> self.residues.consensus_labels is CP.residues.consensus_labels
If True, only the equality holds:
>>> self.residues.consensus_labels == CP.residues.consensus_labels
Note that
time_traces
are always created new no matter what.CP_kwargs (dict) – Optional keyword arguments to instantiate the new
ContactPair
. Any key-value pairsinputted here will update the internal dictionary being used, which is:
- top (
>>> { "top": top, "trajs": self.time_traces.trajs, "fragment_idxs": self.fragments.idxs, "fragment_names": self.fragments.names, "fragment_colors": self.fragments.colors, "anchor_residue_idx": anchor_residue_index, "consensus_labels": self.residues.consensus_labels }
- Returns:
CP – A new CP with updated top and indices
- Return type:
- save(filename)
Save this
ContactPair
as a pickle- Parameters:
filename (str) – filename
- property stacked_time_traces: ndarray
The time-traces of the contact distance for each trajectory stacked into one array
- Returns:
stacked_time_traces
- Return type:
np.ndarray
- property time_max
rtype: int or float, maximum time from list of list of time
- property time_min
rtype: int or float, maximum time from list of list of time
- property time_traces
Contains time-traces stored as a
_TimeTraces
objects
- property top
- property topology