mdciao.contacts.ContactPair

class mdciao.contacts.ContactPair(res_idxs_pair, ctc_trajs, time_trajs, top=None, trajs=None, atom_pair_trajs=None, fragment_idxs=None, fragment_names=None, fragment_colors=None, anchor_residue_idx=None, consensus_labels=None, consensus_fragnames=None)

Container for a contacts between two residues

This is the first level of abstraction of mdciao. It is the “closest” to the actual data, and its methods carry out most of the low-level operations on the data, e.g., the frequency calculations or the basic plotting. Other classes like ContactGroup usually just wrap around a collection of ContactPair-objects and use their methods.

This class just needs the pair of residue (serial) indices, the time-traces of the distances between the residues (for all input trajectories), and the time-traces of the timestamps in those trajectories.

Many other pieces of complementary information can be provided as optional parameters, allowing the class to produce better plots, labels, and tables.

Some sanity checks are carried out upon instantiation to ensure things like same number of steps in the in the distance and timestamp time-traces.

Note

Higher-level methods in the API, like those exposed by mdciao.cli will return ContactPair or ContactGroup objects already instantiated and ready to use. It is recommened to use those instead of individually calling ContactPair or ContactGroup.

__init__(res_idxs_pair, ctc_trajs, time_trajs, top=None, trajs=None, atom_pair_trajs=None, fragment_idxs=None, fragment_names=None, fragment_colors=None, anchor_residue_idx=None, consensus_labels=None, consensus_fragnames=None)
Parameters:
  • res_idxs_pair (iterable of two ints) – pair of residue indices, corresponding to the zero-indexed, serial number of the residues

  • ctc_trajs (list of iterables of floats) – time traces of the contact in nm. len(ctc_trajs) is N_trajs. Each traj can have different lengths Will be cast into arrays.

  • time_trajs (list of iterables of floats) – time traces of the time-values, in ps. Not having the same shape as ctc_trajs will raise an error

  • top (mdtraj.Topology, default is None) – topology associated with the contact

  • trajs (list, default is None) – The molecular trajectories for which the contact has been evaluated. The list can contain of Trajectory objects or strings with pathnames to the trajectory files. Not having the same shape as ctc_trajs will raise an error

  • atom_pair_trajs (list of iterables of integers, default is None) – Time traces of the pair of atom indices responsible for the distance in ctc_trajs Has to be of len(ctc_trajs) and each iterable of shape(Nframes, 2)

  • fragment_idxs (iterable of two ints, default is None) – Indices of the fragments the residues of res_idxs_pair

  • fragment_names (iterable of two strings, default is None) – Names of the fragments the residues of res_idxs_pair

  • fragment_colors (iterable of len 2, default is None) – Colors associated to the fragments of the residues of res_idxs_pair. A color is anything that matplotlib.colors recognizes

  • anchor_residue_idx (int, default is None) – Label this residue as the anchor of the contact, i.e. the residue that’s shared across a number of contacts. Has to be in res_idxs_pair.

    Note

    Using this argument will automatically populate other properties, like (this is not a complete list)
    • anchor_index will contain the [0,1] index of the anchor residue in res_idxs_pair

    • partner_index will contain the [0,1] index of the partner residue in res_idxs_pair

    • partner_residue_index will contain the other index of res_idx_pair

    and other properties which depend on having defined an anchor and a partner

    Furthermore, if a topology is parsed as an argument:
    • anchor_residue_name will contain the anchor residue as an mdtraj.core.Topology.Residue object

    • partner_residue_name will contain the partner residue as an mdtraj.core.Topology.Residue object

  • consensus_labels (iterable of strings, default is None) – Consensus nomenclature of the residues of res_idxs_pair

  • consensus_fragnames (iterable of strings, default is None) – Consensus fragments names of the residues of res_idxs_pair

Methods

__init__(res_idxs_pair, ctc_trajs, time_trajs)

Parameters:
  • res_idxs_pair (iterable of two ints) -- pair of residue indices, corresponding to the zero-indexed, serial number of the residues

binarize_trajs(ctc_cutoff_Ang[, switch_off_Ang])

Turn each distance-trajectory into a boolean using a cutoff.

copy()

copy this object by re-instantiating another ContactPair object with the same attributes.

count_formed_atom_pairs(ctc_cutoff_Ang[, sort])

Count how many times each atom-pair is considered in contact in the trajectories

distro_overall_trajs([bins])

Wrapper around numpy.histogram to produce a distribution of the distance values (not the contact frequencies) this contact over all trajectories

frequency_dict(ctc_cutoff_Ang[, ...])

Returns the frequency_overall_trajs as a more informative dictionary with keys "freq", "residues", "fragments", "label"

frequency_overall_trajs(ctc_cutoff_Ang[, ...])

How many times this contact is formed overall frames.

frequency_per_traj(ctc_cutoff_Ang[, ...])

Contact frequencies for each trajectory

gen_label([AA_format, fragments, delete_anchor])

Generate a labels with different parameters

label_flex([AA_format, pad_label, defrag, ...])

A more flexible method to produce the label of this ContactPair

partial_counts_formed_atom_pairs(ctc_cutoff_Ang)

Count how many times each atom-pair is considered in contact in the trajectories

plot_distance_distribution([label, ...])

Plot the distance distribution of this ContactPair

plot_timetrace([ax, color_scheme, ...])

Plot this ContactPair's timetraces for all trajs onto ax

relative_frequency_of_formed_atom_pairs_overall_trajs(...)

For those frames in which the contact is formed, group them by relative frequencies of individual atom pairs

retop(top, mapping[, deepcopy])

Return a copy of this object with a different topology.

save(filename)

Save this ContactPair as a pickle

Attributes

fragments

label

labels

n

neighborhood

residues

stacked_time_traces

The time-traces of the contact distance for each trajectory stacked into one array

time_max

rtype: int or float, maximum time from list of list of time

time_min

rtype: int or float, maximum time from list of list of time

time_traces

Contains time-traces stored as a _TimeTraces objects

top

topology

binarize_trajs(ctc_cutoff_Ang, switch_off_Ang=None)

Turn each distance-trajectory into a boolean using a cutoff. The comparison is done using “<=”, s.t. d=ctc_cutoff yields True

Whereas ctc_cutoff_Ang is in Angstrom, the trajectories are in nm, as produced by mdtraj.compute_contacts

Note

The method creates a dictionary in self._binarized_trajs keyed with the ctc_cutoff_Ang, to avoid re-computing already binarized trajs

Parameters:

ctc_cutoff_Ang (float) – Cutoff in Angstrom. The comparison operator is “<=”

Returns:

bintrajs

Return type:

list of boolean arrays with the same shape as the trajectories

copy()

copy this object by re-instantiating another ContactPair object with the same attributes. In theory self == self.copy() should hold (but not self is self.copy()

Returns:

CP

Return type:

ContactPair

count_formed_atom_pairs(ctc_cutoff_Ang, sort=True)

Count how many times each atom-pair is considered in contact in the trajectories

Ideally we would return a dictionary but atom pairs is not hashable

Parameters:
  • ctc_cutoff_Ang (float) – Cutoff in Angstrom. The comparison operator is “<=”

  • sort (boolean, default is True) – Return the counts by descending order

Returns:

  • atom_pairs (list of atom pairs)

  • counts (list of ints)

distro_overall_trajs(bins=10)

Wrapper around numpy.histogram to produce a distribution of the distance values (not the contact frequencies) this contact over all trajectories

Parameters:

bins (int or anything numpy.histogram accepts)

Returns:

  • h (_np.ndarray) – The counts (integer valued)

  • x (_np.ndarray) – The bin edges (length(hist)+1).

frequency_dict(ctc_cutoff_Ang, switch_off_Ang=None, atom_types=False, **kwargs_label_flex)

Returns the frequency_overall_trajs as a more informative dictionary with keys “freq”, “residues”, “fragments”, “label”

Parameters:
  • ctc_cutoff_Ang (float) – Cutoff in Angstrom. The comparison operator is “<=”

  • switch_off_Ang (float, default is None) – TODO

  • atom_types (bool, default is false) – Include the relative frequency of atom-type-pairs involved in the contact

  • kwargs_label_flex (dict) – Optional arguments for label_flex. The optional parameters of are:

Other Parameters:
  • AA_format (str, default is “short”) –

    Amino-acid format for the label, can be
    • “short”: A35@4.55

    • “long”: ALA35@4.50

    • “just_consensus”: 4.50 if consensus labels are present, else fail

    • “try_consensus”: 4.50 if consensus labels are present, else

    fallback to “short”

  • pad_label (bool, default is True) – Pad the labels with whitespace so that stacked contact labels become easier-to-read in plain ascii formats

  • defrag (char, default is None) – Character to use when defragging the contact label. Default is to leave them as is, e.g. would be “@”

  • fmt1 (str, default is “%-15s”) – Specify how the labels of res1 should be formatted. Only has effect if pad_label is True

  • fmt2 (str, default is “%-15s”) – Specify how the labels of res2 should be formatted. Only has effect if pad_label is True

Returns:

fdict

Return type:

dictionary

frequency_overall_trajs(ctc_cutoff_Ang, switch_off_Ang=None)

How many times this contact is formed overall frames. Frequencies have values between 0 and 1

Parameters:

ctc_cutoff_Ang (float) – Cutoff in Angstrom. The comparison operator is “<=”

Returns:

freq – Frequency of the contact over all trajectories

Return type:

float

frequency_per_traj(ctc_cutoff_Ang, switch_off_Ang=None)

Contact frequencies for each trajectory

Parameters:

ctc_cutoff_Ang (float) – Cutoff in Angstrom. The comparison operator is “<=”

Returns:

freqs

Return type:

array of len self.n.n_trajs with floats between [0,1]

gen_label(AA_format='short', fragments=False, delete_anchor=False)

Generate a labels with different parameters

Parameters:
  • AA_format (str, default is “short”) –

    Options are:
    • “short”: “E30@3.50

    • “long”: GLU30@3.50

    • “just_consensus”: 3.50, fail if none is found

    • “try_consensus”: 3.50, fallback to “short” if none is found

  • fragments (bool, default is False) – Include fragment information Will get the “best” information available, ie consensus>fragname>fragindex When trying to get consensus labels, this option is ignored, s.t. the full “E30@3.50” is returned regardless.

  • delete_anchor (bool, default is False) – Delete the anchor from the label

Returns:

label – The contact label, containing both or only one residue, depending on the value of delete_anchor.

Return type:

str

label_flex(AA_format='short', pad_label=True, defrag=None, fmt1='%-15s', fmt2='%-15s')

A more flexible method to produce the label of this ContactPair

Parameters:
  • AA_format (str, default is “short”) –

    Amino-acid format for the label, can be
    • “short”: A35@4.55

    • “long”: ALA35@4.50

    • “just_consensus”: 4.50 if consensus labels are present, else fail

    • “try_consensus”: 4.50 if consensus labels are present, else

    fallback to “short”

  • pad_label (bool, default is True) – Pad the labels with whitespace so that stacked contact labels become easier-to-read in plain ascii formats

  • defrag (char, default is None) – Character to use when defragging the contact label. Default is to leave them as is, e.g. would be “@”

  • fmt1 (str, default is “%-15s”) – Specify how the labels of res1 should be formatted. Only has effect if pad_label is True

  • fmt2 (str, default is “%-15s”) – Specify how the labels of res2 should be formatted. Only has effect if pad_label is True

Returns:

label

Return type:

str

partial_counts_formed_atom_pairs(ctc_cutoff_Ang, switch_off_Ang=None, sort=True)

Count how many times each atom-pair is considered in contact in the trajectories

Since the switch_off_Ang parameter introduces partial counts, the return value need not be integer counts

Ideally we would return a dictionary but atom pairs is not hashable

Parameters:
  • ctc_cutoff_Ang (float) – Cutoff in Angstrom. The comparison operator is “<=”

  • sort (boolean, default is True) – Return the counts by descending order

Returns:

  • atom_pairs (list of atom pairs)

  • counts (list of ints)

plot_distance_distribution(label=None, shorten_AAs=False, defrag=None, ctc_cutoff_Ang=None, delete_anchor=False, xlim=None, **kwargs_histogram_w_smoothing_auto) Axes

Plot the distance distribution of this ContactPair

Parameters:
  • label (str or None, default is None) – Default behavior is to construct the label automatically using shorten_AAs, defrag, and ctc_cutoff_Ang, but any label can be passed here to override automatic label generation.

  • shorten_AAs (bool, default is False) – Shorten residue labels from e.g. GLU30 to E30

  • defrag (None or char) – None means do not defrag the contact label. A character, e.g. “@” means use this character to defrag

  • ctc_cutoff_Ang (float or None, default is None) – If float, use this cutoff to compute frequencies and add them to the labels. Also, draw a vertical line in the plot. Before the vertical line is drawn, it’s checked whether the plot already contains a similar line.

  • delete_anchor (bool, default is False) – If True (and possible), the anchor residue will be deleted from the label.

  • kwargs_histogram_w_smoothing_auto (dict) – Optional parameters for mdciao.plots.histogram_w_smoothing_auto, which are listed below

Other Parameters:
  • bins (int, default is 10) – Since this will be passed directly to numpy.histogram it can also take the same values that numpy.histogram_bin_edges can take.

  • ax (Axes or None, default is None) – The axis to draw onto. If None, the current axis will be used invoking gca. If there’s no current axis, one will be created.

  • smooth_bw (bool or float, default is True) – If True, smooth the histogram using a Gaussian-kernel-density estimation with an estimator bandwidth of .5 (Angstrom). If float, use this value as estimator bandwidth, check matplotlib.mlab.GaussianKDE for more info.

  • background (bool, or color-like, (str, hex, rgb), default is True) – When smoothing, the original curve can appear in the background in different colors

    • True: use a fainted version of color

    • False: don’t plot any background

    • color-like: use this color for the background, can be: str, hex, rgba, anything matplotlib.pyplot.colors understands

  • fill_below (bool, default is True) – Fill the area underneath the histogram with a shade of color

  • color (None or color-like, default is None) – Default behaviour is to take the next color of the color-cycle of the plot.

  • alpha_below (float, default is .25) – The are below the curve will be filled with this alpha (transparency) value. Only has an effect if fill_below is True

  • maxcount (bool or positive float, default is False) – Normalize when plotting the histogram, s.t. different datasets can be plotted together at the same height even with very different number of absolute counts. If True, counts will be normalized to the maximum number of counts, s.t. histograms will peak at 1. If any other positive value, that’s where the peak will be.

Returns:

ax – The axis (new or inputed) where the distribution has been plotted.

Return type:

Axes

plot_timetrace(ax=None, color_scheme=None, ctc_cutoff_Ang=None, switch_off_Ang=None, n_smooth_hw=0, dt=1, background=True, shorten_AAs=False, t_unit='ps', ylim_Ang=10, max_handles_per_row=4)

Plot this ContactPair’s timetraces for all trajs onto ax

Parameters:
  • ax (None, Axes) – The axis where to plot the timetrace. Default is to plot on the current axis, and if there’s no current axes, a new one will be created. If a new one is created, it’ll have the default width and height, you have to change it afterwards or create it beforehand with your desired size.

  • color_scheme (list, default is None) – Pass a list of colors, each one should be understandable by matplotlib.colors.is_color_like

  • ctc_cutoff_Ang (float or None, default is None) – The cutoff to use, in Angstrom. If None, don’t use any cutoff.

  • n_smooth_hw (int, default is 0) – Size, in frames, of half the window size of the smoothing window

  • dt (float, default is 1) – How many units of t_unit one frame represents

  • background (bool, or color-like, (str, hex, rgb), default is True) – When smoothing, the original curve can appear in the background in different colors

    • True: use a fainted version of color

    • False: don’t plot any background

    • color-like: use this color for the background,

    can be: str, hex, rgba, anything matplotlib.colors.is_color_like understands

  • shorten_AAs (bool, default is False) – Whether to shorten the AA labels

  • t_unit (str, default is ‘ps’) – The time unit with which to label the x-axis

  • ylim_Ang (float or “auto”) – The limit in Angstrom of the y-axis

  • max_handles_per_row (int, default is 4) – How many rows the legend can have

Returns:

ax – The axis with the plotted timetrace

Return type:

Axes

relative_frequency_of_formed_atom_pairs_overall_trajs(ctc_cutoff_Ang, switch_off_Ang=None, keep_resname=False, aggregate_by_atomtype=True, min_freq=0.05)

For those frames in which the contact is formed, group them by relative frequencies of individual atom pairs

Parameters:
  • ctc_cutoff_Ang (float) – Cutoff in Angstrom. The comparison operator is “<=”

  • keep_resname (bool, default is False) – Keep the atom’s residue name in its descriptor. Only make sense if consolidate_by_atom_type is False

  • aggregate_by_atomtype (bool, default is True) – Aggregate the frequencies of the contact by tye atom types involved. Atom types are backbone, sidechain or other (BB,SC, X)

  • min_freq (float, default is .05) – Do not report relative frequencies below this cutoff, e.g. “BB-BB”:.9, “BB-SC”:0.03, “SC-SC”:0.03, “SC-BB”:0.03 gets reported as “BB-BB”:.9

Returns:

out_dict – Relative freqs, keyed by atom-type (atoms) involved in the contact The order is the same as in self.ctc_labels

Return type:

dictionary

retop(top, mapping, deepcopy=False, **CP_kwargs)

Return a copy of this object with a different topology.

Uses the mapping to generate new residue- and and atom-indices where necessary, using the rest of the object’s attributes (time-traces, labels, colors, fragments…) as they were.

Note

This method will (rightly) fail if:
  • the mapping doesn’t contain the needed residues

  • the individual atoms of those residues cannot be uniquely mapped between topologies

Parameters:
  • top (Topology) – The new topology
    • mapping (indexable (array, dict, list)) – A mapping of old residue indices to new residue indices. Usually, comes from aligning the old and the new topology using mdciao.utils.sequence.maptops. These maps only contain (key,value) pairs whenever there’s been a “match”, s.t this method will fail if maping doesn’t contain all the residues in this ContactPair.

    • deepcopy (bool, default is False) – Use copy.deepcopy on the attributes when creating the new ContactPair. If False, the identity holds:

      >>> self.residues.consensus_labels is CP.residues.consensus_labels
      

      If True, only the equality holds:

      >>> self.residues.consensus_labels == CP.residues.consensus_labels
      

      Note that time_traces are always created new no matter what.

    • CP_kwargs (dict) – Optional keyword arguments to instantiate the new ContactPair. Any key-value pairs

      inputted here will update the internal dictionary being used, which is:

>>>  {
"top": top,
"trajs": self.time_traces.trajs,
"fragment_idxs": self.fragments.idxs,
"fragment_names": self.fragments.names,
"fragment_colors": self.fragments.colors,
"anchor_residue_idx": anchor_residue_index,
"consensus_labels": self.residues.consensus_labels
}
Returns:

CP – A new CP with updated top and indices

Return type:

ContactPair

save(filename)

Save this ContactPair as a pickle

Parameters:

filename (str) – filename

property stacked_time_traces: ndarray

The time-traces of the contact distance for each trajectory stacked into one array

Returns:

stacked_time_traces

Return type:

np.ndarray

property time_max

rtype: int or float, maximum time from list of list of time

property time_min

rtype: int or float, maximum time from list of list of time

property time_traces

Contains time-traces stored as a _TimeTraces objects

property top
property topology