mdciao.contacts.ContactGroup

class mdciao.contacts.ContactGroup(list_of_contact_objects, interface_fragments=None, top=None, name=None, neighbors_excluded=None, use_AA_when_conslab_is_missing=True, max_cutoff_Ang=None)

Container for ContactPair-objects

This class is the second level of abstraction after ContactPair and provides methods to

  • perform operations on all the contact-pairs simultaneously and

  • plot/show/save the result of these operations

In many cases, the methods of ContactGroup thinly wrap and iterate around equally named methods of the ContactPair-objects.

Note

Higher-level methods in the API, like those exposed by mdciao.cli will return ContactPair or ContactGroup objects already instantiated and ready to use. It is recommened to use those instead of individually calling ContactPair or ContactGroup.

__init__(list_of_contact_objects, interface_fragments=None, top=None, name=None, neighbors_excluded=None, use_AA_when_conslab_is_missing=True, max_cutoff_Ang=None)
Parameters:
  • list_of_contact_objects (list) – List of ContactPair objects. Will be accesseible at ContactGroup.contact_pairs.

  • interface_fragments (list of two iterables of indexes, default is None) – An interface is defined by two groups of residue indices.

    This input doesn’t need to have all or any of the residue indices in res_idxs_pairs.

    This input will be used to group the object’s own residue idxs present in residxs_pairs into the two groups of the interface. These two groups will be accessible through the attribute self.interface_residxs

    It will remain accessible through the object’s equally named the attribute self.interface_fragments

  • top (Topology, default is None) – The molecular topology associated with this object. Normally, the default behaviour is enough. It checks whether all ContactPairs of list_of_contact_objects share the same self.top and use that one. If they have different topologies, the method fails, since you can’t instantiate a ContactGroup with ContactPairs from different topologies. In case the ContactPairs don’t have any topology at all (self.top is None for all ContactPairs) you can pass one here. Or, if they have one, and you pass one here, it will be checked that top provided here coincides with the ContactPairs’ shared topology

  • name (string, default is None) – Optional name you want to give this object, ATM it is only used for the title of the ContactGroup.plot_distance_distributions title when the object is not a neighborhood

  • neighbors_excluded (int, default is None) – The neighbors excluded when creating the underlying ContactPairs passed in list_of_contact_objects

  • max_cutoff_Ang (float, default is None) – Operations involving cutoffs higher than this will be forbidden and will raise ValueError. Prevents the user from asking for contact-frequencies that aren’t present in the ContactGroup

Methods

__init__(list_of_contact_objects[, ...])

Parameters:
  • list_of_contact_objects (list) -- List of ContactPair objects.

archive([filename])

Save this ContactGroup's list of ContactPairs as a list of dictionaries that can be used to re-instantiate an equivalent ContactGroup

binarize_trajs(ctc_cutoff_Ang[, ...])

Binarize trajs

copy()

copy this object by re-instantiating another ContactGroup object with the same attributes.

distribution_dicts([bins])

Wraps around the method ContactGroup.distributions_of_distances and returns one distribution dict keyed by contact label

frequency_as_contact_matrix(ctc_cutoff_Ang)

Returns a symmetrical, square matrix of size top.n_residues containing the frequencies of the pairs in residxs_pairs, and those pairs only, the rest will be NaNs

frequency_as_contact_matrix_CG(ctc_cutoff_Ang)

Coarse-grained contact-matrix

frequency_dataframe(ctc_cutoff_Ang[, ...])

Output a formatted dataframe with fields "label", "freq" and "sum", optionally dis-aggregated by type of contact by atom types

frequency_delta(otherCG, ctc_cutoff_Ang[, ...])

Compute per-contact frequency differences between self and some other ContactGroup

frequency_dict_by_consensus_labels(...[, ...])

Return frequencies as a dictionary of dictionaries keyed by consensus labels

frequency_dicts(ctc_cutoff_Ang[, sort_by_freq])

Wraps around the method ContactPair.frequency_dict of each of the underlying ContactPair s and returns one frequency dict keyed by contact label

frequency_per_contact(ctc_cutoff_Ang[, ...])

Frequency per contact over all trajs :Parameters: * ctc_cutoff_Ang (float) -- The cutoff to use * switch_off_Ang (float, default is None) -- TODO

frequency_per_traj(ctc_cutoff_Ang[, ...])

Frequency per contact, per-trajectory, over all trajectory

frequency_spreadsheet(sheet1_dataframe, ...)

Write an Excel file with the Dataframe that is returned by self.frequency_dataframe.

frequency_str_ASCII_file(idf[, ascii_file])

Create a string with the frequencies from a DataFrame

frequency_sum_per_residue_idx_dict(...[, ...])

Dictionary of aggregated frequency_per_contact per residue indices Values larger than 1 are possible, example if [0,1], [0,2] are always formed (=1) freqs_dict[0]=2

frequency_sum_per_residue_names(ctc_cutoff_Ang)

Aggregate the frequencies of frequency_per_contact by residue name, using the most informative names possible, see residx2resnamefragnamebest for more info on this

frequency_table(ctc_cutoff_Ang, fname[, ...])

Print and/or save frequencies as a formatted table

frequency_to_bfactor(ctc_cutoff_Ang, ...[, ...])

Save the contact frequency aggregated by residue to a pdb file

gen_ctc_labels(**kwargs)

Generate a labels with different parameters

interface_frequency_matrix(ctc_cutoff_Ang[, ...])

Rectangular matrix of size (N,M) where N is the length of the first list of interface_residxs and M the length of the second list of interface_residxs.

n_ctcs_timetraces(ctc_cutoff_Ang[, ...])

time-traces of the number of contacts, by summing overall contacts for each frame

plot_distance_distributions([bins, xlim, ...])

Plot distance distributions for the distance trajectories of the contacts

plot_freqs_as_bars(ctc_cutoff_Ang[, ...])

Plot a contact frequencies as a bar plot

plot_freqs_as_flareplot(ctc_cutoff_Ang[, ...])

Produce contact flareplots by wrapping around mdciao.flare.freqs2flare

plot_frequency_sums_as_bars(ctc_cutoff_Ang, ...)

Bar plot with per-residue sums of frequencies (called Sigma in mdciao)

plot_interface_frequency_matrix(ctc_cutoff_Ang)

Plot the interface_frequency_matrix

plot_neighborhood_freqs(ctc_cutoff_Ang[, ...])

Wrapper around ContactGroup.plot_freqs_as_bars for plotting neighborhoods

plot_timedep_ctcs([panelheight, ...])

For each trajectory, plot the time-traces of the all the contacts (one per panel) and/or the timetrace of the overall number of contacts

plot_timedep_ctcs_matrix(ctc_cutoff_Ang[, ...])

Per-trajectory time-traces of the formed contacts, shown as binary traces, i.e. formed or not formed.

plot_violins([sort_by, ctc_cutoff_Ang, ...])

Plot residue-residue distances as violin plots violinplot

relabel_consensus([new_labels])

Relabel any residue missing its consensus label to shortAA

relative_frequency_formed_atom_pairs_overall_trajs(...)

Relative frequencies interaction-type (by atom-type) for all contact-pairs in the ContactGroup

repframes([scheme, ctc_cutoff_Ang, ...])

Find representative frames for this ContactGroup

residx2ctcidx(idx)

Indices of the contacts and the position (0 or 1) in which the residue with residue idx appears

residx2resnamefragnamebest([fragsep, ...])

Dictionary mapping residue indices to best possible residue+fragment label

retop(top, mapping[, deepcopy])

Return a copy of this object with a different topology.

save(filename)

Save this ContactGroup as a pickle

save_trajs(prepend_filename, ext[, ...])

Save time-traces to disk.

select_by_frames(frames)

Return a copy this ContactGroup, but with a sub-selection of trajectories and frames.

select_by_residues([CSVexpression, ...])

Return a copy this ContactGroup, but with a sub-selection of ContactGroup.contact_pairs based on residues.

to_ContactGroups_per_traj()

Break this ContactGroup (potentially containing many trajectories) into individual, per-trajectory ContactGroups

Attributes

anchor_fragment_color

The color associated with the fragment of the anchor residue

anchor_res_and_fragment_str

Label of the anchor residue of this neighborhood, including fragment

anchor_res_and_fragment_str_short

Label of the anchor residue (short) of this neighborhood, including fragment

consensus_labels

List of pairs of labels derived from GPCR, CGN or other type of consensus nomenclature.

consensuslabel2resname

Dictionary mapping consensus labels to residue names:

contact_pairs

List of ContactPair objects composing this ContactGroup

ctc_labels

List of simple labels (no fragment info) for the residue pairs in ContactPairs

ctc_labels_short

List of simple labels (no fragment info, short AAs) for the residue pairs in ContactPairs

ctc_labels_w_fragments_short_AA

List of labels ) for the residue pairs in ContactPairs

fragment_names_best

Best possible fragment names for the residue pairs in ContactPairs

interface_fragments

Two residue lists provided at initialization

interface_labels_consensus

Consensus labels of whatever residues interface_residxs holds.

interface_residue_names_w_best_fragments_short

Best possible residue@fragment string for the residues in interface_residxs

interface_residxs

The residues of self.res_idxs_pairs grouped into two lists, depending on what self.interface_fragments they belong to

interface_reslabels_short

Residue labels of whatever residues interface_residxs holds

is_interface

Whether this ContactGroup can be interpreted as an interface.

is_neighborhood

Whether this ContactGroup is a neighborhood or not

max_cutoff_Ang

Operations involving cutoffs higher than this will be forbidden and wil raise ValueError.

maxima

Per-contact maximum values over all distance time-traces

means

Per-contact mean values over all distance time-traces

minima

Per-contact minimum values over all distance time-traces

modes

//en.wikipedia.org/wiki/Mode_(statistics)>`_ over all distance time-traces

n_ctcs

The number of contact pairs (mdciao.contacts.ContactPair -objects) stored in this object

n_frames

List of per-trajectory n_frames

n_frames_total

Total number of frames

n_trajs

The number of trajectories contained in this ContactGroup

name

The name of this ContactGroup, given when creating it

neighbors_excluded

The number of neighbors that were excluded when creating this ContactGroup

partner_fragment_colors

The colors associated with the fragments of the anchor partner residues

partner_res_and_fragment_labels

List of labels the partner (not anchor) residues of this neighborhood, including fragment

partner_res_and_fragment_labels_short

List of labels (short) the partner (not anchor) residues of this neighborhood, including fragment

res_idxs_pairs

Pairs of residue indices of the contacts in this object

residue_names_long

Pairs of long residue names of the ContactPairs

residue_names_short

Pairs of short residue names of the ContactPairs

residx2consensuslabel

Dictionary mapping residue indices to consensus labels:

residx2fragnamebest

Dictionary mapping residue indices to best possible fragment names

residx2resnamelong

Dictionary mapping residue indices to short residue names:

residx2resnameshort

Dictionary mapping residue indices to short residue names:

shared_anchor_residue_index

The index of the anchor residue, i.e. the residue at the center of this neighborhood.

stacked_time_traces

All ContactPair time_traces stacked into an 2D np.array

time_arrays

The time-arrays of each trajectory contained in this ContactGroup

time_max

Maximum time-value of the ContactGroup

time_min

Minimum time-value of the ContactGroup

top

The topology used to instantiate the ContactPairs in this ContactGroup

topology

The topology used to instantiate the ContactPairs in this ContactGroup

trajlabels

List of trajectory labels shared by all ContactGroup.contact_pairs.

property anchor_fragment_color: str

The color associated with the fragment of the anchor residue

Two fragment colors were given to the individual ContactPairs that were used to instantiate this ContactGroup. These colors might have been passed by the user themselves or given by default e.g. by mdciao.cli._parse_coloring_options. Check the defaults there

Will fail if self.is_neighborhood is False

>>> CG = mdciao.examples.ContactGroupL394()
>>> CG.anchor_fragment_color

‘tab:blue’

color : str

property anchor_res_and_fragment_str: str

Label of the anchor residue of this neighborhood, including fragment

Will fail if self.is_neighborhood is False

>>> CG = mdciao.examples.ContactGroupL394()
>>> CG.anchor_res_and_fragment_str
'LEU394@G.H5.26'
Returns:

label

Return type:

str

property anchor_res_and_fragment_str_short: str

Label of the anchor residue (short) of this neighborhood, including fragment

Will fail if self.is_neighborhood is False

>>> CG = mdciao.examples.ContactGroupL394()
>>> CG.anchor_res_and_fragment_str_short
'L394@G.H5.26'
Returns:

label

Return type:

str

archive(filename=None, **kwargs)

Save this ContactGroup’s list of ContactPairs as a list of dictionaries that can be used to re-instantiate an equivalent ContactGroup

The method ContactGroup.save creates a pickle that has a lot of redundant information

Parameters:

filename (str, default is None) – Has to end in “npy”. Default is to return the dictionary

Other Parameters:

kwargs (dict) – Optional parameters for mdciao.contacts.ContactPair._serialized_as_dict

Returns:

archive

Return type:

dict

binarize_trajs(ctc_cutoff_Ang, switch_off_Ang=None, order='contact')

Binarize trajs

Parameters:
  • ctc_cutoff_Ang (float) – The cutoff to use

  • switch_off_Ang (float, default is None) – Implements a linear switchoff from ctc_cutoff_Ang to ctc_cutoff_Ang`+`switch_off_Ang. E.g. if the cutoff is 3 Ang and the switch is 1 Ang, then

    • 3.0 -> 1.0

    • 3.5 -> .5

    • 4.0 -> 0.0

  • order (str, default is “contact”) – Sort first by contact, then by traj index. Alternative is “traj”, i.e. sort first by traj index, then by contact

  • TODO (change the name “binarize”)

Returns:

bintrajs – if order==traj, each item of the list is a 2D np.ndarray with of shape(Nt,n_ctcs), where Nt is the number of frames of that trajectory

Return type:

list of boolean arrays

property consensus_labels: list

List of pairs of labels derived from GPCR, CGN or other type of consensus nomenclature.

They were parsed at initialization

>>> CG = mdciao.examples.ContactGroupL394()
>>> CG.consensus_labels
[['G.H5.21', 'G.H5.26'],
 ['G.H5.26', '6.32'],
 ['G.H5.20', 'G.H5.26'],
 ['G.H5.26', '5.69'],
 ['G.H5.17', 'G.H5.26']]
Returns:

consensus_labels

Return type:

list

property consensuslabel2resname: dict

Dictionary mapping consensus labels to residue names:

>>> CG = mdciao.examples.ContactGroupL394()
>>> CG.consensuslabel2resname
{'G.H5.21': 'R389',
 'G.H5.26': 'L394',
 '6.32': 'K270',
 'G.H5.20': 'L388',
 '5.69': 'L230',
 'G.H5.17': 'R385'}
Returns:

consensuslabel2resname

Return type:

dict

property contact_pairs

List of ContactPair objects composing this ContactGroup

Gives direct access for (expert) users to manipulate, plot, save, individual ContactPair objects

The order of these ContactPair objects is the order the list_of_contact_objects passed to this ContactGroup at initialization.

Returns:

contact_pairs – List of ContactPair objects

Return type:

list

copy()

copy this object by re-instantiating another ContactGroup object with the same attributes.

In theory self == self.copy() should hold, but not self is self.copy()

Returns:

CG

Return type:

ContactGroup

property ctc_labels: list

List of simple labels (no fragment info) for the residue pairs in ContactPairs

>>> CG = mdciao.examples.ContactGroupL394()
>>> CG.ctc_labels
['ARG389-LEU394',
 'LEU394-LYS270',
 'LEU388-LEU394',
 'LEU394-LEU230',
 'ARG385-LEU394']

Returns:

ctc_labels : list

property ctc_labels_short: list

List of simple labels (no fragment info, short AAs) for the residue pairs in ContactPairs

>>> CG = mdciao.examples.ContactGroupL394()
>>> CG.ctc_labels_short
['R389-L394',
 'L394-K270',
 'L388-L394',
 'L394-L230',
 'R385-L394']

Returns:

ctc_labels_short : list

property ctc_labels_w_fragments_short_AA: list

List of labels ) for the residue pairs in ContactPairs

>>> CG = mdciao.examples.ContactGroupL394()
>>> CG.ctc_labels_short
['R389@G.H5.21-L394@G.H5.26',
 'L394@G.H5.26-K270@6.32',
 'L388@G.H5.20-L394@G.H5.26',
 'L394@G.H5.26-L230@5.69',
 'R385@G.H5.17-L394@G.H5.26']

Returns:

ctc_labels_w_fragments_short_AA : list

distribution_dicts(bins=10, **kwargs)

Wraps around the method ContactGroup.distributions_of_distances and returns one distribution dict keyed by contact label

Parameters:
  • bins (int or sequence of scalars or str, optional, default is 10) – If bins is an int, it defines the number of equal-width bins in the given range (10, by default). If bins is a sequence, it defines a monotonically increasing array of bin edges, including the rightmost edge, allowing for non-uniform bin widths.

  • kwargs (dict) – Optional keyword arguments for ContactPair.label_flex, which are listed below

Other Parameters:
  • AA_format (str, default is “short”) –

    Amino-acid format for the label, can be
    • “short”: A35@4.55

    • “long”: ALA35@4.50

    • “just_consensus”: 4.50 if consensus labels are present, else fail

    • “try_consensus”: 4.50 if consensus labels are present, else

    fallback to “short”

  • pad_label (bool, default is True) – Pad the labels with whitespace so that stacked contact labels become easier-to-read in plain ascii formats

  • defrag (char, default is None) – Character to use when defragging the contact label. Default is to leave them as is, e.g. would be “@”

  • fmt1 (str, default is “%-15s”) – Specify how the labels of res1 should be formatted. Only has effect if pad_label is True

  • fmt2 (str, default is “%-15s”) – Specify how the labels of res2 should be formatted. Only has effect if pad_label is True

Returns:

fdict

Return type:

dictionary

property fragment_names_best: list

Best possible fragment names for the residue pairs in ContactPairs

The fragment name will try to pick the consensus nomenclature. If no consensus label for the residue exists, the actual fragment names are used as fallback (which themselves fallback to the fragment index)

Only if no consensus label, no fragment name and no fragment indices are there, will this yeild “None” as a string.

>>> CG = mdciao.examples.ContactGroupL394()
>>> CG.fragment_names_best
[['G.H5.21', 'G.H5.26'],
 ['G.H5.26', '6.32'],
 ['G.H5.20', 'G.H5.26'],
 ['G.H5.26', '5.69'],
 ['G.H5.17', 'G.H5.26']]

Returns:

fragment_names_best : list

frequency_as_contact_matrix(ctc_cutoff_Ang, switch_off_Ang=None)

Returns a symmetrical, square matrix of size top.n_residues containing the frequencies of the pairs in residxs_pairs, and those pairs only, the rest will be NaNs

If top is None the method will fail.

Note

This is NOT the full contact matrix unless all necessary residue pairs were used to construct this ContactGroup

Parameters:
  • ctc_cutoff_Ang (float) – The cutoff to use

  • switch_off_Ang (float, default is None) – TODO

Returns:

mat

Return type:

numpy.ndarray

frequency_as_contact_matrix_CG(ctc_cutoff_Ang, switch_off_Ang=None, fragments=None, fragment_names=None, consensus_labelers=None, verbose=False, sparse=False, interface=False, zero_freq=0.01, dec_round=3, return_fragments=False)

Coarse-grained contact-matrix

Frequencies of self.frequency_per_contact get coarse-grained into fragments. Fragment definitions come from fragments and/or from the consensus_labelers. These definitions need to contain all residues in self.res_idxs_pairs

User-defined and consensus-derived fragment definitions get spliced together using splice_orphan_fragments. This might lead to sub-sets of the input fragments getting re-labeled as “subfrags” and residues not defined anywhere being labelled “orphans”. This leads to cumbersome fragment names (and can change in the future), but at least its “traceable” for the moment

If you want to have the fragment definitions, use return_fragments = True

Anytime some argument leads to a row/column being deleted from the output, the matrix is returned as an annotated DataFrame, to be able to provide row/columns with names and keep track of their meaning

If interface is True and this ContactGroup is indeed an interface, the matrix will be asymmetric.

If :self:`top` is None the method will fail.

Note

This is NOT the full contact matrix unless all necessary residue pairs were used to construct this ContactGroup

Parameters:
  • ctc_cutoff_Ang (float) – The cutoff to use

  • switch_off_Ang (float, default is None) – TODO

  • fragments (dict) – The fragment definitions

  • fragment_names (iterable of strings, default is None) – The names of the fragments

  • consensus_labelers (list, default is None) – It has to contain LabelerConsensus-objects, where the fragments are obtained from.

  • verbose (bool, default is False) – Be verbose

  • sparse (bool, default is False) – Delete rows and columns where all elements are < zero_freq. Since the row/column indices lose their meaning this way, a DataFrame with named row/columns is returned instead of an array If no fragment_names are passed, some will be created.

  • interface (bool, default is False) – If True, an asymmetric matrix is reported, with rows and columns representing fragments on each side of the interface, respectively. Since this is done using self.interface_residxs, and not all input fragments are necessarily contained therein, interface=True introduces a sparsity, which makes the return type be a DataFrame (see above)

  • zero_freq (float, default is 0.2) – Only has effect when sparse is True. The cutoff for a frequency to be considered zero

  • dec_round (int, default is 3) – The number of decimals to round to when reporting results. It’s assumed the CG matrix doesn’t need much precision beyond this

  • return_fragments (bool, default is False) – Whether to return the fragments that the input produced.

Returns:

  • mat (numpy.ndarray or DataFrame) – The coarse-grained contact matrix

  • fragments (dict) – The fragment definitions

frequency_dataframe(ctc_cutoff_Ang, switch_off_Ang=None, atom_types=False, sort_by_freq=False, **ctc_fd_kwargs)

Output a formatted dataframe with fields “label”, “freq” and “sum”, optionally dis-aggregated by type of contact by atom types

Parameters:
  • ctc_cutoff_Ang (float) – The cutoff to use

  • switch_off_Ang (float, default is None) – TODO

  • atom_types (bool, default is false) – Include the relative frequency of atom-type-pairs involved in the contact

  • sort_by_freq (bool, default is False) – Sort by descending frequency value, default is to keep the order of self.contact_pairs

  • ctc_fd_kwargs (named optional arguments) – Optional parameters for mdciao.ContactPair.frequency_dict, which are listed below.

Other Parameters:
  • AA_format (str, default is “short”) –

    Amino-acid format for the label, can be
    • “short”: A35@4.55

    • “long”: ALA35@4.50

    • “just_consensus”: 4.50 if consensus labels are present, else fail

    • “try_consensus”: 4.50 if consensus labels are present, else

    fallback to “short”

  • pad_label (bool, default is True) – Pad the labels with whitespace so that stacked contact labels become easier-to-read in plain ascii formats

  • defrag (char, default is None) – Character to use when defragging the contact label. Default is to leave them as is, e.g. would be “@”

  • fmt1 (str, default is “%-15s”) – Specify how the labels of res1 should be formatted. Only has effect if pad_label is True

  • fmt2 (str, default is “%-15s”) – Specify how the labels of res2 should be formatted. Only has effect if pad_label is True

Returns:

df

Return type:

pandas.DataFrame

frequency_delta(otherCG, ctc_cutoff_Ang, residuemap=None)

Compute per-contact frequency differences between self and some other ContactGroup

The difference is defined as

\(\Delta_{AB} = freq_B - freq_A\),

i.e. the delta that occurs upon “reacting” from self to otherCG

No sanity checks are performed, residue indices are assumed to have the same meaning in both self and otherCG, unless residuemap is provided.

Parameters:
  • otherCG (ContactGroup) – The ContactGroup to compute the difference with

  • ctc_cutoff_Ang (float) – The cutoff to use to compute the frequencies

  • residuemap (dict) – Maps residue indices of otherCG to residue indices of self, in case self and are different topologies.

    >>> residuemap[0]=20
    

    Means the residue with the index 0 in otherCG is the residue with the index 20 in this ContactGroup. (self)

    Residues of otherCG absent of residuemap are un-mappable to self and thus their associated frequencies ignored, so beware of incomplete maps.

Returns:

  • delta_freq (1D np.ndarray) – The value resulting from doing otherCG.frequency_per_contact(ctc_cutoff_Ang)-self.frequency_per_ctc(ctc_cutoff_Ang

  • res_idxs_pairs (2D np.ndarray of len(delta_freq)) – The res_idxs_pairs for the delta_freq values

frequency_dict_by_consensus_labels(ctc_cutoff_Ang, switch_off_Ang=None, return_as_triplets=False, sort_by_interface=False, include_trilower=False)

Return frequencies as a dictionary of dictionaries keyed by consensus labels

Note

Will fail if not all residues have consensus labels TODO this is very similar to frequency_sum_per_residue_names, look at the usecase closesely and try to unify both methods

Parameters:
  • ctc_cutoff_Ang (float) – The cutoff to use

  • switch_off_Ang (float, default is None) – TODO

  • return_as_triplets (bool, default is False) – Return as the dictionary as a list of triplets, s.t. freq_dict[3.50][4.50]=.25 is returned as [[3.50,4.50,.25]] Makes it easier to iterate through in other methods

  • sort_by_interface (bool, default is False) – Not implemented AT, will raise NotImplementedError

  • include_trilower (bool, default is False) – Include the transposed indexes in the returned dictionary. s.t. the contact pair [3.50][4.50]=.25 also generates [4.50][3.50]=.25

Returns:

freqs

Return type:

dictionary of dictionary or list of triplets (if return_as_triplets is True)

frequency_dicts(ctc_cutoff_Ang, sort_by_freq=False, **kwargs)

Wraps around the method ContactPair.frequency_dict of each of the underlying ContactPair s and returns one frequency dict keyed by contact label

Parameters:
  • ctc_cutoff_Ang (float) – Cutoff in Angstrom. The comparison operator is “<=”

  • sort_by_freq (bool, default is False) – Sort by descending frequency. Default is to return in the same order as ContactGroup._contacts

  • kwargs (optional keyword arguments for) – ContactPair.frequency_dict, which are listed below:

Other Parameters:

%(substitute_kwargs)s

Returns:

fdict

Return type:

dictionary

frequency_per_contact(ctc_cutoff_Ang, switch_off_Ang=None)

Frequency per contact over all trajs :Parameters: * ctc_cutoff_Ang (float) – The cutoff to use

  • switch_off_Ang (float, default is None) – TODO

Returns:

freqs

Return type:

1D np.ndarray of len(n_ctcs)

frequency_per_traj(ctc_cutoff_Ang, switch_off_Ang=None) ndarray

Frequency per contact, per-trajectory, over all trajectory

Wraps around mdciao.contacts.ContactPair.frequency_per_traj

Parameters:
  • ctc_cutoff_Ang (float) – The cutoff to use

  • switch_off_Ang (float, default is None) – TODO

Returns:

freqs – Shape (n,m) is (self.n_trajs, self.n_ctcs)

Return type:

np.ndarray

frequency_spreadsheet(sheet1_dataframe, sheet2_dataframes, ctc_cutoff_Ang, fname_excel, sheet1_name='pairs by frequency', sheet2_name='residues by frequency')

Write an Excel file with the Dataframe that is returned by self.frequency_dataframe.

Parameters:
  • sheet1_dataframe (DataFrame) – Normally, these are pairwise frequencies

  • sheet2_dataframes (list) – Contains DataFrame objects with per-residue frequencies

  • ctc_cutoff_Ang (float) – The cutoff used

  • fname_excel (str) – The filename to save to

  • sheet1_name (str, default is “pairs by frequency”,)

  • sheet2_name (str, default is ‘residues by frequency’)

frequency_str_ASCII_file(idf, ascii_file=None)

Create a string with the frequencies from a DataFrame

Parameters:
  • idf (DataFrame) – A frequency table, typically generated by self.frequency_dataframe

  • ascii_file (str, default is None) – Instead of returning the formatted a table as a string, provided a filename here and write the frequencies will be directly written to it

Returns:

freq_str

Return type:

str or None

frequency_sum_per_residue_idx_dict(ctc_cutoff_Ang, switch_off_Ang=None, sort_by_freq=True, return_array=False)

Dictionary of aggregated frequency_per_contact per residue indices Values larger than 1 are possible, example if [0,1], [0,2] are always formed (=1) freqs_dict[0]=2

Parameters:
  • ctc_cutoff_Ang (float) – The cutoff to use

  • switch_off_Ang (float, default is None) – TODO

  • sort_by_freq (bool, default is True) – Sort the dictionary by descending order of frequency. If False, it will be sorted by residue index. sort_by_freq only has effect if return_array is False

  • return_array (bool, default is False) – If True, the return value is not a dict but an array of len(self.top.n_residues). In this case, sort_by_freq doesn’t have any effect.

Returns:

freqs_dict – If dict, keys are the residue indices present in res_idxs_pairs If array, idxs are the residue indices of self.top

Return type:

dictionary or array

frequency_sum_per_residue_names(ctc_cutoff_Ang, switch_off_Ang=None, sort_by='freq', AA_format='short', list_by_interface=False, return_as_dataframe=False)

Aggregate the frequencies of frequency_per_contact by residue name, using the most informative names possible, see residx2resnamefragnamebest for more info on this

Parameters:
  • ctc_cutoff_Ang (float) – The cutoff to use

  • switch_off_Ang (float, default is None) – TODO

  • sort_by (str or None, default is None) – The frequencies are returned by default in the order in which the ContactPair-objects are stored in the ContactGroup.contact_pairs. This order depends on the ctc_cutoff_Ang originally used to instantiate this ContactGroup You can re-sort them for display purposes, leaving the original order untouched, via:

    • `sort_by`=’freq’

      Use the ctc_cutoff_Ang provided here to recompute new frequencies and sort the contacts in ascending order

    • `sort_by`=’residue’ or ‘numeric’

      Sort by ascending residue number. Currently limited to `AA_format`=”short” or “long” (see below).

  • AA_format (str, default is ‘short’) – Use E30@3.50 instead of GLU30@3.50. Alternatives are:

    • “long”: GLU30@3.50

    • “just_consensus”: 3.50, fail if none is found

    • “try_consensus”: 3.50, fallback to “short” if none is found

  • list_by_interface (bool, default is False) – group the freq_dict by interface residues. Only has an effect if self.is_interface

  • return_as_dataframe (bool, default is False) – Return an DataFrame with the column names labels and freqs

Returns:

res – list of dictionaries (or dataframes). If list_by_interface is True, then the list has two items, default (False) is to be of len=1

Return type:

list

frequency_table(ctc_cutoff_Ang, fname, switch_off_Ang=None, write_interface=True, sort_by_freq=False, **freq_dataframe_kwargs)

Print and/or save frequencies as a formatted table

Internally, it calls frequency_spreadsheet and/or frequency_str_ASCII_file depending on the extension of fname

If you want a DataFrame use frequency_dataframe

Parameters:
  • ctc_cutoff_Ang (float) – The cutoff to use

  • fname (str or None) – Full path to the desired filename Spreadsheet extensions are currently only ‘.xlsx’, all other extensions save to formatted ascii. None returns the formatted ascii string.

  • switch_off_Ang (float, default is None) – TODO

  • write_interface (bool, default is True) – Only has effect if self.is_interface is True A second sheet will be added to the table where residues are sorted by interface membership and per-residue interface participation.

  • sort_by_freq (bool, default is False) – Only has effect if self.is_interface is True and write_interface is True. Sort the second sheet by descending order of frequencies If False, residues are in ascending order within each member of the interface, as returned by self.interface_residxs

  • freq_dataframe_kwargs (dict) – Optional parameters for self.frequency_dataframe, which are listed below.

Other Parameters:
  • switch_off_Ang (float, default is None) – TODO

  • atom_types (bool, default is false) – Include the relative frequency of atom-type-pairs involved in the contact

  • sort_by_freq (bool, default is False) – Sort by descending frequency value, default is to keep the order of self.contact_pairs

  • AA_format (str, default is “short”) –

    Amino-acid format for the label, can be
    • “short”: A35@4.55

    • “long”: ALA35@4.50

    • “just_consensus”: 4.50 if consensus labels are present, else fail

    • “try_consensus”: 4.50 if consensus labels are present, else

    fallback to “short”

  • pad_label (bool, default is True) – Pad the labels with whitespace so that stacked contact labels become easier-to-read in plain ascii formats

  • defrag (char, default is None) – Character to use when defragging the contact label. Default is to leave them as is, e.g. would be “@”

  • fmt1 (str, default is “%-15s”) – Specify how the labels of res1 should be formatted. Only has effect if pad_label is True

  • fmt2 (str, default is “%-15s”) – Specify how the labels of res2 should be formatted. Only has effect if pad_label is True

Returns:

table – If fname is None, then return the table as formatted string, using

Return type:

None or str

frequency_to_bfactor(ctc_cutoff_Ang, pdbfile, geom, interface_sign=False, verbose=True)

Save the contact frequency aggregated by residue to a pdb file

Parameters:
  • ctc_cutoff_Ang (float) – The cutoff to use

  • pdbfile (str) – The path to the pdbfile to save the geom

  • geom (mdtraj.Trajectory) – Has to have the same topology as self.top

  • interface_sign (bool, default is False) – Give the bfactor values of the members of the interface different sign s.t. the appear with different colors in a visualizer

  • verbose (bool, default is True) – Inform of the file being saved

Returns:

bfactors

Return type:

1D np.array of len(self.top.n_atoms)

gen_ctc_labels(**kwargs) list

Generate a labels with different parameters

Wraps around mdciao.contacts.ContactPair.gen_label

Parameters:
  • AA_format (str, default is “short”) –

    Options are:
    • “short”: “E30@3.50

    • “long”: GLU30@3.50

    • “just_consensus”: 3.50, fail if none is found

    • “try_consensus”: 3.50, fallback to “short” if none is found

  • fragments (bool, default is False) – Include fragment information Will get the “best” information available, ie consensus>fragname>fragindex When trying to get consensus labels, this option is ignored, s.t. the full “E30@3.50” is returned regardless.

  • delete_anchor (bool, default is False) – Delete the anchor from the label

Returns:

labels

Return type:

list

property interface_fragments: list

Two residue lists provided at initialization

They are supersets of the residues contained in self.interface_residxs

Empty lists mean no residues were found in the interface defined at initialization

Returns:

interface_fragments

Return type:

list

interface_frequency_matrix(ctc_cutoff_Ang, switch_off_Ang=None)

Rectangular matrix of size (N,M) where N is the length of the first list of interface_residxs and M the length of the second list of interface_residxs.

Note

Pairs missing from res_idxs_pairs will be NaNs, to differentiate from those pairs that were present but have zero contact

Parameters:
  • ctc_cutoff_Ang (float) – The cutoff to use

  • switch_off_Ang (float, default is None) – TODO

Returns:

mat

Return type:

2D numpy.ndarray

property interface_labels_consensus

Consensus labels of whatever residues interface_residxs holds.

If there is no consensus labels, the corresponding label is None

property interface_residue_names_w_best_fragments_short

Best possible residue@fragment string for the residues in interface_residxs

In case neither a consensus label > fragment name > fragment index is found, nothing is returned after the residue name

property interface_residxs: list

The residues of self.res_idxs_pairs grouped into two lists, depending on what self.interface_fragments they belong to

Empty lists mean no residues were found in the interface defined at initialization

Returns:

interface_residxs

Return type:

list

property interface_reslabels_short

Residue labels of whatever residues interface_residxs holds

property is_interface

Whether this ContactGroup can be interpreted as an interface.

Note

If none of the residxs_pairs were found in the interface_residxs (both provided at initialization), this property will evaluate to False even if some indeces were parsed

property is_neighborhood: bool

Whether this ContactGroup is a neighborhood or not

When instantiating this ContactGroup, it is checked whether all the used ContactPair have a shared :obj:anchor_residue_idx attribute, whichand whether if self.neighbors_excluded is None. This means this ContactGroup is a neighborhood around the residue stored in the attribute self.shared_anchor_residue_index

Other neighborhood-only attributes get populated, e.g.
  • self.anchor_res_and_fragment_str

  • self.anchor_res_and_fragment_str_short

  • self.partner_res_and_fragment_labels

  • self.partner_res_and_fragment_labels_short

  • self.partner_fragment_colors

  • self.anchor_fragment_color

Note that all these attributes will raise an Exception when called if self.is_neighborhood is False

Returns:

is_neighborhood

Return type:

bool

property max_cutoff_Ang: float

Operations involving cutoffs higher than this will be forbidden and wil raise ValueError.

property maxima

Per-contact maximum values over all distance time-traces

Returns:

mean – No unit transformation is done, whatever was given at instantiation (most likely nanometers), is returned here

Return type:

1D np.array of len(self.n_ctcs)

property means

Per-contact mean values over all distance time-traces

Returns:

mean – No unit transformation is done, whatever was given at instantiation (most likely nanometers), is returned here

Return type:

1D np.array of len(self.n_ctcs)

property minima

Per-contact minimum values over all distance time-traces

Returns:

mean – No unit transformation is done, whatever was given at instantiation (most likely nanometers), is returned here

Return type:

1D np.array of len(self.n_ctcs)

property modes

//en.wikipedia.org/wiki/Mode_(statistics)>`_ over all distance time-traces

Note

In order to quickly compute modes, residue-residue distances are multiplied by 1000 and rounded to integers, to be able to use numpy.bincount for speed. Then, the argmax(bincount) is returned

Returns:

modes – No unit transformation is done, whatever was given at instantiation (most likely nanometers), is returned here

Return type:

1D np.array of len(self.n_ctcs)

Type:

Per-contact `modes <https

property n_ctcs: int

The number of contact pairs (mdciao.contacts.ContactPair -objects) stored in this object

Returns:

n_ctcs

Return type:

int

n_ctcs_timetraces(ctc_cutoff_Ang, switch_off_Ang=None)

time-traces of the number of contacts, by summing overall contacts for each frame

Parameters:
  • ctc_cutoff_Ang (float) – The cutoff to use

  • switch_off_Ang (float, default is None) – TODO

Returns:

nctc_trajs

Return type:

list of 1D np.ndarrays

property n_frames: list

List of per-trajectory n_frames

Returns:

n_frames

Return type:

list

property n_frames_total: int

Total number of frames

Returns:

n_frames_total

Return type:

int

property n_trajs: int

The number of trajectories contained in this ContactGroup

Returns:

n_trajs

Return type:

int

property name: str

The name of this ContactGroup, given when creating it

Returns:

name

Return type:

str

property neighbors_excluded: int

The number of neighbors that were excluded when creating this ContactGroup

Returns:

neighbors_excluded

Return type:

int

property partner_fragment_colors

The colors associated with the fragments of the anchor partner residues

The fragment colors were given as pairs of values to the individual ContactPairs that were used to instantiate this ContactGroup. These colors might have been passed by the user themselves or given by default e.g. by mdciao.cli._parse_coloring_options. Check the defaults there.

>>> CG = mdciao.examples.ContactGroupL394()
>>> CG.partner_fragment_colors
['tab:blue', 'tab:blue', 'tab:blue', 'tab:blue', 'tab:blue']

or

>>> CG = mdciao.examples.ContactGroupL394(fragment_colors=["red","blue","yellow","orange","black"])
>>> CG.partner_fragment_colors
['red', 'orange', 'red', 'orange', 'red']

Note

This colors are not automatically used by self.plot_neighborhood_freqs or self.plot_freqs_as_bars unless passed as color=self.partner_fragment_colors

Will fail if self.is_neighborhood is False

Returns:

color

Return type:

str

property partner_res_and_fragment_labels: list

List of labels the partner (not anchor) residues of this neighborhood, including fragment

Will fail if self.is_neighborhood is False

>>> CG = mdciao.examples.ContactGroupL394()
>>> CG.partner_res_and_fragment_labels
['ARG389@G.H5.21',
 'LYS270@6.32',
 'LEU388@G.H5.20',
 'LEU230@5.69',
 'ARG385@G.H5.17']
Returns:

labels

Return type:

list

property partner_res_and_fragment_labels_short: list

List of labels (short) the partner (not anchor) residues of this neighborhood, including fragment

Will fail if self.is_neighborhood is False

>>> CG = mdciao.examples.ContactGroupL394()
>>> CG.partner_res_and_fragment_labels_short
[‘R389@G.H5.21’,

‘K270@6.32’, ‘L388@G.H5.20’, ‘L230@5.69’, ‘R385@G.H5.17’]

labels : list

plot_distance_distributions(bins=10, xlim=None, ax=None, shorten_AAs=False, ctc_cutoff_Ang=None, legend_sort=True, label_fontsize_factor=1, max_handles_per_row=4, defrag=None, smooth_bw=False, background=True) Axes

Plot distance distributions for the distance trajectories of the contacts

The title will get try to get the name from self.name

Parameters:
  • bins (int, default is 10) – How many bins to use for the distribution

  • xlim (iterable of two floats, default is None) – Limits of the x-axis. Outlier can stretch the scale, this forces it to a given range

  • ax (Axes, default is None) – One will be created if None is passed

  • shorten_AAs (bool, default is False) – Use amino-acid one-letter codes

  • ctc_cutoff_Ang (float, default is None) – Include in the legend of the plot how much of the distribution is below this cutoff. A vertical line will be draw at this x-value nearest bonded neighbors were excluded

  • legend_sort (boolean, default is True) – Sort the legend in descending order of frequency. Has only an effect when ctc_cutoff_Ang is not None

  • label_fontsize_factor (int, default is 1) – Labels will be written in a fontsize rcParams[“font.size”] * label_fontsize_factor

  • max_handles_per_row (int, default is 4) – legend control

  • defrag (char, default is None) – Delete fragment labels from the residue labels, “G30@frag1”->”G30”. If None, don’t delete the fragment label

  • smooth_bw (bool or float) – If True smooth the histogram using a Gaussian-kernel-density estimation with an estimator bandwidth of .5 Angstrom. If float, use this value as estimator bandwidth, check matplotlib.mlab.GaussianKDE for more info. If False, don’t smooth

  • background (bool, or color-like, (str, hex, rgb), default is True) – When smoothing, the original curve can appear in the background in different colors

    • True: use a fainted version of color

    • False: don’t plot any background

    • color-like: use this color for the background, can be: str, hex, rgba, anything matplotlib.pyplot.colors understands

Returns:

ax

Return type:

Axes

plot_freqs_as_bars(ctc_cutoff_Ang, title_label=None, switch_off_Ang=None, xlim=None, ax=None, color='tab:blue', shorten_AAs=False, label_fontsize_factor=1, lower_cutoff_val=None, plot_atomtypes=False, sort_by=None, sum_freqs=True, total_freq=None, defrag=None, cumsum=False)

Plot a contact frequencies as a bar plot

Parameters:
  • ctc_cutoff_Ang (float) – The cutoff to use

  • title_label (str, default is None) – If None, the method will default to self.name If self.name is also None, the method will fail

  • switch_off_Ang (float, default is None) – TODO

  • xlim (float, default is None) – The right limit of the x-axis. +.5 will be added to this number to accommodate some padding around the bars. If None, it’s chosen automatically

  • ax (Axes, default is None) – Draw into this axis. If None is passed, then one will be created

  • shorten_AAs (bool, default is None) – Shorten residue labels from “GLU30” to “E30”

  • color (color-like (str or RGB triple) or list thereof, default is “tab:blue”) – The color for the bars. If string or RGB array, all bars will have this color. If list, it’s assumed in the order of the self.res_idx_pairs. It will get re-sorted according to sort, s.t. residues always have the same color not matter the order

  • shorten_AAs (bool, default is None) – Shorten residue labels from “GLU30” to “E30”

  • label_fontsize_factor (float, default is 1) – Labels will be written in a fontsize rcParams[“font.size”] * label_fontsize_factor

  • lower_cutoff_val (float, default is None) – Only plot frequencies above this value. Default is to plot all

  • plot_atomtypes (bool, default is False) – Use stripe-patterns to inform about the types of interactions (sidechain, backbone, etc)

  • sort_by (str or None, default is None) – The frequencies are by default plotted in the order in which the ContactPair-objects are stored in the ContactGroup.contact_pairs. This order depends on the ctc_cutoff_Ang originally used to instantiate this ContactGroup You can re-sort them for display purposes, leaving the original order untouched, via:

    • `sort_by`=’freq’

      Use the ctc_cutoff_Ang provided here to recompute new frequencies and sort the contacts in ascending order

    • `sort_by`=’residue’ or ‘numeric’

      Sort by ascending residue number

  • sum_freqs (bool, default is True) – Inform, in the legend and in the title, about the sum of frequencies/bar-heights being plotted

  • total_freq (float, default is None) – Add a line to the title informing about the fraction of the total_freq that’s being plotted in the figure. Only has an effect if sum_freqs is True

  • defrag (str, default is None) – Delete fragment labels from the residue labels, “G30@frag1”->”G30”. If None, don’t delete the fragment label

  • cumsum (bool, default is False) – Plot the cumulative frequency (aka cumsum, as in numpy.cumsum) as a faint dotted line in the graph. This quantity:

    • Is normalized to 1 s.t. the summed frequencies

    numerically coincide with the y-axis limit

    regardless of the value of truncate, which hides some of these. I.e. it might be that you don’t see the cummulative frequency fully arrive at 1 if some small contributions have been truncated

Returns:

ax

Return type:

Axes

plot_freqs_as_flareplot(ctc_cutoff_Ang, fragments=None, fragment_names=None, fragment_colors=None, consensus_maps=None, SS=None, scheme='auto', **kwargs_freqs2flare)

Produce contact flareplots by wrapping around mdciao.flare.freqs2flare

Note

The logic to assign fragments and colors can lead to unexpected behavior in cases where too much guess-work has to be done. If a particular combination of fragments and colors is desired but not achievable through this method, it is highly recommended the user uses mdciao.flare.freqs2flare directly and experiment there with parameter combinations. It is also a good idea to check out the notebook called “Controlling Flareplots”

Parameters:
  • ctc_cutoff_Ang (float) – The cutoff to use

  • fragments (string or list of iterables, default is None) – The way the topology is fragmented. Default is to put all residues in one fragment. This optarg can modify the behaviour of scheme=’all’, since residues absent from fragments will not be plotted, see below. If string, it will be passed as method to :obj:mdciao.fragments.get_fragments`, to get the fragments on the fly.

  • fragment_names (list of strings, default is None) – The fragment names, at least len(fragments)

  • fragment_colors (None or list of color-likes) – Will be used to give the fragments their colors, needs to be color-like and of len(fragments)

  • consensus_maps (list, default is None) –

    The items of this list are either:
    • indexables containing the consensus

      labels (strings) themselves. They need to be “gettable” by residue index, i.e. dict, list or array. Typically, one generates these maps by using mdciao.nomenclature.LabelerConsensus.top2labels.

    • mdciao.nomenclature.LabelerConsensus-objects

      When these objects are passed, their mdciao.nomenclature.LabelerConsensus.top2labels and mdciao.nomenclature.LabelerConsensus.top2fragments are called on-the-fly, generating not only the consensus labels but also the consensus fragments (i.e. subdomains) to further fragment the topology into sub-domains, like TM6 or G.H5. If fragments are parsed, they will be made compatible with the consensus fragments.

    If you want the consensus labels but not the sub-fragmentation, simply use the first option.

  • SS (secondary structure information, default is None) – Whether and how to include information about secondary structure. Can be many things:

    • triple of ints (CP_idx, traj_idx, frame_idx)

      Go to contact group CP_idx, trajectory traj_idx and grab this frame to compute the SS. Will read xtcs when necessary or otherwise directly grab it from a mdtraj.Trajectory in case it was passed. Ignores potential stride values. See ContactPair.time_traces for more info

    • True

      same as [0,0,0]

    • None or False

      Do nothing

    • mdtraj.Trajectory

      Use this geometry to compute the SS

    • string

      Path to a filename, of which only the first frame will be read. The SS will be computed from there. The file will be tried to read first without topology information (e.g. .pdb, .gro, .h5) will work, and when this fails, self.top will be passed (e.g. .xtc, .dcd)

    • array_like

      Use the SS from here, s.t. ss_inf[idx] gives the SS-info for the residue with that idx

  • scheme (str, default is ‘auto’) –

    How to decide which residues to plot
    • ‘all’

      plot as many residues as possible. E.g., if a self.topology is present, plot all its residues. This can be modified with fragments, see above. Using ‘all’ without any fragments means that the topology won’t be separated into interface fragments, even if it is an interface. Given that some of the topology (which the user insists on plotting) might not have been assigned to either side of the interface, it’s unclear how to proceed here.

    • ‘interface’:

      use only the fragments in self.interface_fragments. Will only work if self.is_interface is True

    • ‘auto’

      Uses self.is_interface to decide. If True, scheme is set to ‘interface’. If False, e.g. a residue neighborhood or a site, then scheme is set to ‘all’

    • ‘interface_sparse’:

      like ‘interface’, but using the input fragments to break self.interface_fragments (which are only two, by definition) further down into other fragments. Of these, show only the ones where at least one residue participates in the interface. If fragments is None, scheme=’interface’ and scheme=’interface_sparse’ are the same thing.

    • ‘residues’:

      plot only the residues present in self.res_idxs_pairs

    • ‘residues_sparse’ :

      plot only the residues that have a non-zero frequency

    • ‘consensus_sparse’:

      like ‘interface_sparse’, but leaving out sub-domains not participating in the interface with any contacts.For this, the consensus_maps need to be actual LabelerConsensus-objects

  • kwargs_freqs2flare (dict) – Optional keyword arguments for mdciao.flare.freqs2flare. Note that many of these kwargs will be overwritten internally by this method, mostly to accommodate the scheme+fragment+color combinations, but not only (please see the note above). These are the kwargs that this method manipulates internally and might be overwritten:

    • top, ss_array, fragments, fragment_names fragment_names, colors

    Note that some of values in kwargs_freqs2flare (in particular sparse_residues) might alter (with or w/o conflict) the scheme option. The full list of optional arguments is listed below

Other Parameters:
  • sparse_residues (boolean, default is False) – Show only those residues that appear in the initial res_idxs_pairs

    Note

    There is a development option for this argument where a residue list is passed, meaning, show these residues regardless of any other option that has been passed. Perhaps this changes in the future.

  • sparse_fragments (boolean, default is False) – Same as sparse_residues, but with fragments. When sparse_residues isn’t False, this option has no effect.

  • exclude_neighbors (int, default is 1) – Do not show contacts where the partners are separated by these many residues. If no top is passed, the neighborhood-condition is checked using residue serial-numbers, assuming the molecule only has one long peptidic-chain.

  • freq_cutoff (float, default is 0) – Contact frequencies lower than this value will not be shown

  • ax (Axes) – Parse an axis to draw on, otherwise one will be created using panelsize. In case you want to re-use the same circle of residues as a background to plot different sets of freqs, YOU HAVE TO USE THE SAME fragments and sparse values

    on all calls, else the

    bezier lines will be placed erroneously.

  • center (np.ndarray, default is [0,0]) – In axis units, where the flareplot will be centered around

  • r (float, default is 1) – In axis units, the radius of the flareplot

  • mute_fragments (iterable of integers, default is None) – Curves involving these fragments will be hidden. Fragments are expressed as indices of fragments

  • anchor_fragments (iterable of integers, default is None) – Curves not involving these fragments will be not be shown, i.e. it is the complementary

    of mute_fragments. Both cannot be passed simultaneously.

  • panelsize (float, default is 10) – Size in inches of the panel (=figsize in matplotlib). Will be ignored if a pre-existing axis object is parsed

  • angle_offset (float, default is 0) – In degrees, where the flareplot “begins”. Zero is xy = [1,0]

  • highlight_residxs (iterable of ints, default is None) – Show the labels for these residues in red

  • select_residxs (iterable of ints, default is None) – Only the residues here can be connected with a Bezier curve

  • fontsize (float, default is None) – Currently, the fontsize is internally computed as a function of the dotsize, since the space available for the labels is determined by the dotsize. There’s plans for user control in the future, but until then NotImplementedError will be thrown

  • shortenAAs (boolean, default is True) – Use short AA-codes, e.g. E30 for GLU30. Only has effect if a topology is parsed

  • aa_offset (int, default is 0) – Add this number to the resSeq value

  • markersize (float, default is None) – The size of the dots. It is internally optimized to have adjacent dots fill the available space without overlapping among them. There’s plans for user control in the future, but until then NotImplementedError will be thrown

  • bezier_linecolor (color-like, default is ‘k’) – The color of the bezier curves connecting the residues. Can be a character, string or RGB value (not RGBA)

  • plot_curves_only (bool, default is False) – Only plot the curves connecting the dots, but not the dots themselves or any other annotation. (labels, fragment names or SS information). The same caution as ax applies.

  • textlabels (bool or array_like, default is True) – How to label the residue dots. Gets passed directly to mdciao.flare.circle_plot_residues. Options are:

    • True: the dots representing the residues will get a label automatically, either their serial index or the residue name, e.g. GLU30, if a top was passed.

    • False: no labeling

    • array_like : will be passed as replacement_labels to mdciao.flare.add_fragmented_residue_labels

  • padding (iterable of len 3, default is [1,1,1]) – The padding, expressed as empty dot positions. Each number is used for:

    • the beginning of the flareplot, before the first residue

    • between fragments

    • at the end of the plot, after the last residue

  • lw (float, default is None) – Line width of the contact lines

  • signed_colors (dict, default is None) – Provide a color dictionary, e.g. {-1:”b”, +1:”r”} to give different colors to positive and negative alpha values. If None, defaults to bezier_linecolor

  • subplot (bool, default is False) – If True, the method checks if ax is the last axis in a figure (=all other panels have been already drawn) and then transfers the last plot’s fontsizes and linewidths to panels (if possible). It will help produce more homogeneous plots when heuristics about font-sizing fail

  • aura (iterable, default is None) – Scalar array (positive or negative), indexed with residue indices, e.g. RMSF, SASA, degree of conservation etc. It will be drawn as an aura around the flareplot.

  • coarse_grain (bool, default is False) – If True, will use the fragment definitions of fragments and/or sparse_fragments to coarse grain the frequencies into per-fragment frequencies and show them as a chord-diagram wrapping around freqs2chord Check there for more info

  • normalize_to_sigma (bool or float, default is False) – Only used if coarse_grain is True. Allows for scaling the arc occupied by the chords to a particular sigma value. This is explained in detail in the documentation of freqs2chord.

Returns:

  • ifig (Figure)

  • ax (Axes)

  • flareplot_attrs (dict) – Flareplot attributes as dictionary containing matplotlib objects (texts, dots, curves etc) for further manipulation and fine tuning of the plot if necessary. See the returned values of mdciao.flare.freqs2flare for more information.

plot_frequency_sums_as_bars(ctc_cutoff_Ang, title_str, switch_off_Ang=None, xmax=None, ax=None, shorten_AAs=False, label_fontsize_factor=1, lower_cutoff_val=0, bar_width_in_inches=0.75, list_by_interface=False, sort_by='freq', interface_vline=False)

Bar plot with per-residue sums of frequencies (called Sigma in mdciao)

Parameters:
  • ctc_cutoff_Ang (float) – The cutoff to use

  • title_str (str) – The title of the plot

  • switch_off_Ang (float, default is None) – TODO

  • xmax (float, default is None) – X-axis will extend from -.5 to xmax+.5

  • ax (obj:~matplotlib.axes.Axes`, default is None) – If None, one will be created, else draw here

  • shorten_AAs (boolean, default is False) – Unused ATM

  • label_fontsize_factor (float, default is 1) – Some control over fontsizes when plotting a high number of bars

  • lower_cutoff_val (float, default is 0) – Do not show sums of freqs lower than this value

  • bar_width_in_inches (float, default is .75) – If no ax is parsed, this controls that the drawn figure always has a size proportional to the number of frequencies being shown. Allows for combining multiple subplots with different number of bars in one figure with all bars equally wide regardles of the subplot

  • list_by_interface (boolean, default is True) – Separate residues by interface

  • sort_by (str or None, default is None) – The frequencies are by default plotted in the order in which the ContactPair-objects are stored in the ContactGroup.contact_pairs. This order depends on the ctc_cutoff_Ang originally used to instantiate this ContactGroup You can re-sort them for display purposes, leaving the original order untouched, via:

    • `sort_by`=’freq’

      Use the ctc_cutoff_Ang provided here to recompute new frequencies and sort the contacts in ascending order

    • `sort_by`=’residue’ or ‘numeric’

      Sort by ascending residue number

  • interface_vline (bool, default is False) – Plot a vertical line visually separating both interfaces

Returns:

ax

Return type:

Axes

plot_interface_frequency_matrix(ctc_cutoff_Ang, switch_off_Ang=None, transpose=False, label_type='best', **kwargs_plot_matrix)

Plot the interface_frequency_matrix

The first group of interface_residxs are the row indices, shown in the y-axis top-to-bottom (since imshow is used to plot) The second group of interface_residxs are the column indices, shown in the x-axis left-to-right

Parameters:
  • ctc_cutoff_Ang (float) – The cutoff to use

  • switch_off_Ang (float, default is None) – TODO

  • transpose (bool, default is False) – Transpose the contact matrix in the plot

  • label_type (str, default is “best”) – Best tries resname@consensus(>fragname>fragidx) Alternatives are “residue” or “consensus”, but “consensus” alone might lead to empty labels since it is not guaranteed that all residues of the interface have consensus labels

  • kwargs_plot_matrix (dict, default is None) – Optional keyword arguments for mdciao.plots.plot_matrix, listed below.

Other Parameters:
  • pixelsize (int, default is 1) – The size in inches of the pixel representing the contact. Ultimately controls the size of the figure, because figsize = _np.array(mat.shape)*pixelsize

  • grid (boolean, default is False) – overlap a grid of dashed lines

  • cmap (str, default is ‘binary’) – What matplotlib.cmap to use

  • colorbar (boolean, default is False) – whether to use a colorbar

Returns:

  • ax (Axes)

  • fig (matplotlib.pyplot.Figure)

plot_neighborhood_freqs(ctc_cutoff_Ang, switch_off_Ang=None, color='tab:blue', xmax=None, ax=None, shorten_AAs=False, label_fontsize_factor=1, sum_freqs=True, plot_atomtypes=False, sort_by=None)

Wrapper around ContactGroup.plot_freqs_as_bars for plotting neighborhoods

#TODO perhaps get rid of the wrapper altogether. ATM it would break the API

Parameters:
  • ctc_cutoff_Ang (float) – The cutoff to use

  • switch_off_Ang (float, default is None) – TODO

  • color (color-like (str or RGB triple) or list thereof, default is “tab:blue”) – The color for the bars. If string or RGB array, all bars will have this color. If list, it’s assumed in the order of the self.res_idx_pairs. It will get re-sorted according to sort, s.t. residues always have the same color not matter the order

  • xmax (int, default is None) – Default behaviour is to go to n_ctcs, use this parameter to homogenize different calls to this function over different contact groups, s.t. each subplot has equal xlimits

  • ax (Axes, default is None) – Axes to plot into, if None, one will be created

  • shorten_AAs (bool, default is False,) – Shorten residue names from “GLU30”->”E30”

  • label_fontsize_factor (float, default is 1) – Fontsize for the tilted labels and the legend, as fraction [0,1] of the default value in rcParams[“font.size”]

  • sum_freqs (bool, default is True) – Add the sum of frequencies of the represented (and only those) frequencies

  • plot_atomtypes (bool, default is False) – Add stripes to frequency bars to include the atom-types (backbone, sidechain, etc)

  • sort_by (str or None, default is None) – The frequencies are by default plotted in the order in which the ContactPair-objects are stored in the ContactGroup.contact_pairs. This order depends on the ctc_cutoff_Ang originally used to instantiate this ContactGroup You can re-sort them for display purposes, leaving the original order untouched, via:

    • `sort_by`=’freq’

      Use the ctc_cutoff_Ang provided here to recompute new frequencies and sort the contacts in ascending order

    • `sort_by`=’residue’ or ‘numeric’

      Sort by ascending residue number

Returns:

ax

Return type:

Axes

plot_timedep_ctcs(panelheight=3, plot_N_ctcs=True, pop_N_ctcs=False, skip_timedep=False, ctc_cutoff_Ang=None, sort_by_freq=False, **plot_timetrace_kwargs)

For each trajectory, plot the time-traces of the all the contacts (one per panel) and/or the timetrace of the overall number of contacts

In order for the number of contacts to be plotted, ctc_cutoff_Ang should be provided.

Parameters:
  • panelheight (float, default is 3) – The height of the per-contact panels, in inches

  • plot_N_ctcs (bool, default is True) – Add an extra panel at the bottom of the figure containing the number of formed contacts for each frame for each trajecotry A valid cutoff has to be passed along in plot_contact_kwargs otherwise this has no effect

  • pop_N_ctcs (bool, default is False) – Put the panel with the number of contacts in a separate figure A valid cutoff has to be passed along in plot_contact_kwargs otherwise this has no effect

  • skip_timedep (bool, default is False) – Skip plotting the individual timetraces and plot only the time trace of overall formed contacts. This sets pop_N_ctcs to True internally

  • ctc_cutoff_Ang (float, default is None,) – The cutoff to use, in Angstrom

  • sort_by_freq (bool, default is False) – Sort by descending frequency. Default is to plot in the same order as ContactGroup._contacts, which will be in descending order of frequencies with the cutoff used originally to compute this ContactGroup Only works if a ctc_cutoff_Ang is provided.

  • plot_timetrace_kwargs (dict) – Optional parameters for mdciao.contacts.ContactPair.plot_timetrace, which are documented below:

Other Parameters:
  • ax (None, Axes) – The axis where to plot the timetrace. Default is to plot on the current axis, and if there’s no current axes, a new one will be created. If a new one is created, it’ll have the default width and height, you have to change it afterwards or create it beforehand with your desired size.

  • color_scheme (list, default is None) – Pass a list of colors, each one should be understandable by matplotlib.colors.is_color_like

  • n_smooth_hw (int, default is 0) – Size, in frames, of half the window size of the smoothing window

  • dt (float, default is 1) – How many units of t_unit one frame represents

  • background (bool, or color-like, (str, hex, rgb), default is True) – When smoothing, the original curve can appear in the background in different colors

    • True: use a fainted version of color

    • False: don’t plot any background

    • color-like: use this color for the background,

    can be: str, hex, rgba, anything matplotlib.colors.is_color_like understands

  • shorten_AAs (bool, default is False) – Whether to shorten the AA labels

  • t_unit (str, default is ‘ps’) – The time unit with which to label the x-axis

  • ylim_Ang (float or “auto”) – The limit in Angstrom of the y-axis

  • max_handles_per_row (int, default is 4) – How many rows the legend can have

Returns:

list_of_figs – The wanted figure(s)

Return type:

list

Note

The keywords plot_N_ctcs, pop_N_ctcs, and skip_timedep allow this method to both include or totally exclude the total number of contacts and/or the time-traces in the figure. This might change in the future, it was coded this way to avoid breaking the command_line tools API. Also note that some combinations will produce an empty return!

plot_timedep_ctcs_matrix(ctc_cutoff_Ang, inches_per_contact=0.35, figsize=None, panelwidth=10, color='lightblue', shorten_AAs=True, dt=1, t_unit='ps', grid=True, show_freqs=True, anchor=None, bookends=True, defrag=None, ctc_control=None, sort_by='freq', lower_cutoff_val=0, n_smooth_hw=0) tuple

Per-trajectory time-traces of the formed contacts, shown as binary traces, i.e. formed or not formed.

Each trajectory gets displayed in its own panel.

Note

Contacts are shown in descending order of contact-frequency, as obtained using ctc_cutoff_Ang, over all dataset. Expect different orders when changing ctc_cutoff_Ang.

Parameters:
  • ctc_cutoff_Ang (float) – The cutoff to use, in Angstrom

  • inches_per_contact (float, default is .5) – The height, in inches, that each contact will take up on the whole plot. Making this number too small to make the figure look flatter might squeeze contact-labels vertically, try instead using panaelwidth.

  • figsize (tuple, default is None,) – Default behavior is to set the size of the figure automatically as

    height, width = self.n_trajs * self.n_ctcs * inches_per_contact, panelwidth

    s.t. figure sizes are consistent across systems and number of contacts. However, you can override this behavior by setting the figsize yourself here.

  • panelwidth (float, default is 10) – The width of the figure, in inches

  • color (any color-like, default is “lightblue”) – The color assigned to the formed contacts

  • shorten_AAs (bool, default is True) – Whether to use short verions of residue names

  • dt (float, default is 1) – How many units of t_unit one frame represents

  • t_unit (str, default is “ps”) – The time unit with which to label the x-axis

  • grid (boolean, default is True) – Overlap a grid of faint dashed lines on x and y ticks

  • show_freqs (bool, default is True) – Use the right-handside y-axis to annotate each contact with its contact-frequency. When multiple trajectories are plotted, the label includes per-trajectory frequency and overall frequency.

  • anchor (str, default is None) – This string will be deleted from the contact labels, leaving only the partner-residue to identify the contact. The final anchor label will be that of the deleted keys (allows for keeping e.g. pre-existing consensus nomenclature). No consistency-checks are carried out, i.e. use at your own risk (plus it looks ugly, somehow).

  • bookends (bool, default is True) – Indicate the beginning and end of each trajectory with a faint dashed line, to differentiate non formed contacts from simply absent trajectory data. Only has effect if trajectories have different starting or ending timestamps.

  • defrag (bool, default is None) – Whether or not to include the fragment information in the contact labels

  • ctc_control (None, float or int, default None) – Control the number of contacts that gets plotted. Default is to show all regardless of their frequency value.

    • If integer, interpret directly as number of contacts to be shown, e.g. ctc_control = 5 means show the 5 most frequent contacts (regardless of how many other there might be).

    • If float must be between [0,1]. It is interpreted as fraction of the total number of contacts to keep over all dataset, i.e. ctc_control=.75 means show contacts until 75% of all aggregated frequency is shown. The aggregate is computed on the frequencies that have not been truncated by lower_cutoff_val.

    • If None show all contacts regardless of their frequency.

    • This paramater will be ignored if sort_by is different from “frequency”, as it is only meaningful if contacts are sorted in descending order of frequency.

    The difference between None and 1.0 (100% of overall frequency) is that ctc_control = None will still show zero-frequency contacts, whereas ctc_control = 1.0 won’t, since 100% of overall frequency is achieved without the zero-frequency contacts.

  • sort_by (str, default is freq) – Default is to sort contacts by descending order of frequency. Alternatively, you can sort them by residue number by passing “residue” or “numeric” here

  • lower_cutoff_val (float, default is 0) – Hide contacts with frequencies lower than this value.

  • n_smooth_hw (int default is 0) – Half-window size for a smoothing the time-traces before computing the contact

Returns:

  • fig (Figure) – The figure with the plots

  • plotted_freqs (dict) – A dictionary keyed with the plotted contact labels and valued with the plotted overall frequencies. Keys are sorted in the same order as plotted.

  • plotted_trajs (list) – The binary trajectories, as plotted, i.e. each item of this list is a np.ndarray of shape (len(plotted_freqs), n_frames_i), where i is the trajectory index. The order of the rows is the same as the order of the keys in plotted_freqs.

plot_violins(sort_by=False, ctc_cutoff_Ang=None, truncate_at_mean=None, zero_freq=0.01, switch_off_Ang=None, ax=None, title_label=None, xmax=None, color='tab:blue', shorten_AAs=False, label_fontsize_factor=1, sum_freqs=True, defrag=None, stride=1)

Plot residue-residue distances as violin plots violinplot

The default behaviour is to plot all residue pairs in the order in which the ContactPair-objects are stored in the ContactGroup. You can check this order in self.res_idxs_pairs. This order typically depends on the original ctc_cutoff_Ang used to instantiate this ContactGroup, which might not carry the same meaning here.

For more than 50 contacts or so, violin plots take some time to compute, because a Gaussian-kernel-density estimation is done for each residue pair.

Also, plots with many residue pairs simply might be difficult to read.

Hence, to control the number of shown contacts, you can control the you can use these parameters, sorted somewhat hierarchically

  • sort_by

  • ctc_cutoff_ang

  • truncate_at_mean

  • zero_freq

Please check their documentation below.

Finally, if the plots still take too long to compute/show for the desired number of violins, try reducing the amount of data by using stride > 1

Parameters:
  • sort_by (iterable of ints, boolean, int, default is False) –

    Can be different things:
    • iterable of ints

      Strongest selection. Show only these residue pairs, in this order. Indices are intended as self.res_idxs_pairs indices. All other parameters are ignored.

    • str “numeric” or “residue”

      Sort by ascending residue number

    • boolean False

      Don’t sort, i.e. use the order in self.contact_pairs

    • boolean True

      Sort. There’s two options for sorting, depending on the value of ctc_cutoff_Ang (more below)

      • sort by distance means, ascending: ctc_cutoff_Ang is None

      • sort by contact-frequencies, descending: ctc_cutoff_Ang is needed is a float

        For contacts with zero frequency, fallback on ascending distance means This it means that you frequent contacts will be displayed first (=sorted by freq high to low). followed by infrequent ones sorted form (short to long)

    • int n

      Like True but up to n contacts at most. Other parameters like truncate_at_mean can reduce this number automatically

  • ctc_cutoff_Ang (opt, default is None) – If provided, contact-frequencies will be computed and shown in the contact-labels. Additionally, if sort is True or int, then the violins are sorted by contact-frequency in the plot

  • truncate_at_mean (float, default is None) – Don’t show violins with mean values higher than this (in Angstrom). This remains effectless for contacts in which the mean is above the cutoff BUT the frequency is > zero_freq. This case is very common, since a contact can be formed at small distances but broken at very large ones, s.t. the mean (or median) values are meaningless.

  • zero_freq (float, default is 1e-2) – Frequencies below this number will be considered zero and not shown. For this parameter to have effect, you need a ctc_cutoff_Ang

  • switch_off_Ang (float, default is None) – TODO

  • ax (None or Axes, default is None) – The axis to plot into. If None, one will be created

  • title_label (str, default is None) – If None, the method will default to self.name If self.name is also None, the method will fail

  • xmax (float, default is None) – X-axis will extend from -.5 to xmax+.5

  • color (iterable (list or dict), or str, default is None) –

    • list, the colors will be reordered so that the same residue pair always gets the same color, regardless of order in which they appear. This way you can track a violin across different sorting orders

    • str, it has to be a matplotlib color or a case-sensitive matplotlib colorname https://matplotlib.org/stable/tutorials/colors/colormaps.html

    • dict, keys are integers and values are colors This is the best way to work with sort is an iterable of ints, e.g. [ii,jj], because you can pass only those colors here as {ii:”red”,jj:”blue”}

    • If None, the ‘tab10’ colormap (tableau) is chosen

  • shorten_AAs (bool, default is None) – Shorten residue labels from “GLU30”->”E30”

  • label_fontsize_factor (float, default is 1) – Labels will use the fontsize rcParams[“font.size”]*label_fontsize_factor

  • sum_freqs (bool, default is True) – Whether to sum per-contact frequencies and place the in the label as \(Sigma\) values

  • defrag (char, default is None) – Whether to leave out the fragment affiliation, e.g. “GLU30@3.50” w/ defrag=”@” appears as “GLU30” only

  • stride (int,default is 1) – Stride the data down by this much, in case the computation of the violins takes too long

Returns:

  • ax (Axes)

  • order (np.ndarray) –

    Indices of the plotted residue pairs,

    in the order in which they were plotted.

    Is the result from the combination of the above selection parameters

relabel_consensus(new_labels=None)

Relabel any residue missing its consensus label to shortAA

Alternative (or additional) labels can be given as a dictionary.

Parameters:

new_labels (dict) – keyed with shortAA-codes and valued with the new desired labels

Warning

For expert use only. The changes in consensus labels propagates down to the attribute consensus labels of the the low-level attribute Residues.consensus_labels of the Residues objects underlying each of the ContactPair`s in this :obj:`ContactGroup

relative_frequency_formed_atom_pairs_overall_trajs(ctc_cutoff_Ang, switch_off_Ang=None, **kwargs) list

Relative frequencies interaction-type (by atom-type) for all contact-pairs in the ContactGroup

“Relative” means that they will sum up to 1 regardless of the contact’s frequency

>>> CG = mdciao.examples.ContactGroupL394()
>>> CG.relative_frequency_formed_atom_pairs_overall_trajs(4)
[{'SC-SC': 0.62, 'SC-BB': 0.21, 'BB-BB': 0.09, 'BB-SC': 0.08}
 {'BB-BB': 0.74, 'SC-SC': 0.26}
 {'SC-SC': 1.0}
 {'BB-SC': 0.59, 'SC-SC': 0.41}
 {'BB-SC': 0.73, 'SC-SC': 0.27}]
Parameters:
  • ctc_cutoff_Ang (float) – Cutoff in Angstrom. The comparison operator is “<=”

  • switch_off_Ang (float, default is None) – TODO

  • kwargs (dict) – Optional parameters for mdciao.ContactPair.relative_frequency_of_formed_atom_pairs_overall_trajs, which are listed below.

Other Parameters:
  • keep_resname (bool, default is False) – Keep the atom’s residue name in its descriptor. Only make sense if consolidate_by_atom_type is False

  • aggregate_by_atomtype (bool, default is True) – Aggregate the frequencies of the contact by tye atom types involved. Atom types are backbone, sidechain or other (BB,SC, X)

  • min_freq (float, default is .05) – Do not report relative frequencies below this cutoff, e.g. “BB-BB”:.9, “BB-SC”:0.03, “SC-SC”:0.03, “SC-BB”:0.03 gets reported as “BB-BB”:.9

Returns:

refreq_dicts – Lists of dictionaries with the relative freqs, keyed by atom-type (atoms) involved in the contact The order is the same as in self.ctc_labels

Return type:

list

repframes(scheme='mode', ctc_cutoff_Ang=None, return_traj=False, show_violins=False, n_frames=1, verbose=True)

Find representative frames for this ContactGroup

A “representative frame” means, in this context, a frame that minimizes the average distance to the modes (or means) of the residue-residue distances contained in this object.

Please note that “representative” can have other meanings in other contexts. Here, it’s just a way to pick a frames/geometries that will most likely resemble most of what is also seen in the distributions, barplots, violinplots, and flareplots.

Please also note that minimizing averages has its own limitations and might not always yield the best result, However, it is the easiest and quickest to implement. Feel free to use any of Sklearn’s great regression tools under constraints to get a better “representative”.

Parameters:
  • scheme (str, default is “mode”) – Two options: * “mode” : minimize average distance

    to the most likely distance, i.e. to the mode, i.e. to the distance values at which the distributions (plot_distance_distributions or plot_violins) peak. You can check the mode values in modes

    • “mean” : minimize average distance to the mean values of the distances You can check the means in means

    • “min” : minimize average distance to the minimum values of the distances You can check the means in minima

    • “max” : minimize average distance to the maximum values of the distances You can check the means in maxima

  • ctc_cutoff_Ang (float, default is None) – THIS IS EXPERIMENTAL If given, the contact frequencies will be used as weights when computing the average. In cases with many contacts, many of them broken, this might help

  • return_traj (bool, default is False) – If True, try to return also the Trajectory objects Will fail that is not possible because the original files aren’t accessible (or there weren’t any)

  • show_violins (bool, default is False) – Superimpose the distance values as dots on top of a violin plot, created by using the plot_violins

  • n_frames (int, default is 1) – The number of representative frames to return

  • verbose (bool, default is True) – Inform of the frames that are being selected

Returns:

  • frames (list) – A list of n_frames tuples, each tuple containing the trajectory and frame index that minimize RMSDd.

  • RMSDd (np.ndarray) – A 1D array containing the root-mean-square-deviation (in Angstrom) over distances (not positions) of the returned frames to the computed reference as specified by the scheme. This mean is weighted by the contact frequencies in case a ctc_cutoff_Ang was given. Should always be in ascending order, i.e. the frames are sorted from closest to furthest to the reference.

  • values (np.ndarray) – A 2D array of shape(n_frames, n_ctcs) containing the distance values of the frames in Angstrom

  • trajs (Trajectory) – An Trajectory with n_frames frames. Only if `return_traj`=True

property res_idxs_pairs: ndarray

Pairs of residue indices of the contacts in this object

Returns:

res_idxs_pairs

Return type:

_np.ndarray

property residue_names_long: list

Pairs of long residue names of the ContactPairs

>>> CG = mdciao.examples.ContactGroupL394()
>>> CG.residue_names_short
[['ARG389', 'LEU394'],
 ['LEU394', 'LYS270'],
 ['LEU388', 'LEU394'],
 ['LEU394', 'LEU230'],
 ['ARG385', 'LEU394']]
Returns:

residue_names_long

Return type:

list

property residue_names_short: list

Pairs of short residue names of the ContactPairs

>>> CG = mdciao.examples.ContactGroupL394()
>>> CG.residue_names_short
[['R389', 'L394'],
 ['L394', 'K270'],
 ['L388', 'L394'],
 ['L394', 'L230'],
 ['R385', 'L394']]
Returns:

residue_names_short

Return type:

list

property residx2consensuslabel: dict

Dictionary mapping residue indices to consensus labels:

>>> CG = mdciao.examples.ContactGroupL394()
>>> CG.residx2consensuslabel
{348: 'G.H5.21',
 353: 'G.H5.26',
 972: '6.32',
 347: 'G.H5.20',
 957: '5.69',
 344: 'G.H5.17'}
Returns:

residx2consensuslabel

Return type:

dict

residx2ctcidx(idx)

Indices of the contacts and the position (0 or 1) in which the residue with residue idx appears

>>> CG = examples.ContactGroupL394()
>>> CG.res_idxs_pairs
array([[348, 353],
       [353, 972],
       [347, 353],
       [353, 957],
       [344, 353]])
>>> CG.residx2ctcidx(347)
array([[2, 0]])
Parameters:

idx (int) – A residue index

Returns:

ctc_idxs – The first index is the contact index, the second the pair index (0 or 1)

Return type:

2D np.ndarray of shape (N,2)

property residx2fragnamebest: dict

Dictionary mapping residue indices to best possible fragment names

“best” means consensus label > fragment name > fragment index

>>> CG = mdciao.examples.ContactGroupL394()
>>> CG.residx2fragnamebest
{348: 'G.H5.21',
 353: 'G.H5.26',
 972: '6.32',
 347: 'G.H5.20',
 957: '5.69',
 344: 'G.H5.17'}
Returns:

residx2fragnamebest

Return type:

dict

residx2resnamefragnamebest(fragsep='@', shorten_AAs=True) dict

Dictionary mapping residue indices to best possible residue+fragment label

“best” means consensus label > fragment name > fragment index

Parameters:
  • fragsep (str, default is “@”) – The str or char to separate residue labels from fragment labels, “A30@frag1

  • shorten_AAs (bool, default is True) – Whether to use short residue names

  • >>> CG = mdciao.examples.ContactGroupL394()

  • >>> CG.residx2resnamefragnamebest()

  • {344 (‘R385@G.H5.17’,) – 347: ‘L388@G.H5.20’, 348: ‘R389@G.H5.21’, 353: ‘L394@G.H5.26’, 957: ‘L230@5.69’, 972: ‘K270@6.32’}

Returns:

residx2resnamefragnamebest

Return type:

dict

property residx2resnamelong: dict

Dictionary mapping residue indices to short residue names:

>>> CG = mdciao.examples.ContactGroupL394()
>>> CG.residx2resnamelong
{348: 'ARG389',
 353: 'LEU394',
 972: 'LYS270',
 347: 'LEU388',
 957: 'LEU230',
 344: 'ARG385'}
Returns:

residx2resnamelong

Return type:

dict

property residx2resnameshort

Dictionary mapping residue indices to short residue names:

>>> CG = mdciao.examples.ContactGroupL394()
>>> CG.residx2resnameshort
{348: 'R389',
 353: 'L394',
 972: 'K270',
 347: 'L388',
 957: 'L230',
 344: 'R385'}
Returns:

residx2resnameshort

Return type:

dict

retop(top, mapping, deepcopy=False)

Return a copy of this object with a different topology.

Uses the mapping to generate new residue-indices where necessary, using the rest of the attributes (time-traces, labels, colors, fragments…) as they were

Wraps thinly around mdciao.contacts.ContactPair.retop

Note

When re-topping interfaces, those residues of the ‘old’ interface_fragments which are not covered by the mapping will be missing in the ‘new’ interface_fragments. However, the new interface is guaranteed to have at least all the ‘new’ interface_residxs mapped. So, as long as the ‘old’ interface_residxs are covered by the mapping, this isn’t a problem (TODO except, perhaps, when plotting flareplots using the spare=”interface” option after re-topping)

Parameters:
  • top (Topology) – The new topology

  • mapping (indexable (array, dict, list)) – A mapping of old residue indices to new residue indices. Usually, comes from aligning the old and the new topology using mdciao.utils.sequence.maptops.

  • deepcopy (bool, default is False) – Use copy.deepcopy on the attributes when creating the new ContactPair.

Returns:

CG

Return type:

ContactGroup

save(filename)

Save this ContactGroup as a pickle

Parameters:

filename (str) – filename

save_trajs(prepend_filename, ext, output_dir='.', t_unit='ps', verbose=False, ctc_cutoff_Ang=None, self_descriptor='mdciaoCG')

Save time-traces to disk.

FileNames will be created based on the property self.trajlabels, but using only the basenames and prepending with the string prepend_filename

If there is an anchor residue (i.e. this ContactGroup is a neighborhood, the anchor will be included in the filename, otherwise the string “contact_group” will be used. You can control the output_directory using output_dir

If a ctc_cutoff is given, the time-traces will be binarized (see self.binarize_trajs). Else, the distances themselves are stored.

Parameters:
  • prepend_filename (str) – Each filename will be prepended with this string

  • ext (str) – Extension, can be “xlsx” or anything numpy.savetext can handle

  • output_dir (str, default is “.”) – The output directory

  • t_unit (str, default is “ps”) – Other units are “ns”, “mus”, and “ms”. The transformation happens internally

  • verbose (boolean, default is False) – Prints filenames

  • ctc_cutoff_Ang (float, default is None) – Use this cutoff and save bintrajs instead

  • self_descriptor (str, default is “mdciaoCG”) – Saved filenames will be tagged with this descriptor

Return type:

None

select_by_frames(frames) ContactPair

Return a copy this ContactGroup, but with a sub-selection of trajectories and frames. The returned ContactGroup has the same ContactPairs as the original.

Parameters:

frames (int, dict, or iterable of pairs) – Control what frames of the trajectory data gets used in the returned ContactGroups. Several modes of input are possible.

  • integer n: select the first n frames of each trajectory. If n is negative, then select the last n frames of each trajectory. If a trajectory has less than n frames, all frames are selected.

  • dict: keyed with trajectory indices, valued with a list of trajectory frames. E.g. if frames = {2 : [101,100], 0: [10, 20]}, then the new ContactGroup has two trajectories which consist of old trajectories 2 and 0, with the frames 101,100 and 10,20, respectively. The output order corresponds the input order both in terms of keys and values of the input dictionary.

  • list of pairs of integers: individual frames of individual trajectories merged into a single ContactGroup, e.g.

    >>> frames = [[i,j],
    >>>           [k,l],
    >>>           [m,n]]
    
    means the new ContactGroup has three frames
    • frame j of trajectory i

    • frame k of trajectory l

    • frame n of trajectory m

Returns:

newCG – A new ContactGroup, equivalent to the original one but with only those trajectories and frames selected by frames

Return type:

ContactGroup

Note

Any trajectory filenames used to instantiate the original ContactGroup, which are stored in ContactGroup.trajlabels, are NOT passed onto the newCG returned by this method. This is because frame-indices of the time-traces contained in the newCG most likely do not correspond to the frame-indices of the those original filenames. However, the methods of newCG are not aware of this and things like ContactGroup.repframes will return the wrong frames. Hence, the newCG always gets mdtraj.Trajectory objects as traj input and accordingly has [“mdtraj.00”, “mdtraj.01”…] as trajlabels. The same principle applies to the order of trajectories, i.e. if you reorder trajectories by passing a dict to frames, the newCG is not aware of the fact that these trajectories had a previous order. newCG has them stored (and readily available) as Trajectory objects and calls them [“mdtraj.00”, “mdtraj.01”…].

select_by_residues(CSVexpression=None, residue_indices=None, residue_pairs=None, allow_multiple_matches=False, merge=True, keep_interface=True, n_residues=1)

Return a copy this ContactGroup, but with a sub-selection of ContactGroup.contact_pairs based on residues. The returned ContactGroup has the same trajectories and frames as the original.

The filtering of ContactPairs is done using CSVexpression, residue_indices, or residue_pairs so that: * one residue match per ContactPair is enough, or * both residues of the ContactPair need to match for the ContactPair to be selected for the new ContactGroup. See n_residues for more info.

CSVexpression, residue_indices, and residue_pairs are mutually exclusive, only one of them can be not None.

Parameters:
  • CSVexpression (str or None, default is None) – CSV expression like “GLU30,K*,3.50” to select the residue-pairs of self for the new ContactGroup. See mdciao.utils.residue_and_atom.find_AA for the syntax of the expression.

  • residue_indices (list, default is None,) – Input your selection via zero-indexed residue indices of self.top.

  • residue_pairs (list, default is None) – Input your selection via pairs of zero-indexed residue indices of self.top. Sets n_residues automatically to two.

  • allow_multiple_matches (bool, default is False) – Fail if the substrings of the CSVexpression return more than one residue. Protects from over-grabbing residues. Only has effect if CSVexpression is used, since residue_indices matches are unique

  • merge (bool, default is True) – Merge the selected residue-pairs into one single ContactGroup. If False every sub-string of CSVexpression returns its own ContactGroup

  • keep_interface (bool, default is True) – If self.is_interface and merge are both True, then returned ContactGroup will also be an interfaces itself

  • n_residues (int, default is 1) – Number of residues-matches that a ContactPair has to have be selected for the new ContactGroup. By default, one residue alone is enough. Using n_residues = 2 selects only ContactPairs where the both residues match against CSVexpression, residue_indices, or residue_pairs. This is useful when trying to keep interface properties. Any n_residues value different from [1,2] will raise an error.

Returns:

newCG – If dict, it’s keyed with substrings of CSVexpression and valued with ContactGroups

Return type:

ContactGroup or dict

property shared_anchor_residue_index: int

The index of the anchor residue, i.e. the residue at the center of this neighborhood

Only populated if self.is_neighborhood is True, else returns None

Returns:

idx

Return type:

int

property stacked_time_traces

All ContactPair time_traces stacked into an 2D np.array

Returns:

data – The array is of shape(self.n_frames_total, self.n_ctcs)

Return type:

np.ndarray

property time_arrays: list

The time-arrays of each trajectory contained in this ContactGroup

Returns:

time_arrays – The units of these arrays will be whatever was given to the ContactPairs used to instantiate this ContactGroup

Return type:

list

property time_max: float

Maximum time-value of the ContactGroup

Returns:

time_max – Its units will be whatever was given to the ContactPairs used to instantiate this ContactGroup. The most frequent case are “ps”, since that’s how time arrays are stored in xtc files

Return type:

float

property time_min: float

Minimum time-value of the ContactGroup

Returns:

time_min – Its units will be whatever was given to the ContactPairs used to instantiated this ContactGroup. The most frequent case are “ps”, since that’s how time arrays are stored in xtc files

Return type:

float

to_ContactGroups_per_traj() dict

Break this ContactGroup (potentially containing many trajectories) into individual, per-trajectory ContactGroups

Returns:

CGs – The dictionary is keyed with each of the original self.trajlabels, and valued with ContactGroups that only contain information regarding that single trajectory.

Return type:

dict

Note

The attribute mdciao.contacts.ContactGroup.trajlabels of the returned, n-th CG will necessarily only contain one trajectory label. In case the original labels were strings containing pathnames, that name will coincide with he n-th original trajlabel. On the contrary, in case it contained a placeholder name created on-the-fly (e.g. ‘mdtraj.01’) because no pathnames were originally known, but rather mdtraj.Trajectory objects were passed as trajs, that placeholder-name gets re-set to mdtraj.00 since each returned CG only “knows” one traj and it’s necessarily the first one.

property top

The topology used to instantiate the ContactPairs in this ContactGroup

>>> CG = mdciao.examples.ContactGroupL394()
>>> CG.top
<mdtraj.Topology with 1 chains, 1044 residues, 8384 atoms, 8502 bonds at 0x7efdae47e990>

Returns:

top : :obj:~mdtraj.Trajectory or None

property topology

The topology used to instantiate the ContactPairs in this ContactGroup

>>> CG = mdciao.examples.ContactGroupL394()
>>> CG.top
<mdtraj.Topology with 1 chains, 1044 residues, 8384 atoms, 8502 bonds at 0x7efdae47e990>

Returns:

topology : :obj:~mdtraj.Trajectory or None

property trajlabels: list

List of trajectory labels shared by all ContactGroup.contact_pairs.

If Trajectory objects were passed originally to the underlying ContactGroup.contact_pairs, then [“mdtraj.00”, “mdtraj.01”,…] descriptors will be used. If filenames were passed, then the trajlabels are the filenames (basename, no files) without the extension. If no labels and no trajectories were passed , then labels like [“traj 0”, “traj 1”,…] are used.

>>> CG = mdciao.examples.ContactGroupL394()
>>> CG.trajlabels
['gs-b2ar.noH.stride.5']
Returns:

trajlabels

Return type:

list