mdciao.contacts.ContactGroup

class mdciao.contacts.ContactGroup(list_of_contact_objects, interface_fragments=None, top=None, name=None, neighbors_excluded=None, use_AA_when_conslab_is_missing=True, max_cutoff_Ang=None)

Container for ContactPair-objects

This class is the second level of abstraction after ContactPair and provides methods to

  • perform operations on all the contact-pairs simultaneously and

  • plot/show/save the result of these operations

In many cases, the methods of ContactGroup thinly wrap and iterate around equally named methods of the ContactPair-objects.

Note

Higher-level methods in the API, like those exposed by mdciao.cli will return ContactPair or ContactGroup objects already instantiated and ready to use. It is recommened to use those instead of individually calling ContactPair or ContactGroup.

__init__(list_of_contact_objects, interface_fragments=None, top=None, name=None, neighbors_excluded=None, use_AA_when_conslab_is_missing=True, max_cutoff_Ang=None)
Parameters
  • list_of_contact_objects (list) – list of ContactPair objects

  • interface_fragments (list of two iterables of indexes, default is None) –

    An interface is defined by two groups of residue indices.

    This input doesn’t need to have all or any of the residue indices in res_idxs_pairs.

    This input will be will be used to group the object’s own residue idxs present in residxs_pairs into the two groups of the interface. These two groups will be accessible through the attribute self.interface_residxs

    It will remain accessible through the object’s equally named the attribute self.interface_fragments

  • top (Topology, default is None) – The molecular topology associated with this object. Normally, the default behaviour is enough. It checks whether all ContactPairs of list_of_contact_objects share the same self.top and use that one. If they have different topologies, the method fails, since you can’t instantiate a ContactGroup with ContactPairs from different. In case the ContactPairs don’t have any topology at all (self.top is None for all ContactPairs) you can pass one here. Or, if the have one, and you pass one here, it will be checked that top provided here coincides with the ContactPairs’ shared topology

  • name (string, default is None) – Optional name you want to give this object, ATM it is only used for the title of the ContactGroup.plot_distance_distributions title when the object is not a neighborhood

  • neighbors_excluded (int, default is None) – The neighbors excluded when creating the underlying ContactPairs passed in list_of_contact_objects

  • max_cutoff_Ang (float, default is None) – Operations involving cutoffs higher than this will be forbidden and will raise ValueError. Prevents the user from asking for contact-frequencies that aren’t present in the ContactGroup

Methods

__init__(list_of_contact_objects[, …])

param list_of_contact_objects

list of ContactPair objects

archive([filename])

Save this ContactGroup’s list of ContactPairs as a list of dictionaries that can be used to re-instantiate an equivalent ContactGroup

binarize_trajs(ctc_cutoff_Ang[, …])

Binarize trajs

copy()

copy this object by re-instantiating another ContactGroup object with the same attributes.

distribution_dicts([bins])

Wraps around the method ContactGroup.distributions_of_distances and returns one distribution dict keyed by contact label (see kwargs and CP.label_flex

distributions_of_distances([bins])

Histograms the distance values of each contact, returning a list with as many distributions as there are contacts.

frequency_as_contact_matrix(ctc_cutoff_Ang)

Returns a symmetrical, square matrix of size top.n_residues containing the frequencies of the pairs in residxs_pairs, and those pairs only, the rest will be NaNs

frequency_as_contact_matrix_CG(ctc_cutoff_Ang)

Coarse-grained contact-matrix

frequency_dataframe(ctc_cutoff_Ang[, …])

Output a formatted dataframe with fields “label”, “freq” and “sum”, optionally dis-aggregated by type of contact in “by_atomtypes”

frequency_delta(otherCG, ctc_cutoff_Ang)

Compute per-contact frequency differences between self and some other ContactGroup

frequency_dict_by_consensus_labels(…[, …])

Return frequencies as a dictionary of dictionaries keyed by consensus labels

frequency_dicts(ctc_cutoff_Ang[, sort_by_freq])

Wraps around the method ContactPair.frequency_dict of each of the underlying ContactPair s and returns one frequency dict keyed by contact label

frequency_per_contact(ctc_cutoff_Ang[, …])

Frequency per contact over all trajs :param ctc_cutoff_Ang: The cutoff to use :type ctc_cutoff_Ang: float :param switch_off_Ang: TODO :type switch_off_Ang: float, default is None

frequency_spreadsheet(sheet1_dataframe, …)

Write an Excel file with the Dataframe that is returned by self.frequency_dataframe.

frequency_str_ASCII_file(idf[, ascii_file])

Create a string with the frequencies from a DataFrame

frequency_sum_per_residue_idx_dict(…[, …])

Dictionary of aggregated frequency_per_contact per residue indices Values over 1 are possible, example if [0,1], [0,2] are always formed (=1) freqs_dict[0]=2

frequency_sum_per_residue_names(ctc_cutoff_Ang)

Aggregate the frequencies of frequency_per_contact by residue name, using the most informative names possible, see self.residx2resnamefragnamebest for more info on this

frequency_table(ctc_cutoff_Ang, fname[, …])

Print and/or save frequencies as a formatted table

frequency_to_bfactor(ctc_cutoff_Ang, …[, …])

Save the contact frequency aggregated by residue to a pdb file

gen_ctc_labels(**kwargs)

Generate a labels with different parameters

interface_frequency_matrix(ctc_cutoff_Ang[, …])

Rectangular matrix of size (N,M) where N is the length of the first list of interface_residxs and M the length of the second list of interface_residxs.

n_ctcs_timetraces(ctc_cutoff_Ang[, …])

time-traces of the number of contacts, by summing overall contacts for each frame

plot_distance_distributions([bins, xlim, …])

Plot distance distributions for the distance trajectories of the contacts

plot_freqs_as_bars(ctc_cutoff_Ang[, …])

Plot a contact frequencies as a bar plot

plot_freqs_as_flareplot(ctc_cutoff_Ang[, …])

Produce contact flareplots by wrapping around mdciao.flare.freqs2flare

plot_frequency_sums_as_bars(ctc_cutoff_Ang, …)

Bar plot with per-residue sums of frequencies (called Sigma in mdciao)

plot_interface_frequency_matrix(ctc_cutoff_Ang)

Plot the interface_frequency_matrix

plot_neighborhood_freqs(ctc_cutoff_Ang[, …])

Wrapper around ContactGroup.plot_freqs_as_bars for plotting neighborhoods

plot_timedep_ctcs(panelheight[, …])

For each trajectory, plot the time-traces of the all the contacts (one per panel) and/or the timetrace of the overall number of contacts

plot_violins([sort_by, ctc_cutoff_Ang, …])

Plot residue-residue distances as violin plots violinplot

relabel_consensus([new_labels])

Relabel any residue missing its consensus label to shortAA

relative_frequency_formed_atom_pairs_overall_trajs(…)

Relative frequencies interaction-type (by atom-type) for all contact-pairs in the ContactGroup

repframes([scheme, ctc_cutoff_Ang, …])

Find representative frames for this ContactGroup

residx2ctcidx(idx)

Indices of the contacts and the position (0 or 1) in which the residue with residue idx appears

residx2resnamefragnamebest([fragsep, …])

Dictionary mapping residue indices to best possible residue+fragment label

retop(top, mapping[, deepcopy])

Return a copy of this object with a different topology.

save(filename)

Save this ContactGroup as a pickle

save_trajs(prepend_filename, ext[, …])

Save time-traces to disk.

to_new_ContactGroup(CSVexpression[, …])

Creates a new ContactGroup from this une using a CSV expression to filter for residues

Attributes

anchor_fragment_color

The color associated with the fragment of the anchor residue

anchor_res_and_fragment_str

Label of the anchor residue of this neighborhood, including fragment

anchor_res_and_fragment_str_short

Label of the anchor residue (short) of this neighborhood, including fragment

consensus_labels

List of pairs of labels derived from GPCR, CGN or other type of consensus nomenclature.

consensuslabel2resname

Dictionary mapping consensus labels to residue names:

ctc_labels

List of simple labels (no fragment info) for the residue pairs in ContactPairs

ctc_labels_short

List of simple labels (no fragment info, short AAs) for the residue pairs in ContactPairs

ctc_labels_w_fragments_short_AA

List of labels ) for the residue pairs in ContactPairs

fragment_names_best

Best possible fragment names for the residue pairs in ContactPairs

interface_fragments

Two residue lists provided at initialization

interface_labels_consensus

Consensus labels of whatever residues interface_residxs holds.

interface_residue_names_w_best_fragments_short

Best possible residue@fragment string for the residues in interface_residxs

interface_residxs

The residues of self.res_idxs_pairs grouped into two lists, depending on what self.interface_fragments they belong to

interface_reslabels_short

Residue labels of whatever residues interface_residxs holds

is_interface

Whether this ContactGroup can be interpreted as an interface.

is_neighborhood

Whether this ContactGroup is a neighborhood or not

max_cutoff_Ang

Operations involving cutoffs higher than this will be forbidden and wil raise ValueError.

means

The mean value over all distance time-traces

modes

//en.wikipedia.org/wiki/Mode_(statistics)>`_ over all distance time-traces

n_ctcs

The number of contact pairs (mdciao.contacts.ContactPair -objects) stored in this object

n_frames

List of per-trajectory n_frames

n_frames_total

Total number of frames

n_trajs

The number of trajectories contained in this ContactGroup

name

The name of this ContactGroup, given when creating it

neighbors_excluded

The number of neighbors that were excluded when creating this ContactGroup

partner_fragment_colors

The colors associated with the fragments of the anchor partner residues

partner_res_and_fragment_labels

List of labels the partner (not anchor) residues of this neighborhood, including fragment

partner_res_and_fragment_labels_short

List of labels (short) the partner (not anchor) residues of this neighborhood, including fragment

res_idxs_pairs

Pairs of residue indices of the contacts in this object

residue_names_long

Pairs of long residue names of the ContactPairs

residue_names_short

Pairs of short residue names of the ContactPairs

residx2consensuslabel

Dictionary mapping residue indices to consensus labels:

residx2fragnamebest

Dictionary mapping residue indices to best possible fragment names

residx2resnamelong

Dictionary mapping residue indices to short residue names:

residx2resnameshort

Dictionary mapping residue indices to short residue names:

shared_anchor_residue_index

The index of the anchor residue, i.e.

stacked_time_traces

All ContactPair time_traces stacked into an 2D np.array

time_arrays

The time-arrays of each trajectory contained in this ContactGroup

time_max

Maximum time-value of the ContactGroup

time_min

Minimum time-value of the ContactGroup

top

The topology used to instantiate the ContactPairs in this ContactGroup

topology

The topology used to instantiate the ContactPairs in this ContactGroup

trajlabels

List of trajectory labels

property anchor_fragment_color

The color associated with the fragment of the anchor residue

Two fragment colors were given to the individual ContactPairs that were used to instantiate this ContactGroup. These colors might have been passed by the user themselves or given by default e.g. by mdciao.cli._parse_coloring_options. Check the defaults there

Will fail if self.is_neighborhood is False

>>> CG = mdciao.examples.ContactGroupL394()
>>> CG.anchor_fragment_color

‘tab:blue’

color : str

property anchor_res_and_fragment_str

Label of the anchor residue of this neighborhood, including fragment

Will fail if self.is_neighborhood is False

>>> CG = mdciao.examples.ContactGroupL394()
>>> CG.anchor_res_and_fragment_str
'LEU394@G.H5.26'
Returns

label

Return type

str

property anchor_res_and_fragment_str_short

Label of the anchor residue (short) of this neighborhood, including fragment

Will fail if self.is_neighborhood is False

>>> CG = mdciao.examples.ContactGroupL394()
>>> CG.anchor_res_and_fragment_str_short
'L394@G.H5.26'
Returns

label

Return type

str

archive(filename=None, **kwargs)

Save this ContactGroup’s list of ContactPairs as a list of dictionaries that can be used to re-instantiate an equivalent ContactGroup

The method ContactGroup.save creates a pickle that has a lot of redundant information

Parameters

filename (str, default is None) – Has to end in “npy”. Default is to return the dictionary

Other Parameters

kwargs (dict) – Optional parameters for mdciao.contacts.ContactPair._serialized_as_dict

Returns

archive

Return type

dict

binarize_trajs(ctc_cutoff_Ang, switch_off_Ang=None, order='contact')

Binarize trajs

Parameters
  • ctc_cutoff_Ang (float) – The cutoff to use

  • switch_off_Ang (float, default is None) –

    Implements a linear switchoff from ctc_cutoff_Ang to ctc_cutoff_Ang`+`switch_off_Ang. E.g. if the cutoff is 3 Ang and the switch is 1 Ang, then

    • 3.0 -> 1.0

    • 3.5 -> .5

    • 4.0 -> 0.0

  • order (str, default is "contact") – Sort first by contact, then by traj index. Alternative is “traj”, i.e. sort first by traj index, then by contact

  • TODO (change the name "binarize") –

Returns

bintrajs – if order==traj, each item of the list is a 2D np.ndarray with of shape(Nt,n_ctcs), where Nt is the number of frames of that trajectory

Return type

list of boolean arrays

property consensus_labels

List of pairs of labels derived from GPCR, CGN or other type of consensus nomenclature.

They were parsed at initialization

>>> CG = mdciao.examples.ContactGroupL394()
>>> CG.consensus_labels
[['G.H5.21', 'G.H5.26'],
 ['G.H5.26', '6.32'],
 ['G.H5.20', 'G.H5.26'],
 ['G.H5.26', '5.69'],
 ['G.H5.17', 'G.H5.26']]
Returns

consensus_labels

Return type

list

property consensuslabel2resname

Dictionary mapping consensus labels to residue names:

>>> CG = mdciao.examples.ContactGroupL394()
>>> CG.consensuslabel2resname
{'G.H5.21': 'R389',
 'G.H5.26': 'L394',
 '6.32': 'K270',
 'G.H5.20': 'L388',
 '5.69': 'L230',
 'G.H5.17': 'R385'}
Returns

consensuslabel2resname

Return type

dict

copy()

copy this object by re-instantiating another ContactGroup object with the same attributes.

In theory self == self.copy() should hold, but not self is self.copy()

Returns

CG

Return type

ContactGroup

property ctc_labels

List of simple labels (no fragment info) for the residue pairs in ContactPairs

>>> CG = mdciao.examples.ContactGroupL394()
>>> CG.ctc_labels
['ARG389-LEU394',
 'LEU394-LYS270',
 'LEU388-LEU394',
 'LEU394-LEU230',
 'ARG385-LEU394']

ctc_labels : list

property ctc_labels_short

List of simple labels (no fragment info, short AAs) for the residue pairs in ContactPairs

>>> CG = mdciao.examples.ContactGroupL394()
>>> CG.ctc_labels_short
['R389-L394',
 'L394-K270',
 'L388-L394',
 'L394-L230',
 'R385-L394']

ctc_labels_short : list

property ctc_labels_w_fragments_short_AA

List of labels ) for the residue pairs in ContactPairs

>>> CG = mdciao.examples.ContactGroupL394()
>>> CG.ctc_labels_short
['R389@G.H5.21-L394@G.H5.26',
 'L394@G.H5.26-K270@6.32',
 'L388@G.H5.20-L394@G.H5.26',
 'L394@G.H5.26-L230@5.69',
 'R385@G.H5.17-L394@G.H5.26']

ctc_labels_w_fragments_short_AA : list

distribution_dicts(bins=10, **kwargs)

Wraps around the method ContactGroup.distributions_of_distances and returns one distribution dict keyed by contact label (see kwargs and CP.label_flex

Parameters
  • bins (int or sequence of scalars or str, optional, default is 10) – If bins is an int, it defines the number of equal-width bins in the given range (10, by default). If bins is a sequence, it defines a monotonically increasing array of bin edges, including the rightmost edge, allowing for non-uniform bin widths.

  • kwargs (optional keyword arguments) – Check ContactPair.frequency_dict

Returns

fdict

Return type

dictionary

distributions_of_distances(bins=10)

Histograms the distance values of each contact, returning a list with as many distributions as there are contacts.

Parameters

bins (int or sequence of scalars or str, optional, default is 10) – If bins is an int, it defines the number of equal-width bins in the given range (10, by default). If bins is a sequence, it defines a monotonically increasing array of bin edges, including the rightmost edge, allowing for non-uniform bin widths.

Returns

list_of_distros – List of len self.n_ctcs, each entry contains the counts and edges of the bins

Return type

list

property fragment_names_best

Best possible fragment names for the residue pairs in ContactPairs

The fragment name will try to pick the consensus nomenclature. If no consensus label for the residue exists, the actual fragment names are used as fallback (which themselves fallback to the fragment index)

Only if no consensus label, no fragment name and no fragment indices are there, will this yeild “None” as a string.

>>> CG = mdciao.examples.ContactGroupL394()
>>> CG.fragment_names_best
[['G.H5.21', 'G.H5.26'],
 ['G.H5.26', '6.32'],
 ['G.H5.20', 'G.H5.26'],
 ['G.H5.26', '5.69'],
 ['G.H5.17', 'G.H5.26']]

fragment_names_best : list

frequency_as_contact_matrix(ctc_cutoff_Ang, switch_off_Ang=None)

Returns a symmetrical, square matrix of size top.n_residues containing the frequencies of the pairs in residxs_pairs, and those pairs only, the rest will be NaNs

If top is None the method will fail.

Note

This is NOT the full contact matrix unless all necessary residue pairs were used to construct this ContactGroup

Parameters
  • ctc_cutoff_Ang (float) – The cutoff to use

  • switch_off_Ang (float, default is None) – TODO

Returns

mat

Return type

numpy.ndarray

frequency_as_contact_matrix_CG(ctc_cutoff_Ang, switch_off_Ang=None, fragments=None, fragment_names=None, consensus_labelers=None, verbose=False, sparse=False, interface=False, zero_freq=0.01, dec_round=3, return_fragments=False)

Coarse-grained contact-matrix

Frequencies of self.frequency_per_contact get coarse-grained into fragments. Fragment definitions come from fragments and/or from the consensus_labelers. These definitions need to contain all residues in self.res_idxs_pairs

User-defined and consensus-derived fragment definitions get spliced together using splice_orphan_fragments. This might lead to sub-sets of the input fragments getting re-labeled as “subfrags” and residues not defined anywhere being labelled “orphans”. This leads to cumbersome fragment names (and can change in the future), but at least its “traceable” for the moment

If you want to have the fragment definitions, use return_fragments = True

Anytime some argument leads to a row/column being deleted from the output, the matrix is returned as an annotated DataFrame, to be able to provide row/columns with names and keep track of their meaning

If interface is True and this ContactGroup is indeed an interface, the matrix will be asymmetric.

If :self:`top` is None the method will fail.

Note

This is NOT the full contact matrix unless all necessary residue pairs were used to construct this ContactGroup

Parameters
  • ctc_cutoff_Ang (float) – The cutoff to use

  • switch_off_Ang (float, default is None) – TODO

  • fragments (dict) – The fragment definitions

  • fragment_names (iterable of strings, default is None) – The names of the fragments

  • consensus_labelers (list, default is None) – It has to contain LabelerConsensus-objects, where the fragments are obtained from.

  • verbose (bool, default is False) – Be verbose

  • sparse (bool, default is False) – Delete rows and columns where all elements are < zero_freq. Since the row/column indices lose their meaning this way, a DataFrame with named row/columns is returned instead of an array If no fragment_names are passed, some will be created.

  • interface (bool, default is False) – If True, an asymmetric matrix is reported, with rows and columns representing fragments on each side of the interface, respectively. Since this is done using self.interface_residxs, and not all input fragments are necessarily contained therein, interface=True introduces a sparsity, which makes the return type be a DataFrame (see above)

  • zero_freq (float, default is 0.2) – Only has effect when sparse is True. The cutoff for a frequency to be considered zero

  • dec_round (int, default is 3) – The number of decimals to round to when reporting results. It’s assumed the CG matrix doesn’t need much precision beyond this

  • return_fragments (bool, default is False) – Whether to return the fragments that the input produced.

Returns

  • mat (numpy.ndarray or DataFrame) – The coarse-grained contact matrix

  • fragments (dict) – The fragment definitions

frequency_dataframe(ctc_cutoff_Ang, switch_off_Ang=None, atom_types=False, sort_by_freq=False, **ctc_fd_kwargs)

Output a formatted dataframe with fields “label”, “freq” and “sum”, optionally dis-aggregated by type of contact in “by_atomtypes”

Note

The contacts in the table are sorted by their order in the instantiation

Parameters
  • ctc_cutoff_Ang (float) – The cutoff to use

  • switch_off_Ang (float, default is None) – TODO

  • atom_types (bool, default is false) – Include the relative frequency of atom-type-pairs involved in the contact

  • sort_by_freq (bool, default is False) – Sort by descending frequency value, default is to keep the order of self._contacts

  • ctc_fd_kwargs (named optional arguments) – Check ContactPair.frequency_dict for more info on e.g AA_format=’short’ and or split_label

Returns

df

Return type

pandas.DataFrame

frequency_delta(otherCG, ctc_cutoff_Ang)

Compute per-contact frequency differences between self and some other ContactGroup

The difference is defined as

\(\Delta_{AB} = freq_B - freq_A\),

i.e. the delta that occurs upon “reacting” from self to otherCG

No sanity checks are performed, residue indices are assumed to have the same meaning in both self and otherCG

Parameters
  • otherCG (ContactGroup) – The ContactGroup to compute the difference with

  • ctc_cutoff_Ang (float) – The cutoff to use to compute the frequencies

Returns

  • delta_freq (1D np.ndarray) – The value resulting from doing otherCG.frequency_per_contact(ctc_cutoff_Ang)-self.frequency_per_ctc(ctc_cutoff_Ang

  • res_idxs_pairs (2D np.ndarray of len(delta_freq)) – The res_idxs_pairs for the delta_freq values

frequency_dict_by_consensus_labels(ctc_cutoff_Ang, switch_off_Ang=None, return_as_triplets=False, sort_by_interface=False, include_trilower=False)

Return frequencies as a dictionary of dictionaries keyed by consensus labels

Note

Will fail if not all residues have consensus labels TODO this is very similar to frequency_sum_per_residue_names, look at the usecase closesely and try to unify both methods

Parameters
  • ctc_cutoff_Ang (float) – The cutoff to use

  • switch_off_Ang (float, default is None) – TODO

  • return_as_triplets (bool, default is False) – Return as the dictionary as a list of triplets, s.t. freq_dict[3.50][4.50]=.25 is returned as [[3.50,4.50,.25]] Makes it easier to iterate through in other methods

  • sort_by_interface (bool, default is False) – Not implemented AT, will raise NotImplementedError

  • include_trilower (bool, default is False) – Include the transposed indexes in the returned dictionary. s.t. the contact pair [3.50][4.50]=.25 also generates [4.50][3.50]=.25

Returns

freqs

Return type

dictionary of dictionary or list of triplets (if return_as_triplets is True)

frequency_dicts(ctc_cutoff_Ang, sort_by_freq=False, **kwargs)

Wraps around the method ContactPair.frequency_dict of each of the underlying ContactPair s and returns one frequency dict keyed by contact label

Parameters
  • ctc_cutoff_Ang (float) – Cutoff in Angstrom. The comparison operator is “<=”

  • sort_by_freq (bool, default is False) – Sort by descending frequency. Default is to return in the same order as ContactGroup._contacts

  • kwargs (optional keyword arguments) – Check ContactPair.frequency_dict

Returns

fdict

Return type

dictionary

frequency_per_contact(ctc_cutoff_Ang, switch_off_Ang=None)

Frequency per contact over all trajs :param ctc_cutoff_Ang: The cutoff to use :type ctc_cutoff_Ang: float :param switch_off_Ang: TODO :type switch_off_Ang: float, default is None

Returns

freqs

Return type

1D np.ndarray of len(n_ctcs)

frequency_spreadsheet(sheet1_dataframe, sheet2_dataframes, ctc_cutoff_Ang, fname_excel, sheet1_name='pairs by frequency', sheet2_name='residues by frequency')

Write an Excel file with the Dataframe that is returned by self.frequency_dataframe.

Parameters
  • sheet1_dataframe (DataFrame) – Normally, these are pairwise frequencies

  • sheet2_dataframes (list) – Contains DataFrame objects with per-residue frequencies

  • ctc_cutoff_Ang (float) – The cutoff used

  • fname_excel (str) – The filename to save to

  • sheet1_name (str, default is "pairs by frequency",) –

  • sheet2_name (str, default is 'residues by frequency') –

frequency_str_ASCII_file(idf, ascii_file=None)

Create a string with the frequencies from a DataFrame

Parameters
  • idf (DataFrame) – A frequency table, typically generated by self.frequency_dataframe

  • ascii_file (str, default is None) – Instead of returning the formatted a table as a string, provided a filename here and write the frequencies will be directly written to it

Returns

freq_str

Return type

str or None

frequency_sum_per_residue_idx_dict(ctc_cutoff_Ang, switch_off_Ang=None, sort_by_freq=True, return_array=False)

Dictionary of aggregated frequency_per_contact per residue indices Values over 1 are possible, example if [0,1], [0,2] are always formed (=1) freqs_dict[0]=2

Parameters
  • ctc_cutoff_Ang (float) – The cutoff to use

  • switch_off_Ang (float, default is None) – TODO

  • sort_by_freq (bool, default is True) – Sort the dictionary by descending order of frequency. If False, it will be sorted by residue index. sort_by_freq only has effect if return_array is False

  • return_array (bool, default is False) – If True, the return value is not a dict but an array of len(self.top.n_residues)

Returns

freqs_dict – If dict, keys are the residue indices present in res_idxs_pairs If array, idxs are the residue indices of self.top

Return type

dictionary or array

frequency_sum_per_residue_names(ctc_cutoff_Ang, switch_off_Ang=None, sort_by_freq=True, shorten_AAs=True, list_by_interface=False, return_as_dataframe=False)

Aggregate the frequencies of frequency_per_contact by residue name, using the most informative names possible, see self.residx2resnamefragnamebest for more info on this

Parameters
  • ctc_cutoff_Ang (float) – The cutoff to use

  • switch_off_Ang (float, default is None) – TODO

  • sort_by_freq (bool, default is True) – Sort by descending order of frequencies. If list_by_interface is True, then sorting will be descending within each member of the interface, see self.interface_residxs for more info. If False, residues are in ascending order of residue indices

  • shorten_AAs (bool, default is True) – Use E30 instead of GLU30

  • list_by_interface (bool, default is False) – group the freq_dict by interface residues. Only has an effect if self.is_interface

  • return_as_dataframe (bool, default is False) – Return an DataFrame with the column names labels and freqs

Returns

res – list of dictionaries (or dataframes). If list_by_interface is True, then the list has two items, default (False) is to be of len=1

Return type

list

frequency_table(ctc_cutoff_Ang, fname, switch_off_Ang=None, write_interface=True, sort_by_freq=False, **freq_dataframe_kwargs)

Print and/or save frequencies as a formatted table

Internally, it calls frequency_spreadsheet and/or frequency_str_ASCII_file depending on the extension of fname

If you want a DataFrame use frequency_dataframe

Parameters
  • ctc_cutoff_Ang (float) – The cutoff to use

  • fname (str or None) – Full path to the desired filename Spreadsheet extensions are currently only ‘.xlsx’, all other extensions save to formatted ascii. None returns the formatted ascii string.

  • switch_off_Ang (float, default is None) – TODO

  • write_interface (bool, default is True) – Only has effect if self.is_interface is True A second sheet will be added to the table where residues are sorted by interface membership and per-residue interface participation.

  • sort_by_freq (bool, default is False) – Only has effect if self.is_interface is True and write_interface is True. Sort the second sheet by descending order of frequencies If False, residues are in ascending order within each member of the interface, as returned by self.interface_residxs

  • freq_dataframe_kwargs (dict) – Optional parameters for self.frequency_dataframe

Returns

table – If fname is None, then return the table as formatted string, using

Return type

None or str

frequency_to_bfactor(ctc_cutoff_Ang, pdbfile, geom, interface_sign=False)

Save the contact frequency aggregated by residue to a pdb file

Parameters
  • ctc_cutoff_Ang (float) – The cutoff to use

  • pdbfile (str) – The path to the pdbfile to save the geom

  • geom (mdtraj.Trajectory) – Has to have the same topology as self.top

  • interface_sign (bool, default is False) – Give the bfactor values of the members of the interface different sign s.t. the appear with different colors in a visualizer

Returns

bfactors

Return type

1D np.array of len(self.top.n_atoms)

gen_ctc_labels(**kwargs) → list

Generate a labels with different parameters

Wraps around mdciao.contacts.ContactPair.gen_label

AA_formatstr, default is “short”

Alternative is “long” (“E30” vs “GLU30”)

fragmentsbool, default is False

Include fragment information Will get the “best” information available, ie consensus>fragname>fragindex

delete_anchorbool, default is False

the anchor

Returns

labels

Return type

list

property interface_fragments

Two residue lists provided at initialization

They are supersets of the residues contained in self.interface_residxs

Empty lists mean no residues were found in the interface defined at initialization

Returns

interface_fragments

Return type

list

interface_frequency_matrix(ctc_cutoff_Ang, switch_off_Ang=None)

Rectangular matrix of size (N,M) where N is the length of the first list of interface_residxs and M the length of the second list of interface_residxs.

Note

Pairs missing from res_idxs_pairs will be NaNs, to differentiate from those pairs that were present but have zero contact

Parameters
  • ctc_cutoff_Ang (float) – The cutoff to use

  • switch_off_Ang (float, default is None) – TODO

Returns

mat

Return type

2D numpy.ndarray

property interface_labels_consensus

Consensus labels of whatever residues interface_residxs holds.

If there is no consensus labels, the corresponding label is None

property interface_residue_names_w_best_fragments_short

Best possible residue@fragment string for the residues in interface_residxs

In case neither a consensus label > fragment name > fragment index is found, nothing is returned after the residue name

property interface_residxs

The residues of self.res_idxs_pairs grouped into two lists, depending on what self.interface_fragments they belong to

Empty lists mean no residues were found in the interface defined at initialization

Returns

interface_residxs

Return type

list

property interface_reslabels_short

Residue labels of whatever residues interface_residxs holds

property is_interface

Whether this ContactGroup can be interpreted as an interface.

Note

If none of the residxs_pairs were found in the interface_residxs (both provided at initialization), this property will evaluate to False even if some indeces were parsed

property is_neighborhood

Whether this ContactGroup is a neighborhood or not

When instantiating this ContactGroup, it is checked whether all the used ContactPair have a shared :obj:anchor_residue_idx attribute, whichand whether if self.neighbors_excluded is None. This means this ContactGroup is a neighborhood around the residue stored in the attribute self.shared_anchor_residue_index

Other neighborhood-only attributes get populated, e.g.
  • self.anchor_res_and_fragment_str

  • self.anchor_res_and_fragment_str_short

  • self.partner_res_and_fragment_labels

  • self.partner_res_and_fragment_labels_short

  • self.partner_fragment_colors

  • self.anchor_fragment_color

Note that all these attributes will raise an Exception when called if self.is_neighborhood is False

Returns

is_neighborhood

Return type

bool

property max_cutoff_Ang

Operations involving cutoffs higher than this will be forbidden and wil raise ValueError.

property means

The mean value over all distance time-traces

Returns

mean – No unit transformation is done, whatever was given at instantiation (most likely nanometers), is returned here

Return type

1D np.array of len(self.n_ctcs)

property modes

//en.wikipedia.org/wiki/Mode_(statistics)>`_ over all distance time-traces

Note

In order to quickly compute modes, residue-residue distances are multiplied by 1000 and rounded to integers, to be able to use numpy.bincount for speed. Then, the argmax(bincount) is returned

Returns

modes – No unit transformation is done, whatever was given at instantiation (most likely nanometers), is returned here

Return type

1D np.array of len(self.n_ctcs)

Type

The `modes <https

property n_ctcs

The number of contact pairs (mdciao.contacts.ContactPair -objects) stored in this object

Returns

n_ctcs

Return type

int

n_ctcs_timetraces(ctc_cutoff_Ang, switch_off_Ang=None)

time-traces of the number of contacts, by summing overall contacts for each frame

Parameters
  • ctc_cutoff_Ang (float) – The cutoff to use

  • switch_off_Ang (float, default is None) – TODO

Returns

nctc_trajs

Return type

list of 1D np.ndarrays

property n_frames

List of per-trajectory n_frames

Returns

n_frames

Return type

list

property n_frames_total

Total number of frames

Returns

n_frames_total

Return type

int

property n_trajs

The number of trajectories contained in this ContactGroup

Returns

n_trajs

Return type

int

property name

The name of this ContactGroup, given when creating it

Returns

name

Return type

str

property neighbors_excluded

The number of neighbors that were excluded when creating this ContactGroup

Returns

neighbors_excluded

Return type

int

property partner_fragment_colors

The colors associated with the fragments of the anchor partner residues

The fragment colors were given as pairs of values to the individual ContactPairs that were used to instantiate this ContactGroup. These colors might have been passed by the user themselves or given by default e.g. by mdciao.cli._parse_coloring_options. Check the defaults there.

>>> CG = mdciao.examples.ContactGroupL394()
>>> CG.partner_fragment_colors
['tab:blue', 'tab:blue', 'tab:blue', 'tab:blue', 'tab:blue']

or

>>> CG = mdciao.examples.ContactGroupL394(fragment_colors=["red","blue","yellow","orange","black"])
>>> CG.partner_fragment_colors
['red', 'orange', 'red', 'orange', 'red']

Note

This colors are not automatically used by self.plot_neighborhood_freqs or self.plot_freqs_as_bars unless passed as color=self.partner_fragment_colors

Will fail if self.is_neighborhood is False

Returns

color

Return type

str

property partner_res_and_fragment_labels

List of labels the partner (not anchor) residues of this neighborhood, including fragment

Will fail if self.is_neighborhood is False

>>> CG = mdciao.examples.ContactGroupL394()
>>> CG.partner_res_and_fragment_labels
['ARG389@G.H5.21',
 'LYS270@6.32',
 'LEU388@G.H5.20',
 'LEU230@5.69',
 'ARG385@G.H5.17']
Returns

labels

Return type

list

property partner_res_and_fragment_labels_short

List of labels (short) the partner (not anchor) residues of this neighborhood, including fragment

Will fail if self.is_neighborhood is False

>>> CG = mdciao.examples.ContactGroupL394()
>>> CG.partner_res_and_fragment_labels_short
[‘R389@G.H5.21’,

‘K270@6.32’, ‘L388@G.H5.20’, ‘L230@5.69’, ‘R385@G.H5.17’]

labels : list

plot_distance_distributions(bins=10, xlim=None, jax=None, shorten_AAs=False, ctc_cutoff_Ang=None, legend_sort=True, label_fontsize_factor=1, max_handles_per_row=4, defrag=None)

Plot distance distributions for the distance trajectories of the contacts

The title will get try to get the name from self.name

Parameters
  • bins (int, default is 10) – How many bins to use for the distribution

  • xlim (iterable of two floats, default is None) – Limits of the x-axis. Outlier can stretch the scale, this forces it to a given range

  • jax (Axes, default is None) – One will be created if None is passed

  • shorten_AAs (bool, default is False) – Use amino-acid one-letter codes

  • ctc_cutoff_Ang (float, default is None) – Include in the legend of the plot how much of the distribution is below this cutoff. A vertical line will be draw at this x-value nearest bonded neighbors were excluded

  • legend_sort (boolean, default is True) – Sort the legend in descending order of frequency. Has only an effect when ctc_cutoff_Ang is not None

  • label_fontsize_factor (int, default is 1) – Labels will be written in a fontsize rcParams[“font.size”] * label_fontsize_factor

  • max_handles_per_row (int, default is 4) – legend control

  • defrag (char, default is None) – Delete fragment labels from the residue labels, “G30@frag1”->”G30”. If None, don’t delete the fragment label

Returns

jax

Return type

Axes

plot_freqs_as_bars(ctc_cutoff_Ang, title_label=None, switch_off_Ang=None, xlim=None, ax=None, color='tab:blue', shorten_AAs=False, label_fontsize_factor=1, truncate_at=None, atom_types=False, sort_by_freq=False, sum_freqs=True, total_freq=None, defrag=None)

Plot a contact frequencies as a bar plot

Parameters
  • ctc_cutoff_Ang (float) – The cutoff to use

  • title_label (str, default is None) – If None, the method will default to self.name If self.name is also None, the method will fail

  • switch_off_Ang (float, default is None) – TODO

  • xlim (float, default is None) – The right limit of the x-axis. +.5 will be added to this number to accommodate some padding around the bars. If None, it’s chosen automatically

  • ax (Axes, default is None) – Draw into this axis. If None is passed, then one will be created

  • shorten_AAs (bool, default is None) – Shorten residue labels from “GLU30” to “E30”

  • color (color-like (str or RGB triple) or list thereof, default is "tab:blue") – The color for the bars. If string or RGB array, all bars will have this color. If list, it’s assumed in the order of the self.res_idx_pairs. It will get re-sorted according to sort, s.t. residues always have the same color not matter the order

  • shorten_AAs – Shorten residue labels from “GLU30” to “E30”

  • label_fontsize_factor (float, default is 1) – Labels will be written in a fontsize rcParams[“font.size”] * label_fontsize_factor

  • truncate_at (float, default is None) – Only plot frequencies above this value. Default is to plot all

  • atom_types (bool, default is False) – Use stripe-patterns to inform about the types of interactions (sidechain, backbone, etc)

  • sort_by_freq (boolean, default is False) – The frequencies are by default plotted in the order in which the ContactPair-objects are stored in the ContactGroup-object’s _contact_pairs This order depends on the ctc_cutoff_Ang originally used to instantiate this ContactPair If True, you can re-sort them with this cutoff for display purposes only (the original order is untouched)

  • sum_freqs (bool, default is True) – Inform, in the legend and in the title, about the sum of frequencies/bar-heights being plotted

  • total_freq (float, default is None) – Add a line to the title informing about the fraction of the total_freq that’s being plotted in the figure. Only has an effect if sum_freqs is True

  • defrag (str, default is None) – Delete fragment labels from the residue labels, “G30@frag1”->”G30”. If None, don’t delete the fragment label

Returns

ax

Return type

Axes

plot_freqs_as_flareplot(ctc_cutoff_Ang, fragments=None, fragment_names=None, fragment_colors=None, consensus_maps=None, SS=None, scheme='auto', **kwargs_freqs2flare)

Produce contact flareplots by wrapping around mdciao.flare.freqs2flare

Note

The logic to assign fragments and colors can lead to unexpected behavior in cases where too much guess-work has to be done. If a particular combination of fragments and colors is desired but not achievable through this method, it is highly recommended the user uses mdciao.flare.freqs2flare directly and experiment there with parameter combinations. It is also a good idea to check out the notebook called “Controlling Flareplots”

Parameters
  • ctc_cutoff_Ang (float) – The cutoff to use

  • fragments (list of iterables, default is None) – The way the topology is fragmented. Default is to put all residues in one fragment. This optarg can modify the behaviour of scheme=’all’, since residues absent from fragments will not be plotted, see below.

  • fragment_names (list of strings, default is None) – The fragment names, at least len(fragments)

  • fragment_colors (None or list of color-likes) – Will be used to give the fragments their colors, needs to be color-like and of len(fragments)

  • consensus_maps (list, default is None) –

    The items of this list are either:
    • indexables containing the consensus

      labels (strings) themselves. They need to be “gettable” by residue index, i.e. dict, list or array. Typically, one generates these maps by using the top2labels method of the LabelerConsensus object

    • LabelerConsensus-objects

      When these objects are passed, their top2labels and top2fragments methods are called on-the-fly, generating not only the consensus labels but also the consensus fragments (i.e. subdomains) to further fragment the topology into sub-domains, like TM6 or G.H5. If fragments are parsed, they will be made compatible with the consensus fragments.

    If you want the consensus labels but not the sub-fragmentation, simply use the first option.

  • SS (secondary structure information, default is None) –

    Whether and how to include information about secondary structure. Can be many things:

    • triple of ints (CP_idx, traj_idx, frame_idx)

      Go to contact group CP_idx, trajectory traj_idx and grab this frame to compute the SS. Will read xtcs when necessary or otherwise directly grab it from a mdtraj.Trajectory in case it was passed. Ignores potential stride values. See ContactPair.time_traces for more info

    • True

      same as [0,0,0]

    • None or False

      Do nothing

    • mdtraj.Trajectory

      Use this geometry to compute the SS

    • string

      Path to a filename, of which only the first frame will be read. The SS will be computed from there. The file will be tried to read first without topology information (e.g. .pdb, .gro, .h5) will work, and when this fails, self.top will be passed (e.g. .xtc, .dcd)

    • array_like

      Use the SS from here, s.t. ss_inf[idx] gives the SS-info for the residue with that idx

  • scheme (str, default is 'auto') –

    How to decide which residues to plot
    • ’all’

      plot as many residues as possible. E.g., if a self.topology is present, plot all its residues. This can be modified with fragments, see above. Using ‘all’ without any fragments means that the topology won’t be separated into interface fragments, even if it is an interface. Given that some of the topology (which the user insists on plotting) might not have been assigned to either side of the interface, it’s unclear how to proceed here.

    • ’interface’:

      use only the fragments in self.interface_fragments. Will only work if self.is_interface is True

    • ’auto’

      Uses self.is_interface to decide. If True, scheme is set to ‘interface’. If False, e.g. a residue neighborhood or a site, then scheme is set to ‘all’

    • ’interface_sparse’:

      like ‘interface’, but using the input fragments to break self.interface_fragments (which are only two, by definition) further down into other fragments. Of these, show only the ones where at least one residue participates in the interface. If fragments is None, scheme=’interface’ and scheme=’interface_sparse’ are the same thing.

    • ’residues’:

      plot only the residues present in self.res_idxs_pairs

    • ’residues_sparse’ :

      plot only the residues that have a non-zero frequency

    • ’consensus_sparse’:

      like ‘interface_sparse’, but leaving out sub-domains not participating in the interface with any contacts.For this, the consensus_maps need to be actual LabelerConsensus-objects

  • kwargs_freqs2flare (optargs) –

    Keyword arguments for mdciao.flare.freqs2flare. Note that many of these kwargs will be overwritten, mostly to accommodate the scheme+fragment+color combinations, but not only (please see the note above). These are the kwargs that this method manipulates internally and might be overwritten: * top, ss_array, fragments, fragment_names

    fragment_names, colors

    Note that some of freqs2flare kwargs (in particular sparse_residues) might alter (with or w/o conflict) the scheme option.

Returns

plot_frequency_sums_as_bars(ctc_cutoff_Ang, title_str, switch_off_Ang=None, xmax=None, jax=None, shorten_AAs=False, label_fontsize_factor=1, truncate_at=0, bar_width_in_inches=0.75, list_by_interface=False, sort_by_freq=True, interface_vline=False)

Bar plot with per-residue sums of frequencies (called Sigma in mdciao)

Parameters
  • ctc_cutoff_Ang (float) – The cutoff to use

  • title_str (str) – The title of the plot

  • switch_off_Ang (float, default is None) – TODO

  • xmax (float, default is None) – X-axis will extend from -.5 to xmax+.5

  • jax (obj:~matplotlib.axes.Axes`, default is None) – If None, one will be created, else draw here

  • shorten_AAs (boolean, default is False) – Unused ATM

  • label_fontsize_factor (float, default is 1) – Some control over fontsizes when plotting a high number of bars

  • truncate_at (float, default is 0) – Do not show sums of freqs lower than this value

  • bar_width_in_inches (float, default is .75) – If no jax is parsed, this controls that the drawn figure always has a size proportional to the number of frequencies being shown. Allows for combining multiple subplots with different number of bars in one figure with all bars equally wide regardles of the subplot

  • list_by_interface (boolean, default is True) – Separate residues by interface

  • sort_by_freq (boolean, default is True) – Sort sums of freqs in descending order

  • interface_vline (bool, default is False) – Plot a vertical line visually separating both interfaces

Returns

ax

Return type

Axes

plot_interface_frequency_matrix(ctc_cutoff_Ang, switch_off_Ang=None, transpose=False, label_type='best', **plot_mat_kwargs)

Plot the interface_frequency_matrix

The first group of interface_residxs are the row indices, shown in the y-axis top-to-bottom (since imshow is used to plot) The second group of interface_residxs are the column indices, shown in the x-axis left-to-right

Parameters
  • ctc_cutoff_Ang (float) – The cutoff to use

  • switch_off_Ang (float, default is None) – TODO

  • transpose (bool, default is False) – Transpose the contact matrix in the plot

  • label_type (str, default is "best") – Best tries resname@consensus(>fragname>fragidx) Alternatives are “residue” or “consensus”, but “consensus” alone might lead to empty labels since it is not guaranteed that all residues of the interface have consensus labels

  • plot_mat_kwargs (see plot_mat) – pixelsize, transpose, grid, cmap, colorbar

Returns

  • iax (Axes)

  • fig (matplotlib.pyplot.Figure)

plot_neighborhood_freqs(ctc_cutoff_Ang, switch_off_Ang=None, color='tab:blue', xmax=None, ax=None, shorten_AAs=False, label_fontsize_factor=1, sum_freqs=True, plot_atomtypes=False, sort_by_freq=False)

Wrapper around ContactGroup.plot_freqs_as_bars for plotting neighborhoods

#TODO perhaps get rid of the wrapper altogether. ATM it would break the API

Parameters
  • ctc_cutoff_Ang (float) – The cutoff to use

  • switch_off_Ang (float, default is None) – TODO

  • color (color-like (str or RGB triple) or list thereof, default is "tab:blue") – The color for the bars. If string or RGB array, all bars will have this color. If list, it’s assumed in the order of the self.res_idx_pairs. It will get re-sorted according to sort, s.t. residues always have the same color not matter the order

  • xmax (int, default is None) – Default behaviour is to go to n_ctcs, use this parameter to homogenize different calls to this function over different contact groups, s.t. each subplot has equal xlimits

  • ax (Axes, default is None) – Axes to plot into, if None, one will be created

  • shorten_AAs (bool, default is False,) – Shorten residue names from “GLU30”->”E30”

  • label_fontsize_factor (float, default is 1) – Fontsize for the tilted labels and the legend, as fraction [0,1] of the default value in rcParams[“font.size”]

  • sum_freqs (bool, default is True) – Add the sum of frequencies of the represented (and only those) frequencies

  • plot_atomtypes (bool, default is False) – Add stripes to frequency bars to include the atom-types (backbone, sidechain, etc)

  • sort_by_freq (boolean, default is False) – The frequencies are by default plotted in the order in which the ContactPair-objects are stored in the ContactGroup-object’s _contact_pairs This order depends on the ctc_cutoff_Ang originally used to instantiate this ContactPair If True, you can re-sort them with this cutoff for display purposes only (the original order is untouched)

Returns

ax

Return type

Axes

plot_timedep_ctcs(panelheight, plot_N_ctcs=True, pop_N_ctcs=False, skip_timedep=False, **plot_timetrace_kwargs)

For each trajectory, plot the time-traces of the all the contacts (one per panel) and/or the timetrace of the overall number of contacts

Parameters
  • panelheight (int) – The height of the per-contact panels

  • plot_N_ctcs (bool, default is True) – Add an extra panel at the bottom of the figure containing the number of formed contacts for each frame for each trajecotry A valid cutoff has to be passed along in plot_contact_kwargs otherwise this has no effect

  • pop_N_ctcs (bool, default is False) – Put the panel with the number of contacts in a separate figure A valid cutoff has to be passed along in plot_contact_kwargs otherwise this has no effect

  • skip_timedep (bool, default is False) – Skip plotting the individual timetraces and plot only the time trace of overall formed contacts. This sets pop_N_ctcs to True internally

  • plot_timetrace_kwargs (dict) – Optional parameters for _plot_timedep_Nctcs

Returns

  • list_of_figs (list) – The wanted figure(s)

    Note

  • —- – The keywords plot_N_ctcs, pop_N_ctcs, and skip_timedep allow this method to both include or totally exclude the total number of contacts and/or the time-traces in the figure. This might change in the figure, it was coded this way to avoid breaking the command_line tools API. Also note that some combinations will produce an empty return!

plot_violins(sort_by=False, ctc_cutoff_Ang=None, truncate_at_mean=None, zero_freq=0.01, switch_off_Ang=None, ax=None, title_label=None, xmax=None, color='tab:blue', shorten_AAs=False, label_fontsize_factor=1, sum_freqs=True, defrag=None, stride=1)

Plot residue-residue distances as violin plots violinplot

The default behaviour is to plot all residue pairs in the order in which the ContactPair-objects are stored in the ContactGroup. You can check this order in self.res_idxs_pairs. This order typically depends on the original ctc_cutoff_Ang used to instantiate this ContactGroup, which might not carry the same meaning here.

For more than 50 contacts or so, violin plots take some time to compute, because a Gaussian-kernel-density estimation is done for each residue pair.

Also, plots with many residue pairs simply might be difficult to read.

Hence, to control the number of shown contacts, you can control the you can use these parameters, sorted somewhat hierarchically

  • sort

  • ctc_cutoff_ang

  • truncate_at_mean

  • zero_freq

Please check their documentation below.

Finally, if the plots still take too long to compute/show for the desired number of violins, try reducing the amount of data by using stride > 1

Parameters
  • sort_by (iterable of ints, boolean, int, default is False) –

    Can be different things:
    • iterable of ints

      Strongest selection. Show only these residue pairs, in this order. Indices are intended as self.res_idxs_pairs indices. All other parameters are ignored.

    • boolean False

      Don’t sort, i.e. use the original order

    • boolean True

      Sort. There’s two options for sorting, depending on the value of ctc_cutoff_Ang (more below)

      • sort by distance means, ascending: ctc_cutoff_Ang is None

      • sort by contact-frequencies, descending: ctc_cutoff_Ang is needed is a float

        For contacts with zero frequency, fallback on ascending distance means This it means that you frequent contacts will be displayed first (=sorted by freq high to low). followed by infrequent ones sorted form (short to long)

    • int n

      Like True but up to n contacts at most. Other parameters like truncate_at_mean can reduce this number automatically

  • ctc_cutoff_Ang (opt, default is None) – If provided, contact-frequencies will be computed and shown in the contact-labels. Additionally, if sort is True or int, then the violins are sorted by contact-frequency in the plot

  • truncate_at_mean (float, default is None) – Don’t show violins with mean values higher than this (in Angstrom). This remains effectless for contacts in which the mean is above the cutoff BUT the frequency is > zero_freq. This case is very common, since a contact can be formed at small distances but broken at very large ones, s.t. the mean (or median) values are meaningless.

  • zero_freq (float, default is 1e-2) – Frequencies below this number will be considered zero and not shown. For this parameter to have effect, you need a ctc_cutoff_Ang

  • switch_off_Ang (float, default is None) – TODO

  • ax (None or Axes, default is None) – The axis to plot into. If None, one will be created

  • title_label (str, default is None) – If None, the method will default to self.name If self.name is also None, the method will fail

  • xmax (float, default is None) – X-axis will extend from -.5 to xmax+.5

  • color (iterable (list or dict), or str, default is None) –

    • list, the colors will be reordered so that the same residue pair always gets the same color, regardless of order in which they appear. This way you can track a violin across different sorting orders

    • str, it has to be a matplotlib color or a case-sensitive matplotlib colorname https://matplotlib.org/stable/tutorials/colors/colormaps.html

    • dict, keys are integers and values are colors This is the best way to work with sort is an iterable of ints, e.g. [ii,jj], because you can pass only those colors here as {ii:”red”,jj:”blue”}

    • If None, the ‘tab10’ colormap (tableau) is chosen

  • shorten_AAs (bool, default is None) – Shorten residue labels from “GLU30”->”E30”

  • label_fontsize_factor (float, default is 1) – Labels will use the fontsize rcParams[“font.size”]*label_fontsize_factor

  • sum_freqs (bool, default is True) – Whether to sum per-contact frequencies and place the in the label as \(Sigma\) values

  • defrag (char, default is None) – Whether to leave out the fragment affiliation, e.g. “GLU30@3.50” w/ defrag=”@” appears as “GLU30” only

  • stride (int,default is 1) – Stride the data down by this much, in case the computation of the violins takes too long

Returns

  • ax (Axes)

  • order (np.ndarray) –

    Indices of the plotted residue pairs,

    in the order in which they were plotted.

    Is the result from the combination of the above selection parameters

relabel_consensus(new_labels=None)

Relabel any residue missing its consensus label to shortAA

Alternative (or additional) labels can be given as a dictionary.

Parameters

new_labels (dict) – keyed with shortAA-codes and valued with the new desired labels

Warning

For expert use only. The changes in consensus labels propagates down to the attribute consensus labels of the the low-level attribute Residues.consensus_labels of the Residues objects underlying each of the ContactPair`s in this :obj:`ContactGroup

relative_frequency_formed_atom_pairs_overall_trajs(ctc_cutoff_Ang, switch_off_Ang=None, **kwargs) → list

Relative frequencies interaction-type (by atom-type) for all contact-pairs in the ContactGroup

“Relative” means that they will sum up to 1 regardless of the contact’s frequency

>>> CG = mdciao.examples.ContactGroupL394()
>>> CG.relative_frequency_formed_atom_pairs_overall_trajs(3.5)
[{'SC-BB': 0.33, 'SC-SC': 0.52, 'BB-BB': 0.12},
 {'BB-SC': 0.73, 'SC-SC': 0.27},
 {'BB-BB': 0.84, 'SC-SC': 0.16},
 {'SC-SC': 1.0},
 {'SC-SC': 0.5, 'BB-SC': 0.5}]
Parameters
  • ctc_cutoff_Ang (float) – Cutoff in Angstrom. The comparison operator is “<=”

  • switch_off_Ang (float, default is None) – TODO

Other Parameters
  • keep_resname (bool, default is False) – Keep the atom’s residue name in its descriptor. Only make sense if consolidate_by_atom_type is False

  • aggregate_by_atomtype (bool, default is True) – Aggregate the frequencies of the contact by tye atom types involved. Atom types are backbone, sidechain or other (BB,SC, X)

  • min_freq (float, default is .05) – Do not report relative frequencies below this cutoff, e.g. “BB-BB”:.9, “BB-SC”:0.03, “SC-SC”:0.03, “SC-BB”:0.03 gets reported as “BB-BB”:.9

Returns

refreq_dicts – Lists of dictionaries with the relative freqs, keyed by atom-type (atoms) involved in the contact The order is the same as in self.ctc_labels

Return type

list

repframes(scheme='mode', ctc_cutoff_Ang=None, return_traj=False, show_violins=False, n_frames=1)

Find representative frames for this ContactGroup

A “representative frame” means, in this context, a frame that minimizes the average distance to the modes (or means) of the residue-residue distances contained in this object.

Please note that “representative” can have other meanings in other contexts. Here, it’s just a way to pick a frames/geometries that will most likely resemble most of what is also seen in the distributions, barplots and flareplots.

Please also note that minimizing averages has its own limitations and might not always yield the best result, However, it is the easiest and quickest to implement. Feel free to use any of Sklearn’s great regression tools under constraints to get a better “representative”

Parameters
  • scheme (str, default is "mode") –

    Two options: * “mode” : minimize average distance

    to the most likely distance, i.e. to the mode, i.e. to the distance values at which the distributions (plot_distance_distributions') peak. You can check the modes in :obj:`~mdciao.contacts.ContactGroup.modes

    • ”mean” : minimize average distance to the mean values of the distances You can check the means in means

  • ctc_cutoff_Ang (float, default is None) – THIS IS EXPERIMENTAL If given, the contact frequencies will be used as weights when computing the average. In cases with many contacts, many of them broken, this might help

  • return_traj (bool, default is False) – If True, try to return also the Trajectory objects Will fail that is not possible because the original files aren’t accessible (or there weren’t any)

  • show_violins (bool, default is False) – Superimpose the distance values as dots on top of a violin plot, created by using the plot_violins

  • n_frames (int, default is 1) – The number of representative frames to return

Returns

  • frames (list) – A list of n_frames tuples, each tuple containing the traj_idx and the frame_idx that minimize RMSDd

  • RMSDd (np.ndarray) – A 1D array containing the root-mean-square-deviation (in Angstrom) over distances (not positions) of the returned frames to the computed reference. This mean is weighted by the contact frequencies in case a ctc_cutoff_Ang was given. Should always be in ascending order

  • values (np.ndarray) – A 2D array of shape(n_frames, n_ctcs) containing the distance values of the frames in Angstrom

  • trajs (list) – A list of Trajectory objects Only if return_traj=True

property res_idxs_pairs

Pairs of residue indices of the contacts in this object

Returns

res_idxs_pairs

Return type

_np.ndarray

property residue_names_long

Pairs of long residue names of the ContactPairs

>>> CG = mdciao.examples.ContactGroupL394()
>>> CG.residue_names_short
[['ARG389', 'LEU394'],
 ['LEU394', 'LYS270'],
 ['LEU388', 'LEU394'],
 ['LEU394', 'LEU230'],
 ['ARG385', 'LEU394']]
Returns

residue_names_long

Return type

list

property residue_names_short

Pairs of short residue names of the ContactPairs

>>> CG = mdciao.examples.ContactGroupL394()
>>> CG.residue_names_short
[['R389', 'L394'],
 ['L394', 'K270'],
 ['L388', 'L394'],
 ['L394', 'L230'],
 ['R385', 'L394']]
Returns

residue_names_short

Return type

list

property residx2consensuslabel

Dictionary mapping residue indices to consensus labels:

>>> CG = mdciao.examples.ContactGroupL394()
>>> CG.residx2consensuslabel
{348: 'G.H5.21',
 353: 'G.H5.26',
 972: '6.32',
 347: 'G.H5.20',
 957: '5.69',
 344: 'G.H5.17'}
Returns

residx2consensuslabel

Return type

dict

residx2ctcidx(idx)

Indices of the contacts and the position (0 or 1) in which the residue with residue idx appears

>>> CG = examples.ContactGroupL394()
>>> CG.res_idxs_pairs
array([[348, 353],
       [353, 972],
       [347, 353],
       [353, 957],
       [344, 353]])
>>> CG.residx2ctcidx(347)
array([[2, 0]])
Parameters

idx (int) – A residue index

Returns

ctc_idxs – The first index is the contact index, the second the pair index (0 or 1)

Return type

2D np.ndarray of shape (N,2)

property residx2fragnamebest

Dictionary mapping residue indices to best possible fragment names

“best” means consensus label > fragment name > fragment index

>>> CG = mdciao.examples.ContactGroupL394()
>>> CG.residx2fragnamebest
{348: 'G.H5.21',
 353: 'G.H5.26',
 972: '6.32',
 347: 'G.H5.20',
 957: '5.69',
 344: 'G.H5.17'}
Returns

residx2fragnamebest

Return type

dict

residx2resnamefragnamebest(fragsep='@', shorten_AAs=True) → dict

Dictionary mapping residue indices to best possible residue+fragment label

“best” means consensus label > fragment name > fragment index

Parameters
Returns

residx2resnamefragnamebest

Return type

dict

property residx2resnamelong

Dictionary mapping residue indices to short residue names:

>>> CG = mdciao.examples.ContactGroupL394()
>>> CG.residx2resnamelong
{348: 'ARG389',
 353: 'LEU394',
 972: 'LYS270',
 347: 'LEU388',
 957: 'LEU230',
 344: 'ARG385'}
Returns

residx2resnamelong

Return type

dict

property residx2resnameshort

Dictionary mapping residue indices to short residue names:

>>> CG = mdciao.examples.ContactGroupL394()
>>> CG.residx2resnameshort
{348: 'R389',
 353: 'L394',
 972: 'K270',
 347: 'L388',
 957: 'L230',
 344: 'R385'}
Returns

residx2resnameshort

Return type

dict

retop(top, mapping, deepcopy=False)

Return a copy of this object with a different topology.

Uses the mapping to generate new residue-indices where necessary, using the rest of the attributes (time-traces, labels, colors, fragments…) as they were

Wraps thinly around mdciao.contacts.ContactPair.retop

Note

When re-topping interfaces, those residues of the ‘old’ interface_fragments which are not covered by the mapping will be missing in the ‘new’ interface_fragments. However, the new interface is guaranteed to have at least all the ‘new’ interface_residxs mapped. So, as long as the ‘old’ interface_residxs are covered by the mapping, this isn’t a problem (TODO except, perhaps, when plotting flareplots using the spare=”interface” option after re-topping)

Parameters
  • top (Topology) – The new topology

  • mapping (indexable (array, dict, list)) – A mapping of old residue indices to new residue indices. Usually, comes from aligning the old and the new topology using mdciao.utils.sequence.maptops.

  • deepcopy (bool, default is False) – Use copy.deepcopy on the attributes when creating the new ContactPair.

Returns

CG

Return type

ContactGroup

save(filename)

Save this ContactGroup as a pickle

Parameters

filename (str) – filename

save_trajs(prepend_filename, ext, output_dir='.', t_unit='ps', verbose=False, ctc_cutoff_Ang=None, self_descriptor='mdciaoCG')

Save time-traces to disk.

FileNames will be created based on the property self.trajlabels, but using only the basenames and prepending with the string prepend_filename

If there is an anchor residue (i.e. this ContactGroup is a neighborhood, the anchor will be included in the filename, otherwise the string “contact_group” will be used. You can control the output_directory using output_dir

If a ctc_cutoff is given, the time-traces will be binarized (see self.binarize_trajs). Else, the distances themselves are stored.

Parameters
  • prepend_filename (str) – Each filename will be prepended with this string

  • ext (str) – Extension, can be “xlsx” or anything numpy.savetext can handle

  • output_dir (str, default is ".") – The output directory

  • t_unit (str, default is "ps") – Other units are “ns”, “mus”, and “ms”. The transformation happens internally

  • verbose (boolean, default is False) – Prints filenames

  • ctc_cutoff_Ang (float, default is None) – Use this cutoff and save bintrajs instead

  • self_descriptor (str, default is "mdciaoCG") – Saved filenames will be tagged with this descriptor

Returns

Return type

None

property shared_anchor_residue_index

The index of the anchor residue, i.e. the residue at the center of this neighborhood

Only populated if self.is_neighborhood is True, else returns None

Returns

idx

Return type

int

property stacked_time_traces

All ContactPair time_traces stacked into an 2D np.array

Returns

data – The array is of shape(self.n_frames_total, self.n_ctcs)

Return type

np.ndarray

property time_arrays

The time-arrays of each trajectory contained in this ContactGroup

Returns

time_arrays – The units of these arrays will be whatever was given to the ContactPairs used to instantiate this ContactGroup

Return type

list

property time_max

Maximum time-value of the ContactGroup

Returns

time_max – Its units will be whatever was given to the ContactPairs used to instantiate this ContactGroup. The most frequent case are “ps”, since that’s how time arrays are stored in xtc files

Return type

float

property time_min

Minimum time-value of the ContactGroup

Returns

time_min – Its units will be whatever was given to the ContactPairs used to instantiated this ContactGroup. The most frequent case are “ps”, since that’s how time arrays are stored in xtc files

Return type

float

to_new_ContactGroup(CSVexpression, allow_multiple_matches=False, merge=True)

Creates a new ContactGroup from this une using a CSV expression to filter for residues

Parameters
  • CSVexpression (str) – CSV expression like “GLU30,K*” to select the residue-pairs of self for the new ContactGroup. See mdciao.utils.residue_and_atom.find_AA for the syntax of the expression.

  • allow_multiple_matches (bool, default is False) – Fail if the substrings of the CSVexpression return more than one residue. Protects from over-grabbing residues

  • merge (bool, default is True) – Merge the selected residue-pairs into one single ContactGroup. If False every sub-string of CSVexpression returns its own ContactGroup

Returns

newCG – If dict, it’s keyed with substrings of CSVexpression and valued with ContactGroups

Return type

ContactGroup or dict

property top

The topology used to instantiate the ContactPairs in this ContactGroup

>>> CG = mdciao.examples.ContactGroupL394()
>>> CG.top
<mdtraj.Topology with 1 chains, 1044 residues, 8384 atoms, 8502 bonds at 0x7efdae47e990>

top : :obj:~mdtraj.Trajectory or None

property topology

The topology used to instantiate the ContactPairs in this ContactGroup

>>> CG = mdciao.examples.ContactGroupL394()
>>> CG.top
<mdtraj.Topology with 1 chains, 1044 residues, 8384 atoms, 8502 bonds at 0x7efdae47e990>

topology : :obj:~mdtraj.Trajectory or None

property trajlabels

List of trajectory labels

If labels were not passed, then labels like ‘traj 0’,’traj 1’ and so on are assigned. If Trajectory objects were passed, then the “mdtraj” descriptor will be used If filenames were passed, then the labels are the filenames (basename, no files) without the extension

>>> CG = mdciao.examples.ContactGroupL394()
>>> CG.trajlabels
['gs-b2ar.noH.stride.5']
Returns

trajlabels

Return type

list