mdciao.contacts.ContactGroup¶
-
class
mdciao.contacts.
ContactGroup
(list_of_contact_objects, interface_fragments=None, top=None, name=None, neighbors_excluded=None, use_AA_when_conslab_is_missing=True, max_cutoff_Ang=None)¶ Container for
ContactPair
-objectsThis class is the second level of abstraction after
ContactPair
and provides methods toperform operations on all the contact-pairs simultaneously and
plot/show/save the result of these operations
In many cases, the methods of
ContactGroup
thinly wrap and iterate around equally named methods of theContactPair
-objects.Note
Higher-level methods in the API, like those exposed by
mdciao.cli
will returnContactPair
orContactGroup
objects already instantiated and ready to use. It is recommened to use those instead of individually callingContactPair
orContactGroup
.-
__init__
(list_of_contact_objects, interface_fragments=None, top=None, name=None, neighbors_excluded=None, use_AA_when_conslab_is_missing=True, max_cutoff_Ang=None)¶ - Parameters
list_of_contact_objects (list) – list of
ContactPair
objectsinterface_fragments (list of two iterables of indexes, default is None) –
An interface is defined by two groups of residue indices.
This input doesn’t need to have all or any of the residue indices in
res_idxs_pairs
.This input will be will be used to group the object’s own residue idxs present in
residxs_pairs
into the two groups of the interface. These two groups will be accessible through the attribute self.interface_residxsIt will remain accessible through the object’s equally named the attribute self.interface_fragments
top (
Topology
, default is None) – The molecular topology associated with this object. Normally, the default behaviour is enough. It checks whether all ContactPairs oflist_of_contact_objects
share the same self.top and use that one. If they have different topologies, the method fails, since you can’t instantiate a ContactGroup with ContactPairs from different. In case the ContactPairs don’t have any topology at all (self.top is None for all ContactPairs) you can pass one here. Or, if the have one, and you pass one here, it will be checked thattop
provided here coincides with the ContactPairs’ shared topologyname (string, default is None) – Optional name you want to give this object, ATM it is only used for the title of the
ContactGroup.plot_distance_distributions
title when the object is not a neighborhoodneighbors_excluded (int, default is None) – The neighbors excluded when creating the underlying ContactPairs passed in
list_of_contact_objects
max_cutoff_Ang (float, default is None) – Operations involving cutoffs higher than this will be forbidden and will raise ValueError. Prevents the user from asking for contact-frequencies that aren’t present in the ContactGroup
Methods
__init__
(list_of_contact_objects[, …])- param list_of_contact_objects
list of
ContactPair
objects
archive
([filename])Save this
ContactGroup
’s list ofContactPairs
as a list of dictionaries that can be used to re-instantiate an equivalentContactGroup
binarize_trajs
(ctc_cutoff_Ang[, …])Binarize trajs
copy
()copy this object by re-instantiating another
ContactGroup
object with the same attributes.distribution_dicts
([bins])Wraps around the method
ContactGroup.distributions_of_distances
and returns one distribution dict keyed by contact label (see kwargs and CP.label_flexdistributions_of_distances
([bins])Histograms the distance values of each contact, returning a list with as many distributions as there are contacts.
frequency_as_contact_matrix
(ctc_cutoff_Ang)Returns a symmetrical, square matrix of size
top
.n_residues containing the frequencies of the pairs inresidxs_pairs
, and those pairs only, the rest will be NaNsfrequency_as_contact_matrix_CG
(ctc_cutoff_Ang)Coarse-grained contact-matrix
frequency_dataframe
(ctc_cutoff_Ang[, …])Output a formatted dataframe with fields “label”, “freq” and “sum”, optionally dis-aggregated by type of contact in “by_atomtypes”
frequency_delta
(otherCG, ctc_cutoff_Ang)Compute per-contact frequency differences between
self
and some otherContactGroup
Return frequencies as a dictionary of dictionaries keyed by consensus labels
frequency_dicts
(ctc_cutoff_Ang[, sort_by_freq])Wraps around the method
ContactPair.frequency_dict
of each of the underlyingContactPair
s and returns one frequency dict keyed by contact labelfrequency_per_contact
(ctc_cutoff_Ang[, …])Frequency per contact over all trajs :param ctc_cutoff_Ang: The cutoff to use :type ctc_cutoff_Ang: float :param switch_off_Ang: TODO :type switch_off_Ang: float, default is None
frequency_spreadsheet
(sheet1_dataframe, …)Write an Excel file with the
Dataframe
that is returned byself.frequency_dataframe
.frequency_str_ASCII_file
(idf[, ascii_file])Create a string with the frequencies from a
DataFrame
Dictionary of aggregated
frequency_per_contact
per residue indices Values over 1 are possible, example if [0,1], [0,2] are always formed (=1) freqs_dict[0]=2frequency_sum_per_residue_names
(ctc_cutoff_Ang)Aggregate the frequencies of
frequency_per_contact
by residue name, using the most informative names possible, seeself.residx2resnamefragnamebest
for more info on thisfrequency_table
(ctc_cutoff_Ang, fname[, …])Print and/or save frequencies as a formatted table
frequency_to_bfactor
(ctc_cutoff_Ang, …[, …])Save the contact frequency aggregated by residue to a pdb file
gen_ctc_labels
(**kwargs)Generate a labels with different parameters
interface_frequency_matrix
(ctc_cutoff_Ang[, …])Rectangular matrix of size (N,M) where N is the length of the first list of
interface_residxs
and M the length of the second list ofinterface_residxs
.n_ctcs_timetraces
(ctc_cutoff_Ang[, …])time-traces of the number of contacts, by summing overall contacts for each frame
plot_distance_distributions
([bins, xlim, …])Plot distance distributions for the distance trajectories of the contacts
plot_freqs_as_bars
(ctc_cutoff_Ang[, …])Plot a contact frequencies as a bar plot
plot_freqs_as_flareplot
(ctc_cutoff_Ang[, …])Produce contact flareplots by wrapping around
mdciao.flare.freqs2flare
plot_frequency_sums_as_bars
(ctc_cutoff_Ang, …)Bar plot with per-residue sums of frequencies (called Sigma in mdciao)
plot_interface_frequency_matrix
(ctc_cutoff_Ang)Plot the
interface_frequency_matrix
plot_neighborhood_freqs
(ctc_cutoff_Ang[, …])Wrapper around
ContactGroup.plot_freqs_as_bars
for plotting neighborhoodsplot_timedep_ctcs
(panelheight[, …])For each trajectory, plot the time-traces of the all the contacts (one per panel) and/or the timetrace of the overall number of contacts
plot_violins
([sort_by, ctc_cutoff_Ang, …])Plot residue-residue distances as violin plots
violinplot
relabel_consensus
([new_labels])Relabel any residue missing its consensus label to shortAA
Relative frequencies interaction-type (by atom-type) for all contact-pairs in the ContactGroup
repframes
([scheme, ctc_cutoff_Ang, …])Find representative frames for this
ContactGroup
residx2ctcidx
(idx)Indices of the contacts and the position (0 or 1) in which the residue with residue
idx
appearsresidx2resnamefragnamebest
([fragsep, …])Dictionary mapping residue indices to best possible residue+fragment label
retop
(top, mapping[, deepcopy])Return a copy of this object with a different topology.
save
(filename)Save this
ContactGroup
as a picklesave_trajs
(prepend_filename, ext[, …])Save time-traces to disk.
to_new_ContactGroup
(CSVexpression[, …])Creates a new
ContactGroup
from this une using a CSV expression to filter for residuesAttributes
The color associated with the fragment of the anchor residue
Label of the anchor residue of this neighborhood, including fragment
Label of the anchor residue (short) of this neighborhood, including fragment
List of pairs of labels derived from GPCR, CGN or other type of consensus nomenclature.
Dictionary mapping consensus labels to residue names:
List of simple labels (no fragment info) for the residue pairs in ContactPairs
List of simple labels (no fragment info, short AAs) for the residue pairs in ContactPairs
List of labels ) for the residue pairs in ContactPairs
Best possible fragment names for the residue pairs in ContactPairs
Two residue lists provided at initialization
Consensus labels of whatever residues
interface_residxs
holds.Best possible residue@fragment string for the residues in
interface_residxs
The residues of self.res_idxs_pairs grouped into two lists, depending on what self.interface_fragments they belong to
Residue labels of whatever residues
interface_residxs
holdsWhether this ContactGroup can be interpreted as an interface.
Whether this ContactGroup is a neighborhood or not
Operations involving cutoffs higher than this will be forbidden and wil raise ValueError.
The mean value over all distance time-traces
//en.wikipedia.org/wiki/Mode_(statistics)>`_ over all distance time-traces
The number of contact pairs (
mdciao.contacts.ContactPair
-objects) stored in this objectList of per-trajectory n_frames
Total number of frames
The number of trajectories contained in this ContactGroup
The name of this ContactGroup, given when creating it
The number of neighbors that were excluded when creating this ContactGroup
The colors associated with the fragments of the anchor partner residues
List of labels the partner (not anchor) residues of this neighborhood, including fragment
List of labels (short) the partner (not anchor) residues of this neighborhood, including fragment
Pairs of residue indices of the contacts in this object
Pairs of long residue names of the ContactPairs
Pairs of short residue names of the ContactPairs
Dictionary mapping residue indices to consensus labels:
Dictionary mapping residue indices to best possible fragment names
Dictionary mapping residue indices to short residue names:
Dictionary mapping residue indices to short residue names:
The index of the anchor residue, i.e.
All ContactPair time_traces stacked into an 2D np.array
The time-arrays of each trajectory contained in this ContactGroup
Maximum time-value of the ContactGroup
Minimum time-value of the ContactGroup
The topology used to instantiate the ContactPairs in this ContactGroup
The topology used to instantiate the ContactPairs in this ContactGroup
List of trajectory labels
-
property
anchor_fragment_color
¶ The color associated with the fragment of the anchor residue
Two fragment colors were given to the individual ContactPairs that were used to instantiate this ContactGroup. These colors might have been passed by the user themselves or given by default e.g. by mdciao.cli._parse_coloring_options. Check the defaults there
Will fail if self.is_neighborhood is False
>>> CG = mdciao.examples.ContactGroupL394() >>> CG.anchor_fragment_color
‘tab:blue’
color : str
-
property
anchor_res_and_fragment_str
¶ Label of the anchor residue of this neighborhood, including fragment
Will fail if self.is_neighborhood is False
>>> CG = mdciao.examples.ContactGroupL394() >>> CG.anchor_res_and_fragment_str 'LEU394@G.H5.26'
- Returns
label
- Return type
str
-
property
anchor_res_and_fragment_str_short
¶ Label of the anchor residue (short) of this neighborhood, including fragment
Will fail if self.is_neighborhood is False
>>> CG = mdciao.examples.ContactGroupL394() >>> CG.anchor_res_and_fragment_str_short 'L394@G.H5.26'
- Returns
label
- Return type
str
-
archive
(filename=None, **kwargs)¶ Save this
ContactGroup
’s list ofContactPairs
as a list of dictionaries that can be used to re-instantiate an equivalentContactGroup
The method
ContactGroup.save
creates a pickle that has a lot of redundant information- Parameters
filename (str, default is None) – Has to end in “npy”. Default is to return the dictionary
- Other Parameters
kwargs (dict) – Optional parameters for
mdciao.contacts.ContactPair._serialized_as_dict
- Returns
archive
- Return type
dict
-
binarize_trajs
(ctc_cutoff_Ang, switch_off_Ang=None, order='contact')¶ Binarize trajs
- Parameters
ctc_cutoff_Ang (float) – The cutoff to use
switch_off_Ang (float, default is None) –
Implements a linear switchoff from
ctc_cutoff_Ang
toctc_cutoff_Ang`+`switch_off_Ang
. E.g. if the cutoff is 3 Ang and the switch is 1 Ang, then3.0 -> 1.0
3.5 -> .5
4.0 -> 0.0
order (str, default is "contact") – Sort first by contact, then by traj index. Alternative is “traj”, i.e. sort first by traj index, then by contact
TODO (change the name "binarize") –
- Returns
bintrajs – if order==traj, each item of the list is a 2D np.ndarray with of shape(Nt,n_ctcs), where Nt is the number of frames of that trajectory
- Return type
list of boolean arrays
-
property
consensus_labels
¶ List of pairs of labels derived from GPCR, CGN or other type of consensus nomenclature.
They were parsed at initialization
>>> CG = mdciao.examples.ContactGroupL394() >>> CG.consensus_labels [['G.H5.21', 'G.H5.26'], ['G.H5.26', '6.32'], ['G.H5.20', 'G.H5.26'], ['G.H5.26', '5.69'], ['G.H5.17', 'G.H5.26']]
- Returns
consensus_labels
- Return type
list
-
property
consensuslabel2resname
¶ Dictionary mapping consensus labels to residue names:
>>> CG = mdciao.examples.ContactGroupL394() >>> CG.consensuslabel2resname {'G.H5.21': 'R389', 'G.H5.26': 'L394', '6.32': 'K270', 'G.H5.20': 'L388', '5.69': 'L230', 'G.H5.17': 'R385'}
- Returns
consensuslabel2resname
- Return type
dict
-
copy
()¶ copy this object by re-instantiating another
ContactGroup
object with the same attributes.In theory self == self.copy() should hold, but not self is self.copy()
- Returns
CG
- Return type
-
property
ctc_labels
¶ List of simple labels (no fragment info) for the residue pairs in ContactPairs
>>> CG = mdciao.examples.ContactGroupL394() >>> CG.ctc_labels ['ARG389-LEU394', 'LEU394-LYS270', 'LEU388-LEU394', 'LEU394-LEU230', 'ARG385-LEU394']
ctc_labels : list
-
property
ctc_labels_short
¶ List of simple labels (no fragment info, short AAs) for the residue pairs in ContactPairs
>>> CG = mdciao.examples.ContactGroupL394() >>> CG.ctc_labels_short ['R389-L394', 'L394-K270', 'L388-L394', 'L394-L230', 'R385-L394']
ctc_labels_short : list
-
property
ctc_labels_w_fragments_short_AA
¶ List of labels ) for the residue pairs in ContactPairs
>>> CG = mdciao.examples.ContactGroupL394() >>> CG.ctc_labels_short ['R389@G.H5.21-L394@G.H5.26', 'L394@G.H5.26-K270@6.32', 'L388@G.H5.20-L394@G.H5.26', 'L394@G.H5.26-L230@5.69', 'R385@G.H5.17-L394@G.H5.26']
ctc_labels_w_fragments_short_AA : list
-
distribution_dicts
(bins=10, **kwargs)¶ Wraps around the method
ContactGroup.distributions_of_distances
and returns one distribution dict keyed by contact label (see kwargs and CP.label_flex- Parameters
bins (int or sequence of scalars or str, optional, default is 10) – If bins is an int, it defines the number of equal-width bins in the given range (10, by default). If bins is a sequence, it defines a monotonically increasing array of bin edges, including the rightmost edge, allowing for non-uniform bin widths.
kwargs (optional keyword arguments) – Check
ContactPair.frequency_dict
- Returns
fdict
- Return type
dictionary
-
distributions_of_distances
(bins=10)¶ Histograms the distance values of each contact, returning a list with as many distributions as there are contacts.
- Parameters
bins (int or sequence of scalars or str, optional, default is 10) – If bins is an int, it defines the number of equal-width bins in the given range (10, by default). If bins is a sequence, it defines a monotonically increasing array of bin edges, including the rightmost edge, allowing for non-uniform bin widths.
- Returns
list_of_distros – List of len self.n_ctcs, each entry contains the counts and edges of the bins
- Return type
list
-
property
fragment_names_best
¶ Best possible fragment names for the residue pairs in ContactPairs
The fragment name will try to pick the consensus nomenclature. If no consensus label for the residue exists, the actual fragment names are used as fallback (which themselves fallback to the fragment index)
Only if no consensus label, no fragment name and no fragment indices are there, will this yeild “None” as a string.
>>> CG = mdciao.examples.ContactGroupL394() >>> CG.fragment_names_best [['G.H5.21', 'G.H5.26'], ['G.H5.26', '6.32'], ['G.H5.20', 'G.H5.26'], ['G.H5.26', '5.69'], ['G.H5.17', 'G.H5.26']]
fragment_names_best : list
-
frequency_as_contact_matrix
(ctc_cutoff_Ang, switch_off_Ang=None)¶ Returns a symmetrical, square matrix of size
top
.n_residues containing the frequencies of the pairs inresidxs_pairs
, and those pairs only, the rest will be NaNsIf
top
is None the method will fail.Note
This is NOT the full contact matrix unless all necessary residue pairs were used to construct this ContactGroup
- Parameters
ctc_cutoff_Ang (float) – The cutoff to use
switch_off_Ang (float, default is None) – TODO
- Returns
mat
- Return type
numpy.ndarray
-
frequency_as_contact_matrix_CG
(ctc_cutoff_Ang, switch_off_Ang=None, fragments=None, fragment_names=None, consensus_labelers=None, verbose=False, sparse=False, interface=False, zero_freq=0.01, dec_round=3, return_fragments=False)¶ Coarse-grained contact-matrix
Frequencies of
self.frequency_per_contact
get coarse-grained into fragments. Fragment definitions come fromfragments
and/or from theconsensus_labelers
. These definitions need to contain all residues in self.res_idxs_pairsUser-defined and consensus-derived fragment definitions get spliced together using
splice_orphan_fragments
. This might lead to sub-sets of the inputfragments
getting re-labeled as “subfrags” and residues not defined anywhere being labelled “orphans”. This leads to cumbersome fragment names (and can change in the future), but at least its “traceable” for the momentIf you want to have the fragment definitions, use
return_fragments
= TrueAnytime some argument leads to a row/column being deleted from the output, the matrix is returned as an annotated
DataFrame
, to be able to provide row/columns with names and keep track of their meaningIf
interface
is True and thisContactGroup
is indeed an interface, the matrix will be asymmetric.If :self:`top` is None the method will fail.
Note
This is NOT the full contact matrix unless all necessary residue pairs were used to construct this ContactGroup
- Parameters
ctc_cutoff_Ang (float) – The cutoff to use
switch_off_Ang (float, default is None) – TODO
fragments (dict) – The fragment definitions
fragment_names (iterable of strings, default is None) – The names of the fragments
consensus_labelers (list, default is None) – It has to contain
LabelerConsensus
-objects, where the fragments are obtained from.verbose (bool, default is False) – Be verbose
sparse (bool, default is False) – Delete rows and columns where all elements are < zero_freq. Since the row/column indices lose their meaning this way, a DataFrame with named row/columns is returned instead of an array If no
fragment_names
are passed, some will be created.interface (bool, default is False) – If True, an asymmetric matrix is reported, with rows and columns representing fragments on each side of the interface, respectively. Since this is done using
self.interface_residxs
, and not all input fragments are necessarily contained therein, interface=True introduces a sparsity, which makes the return type be a DataFrame (see above)zero_freq (float, default is 0.2) – Only has effect when
sparse
is True. The cutoff for a frequency to be considered zerodec_round (int, default is 3) – The number of decimals to round to when reporting results. It’s assumed the CG matrix doesn’t need much precision beyond this
return_fragments (bool, default is False) – Whether to return the fragments that the input produced.
- Returns
mat (numpy.ndarray or
DataFrame
) – The coarse-grained contact matrixfragments (dict) – The fragment definitions
-
frequency_dataframe
(ctc_cutoff_Ang, switch_off_Ang=None, atom_types=False, sort_by_freq=False, **ctc_fd_kwargs)¶ Output a formatted dataframe with fields “label”, “freq” and “sum”, optionally dis-aggregated by type of contact in “by_atomtypes”
Note
The contacts in the table are sorted by their order in the instantiation
- Parameters
ctc_cutoff_Ang (float) – The cutoff to use
switch_off_Ang (float, default is None) – TODO
atom_types (bool, default is false) – Include the relative frequency of atom-type-pairs involved in the contact
sort_by_freq (bool, default is False) – Sort by descending frequency value, default is to keep the order of
self._contacts
ctc_fd_kwargs (named optional arguments) – Check
ContactPair.frequency_dict
for more info on e.g AA_format=’short’ and or split_label
- Returns
df
- Return type
-
frequency_delta
(otherCG, ctc_cutoff_Ang)¶ Compute per-contact frequency differences between
self
and some otherContactGroup
The difference is defined as
\(\Delta_{AB} = freq_B - freq_A\),
i.e. the delta that occurs upon “reacting” from
self
tootherCG
No sanity checks are performed, residue indices are assumed to have the same meaning in both
self
andotherCG
- Parameters
otherCG (
ContactGroup
) – The ContactGroup to compute the difference withctc_cutoff_Ang (float) – The cutoff to use to compute the frequencies
- Returns
delta_freq (1D np.ndarray) – The value resulting from doing otherCG.frequency_per_contact(ctc_cutoff_Ang)-self.frequency_per_ctc(ctc_cutoff_Ang
res_idxs_pairs (2D np.ndarray of len(delta_freq)) – The res_idxs_pairs for the
delta_freq
values
-
frequency_dict_by_consensus_labels
(ctc_cutoff_Ang, switch_off_Ang=None, return_as_triplets=False, sort_by_interface=False, include_trilower=False)¶ Return frequencies as a dictionary of dictionaries keyed by consensus labels
Note
Will fail if not all residues have consensus labels TODO this is very similar to
frequency_sum_per_residue_names
, look at the usecase closesely and try to unify both methods- Parameters
ctc_cutoff_Ang (float) – The cutoff to use
switch_off_Ang (float, default is None) – TODO
return_as_triplets (bool, default is False) – Return as the dictionary as a list of triplets, s.t. freq_dict[3.50][4.50]=.25 is returned as [[3.50,4.50,.25]] Makes it easier to iterate through in other methods
sort_by_interface (bool, default is False) – Not implemented AT, will raise NotImplementedError
include_trilower (bool, default is False) – Include the transposed indexes in the returned dictionary. s.t. the contact pair [3.50][4.50]=.25 also generates [4.50][3.50]=.25
- Returns
freqs
- Return type
dictionary of dictionary or list of triplets (if return_as_triplets is True)
-
frequency_dicts
(ctc_cutoff_Ang, sort_by_freq=False, **kwargs)¶ Wraps around the method
ContactPair.frequency_dict
of each of the underlyingContactPair
s and returns one frequency dict keyed by contact label- Parameters
ctc_cutoff_Ang (float) – Cutoff in Angstrom. The comparison operator is “<=”
sort_by_freq (bool, default is False) – Sort by descending frequency. Default is to return in the same order as
ContactGroup._contacts
kwargs (optional keyword arguments) – Check
ContactPair.frequency_dict
- Returns
fdict
- Return type
dictionary
-
frequency_per_contact
(ctc_cutoff_Ang, switch_off_Ang=None)¶ Frequency per contact over all trajs :param ctc_cutoff_Ang: The cutoff to use :type ctc_cutoff_Ang: float :param switch_off_Ang: TODO :type switch_off_Ang: float, default is None
- Returns
freqs
- Return type
1D np.ndarray of len(n_ctcs)
-
frequency_spreadsheet
(sheet1_dataframe, sheet2_dataframes, ctc_cutoff_Ang, fname_excel, sheet1_name='pairs by frequency', sheet2_name='residues by frequency')¶ Write an Excel file with the
Dataframe
that is returned byself.frequency_dataframe
.- Parameters
sheet1_dataframe (
DataFrame
) – Normally, these are pairwise frequenciessheet2_dataframes (list) – Contains
DataFrame
objects with per-residue frequenciesctc_cutoff_Ang (float) – The cutoff used
fname_excel (str) – The filename to save to
sheet1_name (str, default is "pairs by frequency",) –
sheet2_name (str, default is 'residues by frequency') –
-
frequency_str_ASCII_file
(idf, ascii_file=None)¶ Create a string with the frequencies from a
DataFrame
- Parameters
idf (
DataFrame
) – A frequency table, typically generated by self.frequency_dataframeascii_file (str, default is None) – Instead of returning the formatted a table as a string, provided a filename here and write the frequencies will be directly written to it
- Returns
freq_str
- Return type
str or None
-
frequency_sum_per_residue_idx_dict
(ctc_cutoff_Ang, switch_off_Ang=None, sort_by_freq=True, return_array=False)¶ Dictionary of aggregated
frequency_per_contact
per residue indices Values over 1 are possible, example if [0,1], [0,2] are always formed (=1) freqs_dict[0]=2- Parameters
ctc_cutoff_Ang (float) – The cutoff to use
switch_off_Ang (float, default is None) – TODO
sort_by_freq (bool, default is True) – Sort the dictionary by descending order of frequency. If False, it will be sorted by residue index.
sort_by_freq
only has effect ifreturn_array
is Falsereturn_array (bool, default is False) – If True, the return value is not a dict but an array of len(self.top.n_residues)
- Returns
freqs_dict – If dict, keys are the residue indices present in
res_idxs_pairs
If array, idxs are the residue indices of self.top- Return type
dictionary or array
-
frequency_sum_per_residue_names
(ctc_cutoff_Ang, switch_off_Ang=None, sort_by_freq=True, shorten_AAs=True, list_by_interface=False, return_as_dataframe=False)¶ Aggregate the frequencies of
frequency_per_contact
by residue name, using the most informative names possible, seeself.residx2resnamefragnamebest
for more info on this- Parameters
ctc_cutoff_Ang (float) – The cutoff to use
switch_off_Ang (float, default is None) – TODO
sort_by_freq (bool, default is True) – Sort by descending order of frequencies. If
list_by_interface
is True, then sorting will be descending within each member of the interface, seeself.interface_residxs
for more info. If False, residues are in ascending order of residue indicesshorten_AAs (bool, default is True) – Use E30 instead of GLU30
list_by_interface (bool, default is False) – group the freq_dict by interface residues. Only has an effect if self.is_interface
return_as_dataframe (bool, default is False) – Return an
DataFrame
with the column names labels and freqs
- Returns
res – list of dictionaries (or dataframes). If
list_by_interface
is True, then the list has two items, default (False) is to be of len=1- Return type
list
-
frequency_table
(ctc_cutoff_Ang, fname, switch_off_Ang=None, write_interface=True, sort_by_freq=False, **freq_dataframe_kwargs)¶ Print and/or save frequencies as a formatted table
Internally, it calls
frequency_spreadsheet
and/orfrequency_str_ASCII_file
depending on the extension offname
If you want a
DataFrame
usefrequency_dataframe
- Parameters
ctc_cutoff_Ang (float) – The cutoff to use
fname (str or None) – Full path to the desired filename Spreadsheet extensions are currently only ‘.xlsx’, all other extensions save to formatted ascii. None returns the formatted ascii string.
switch_off_Ang (float, default is None) – TODO
write_interface (bool, default is True) – Only has effect if self.is_interface is True A second sheet will be added to the table where residues are sorted by interface membership and per-residue interface participation.
sort_by_freq (bool, default is False) – Only has effect if self.is_interface is True and
write_interface
is True. Sort the second sheet by descending order of frequencies If False, residues are in ascending order within each member of the interface, as returned by self.interface_residxsfreq_dataframe_kwargs (dict) – Optional parameters for
self.frequency_dataframe
- Returns
table – If
fname
is None, then return the table as formatted string, using- Return type
None or str
-
frequency_to_bfactor
(ctc_cutoff_Ang, pdbfile, geom, interface_sign=False)¶ Save the contact frequency aggregated by residue to a pdb file
- Parameters
ctc_cutoff_Ang (float) – The cutoff to use
pdbfile (str) – The path to the pdbfile to save the
geom
geom (
mdtraj.Trajectory
) – Has to have the same topology asself.top
interface_sign (bool, default is False) – Give the bfactor values of the members of the interface different sign s.t. the appear with different colors in a visualizer
- Returns
bfactors
- Return type
1D np.array of len(self.top.n_atoms)
-
gen_ctc_labels
(**kwargs) → list¶ Generate a labels with different parameters
Wraps around
mdciao.contacts.ContactPair.gen_label
- AA_formatstr, default is “short”
Alternative is “long” (“E30” vs “GLU30”)
- fragmentsbool, default is False
Include fragment information Will get the “best” information available, ie consensus>fragname>fragindex
- delete_anchorbool, default is False
the anchor
- Returns
labels
- Return type
list
-
property
interface_fragments
¶ Two residue lists provided at initialization
They are supersets of the residues contained in self.interface_residxs
Empty lists mean no residues were found in the interface defined at initialization
- Returns
interface_fragments
- Return type
list
-
interface_frequency_matrix
(ctc_cutoff_Ang, switch_off_Ang=None)¶ Rectangular matrix of size (N,M) where N is the length of the first list of
interface_residxs
and M the length of the second list ofinterface_residxs
.Note
Pairs missing from
res_idxs_pairs
will be NaNs, to differentiate from those pairs that were present but have zero contact- Parameters
ctc_cutoff_Ang (float) – The cutoff to use
switch_off_Ang (float, default is None) – TODO
- Returns
mat
- Return type
2D numpy.ndarray
-
property
interface_labels_consensus
¶ Consensus labels of whatever residues
interface_residxs
holds.If there is no consensus labels, the corresponding label is None
-
property
interface_residue_names_w_best_fragments_short
¶ Best possible residue@fragment string for the residues in
interface_residxs
In case neither a consensus label > fragment name > fragment index is found, nothing is returned after the residue name
-
property
interface_residxs
¶ The residues of self.res_idxs_pairs grouped into two lists, depending on what self.interface_fragments they belong to
Empty lists mean no residues were found in the interface defined at initialization
- Returns
interface_residxs
- Return type
list
-
property
interface_reslabels_short
¶ Residue labels of whatever residues
interface_residxs
holds
-
property
is_interface
¶ Whether this ContactGroup can be interpreted as an interface.
Note
If none of the
residxs_pairs
were found in theinterface_residxs
(both provided at initialization), this property will evaluate to False even if some indeces were parsed
-
property
is_neighborhood
¶ Whether this ContactGroup is a neighborhood or not
When instantiating this ContactGroup, it is checked whether all the used
ContactPair
have a shared :obj:anchor_residue_idx attribute, whichand whether if self.neighbors_excluded is None. This means this ContactGroup is a neighborhood around the residue stored in the attribute self.shared_anchor_residue_index- Other neighborhood-only attributes get populated, e.g.
self.anchor_res_and_fragment_str
self.anchor_res_and_fragment_str_short
self.partner_res_and_fragment_labels
self.partner_res_and_fragment_labels_short
self.partner_fragment_colors
self.anchor_fragment_color
Note that all these attributes will raise an Exception when called if self.is_neighborhood is False
- Returns
is_neighborhood
- Return type
bool
-
property
max_cutoff_Ang
¶ Operations involving cutoffs higher than this will be forbidden and wil raise ValueError.
-
property
means
¶ The mean value over all distance time-traces
- Returns
mean – No unit transformation is done, whatever was given at instantiation (most likely nanometers), is returned here
- Return type
1D np.array of len(self.n_ctcs)
-
property
modes
¶ //en.wikipedia.org/wiki/Mode_(statistics)>`_ over all distance time-traces
Note
In order to quickly compute modes, residue-residue distances are multiplied by 1000 and rounded to integers, to be able to use
numpy.bincount
for speed. Then, the argmax(bincount) is returned- Returns
modes – No unit transformation is done, whatever was given at instantiation (most likely nanometers), is returned here
- Return type
1D np.array of len(self.n_ctcs)
- Type
The `modes <https
-
property
n_ctcs
¶ The number of contact pairs (
mdciao.contacts.ContactPair
-objects) stored in this object- Returns
n_ctcs
- Return type
int
-
n_ctcs_timetraces
(ctc_cutoff_Ang, switch_off_Ang=None)¶ time-traces of the number of contacts, by summing overall contacts for each frame
- Parameters
ctc_cutoff_Ang (float) – The cutoff to use
switch_off_Ang (float, default is None) – TODO
- Returns
nctc_trajs
- Return type
list of 1D np.ndarrays
-
property
n_frames
¶ List of per-trajectory n_frames
- Returns
n_frames
- Return type
list
-
property
n_frames_total
¶ Total number of frames
- Returns
n_frames_total
- Return type
int
-
property
n_trajs
¶ The number of trajectories contained in this ContactGroup
- Returns
n_trajs
- Return type
int
-
property
name
¶ The name of this ContactGroup, given when creating it
- Returns
name
- Return type
str
-
property
neighbors_excluded
¶ The number of neighbors that were excluded when creating this ContactGroup
- Returns
neighbors_excluded
- Return type
int
-
property
partner_fragment_colors
¶ The colors associated with the fragments of the anchor partner residues
The fragment colors were given as pairs of values to the individual ContactPairs that were used to instantiate this ContactGroup. These colors might have been passed by the user themselves or given by default e.g. by mdciao.cli._parse_coloring_options. Check the defaults there.
>>> CG = mdciao.examples.ContactGroupL394() >>> CG.partner_fragment_colors ['tab:blue', 'tab:blue', 'tab:blue', 'tab:blue', 'tab:blue']
or
>>> CG = mdciao.examples.ContactGroupL394(fragment_colors=["red","blue","yellow","orange","black"]) >>> CG.partner_fragment_colors ['red', 'orange', 'red', 'orange', 'red']
Note
This colors are not automatically used by self.plot_neighborhood_freqs or self.plot_freqs_as_bars unless passed as
color=self.partner_fragment_colors
Will fail if self.is_neighborhood is False
- Returns
color
- Return type
str
-
property
partner_res_and_fragment_labels
¶ List of labels the partner (not anchor) residues of this neighborhood, including fragment
Will fail if self.is_neighborhood is False
>>> CG = mdciao.examples.ContactGroupL394() >>> CG.partner_res_and_fragment_labels ['ARG389@G.H5.21', 'LYS270@6.32', 'LEU388@G.H5.20', 'LEU230@5.69', 'ARG385@G.H5.17']
- Returns
labels
- Return type
list
-
property
partner_res_and_fragment_labels_short
¶ List of labels (short) the partner (not anchor) residues of this neighborhood, including fragment
Will fail if self.is_neighborhood is False
>>> CG = mdciao.examples.ContactGroupL394() >>> CG.partner_res_and_fragment_labels_short
- [‘R389@G.H5.21’,
‘K270@6.32’, ‘L388@G.H5.20’, ‘L230@5.69’, ‘R385@G.H5.17’]
labels : list
-
plot_distance_distributions
(bins=10, xlim=None, jax=None, shorten_AAs=False, ctc_cutoff_Ang=None, legend_sort=True, label_fontsize_factor=1, max_handles_per_row=4, defrag=None)¶ Plot distance distributions for the distance trajectories of the contacts
The title will get try to get the name from
self.name
- Parameters
bins (int, default is 10) – How many bins to use for the distribution
xlim (iterable of two floats, default is None) – Limits of the x-axis. Outlier can stretch the scale, this forces it to a given range
jax (
Axes
, default is None) – One will be created if None is passedshorten_AAs (bool, default is False) – Use amino-acid one-letter codes
ctc_cutoff_Ang (float, default is None) – Include in the legend of the plot how much of the distribution is below this cutoff. A vertical line will be draw at this x-value nearest bonded neighbors were excluded
legend_sort (boolean, default is True) – Sort the legend in descending order of frequency. Has only an effect when
ctc_cutoff_Ang
is not Nonelabel_fontsize_factor (int, default is 1) – Labels will be written in a fontsize rcParams[“font.size”] * label_fontsize_factor
max_handles_per_row (int, default is 4) – legend control
defrag (char, default is None) – Delete fragment labels from the residue labels, “G30@frag1”->”G30”. If None, don’t delete the fragment label
- Returns
jax
- Return type
-
plot_freqs_as_bars
(ctc_cutoff_Ang, title_label=None, switch_off_Ang=None, xlim=None, ax=None, color='tab:blue', shorten_AAs=False, label_fontsize_factor=1, truncate_at=None, atom_types=False, sort_by_freq=False, sum_freqs=True, total_freq=None, defrag=None)¶ Plot a contact frequencies as a bar plot
- Parameters
ctc_cutoff_Ang (float) – The cutoff to use
title_label (str, default is None) – If None, the method will default to self.name If self.name is also None, the method will fail
switch_off_Ang (float, default is None) – TODO
xlim (float, default is None) – The right limit of the x-axis. +.5 will be added to this number to accommodate some padding around the bars. If None, it’s chosen automatically
ax (
Axes
, default is None) – Draw into this axis. If None is passed, then one will be createdshorten_AAs (bool, default is None) – Shorten residue labels from “GLU30” to “E30”
color (color-like (str or RGB triple) or list thereof, default is "tab:blue") – The color for the bars. If string or RGB array, all bars will have this color. If list, it’s assumed in the order of the self.res_idx_pairs. It will get re-sorted according to
sort
, s.t. residues always have the same color not matter the ordershorten_AAs – Shorten residue labels from “GLU30” to “E30”
label_fontsize_factor (float, default is 1) – Labels will be written in a fontsize rcParams[“font.size”] * label_fontsize_factor
truncate_at (float, default is None) – Only plot frequencies above this value. Default is to plot all
atom_types (bool, default is False) – Use stripe-patterns to inform about the types of interactions (sidechain, backbone, etc)
sort_by_freq (boolean, default is False) – The frequencies are by default plotted in the order in which the
ContactPair
-objects are stored in theContactGroup
-object’s _contact_pairs This order depends on the ctc_cutoff_Ang originally used to instantiate thisContactPair
If True, you can re-sort them with this cutoff for display purposes only (the original order is untouched)sum_freqs (bool, default is True) – Inform, in the legend and in the title, about the sum of frequencies/bar-heights being plotted
total_freq (float, default is None) – Add a line to the title informing about the fraction of the total_freq that’s being plotted in the figure. Only has an effect if
sum_freqs
is Truedefrag (str, default is None) – Delete fragment labels from the residue labels, “G30@frag1”->”G30”. If None, don’t delete the fragment label
- Returns
ax
- Return type
-
plot_freqs_as_flareplot
(ctc_cutoff_Ang, fragments=None, fragment_names=None, fragment_colors=None, consensus_maps=None, SS=None, scheme='auto', **kwargs_freqs2flare)¶ Produce contact flareplots by wrapping around
mdciao.flare.freqs2flare
Note
The logic to assign fragments and colors can lead to unexpected behavior in cases where too much guess-work has to be done. If a particular combination of fragments and colors is desired but not achievable through this method, it is highly recommended the user uses
mdciao.flare.freqs2flare
directly and experiment there with parameter combinations. It is also a good idea to check out the notebook called “Controlling Flareplots”- Parameters
ctc_cutoff_Ang (float) – The cutoff to use
fragments (list of iterables, default is None) – The way the topology is fragmented. Default is to put all residues in one fragment. This optarg can modify the behaviour of scheme=’all’, since residues absent from
fragments
will not be plotted, see below.fragment_names (list of strings, default is None) – The fragment names, at least len(fragments)
fragment_colors (None or list of color-likes) – Will be used to give the fragments their colors, needs to be color-like and of len(fragments)
consensus_maps (list, default is None) –
- The items of this list are either:
- indexables containing the consensus
labels (strings) themselves. They need to be “gettable” by residue index, i.e. dict, list or array. Typically, one generates these maps by using the top2labels method of the LabelerConsensus object
LabelerConsensus
-objectsWhen these objects are passed, their top2labels and top2fragments methods are called on-the-fly, generating not only the consensus labels but also the consensus fragments (i.e. subdomains) to further fragment the topology into sub-domains, like TM6 or G.H5. If
fragments
are parsed, they will be made compatible with the consensus fragments.
If you want the consensus labels but not the sub-fragmentation, simply use the first option.
SS (secondary structure information, default is None) –
Whether and how to include information about secondary structure. Can be many things:
- triple of ints (CP_idx, traj_idx, frame_idx)
Go to contact group CP_idx, trajectory traj_idx and grab this frame to compute the SS. Will read xtcs when necessary or otherwise directly grab it from a
mdtraj.Trajectory
in case it was passed. Ignores potential stride values. SeeContactPair.time_traces
for more info
- True
same as [0,0,0]
- None or False
Do nothing
mdtraj.Trajectory
Use this geometry to compute the SS
- string
Path to a filename, of which only the first frame will be read. The SS will be computed from there. The file will be tried to read first without topology information (e.g. .pdb, .gro, .h5) will work, and when this fails, self.top will be passed (e.g. .xtc, .dcd)
- array_like
Use the SS from here, s.t. ss_inf[idx] gives the SS-info for the residue with that idx
scheme (str, default is 'auto') –
- How to decide which residues to plot
- ’all’
plot as many residues as possible. E.g., if a
self.topology
is present, plot all its residues. This can be modified with fragments, see above. Using ‘all’ without any fragments means that the topology won’t be separated into interface fragments, even if it is an interface. Given that some of the topology (which the user insists on plotting) might not have been assigned to either side of the interface, it’s unclear how to proceed here.
- ’interface’:
use only the fragments in
self.interface_fragments
. Will only work if self.is_interface is True
- ’auto’
Uses
self.is_interface
to decide. If True,scheme
is set to ‘interface’. If False, e.g. a residue neighborhood or a site, thenscheme
is set to ‘all’
- ’interface_sparse’:
like ‘interface’, but using the input
fragments
to break self.interface_fragments (which are only two, by definition) further down into other fragments. Of these, show only the ones where at least one residue participates in the interface. Iffragments
is None, scheme=’interface’ and scheme=’interface_sparse’ are the same thing.
- ’residues’:
plot only the residues present in self.res_idxs_pairs
- ’residues_sparse’ :
plot only the residues that have a non-zero frequency
- ’consensus_sparse’:
like ‘interface_sparse’, but leaving out sub-domains not participating in the interface with any contacts.For this, the
consensus_maps
need to be actualLabelerConsensus
-objects
kwargs_freqs2flare (optargs) –
Keyword arguments for
mdciao.flare.freqs2flare
. Note that many of these kwargs will be overwritten, mostly to accommodate the scheme+fragment+color combinations, but not only (please see the note above). These are the kwargs that this method manipulates internally and might be overwritten: * top, ss_array, fragments, fragment_namesfragment_names, colors
Note that some of freqs2flare kwargs (in particular sparse_residues) might alter (with or w/o conflict) the scheme option.
- Returns
ifig (:obj:``~matplotlib.figure.Figure`)
iax (
matplotlib.axes.Axes
)
-
plot_frequency_sums_as_bars
(ctc_cutoff_Ang, title_str, switch_off_Ang=None, xmax=None, jax=None, shorten_AAs=False, label_fontsize_factor=1, truncate_at=0, bar_width_in_inches=0.75, list_by_interface=False, sort_by_freq=True, interface_vline=False)¶ Bar plot with per-residue sums of frequencies (called Sigma in mdciao)
- Parameters
ctc_cutoff_Ang (float) – The cutoff to use
title_str (str) – The title of the plot
switch_off_Ang (float, default is None) – TODO
xmax (float, default is None) – X-axis will extend from -.5 to xmax+.5
jax (obj:~matplotlib.axes.Axes`, default is None) – If None, one will be created, else draw here
shorten_AAs (boolean, default is False) – Unused ATM
label_fontsize_factor (float, default is 1) – Some control over fontsizes when plotting a high number of bars
truncate_at (float, default is 0) – Do not show sums of freqs lower than this value
bar_width_in_inches (float, default is .75) – If no
jax
is parsed, this controls that the drawn figure always has a size proportional to the number of frequencies being shown. Allows for combining multiple subplots with different number of bars in one figure with all bars equally wide regardles of the subplotlist_by_interface (boolean, default is True) – Separate residues by interface
sort_by_freq (boolean, default is True) – Sort sums of freqs in descending order
interface_vline (bool, default is False) – Plot a vertical line visually separating both interfaces
- Returns
ax
- Return type
-
plot_interface_frequency_matrix
(ctc_cutoff_Ang, switch_off_Ang=None, transpose=False, label_type='best', **plot_mat_kwargs)¶ Plot the
interface_frequency_matrix
The first group of
interface_residxs
are the row indices, shown in the y-axis top-to-bottom (since imshow is used to plot) The second group ofinterface_residxs
are the column indices, shown in the x-axis left-to-right- Parameters
ctc_cutoff_Ang (float) – The cutoff to use
switch_off_Ang (float, default is None) – TODO
transpose (bool, default is False) – Transpose the contact matrix in the plot
label_type (str, default is "best") – Best tries resname@consensus(>fragname>fragidx) Alternatives are “residue” or “consensus”, but “consensus” alone might lead to empty labels since it is not guaranteed that all residues of the interface have consensus labels
plot_mat_kwargs (see
plot_mat
) – pixelsize, transpose, grid, cmap, colorbar
- Returns
iax (
Axes
)fig (
matplotlib.pyplot.Figure
)
-
plot_neighborhood_freqs
(ctc_cutoff_Ang, switch_off_Ang=None, color='tab:blue', xmax=None, ax=None, shorten_AAs=False, label_fontsize_factor=1, sum_freqs=True, plot_atomtypes=False, sort_by_freq=False)¶ Wrapper around
ContactGroup.plot_freqs_as_bars
for plotting neighborhoods#TODO perhaps get rid of the wrapper altogether. ATM it would break the API
- Parameters
ctc_cutoff_Ang (float) – The cutoff to use
switch_off_Ang (float, default is None) – TODO
color (color-like (str or RGB triple) or list thereof, default is "tab:blue") – The color for the bars. If string or RGB array, all bars will have this color. If list, it’s assumed in the order of the self.res_idx_pairs. It will get re-sorted according to
sort
, s.t. residues always have the same color not matter the orderxmax (int, default is None) – Default behaviour is to go to n_ctcs, use this parameter to homogenize different calls to this function over different contact groups, s.t. each subplot has equal xlimits
ax (
Axes
, default is None) – Axes to plot into, if None, one will be createdshorten_AAs (bool, default is False,) – Shorten residue names from “GLU30”->”E30”
label_fontsize_factor (float, default is 1) – Fontsize for the tilted labels and the legend, as fraction [0,1] of the default value in rcParams[“font.size”]
sum_freqs (bool, default is True) – Add the sum of frequencies of the represented (and only those) frequencies
plot_atomtypes (bool, default is False) – Add stripes to frequency bars to include the atom-types (backbone, sidechain, etc)
sort_by_freq (boolean, default is False) – The frequencies are by default plotted in the order in which the
ContactPair
-objects are stored in theContactGroup
-object’s _contact_pairs This order depends on the ctc_cutoff_Ang originally used to instantiate thisContactPair
If True, you can re-sort them with this cutoff for display purposes only (the original order is untouched)
- Returns
ax
- Return type
-
plot_timedep_ctcs
(panelheight, plot_N_ctcs=True, pop_N_ctcs=False, skip_timedep=False, **plot_timetrace_kwargs)¶ For each trajectory, plot the time-traces of the all the contacts (one per panel) and/or the timetrace of the overall number of contacts
- Parameters
panelheight (int) – The height of the per-contact panels
plot_N_ctcs (bool, default is True) – Add an extra panel at the bottom of the figure containing the number of formed contacts for each frame for each trajecotry A valid cutoff has to be passed along in
plot_contact_kwargs
otherwise this has no effectpop_N_ctcs (bool, default is False) – Put the panel with the number of contacts in a separate figure A valid cutoff has to be passed along in
plot_contact_kwargs
otherwise this has no effectskip_timedep (bool, default is False) – Skip plotting the individual timetraces and plot only the time trace of overall formed contacts. This sets pop_N_ctcs to True internally
plot_timetrace_kwargs (dict) – Optional parameters for
_plot_timedep_Nctcs
- Returns
list_of_figs (list) – The wanted figure(s)
Note
—- – The keywords
plot_N_ctcs
,pop_N_ctcs
, andskip_timedep
allow this method to both include or totally exclude the total number of contacts and/or the time-traces in the figure. This might change in the figure, it was coded this way to avoid breaking the command_line tools API. Also note that some combinations will produce an empty return!
-
plot_violins
(sort_by=False, ctc_cutoff_Ang=None, truncate_at_mean=None, zero_freq=0.01, switch_off_Ang=None, ax=None, title_label=None, xmax=None, color='tab:blue', shorten_AAs=False, label_fontsize_factor=1, sum_freqs=True, defrag=None, stride=1)¶ Plot residue-residue distances as violin plots
violinplot
The default behaviour is to plot all residue pairs in the order in which the
ContactPair
-objects are stored in theContactGroup
. You can check this order in self.res_idxs_pairs. This order typically depends on the original ctc_cutoff_Ang used to instantiate thisContactGroup
, which might not carry the same meaning here.For more than 50 contacts or so, violin plots take some time to compute, because a Gaussian-kernel-density estimation is done for each residue pair.
Also, plots with many residue pairs simply might be difficult to read.
Hence, to control the number of shown contacts, you can control the you can use these parameters, sorted somewhat hierarchically
sort
ctc_cutoff_ang
truncate_at_mean
zero_freq
Please check their documentation below.
Finally, if the plots still take too long to compute/show for the desired number of violins, try reducing the amount of data by using stride > 1
- Parameters
sort_by (iterable of ints, boolean, int, default is False) –
- Can be different things:
- iterable of ints
Strongest selection. Show only these residue pairs, in this order. Indices are intended as self.res_idxs_pairs indices. All other parameters are ignored.
- boolean False
Don’t sort, i.e. use the original order
- boolean True
Sort. There’s two options for sorting, depending on the value of ctc_cutoff_Ang (more below)
sort by distance means, ascending: ctc_cutoff_Ang is None
- sort by contact-frequencies, descending: ctc_cutoff_Ang is needed is a float
For contacts with zero frequency, fallback on ascending distance means This it means that you frequent contacts will be displayed first (=sorted by freq high to low). followed by infrequent ones sorted form (short to long)
- int n
Like True but up to n contacts at most. Other parameters like truncate_at_mean can reduce this number automatically
ctc_cutoff_Ang (opt, default is None) – If provided, contact-frequencies will be computed and shown in the contact-labels. Additionally, if
sort
is True or int, then the violins are sorted by contact-frequency in the plottruncate_at_mean (float, default is None) – Don’t show violins with mean values higher than this (in Angstrom). This remains effectless for contacts in which the mean is above the cutoff BUT the frequency is > zero_freq. This case is very common, since a contact can be formed at small distances but broken at very large ones, s.t. the mean (or median) values are meaningless.
zero_freq (float, default is 1e-2) – Frequencies below this number will be considered zero and not shown. For this parameter to have effect, you need a
ctc_cutoff_Ang
switch_off_Ang (float, default is None) – TODO
ax (None or
Axes
, default is None) – The axis to plot into. If None, one will be createdtitle_label (str, default is None) – If None, the method will default to self.name If self.name is also None, the method will fail
xmax (float, default is None) – X-axis will extend from -.5 to xmax+.5
color (iterable (list or dict), or str, default is None) –
list, the colors will be reordered so that the same residue pair always gets the same color, regardless of order in which they appear. This way you can track a violin across different sorting orders
str, it has to be a matplotlib color or a case-sensitive matplotlib colorname https://matplotlib.org/stable/tutorials/colors/colormaps.html
dict, keys are integers and values are colors This is the best way to work with
sort
is an iterable of ints, e.g. [ii,jj], because you can pass only those colors here as {ii:”red”,jj:”blue”}If None, the ‘tab10’ colormap (tableau) is chosen
shorten_AAs (bool, default is None) – Shorten residue labels from “GLU30”->”E30”
label_fontsize_factor (float, default is 1) – Labels will use the fontsize rcParams[“font.size”]*label_fontsize_factor
sum_freqs (bool, default is True) – Whether to sum per-contact frequencies and place the in the label as \(Sigma\) values
defrag (char, default is None) – Whether to leave out the fragment affiliation, e.g. “GLU30@3.50” w/ defrag=”@” appears as “GLU30” only
stride (int,default is 1) – Stride the data down by this much, in case the computation of the violins takes too long
- Returns
ax (
Axes
)order (np.ndarray) –
- Indices of the plotted residue pairs,
in the order in which they were plotted.
Is the result from the combination of the above selection parameters
-
relabel_consensus
(new_labels=None)¶ Relabel any residue missing its consensus label to shortAA
Alternative (or additional) labels can be given as a dictionary.
- Parameters
new_labels (dict) – keyed with shortAA-codes and valued with the new desired labels
Warning
For expert use only. The changes in consensus labels propagates down to the attribute consensus labels of the the low-level attribute
Residues.consensus_labels
of theResidues
objects underlying each of theContactPair`s in this :obj:`ContactGroup
-
relative_frequency_formed_atom_pairs_overall_trajs
(ctc_cutoff_Ang, switch_off_Ang=None, **kwargs) → list¶ Relative frequencies interaction-type (by atom-type) for all contact-pairs in the ContactGroup
“Relative” means that they will sum up to 1 regardless of the contact’s frequency
>>> CG = mdciao.examples.ContactGroupL394() >>> CG.relative_frequency_formed_atom_pairs_overall_trajs(3.5) [{'SC-BB': 0.33, 'SC-SC': 0.52, 'BB-BB': 0.12}, {'BB-SC': 0.73, 'SC-SC': 0.27}, {'BB-BB': 0.84, 'SC-SC': 0.16}, {'SC-SC': 1.0}, {'SC-SC': 0.5, 'BB-SC': 0.5}]
- Parameters
ctc_cutoff_Ang (float) – Cutoff in Angstrom. The comparison operator is “<=”
switch_off_Ang (float, default is None) – TODO
- Other Parameters
keep_resname (bool, default is False) – Keep the atom’s residue name in its descriptor. Only make sense if consolidate_by_atom_type is False
aggregate_by_atomtype (bool, default is True) – Aggregate the frequencies of the contact by tye atom types involved. Atom types are backbone, sidechain or other (BB,SC, X)
min_freq (float, default is .05) – Do not report relative frequencies below this cutoff, e.g. “BB-BB”:.9, “BB-SC”:0.03, “SC-SC”:0.03, “SC-BB”:0.03 gets reported as “BB-BB”:.9
- Returns
refreq_dicts – Lists of dictionaries with the relative freqs, keyed by atom-type (atoms) involved in the contact The order is the same as in
self.ctc_labels
- Return type
list
-
repframes
(scheme='mode', ctc_cutoff_Ang=None, return_traj=False, show_violins=False, n_frames=1)¶ Find representative frames for this
ContactGroup
A “representative frame” means, in this context, a frame that minimizes the average distance to the modes (or means) of the residue-residue distances contained in this object.
Please note that “representative” can have other meanings in other contexts. Here, it’s just a way to pick a frames/geometries that will most likely resemble most of what is also seen in the distributions, barplots and flareplots.
Please also note that minimizing averages has its own limitations and might not always yield the best result, However, it is the easiest and quickest to implement. Feel free to use any of Sklearn’s great regression tools under constraints to get a better “representative”
- Parameters
scheme (str, default is "mode") –
Two options: * “mode” : minimize average distance
to the most likely distance, i.e. to the mode, i.e. to the distance values at which the distributions (
plot_distance_distributions') peak. You can check the modes in :obj:`~mdciao.contacts.ContactGroup.modes
”mean” : minimize average distance to the mean values of the distances You can check the means in
means
ctc_cutoff_Ang (float, default is None) – THIS IS EXPERIMENTAL If given, the contact frequencies will be used as weights when computing the average. In cases with many contacts, many of them broken, this might help
return_traj (bool, default is False) – If True, try to return also the
Trajectory
objects Will fail that is not possible because the original files aren’t accessible (or there weren’t any)show_violins (bool, default is False) – Superimpose the distance values as dots on top of a violin plot, created by using the
plot_violins
n_frames (int, default is 1) – The number of representative frames to return
- Returns
frames (list) – A list of
n_frames
tuples, each tuple containing the traj_idx and the frame_idx that minimize RMSDdRMSDd (np.ndarray) – A 1D array containing the root-mean-square-deviation (in Angstrom) over distances (not positions) of the returned
frames
to the computedreference
. This mean is weighted by the contact frequencies in case actc_cutoff_Ang
was given. Should always be in ascending ordervalues (np.ndarray) – A 2D array of shape(n_frames, n_ctcs) containing the distance values of the
frames
in Angstromtrajs (list) – A list of
Trajectory
objects Only if return_traj=True
-
property
res_idxs_pairs
¶ Pairs of residue indices of the contacts in this object
- Returns
res_idxs_pairs
- Return type
_np.ndarray
-
property
residue_names_long
¶ Pairs of long residue names of the ContactPairs
>>> CG = mdciao.examples.ContactGroupL394() >>> CG.residue_names_short [['ARG389', 'LEU394'], ['LEU394', 'LYS270'], ['LEU388', 'LEU394'], ['LEU394', 'LEU230'], ['ARG385', 'LEU394']]
- Returns
residue_names_long
- Return type
list
-
property
residue_names_short
¶ Pairs of short residue names of the ContactPairs
>>> CG = mdciao.examples.ContactGroupL394() >>> CG.residue_names_short [['R389', 'L394'], ['L394', 'K270'], ['L388', 'L394'], ['L394', 'L230'], ['R385', 'L394']]
- Returns
residue_names_short
- Return type
list
-
property
residx2consensuslabel
¶ Dictionary mapping residue indices to consensus labels:
>>> CG = mdciao.examples.ContactGroupL394() >>> CG.residx2consensuslabel {348: 'G.H5.21', 353: 'G.H5.26', 972: '6.32', 347: 'G.H5.20', 957: '5.69', 344: 'G.H5.17'}
- Returns
residx2consensuslabel
- Return type
dict
-
residx2ctcidx
(idx)¶ Indices of the contacts and the position (0 or 1) in which the residue with residue
idx
appears>>> CG = examples.ContactGroupL394() >>> CG.res_idxs_pairs array([[348, 353], [353, 972], [347, 353], [353, 957], [344, 353]]) >>> CG.residx2ctcidx(347) array([[2, 0]])
- Parameters
idx (int) – A residue index
- Returns
ctc_idxs – The first index is the contact index, the second the pair index (0 or 1)
- Return type
2D np.ndarray of shape (N,2)
-
property
residx2fragnamebest
¶ Dictionary mapping residue indices to best possible fragment names
“best” means consensus label > fragment name > fragment index
>>> CG = mdciao.examples.ContactGroupL394() >>> CG.residx2fragnamebest {348: 'G.H5.21', 353: 'G.H5.26', 972: '6.32', 347: 'G.H5.20', 957: '5.69', 344: 'G.H5.17'}
- Returns
residx2fragnamebest
- Return type
dict
-
residx2resnamefragnamebest
(fragsep='@', shorten_AAs=True) → dict¶ Dictionary mapping residue indices to best possible residue+fragment label
“best” means consensus label > fragment name > fragment index
- Parameters
fragsep (str, default is "@") – The str or char to separate residue labels from fragment labels, “A30@frag1”
shorten_AAs (bool, default is True) – Whether to use short residue names
CG = mdciao.examples.ContactGroupL394() (>>>) –
CG.residx2resnamefragnamebest() (>>>) –
{344 (‘R385@G.H5.17’,) – 347: ‘L388@G.H5.20’, 348: ‘R389@G.H5.21’, 353: ‘L394@G.H5.26’, 957: ‘L230@5.69’, 972: ‘K270@6.32’}
- Returns
residx2resnamefragnamebest
- Return type
dict
-
property
residx2resnamelong
¶ Dictionary mapping residue indices to short residue names:
>>> CG = mdciao.examples.ContactGroupL394() >>> CG.residx2resnamelong {348: 'ARG389', 353: 'LEU394', 972: 'LYS270', 347: 'LEU388', 957: 'LEU230', 344: 'ARG385'}
- Returns
residx2resnamelong
- Return type
dict
-
property
residx2resnameshort
¶ Dictionary mapping residue indices to short residue names:
>>> CG = mdciao.examples.ContactGroupL394() >>> CG.residx2resnameshort {348: 'R389', 353: 'L394', 972: 'K270', 347: 'L388', 957: 'L230', 344: 'R385'}
- Returns
residx2resnameshort
- Return type
dict
-
retop
(top, mapping, deepcopy=False)¶ Return a copy of this object with a different topology.
Uses the
mapping
to generate new residue-indices where necessary, using the rest of the attributes (time-traces, labels, colors, fragments…) as they wereWraps thinly around
mdciao.contacts.ContactPair.retop
Note
When re-topping interfaces, those residues of the ‘old’ interface_fragments which are not covered by the
mapping
will be missing in the ‘new’ interface_fragments. However, the new interface is guaranteed to have at least all the ‘new’ interface_residxs mapped. So, as long as the ‘old’ interface_residxs are covered by the mapping, this isn’t a problem (TODO except, perhaps, when plotting flareplots using the spare=”interface” option after re-topping)- Parameters
top (
Topology
) – The new topologymapping (indexable (array, dict, list)) – A mapping of old residue indices to new residue indices. Usually, comes from aligning the old and the new topology using
mdciao.utils.sequence.maptops
.deepcopy (bool, default is False) – Use
copy.deepcopy
on the attributes when creating the newContactPair
.
- Returns
CG
- Return type
-
save
(filename)¶ Save this
ContactGroup
as a pickle- Parameters
filename (str) – filename
-
save_trajs
(prepend_filename, ext, output_dir='.', t_unit='ps', verbose=False, ctc_cutoff_Ang=None, self_descriptor='mdciaoCG')¶ Save time-traces to disk.
FileNames will be created based on the property
self.trajlabels
, but using only the basenames and prepending with the stringprepend_filename
If there is an anchor residue (i.e. this
ContactGroup
is a neighborhood, the anchor will be included in the filename, otherwise the string “contact_group” will be used. You can control the output_directory usingoutput_dir
If a ctc_cutoff is given, the time-traces will be binarized (see
self.binarize_trajs
). Else, the distances themselves are stored.- Parameters
prepend_filename (str) – Each filename will be prepended with this string
ext (str) – Extension, can be “xlsx” or anything
numpy.savetext
can handleoutput_dir (str, default is ".") – The output directory
t_unit (str, default is "ps") – Other units are “ns”, “mus”, and “ms”. The transformation happens internally
verbose (boolean, default is False) – Prints filenames
ctc_cutoff_Ang (float, default is None) – Use this cutoff and save bintrajs instead
self_descriptor (str, default is "mdciaoCG") – Saved filenames will be tagged with this descriptor
- Returns
- Return type
None
The index of the anchor residue, i.e. the residue at the center of this neighborhood
Only populated if self.is_neighborhood is True, else returns None
- Returns
idx
- Return type
int
-
property
stacked_time_traces
¶ All ContactPair time_traces stacked into an 2D np.array
- Returns
data – The array is of shape(self.n_frames_total, self.n_ctcs)
- Return type
np.ndarray
-
property
time_arrays
¶ The time-arrays of each trajectory contained in this ContactGroup
- Returns
time_arrays – The units of these arrays will be whatever was given to the ContactPairs used to instantiate this ContactGroup
- Return type
list
-
property
time_max
¶ Maximum time-value of the ContactGroup
- Returns
time_max – Its units will be whatever was given to the ContactPairs used to instantiate this ContactGroup. The most frequent case are “ps”, since that’s how time arrays are stored in xtc files
- Return type
float
-
property
time_min
¶ Minimum time-value of the ContactGroup
- Returns
time_min – Its units will be whatever was given to the ContactPairs used to instantiated this ContactGroup. The most frequent case are “ps”, since that’s how time arrays are stored in xtc files
- Return type
float
-
to_new_ContactGroup
(CSVexpression, allow_multiple_matches=False, merge=True)¶ Creates a new
ContactGroup
from this une using a CSV expression to filter for residues- Parameters
CSVexpression (str) – CSV expression like “GLU30,K*” to select the residue-pairs of
self
for the newContactGroup
. Seemdciao.utils.residue_and_atom.find_AA
for the syntax of the expression.allow_multiple_matches (bool, default is False) – Fail if the substrings of the
CSVexpression
return more than one residue. Protects from over-grabbing residuesmerge (bool, default is True) – Merge the selected residue-pairs into one single
ContactGroup
. If False every sub-string ofCSVexpression
returns its ownContactGroup
- Returns
newCG – If dict, it’s keyed with substrings of
CSVexpression
and valued withContactGroups
- Return type
ContactGroup
or dict
-
property
top
¶ The topology used to instantiate the ContactPairs in this ContactGroup
>>> CG = mdciao.examples.ContactGroupL394() >>> CG.top <mdtraj.Topology with 1 chains, 1044 residues, 8384 atoms, 8502 bonds at 0x7efdae47e990>
top : :obj:~mdtraj.Trajectory or None
-
property
topology
¶ The topology used to instantiate the ContactPairs in this ContactGroup
>>> CG = mdciao.examples.ContactGroupL394() >>> CG.top <mdtraj.Topology with 1 chains, 1044 residues, 8384 atoms, 8502 bonds at 0x7efdae47e990>
topology : :obj:~mdtraj.Trajectory or None
-
property
trajlabels
¶ List of trajectory labels
If labels were not passed, then labels like ‘traj 0’,’traj 1’ and so on are assigned. If
Trajectory
objects were passed, then the “mdtraj” descriptor will be used If filenames were passed, then the labels are the filenames (basename, no files) without the extension>>> CG = mdciao.examples.ContactGroupL394() >>> CG.trajlabels ['gs-b2ar.noH.stride.5']
- Returns
trajlabels
- Return type
list