mdciao.contacts.ContactGroup
- class mdciao.contacts.ContactGroup(list_of_contact_objects, interface_fragments=None, top=None, name=None, neighbors_excluded=None, use_AA_when_conslab_is_missing=True, max_cutoff_Ang=None)
Container for
ContactPair
-objectsThis class is the second level of abstraction after
ContactPair
and provides methods toperform operations on all the contact-pairs simultaneously and
plot/show/save the result of these operations
In many cases, the methods of
ContactGroup
thinly wrap and iterate around equally named methods of theContactPair
-objects.Note
Higher-level methods in the API, like those exposed by
mdciao.cli
will returnContactPair
orContactGroup
objects already instantiated and ready to use. It is recommened to use those instead of individually callingContactPair
orContactGroup
.- __init__(list_of_contact_objects, interface_fragments=None, top=None, name=None, neighbors_excluded=None, use_AA_when_conslab_is_missing=True, max_cutoff_Ang=None)
- Parameters:
list_of_contact_objects (list) – List of
ContactPair
objects. Will be accesseible atContactGroup.contact_pairs
.interface_fragments (list of two iterables of indexes, default is None) – An interface is defined by two groups of residue indices.
This input doesn’t need to have all or any of the residue indices in res_idxs_pairs.
This input will be used to group the object’s own residue idxs present in residxs_pairs into the two groups of the interface. These two groups will be accessible through the attribute self.interface_residxs
It will remain accessible through the object’s equally named the attribute self.interface_fragments
top (
Topology
, default is None) – The molecular topology associated with this object. Normally, the default behaviour is enough. It checks whether all ContactPairs of list_of_contact_objects share the same self.top and use that one. If they have different topologies, the method fails, since you can’t instantiate a ContactGroup with ContactPairs from different topologies. In case the ContactPairs don’t have any topology at all (self.top is None for all ContactPairs) you can pass one here. Or, if they have one, and you pass one here, it will be checked that top provided here coincides with the ContactPairs’ shared topologyname (string, default is None) – Optional name you want to give this object, ATM it is only used for the title of the
ContactGroup.plot_distance_distributions
title when the object is not a neighborhoodneighbors_excluded (int, default is None) – The neighbors excluded when creating the underlying ContactPairs passed in list_of_contact_objects
max_cutoff_Ang (float, default is None) – Operations involving cutoffs higher than this will be forbidden and will raise ValueError. Prevents the user from asking for contact-frequencies that aren’t present in the ContactGroup
Methods
__init__
(list_of_contact_objects[, ...])- Parameters:
list_of_contact_objects (list) -- List of
ContactPair
objects.
archive
([filename])Save this
ContactGroup
's list ofContactPairs
as a list of dictionaries that can be used to re-instantiate an equivalentContactGroup
binarize_trajs
(ctc_cutoff_Ang[, ...])Binarize trajs
copy
()copy this object by re-instantiating another
ContactGroup
object with the same attributes.distribution_dicts
([bins])Wraps around the method
ContactGroup.distributions_of_distances
and returns one distribution dict keyed by contact labelfrequency_as_contact_matrix
(ctc_cutoff_Ang)Returns a symmetrical, square matrix of size
top
.n_residues containing the frequencies of the pairs inresidxs_pairs
, and those pairs only, the rest will be NaNsfrequency_as_contact_matrix_CG
(ctc_cutoff_Ang)Coarse-grained contact-matrix
frequency_dataframe
(ctc_cutoff_Ang[, ...])Output a formatted dataframe with fields "label", "freq" and "sum", optionally dis-aggregated by type of contact by atom types
frequency_delta
(otherCG, ctc_cutoff_Ang[, ...])Compute per-contact frequency differences between self and some other
ContactGroup
frequency_dict_by_consensus_labels
(...[, ...])Return frequencies as a dictionary of dictionaries keyed by consensus labels
frequency_dicts
(ctc_cutoff_Ang[, sort_by_freq])Wraps around the method
ContactPair.frequency_dict
of each of the underlyingContactPair
s and returns one frequency dict keyed by contact labelfrequency_per_contact
(ctc_cutoff_Ang[, ...])Frequency per contact over all trajs :Parameters: * ctc_cutoff_Ang (float) -- The cutoff to use * switch_off_Ang (float, default is None) -- TODO
frequency_per_traj
(ctc_cutoff_Ang[, ...])Frequency per contact, per-trajectory, over all trajectory
frequency_spreadsheet
(sheet1_dataframe, ...)Write an Excel file with the
Dataframe
that is returned byself.frequency_dataframe
.frequency_str_ASCII_file
(idf[, ascii_file])Create a string with the frequencies from a
DataFrame
frequency_sum_per_residue_idx_dict
(...[, ...])Dictionary of aggregated
frequency_per_contact
per residue indices Values larger than 1 are possible, example if [0,1], [0,2] are always formed (=1) freqs_dict[0]=2frequency_sum_per_residue_names
(ctc_cutoff_Ang)Aggregate the frequencies of
frequency_per_contact
by residue name, using the most informative names possible, seeresidx2resnamefragnamebest
for more info on thisfrequency_table
(ctc_cutoff_Ang, fname[, ...])Print and/or save frequencies as a formatted table
frequency_to_bfactor
(ctc_cutoff_Ang, ...[, ...])Save the contact frequency aggregated by residue to a pdb file
gen_ctc_labels
(**kwargs)Generate a labels with different parameters
interface_frequency_matrix
(ctc_cutoff_Ang[, ...])Rectangular matrix of size (N,M) where N is the length of the first list of
interface_residxs
and M the length of the second list ofinterface_residxs
.n_ctcs_timetraces
(ctc_cutoff_Ang[, ...])time-traces of the number of contacts, by summing overall contacts for each frame
plot_distance_distributions
([bins, xlim, ...])Plot distance distributions for the distance trajectories of the contacts
plot_freqs_as_bars
(ctc_cutoff_Ang[, ...])Plot a contact frequencies as a bar plot
plot_freqs_as_flareplot
(ctc_cutoff_Ang[, ...])Produce contact flareplots by wrapping around
mdciao.flare.freqs2flare
plot_frequency_sums_as_bars
(ctc_cutoff_Ang, ...)Bar plot with per-residue sums of frequencies (called Sigma in mdciao)
plot_interface_frequency_matrix
(ctc_cutoff_Ang)Plot the
interface_frequency_matrix
plot_neighborhood_freqs
(ctc_cutoff_Ang[, ...])Wrapper around
ContactGroup.plot_freqs_as_bars
for plotting neighborhoodsplot_timedep_ctcs
([panelheight, ...])For each trajectory, plot the time-traces of the all the contacts (one per panel) and/or the timetrace of the overall number of contacts
plot_timedep_ctcs_matrix
(ctc_cutoff_Ang[, ...])Per-trajectory time-traces of the formed contacts, shown as binary traces, i.e. formed or not formed.
plot_violins
([sort_by, ctc_cutoff_Ang, ...])Plot residue-residue distances as violin plots
violinplot
relabel_consensus
([new_labels])Relabel any residue missing its consensus label to shortAA
Relative frequencies interaction-type (by atom-type) for all contact-pairs in the ContactGroup
repframes
([scheme, ctc_cutoff_Ang, ...])Find representative frames for this
ContactGroup
residx2ctcidx
(idx)Indices of the contacts and the position (0 or 1) in which the residue with residue
idx
appearsresidx2resnamefragnamebest
([fragsep, ...])Dictionary mapping residue indices to best possible residue+fragment label
retop
(top, mapping[, deepcopy])Return a copy of this object with a different topology.
save
(filename)Save this
ContactGroup
as a picklesave_trajs
(prepend_filename, ext[, ...])Save time-traces to disk.
select_by_frames
(frames)Return a copy this ContactGroup, but with a sub-selection of trajectories and frames.
select_by_residues
([CSVexpression, ...])Return a copy this ContactGroup, but with a sub-selection of ContactGroup.contact_pairs based on residues.
Break this ContactGroup (potentially containing many trajectories) into individual, per-trajectory ContactGroups
Attributes
The color associated with the fragment of the anchor residue
Label of the anchor residue of this neighborhood, including fragment
Label of the anchor residue (short) of this neighborhood, including fragment
List of pairs of labels derived from GPCR, CGN or other type of consensus nomenclature.
Dictionary mapping consensus labels to residue names:
List of
ContactPair
objects composing thisContactGroup
List of simple labels (no fragment info) for the residue pairs in ContactPairs
List of simple labels (no fragment info, short AAs) for the residue pairs in ContactPairs
List of labels ) for the residue pairs in ContactPairs
Best possible fragment names for the residue pairs in ContactPairs
Two residue lists provided at initialization
Consensus labels of whatever residues
interface_residxs
holds.Best possible residue@fragment string for the residues in
interface_residxs
The residues of self.res_idxs_pairs grouped into two lists, depending on what self.interface_fragments they belong to
Residue labels of whatever residues
interface_residxs
holdsWhether this ContactGroup can be interpreted as an interface.
Whether this ContactGroup is a neighborhood or not
Operations involving cutoffs higher than this will be forbidden and wil raise ValueError.
Per-contact maximum values over all distance time-traces
Per-contact mean values over all distance time-traces
Per-contact minimum values over all distance time-traces
//en.wikipedia.org/wiki/Mode_(statistics)>`_ over all distance time-traces
The number of contact pairs (
mdciao.contacts.ContactPair
-objects) stored in this objectList of per-trajectory n_frames
Total number of frames
The number of trajectories contained in this ContactGroup
The name of this ContactGroup, given when creating it
The number of neighbors that were excluded when creating this ContactGroup
The colors associated with the fragments of the anchor partner residues
List of labels the partner (not anchor) residues of this neighborhood, including fragment
List of labels (short) the partner (not anchor) residues of this neighborhood, including fragment
Pairs of residue indices of the contacts in this object
Pairs of long residue names of the ContactPairs
Pairs of short residue names of the ContactPairs
Dictionary mapping residue indices to consensus labels:
Dictionary mapping residue indices to best possible fragment names
Dictionary mapping residue indices to short residue names:
Dictionary mapping residue indices to short residue names:
The index of the anchor residue, i.e. the residue at the center of this neighborhood.
All ContactPair time_traces stacked into an 2D np.array
The time-arrays of each trajectory contained in this ContactGroup
Maximum time-value of the ContactGroup
Minimum time-value of the ContactGroup
The topology used to instantiate the ContactPairs in this ContactGroup
The topology used to instantiate the ContactPairs in this ContactGroup
List of trajectory labels shared by all
ContactGroup.contact_pairs
.- property anchor_fragment_color: str
The color associated with the fragment of the anchor residue
Two fragment colors were given to the individual ContactPairs that were used to instantiate this ContactGroup. These colors might have been passed by the user themselves or given by default e.g. by mdciao.cli._parse_coloring_options. Check the defaults there
Will fail if self.is_neighborhood is False
>>> CG = mdciao.examples.ContactGroupL394() >>> CG.anchor_fragment_color
‘tab:blue’
color : str
- property anchor_res_and_fragment_str: str
Label of the anchor residue of this neighborhood, including fragment
Will fail if self.is_neighborhood is False
>>> CG = mdciao.examples.ContactGroupL394() >>> CG.anchor_res_and_fragment_str 'LEU394@G.H5.26'
- Returns:
label
- Return type:
str
- property anchor_res_and_fragment_str_short: str
Label of the anchor residue (short) of this neighborhood, including fragment
Will fail if self.is_neighborhood is False
>>> CG = mdciao.examples.ContactGroupL394() >>> CG.anchor_res_and_fragment_str_short 'L394@G.H5.26'
- Returns:
label
- Return type:
str
- archive(filename=None, **kwargs)
Save this
ContactGroup
’s list ofContactPairs
as a list of dictionaries that can be used to re-instantiate an equivalentContactGroup
The method
ContactGroup.save
creates a pickle that has a lot of redundant information- Parameters:
filename (str, default is None) – Has to end in “npy”. Default is to return the dictionary
- Other Parameters:
kwargs (dict) – Optional parameters for
mdciao.contacts.ContactPair._serialized_as_dict
- Returns:
archive
- Return type:
dict
- binarize_trajs(ctc_cutoff_Ang, switch_off_Ang=None, order='contact')
Binarize trajs
- Parameters:
ctc_cutoff_Ang (float) – The cutoff to use
switch_off_Ang (float, default is None) – Implements a linear switchoff from
ctc_cutoff_Ang
toctc_cutoff_Ang`+`switch_off_Ang
. E.g. if the cutoff is 3 Ang and the switch is 1 Ang, then3.0 -> 1.0
3.5 -> .5
4.0 -> 0.0
order (str, default is “contact”) – Sort first by contact, then by traj index. Alternative is “traj”, i.e. sort first by traj index, then by contact
TODO (change the name “binarize”)
- Returns:
bintrajs – if order==traj, each item of the list is a 2D np.ndarray with of shape(Nt,n_ctcs), where Nt is the number of frames of that trajectory
- Return type:
list of boolean arrays
- property consensus_labels: list
List of pairs of labels derived from GPCR, CGN or other type of consensus nomenclature.
They were parsed at initialization
>>> CG = mdciao.examples.ContactGroupL394() >>> CG.consensus_labels [['G.H5.21', 'G.H5.26'], ['G.H5.26', '6.32'], ['G.H5.20', 'G.H5.26'], ['G.H5.26', '5.69'], ['G.H5.17', 'G.H5.26']]
- Returns:
consensus_labels
- Return type:
list
- property consensuslabel2resname: dict
Dictionary mapping consensus labels to residue names:
>>> CG = mdciao.examples.ContactGroupL394() >>> CG.consensuslabel2resname {'G.H5.21': 'R389', 'G.H5.26': 'L394', '6.32': 'K270', 'G.H5.20': 'L388', '5.69': 'L230', 'G.H5.17': 'R385'}
- Returns:
consensuslabel2resname
- Return type:
dict
- property contact_pairs
List of
ContactPair
objects composing thisContactGroup
Gives direct access for (expert) users to manipulate, plot, save, individual
ContactPair
objectsThe order of these
ContactPair
objects is the order the list_of_contact_objects passed to thisContactGroup
at initialization.- Returns:
contact_pairs – List of
ContactPair
objects- Return type:
list
- copy()
copy this object by re-instantiating another
ContactGroup
object with the same attributes.In theory self == self.copy() should hold, but not self is self.copy()
- Returns:
CG
- Return type:
- property ctc_labels: list
List of simple labels (no fragment info) for the residue pairs in ContactPairs
>>> CG = mdciao.examples.ContactGroupL394() >>> CG.ctc_labels ['ARG389-LEU394', 'LEU394-LYS270', 'LEU388-LEU394', 'LEU394-LEU230', 'ARG385-LEU394']
Returns:
ctc_labels : list
- property ctc_labels_short: list
List of simple labels (no fragment info, short AAs) for the residue pairs in ContactPairs
>>> CG = mdciao.examples.ContactGroupL394() >>> CG.ctc_labels_short ['R389-L394', 'L394-K270', 'L388-L394', 'L394-L230', 'R385-L394']
Returns:
ctc_labels_short : list
- property ctc_labels_w_fragments_short_AA: list
List of labels ) for the residue pairs in ContactPairs
>>> CG = mdciao.examples.ContactGroupL394() >>> CG.ctc_labels_short ['R389@G.H5.21-L394@G.H5.26', 'L394@G.H5.26-K270@6.32', 'L388@G.H5.20-L394@G.H5.26', 'L394@G.H5.26-L230@5.69', 'R385@G.H5.17-L394@G.H5.26']
Returns:
ctc_labels_w_fragments_short_AA : list
- distribution_dicts(bins=10, **kwargs)
Wraps around the method
ContactGroup.distributions_of_distances
and returns one distribution dict keyed by contact label- Parameters:
bins (int or sequence of scalars or str, optional, default is 10) – If bins is an int, it defines the number of equal-width bins in the given range (10, by default). If bins is a sequence, it defines a monotonically increasing array of bin edges, including the rightmost edge, allowing for non-uniform bin widths.
kwargs (dict) – Optional keyword arguments for
ContactPair.label_flex
, which are listed below
- Other Parameters:
AA_format (str, default is “short”) –
- Amino-acid format for the label, can be
“short”: A35@4.55
“long”: ALA35@4.50
“just_consensus”: 4.50 if consensus labels are present, else fail
“try_consensus”: 4.50 if consensus labels are present, else
fallback to “short”
pad_label (bool, default is True) – Pad the labels with whitespace so that stacked contact labels become easier-to-read in plain ascii formats
defrag (char, default is None) – Character to use when defragging the contact label. Default is to leave them as is, e.g. would be “@”
fmt1 (str, default is “%-15s”) – Specify how the labels of res1 should be formatted. Only has effect if pad_label is True
fmt2 (str, default is “%-15s”) – Specify how the labels of res2 should be formatted. Only has effect if pad_label is True
- Returns:
fdict
- Return type:
dictionary
- property fragment_names_best: list
Best possible fragment names for the residue pairs in ContactPairs
The fragment name will try to pick the consensus nomenclature. If no consensus label for the residue exists, the actual fragment names are used as fallback (which themselves fallback to the fragment index)
Only if no consensus label, no fragment name and no fragment indices are there, will this yeild “None” as a string.
>>> CG = mdciao.examples.ContactGroupL394() >>> CG.fragment_names_best [['G.H5.21', 'G.H5.26'], ['G.H5.26', '6.32'], ['G.H5.20', 'G.H5.26'], ['G.H5.26', '5.69'], ['G.H5.17', 'G.H5.26']]
Returns:
fragment_names_best : list
- frequency_as_contact_matrix(ctc_cutoff_Ang, switch_off_Ang=None)
Returns a symmetrical, square matrix of size
top
.n_residues containing the frequencies of the pairs inresidxs_pairs
, and those pairs only, the rest will be NaNsIf
top
is None the method will fail.Note
This is NOT the full contact matrix unless all necessary residue pairs were used to construct this ContactGroup
- Parameters:
ctc_cutoff_Ang (float) – The cutoff to use
switch_off_Ang (float, default is None) – TODO
- Returns:
mat
- Return type:
numpy.ndarray
- frequency_as_contact_matrix_CG(ctc_cutoff_Ang, switch_off_Ang=None, fragments=None, fragment_names=None, consensus_labelers=None, verbose=False, sparse=False, interface=False, zero_freq=0.01, dec_round=3, return_fragments=False)
Coarse-grained contact-matrix
Frequencies of
self.frequency_per_contact
get coarse-grained into fragments. Fragment definitions come from fragments and/or from theconsensus_labelers
. These definitions need to contain all residues in self.res_idxs_pairsUser-defined and consensus-derived fragment definitions get spliced together using
splice_orphan_fragments
. This might lead to sub-sets of the input fragments getting re-labeled as “subfrags” and residues not defined anywhere being labelled “orphans”. This leads to cumbersome fragment names (and can change in the future), but at least its “traceable” for the momentIf you want to have the fragment definitions, use
return_fragments
= TrueAnytime some argument leads to a row/column being deleted from the output, the matrix is returned as an annotated
DataFrame
, to be able to provide row/columns with names and keep track of their meaningIf
interface
is True and thisContactGroup
is indeed an interface, the matrix will be asymmetric.If :self:`top` is None the method will fail.
Note
This is NOT the full contact matrix unless all necessary residue pairs were used to construct this ContactGroup
- Parameters:
ctc_cutoff_Ang (float) – The cutoff to use
switch_off_Ang (float, default is None) – TODO
fragments (dict) – The fragment definitions
fragment_names (iterable of strings, default is None) – The names of the fragments
consensus_labelers (list, default is None) – It has to contain
LabelerConsensus
-objects, where the fragments are obtained from.verbose (bool, default is False) – Be verbose
sparse (bool, default is False) – Delete rows and columns where all elements are < zero_freq. Since the row/column indices lose their meaning this way, a DataFrame with named row/columns is returned instead of an array If no
fragment_names
are passed, some will be created.interface (bool, default is False) – If True, an asymmetric matrix is reported, with rows and columns representing fragments on each side of the interface, respectively. Since this is done using
self.interface_residxs
, and not all input fragments are necessarily contained therein, interface=True introduces a sparsity, which makes the return type be a DataFrame (see above)zero_freq (float, default is 0.2) – Only has effect when
sparse
is True. The cutoff for a frequency to be considered zerodec_round (int, default is 3) – The number of decimals to round to when reporting results. It’s assumed the CG matrix doesn’t need much precision beyond this
return_fragments (bool, default is False) – Whether to return the fragments that the input produced.
- Returns:
mat (numpy.ndarray or
DataFrame
) – The coarse-grained contact matrixfragments (dict) – The fragment definitions
- frequency_dataframe(ctc_cutoff_Ang, switch_off_Ang=None, atom_types=False, sort_by_freq=False, **ctc_fd_kwargs)
Output a formatted dataframe with fields “label”, “freq” and “sum”, optionally dis-aggregated by type of contact by atom types
- Parameters:
ctc_cutoff_Ang (float) – The cutoff to use
switch_off_Ang (float, default is None) – TODO
atom_types (bool, default is false) – Include the relative frequency of atom-type-pairs involved in the contact
sort_by_freq (bool, default is False) – Sort by descending frequency value, default is to keep the order of self.contact_pairs
ctc_fd_kwargs (named optional arguments) – Optional parameters for
mdciao.ContactPair.frequency_dict
, which are listed below.
- Other Parameters:
AA_format (str, default is “short”) –
- Amino-acid format for the label, can be
“short”: A35@4.55
“long”: ALA35@4.50
“just_consensus”: 4.50 if consensus labels are present, else fail
“try_consensus”: 4.50 if consensus labels are present, else
fallback to “short”
pad_label (bool, default is True) – Pad the labels with whitespace so that stacked contact labels become easier-to-read in plain ascii formats
defrag (char, default is None) – Character to use when defragging the contact label. Default is to leave them as is, e.g. would be “@”
fmt1 (str, default is “%-15s”) – Specify how the labels of res1 should be formatted. Only has effect if pad_label is True
fmt2 (str, default is “%-15s”) – Specify how the labels of res2 should be formatted. Only has effect if pad_label is True
- Returns:
df
- Return type:
- frequency_delta(otherCG, ctc_cutoff_Ang, residuemap=None)
Compute per-contact frequency differences between self and some other
ContactGroup
The difference is defined as
\(\Delta_{AB} = freq_B - freq_A\),
i.e. the delta that occurs upon “reacting” from self to otherCG
No sanity checks are performed, residue indices are assumed to have the same meaning in both self and otherCG, unless residuemap is provided.
- Parameters:
otherCG (
ContactGroup
) – The ContactGroup to compute the difference withctc_cutoff_Ang (float) – The cutoff to use to compute the frequencies
residuemap (dict) – Maps residue indices of otherCG to residue indices of self, in case self and are different topologies.
>>> residuemap[0]=20
Means the residue with the index 0 in otherCG is the residue with the index 20 in this ContactGroup. (self)
Residues of otherCG absent of residuemap are un-mappable to self and thus their associated frequencies ignored, so beware of incomplete maps.
- Returns:
delta_freq (1D np.ndarray) – The value resulting from doing otherCG.frequency_per_contact(ctc_cutoff_Ang)-self.frequency_per_ctc(ctc_cutoff_Ang
res_idxs_pairs (2D np.ndarray of len(delta_freq)) – The res_idxs_pairs for the
delta_freq
values
- frequency_dict_by_consensus_labels(ctc_cutoff_Ang, switch_off_Ang=None, return_as_triplets=False, sort_by_interface=False, include_trilower=False)
Return frequencies as a dictionary of dictionaries keyed by consensus labels
Note
Will fail if not all residues have consensus labels TODO this is very similar to
frequency_sum_per_residue_names
, look at the usecase closesely and try to unify both methods- Parameters:
ctc_cutoff_Ang (float) – The cutoff to use
switch_off_Ang (float, default is None) – TODO
return_as_triplets (bool, default is False) – Return as the dictionary as a list of triplets, s.t. freq_dict[3.50][4.50]=.25 is returned as [[3.50,4.50,.25]] Makes it easier to iterate through in other methods
sort_by_interface (bool, default is False) – Not implemented AT, will raise NotImplementedError
include_trilower (bool, default is False) – Include the transposed indexes in the returned dictionary. s.t. the contact pair [3.50][4.50]=.25 also generates [4.50][3.50]=.25
- Returns:
freqs
- Return type:
dictionary of dictionary or list of triplets (if return_as_triplets is True)
- frequency_dicts(ctc_cutoff_Ang, sort_by_freq=False, **kwargs)
Wraps around the method
ContactPair.frequency_dict
of each of the underlyingContactPair
s and returns one frequency dict keyed by contact label- Parameters:
ctc_cutoff_Ang (float) – Cutoff in Angstrom. The comparison operator is “<=”
sort_by_freq (bool, default is False) – Sort by descending frequency. Default is to return in the same order as
ContactGroup._contacts
kwargs (optional keyword arguments for) –
ContactPair.frequency_dict
, which are listed below:
- Other Parameters:
%(substitute_kwargs)s
- Returns:
fdict
- Return type:
dictionary
- frequency_per_contact(ctc_cutoff_Ang, switch_off_Ang=None)
Frequency per contact over all trajs :Parameters: * ctc_cutoff_Ang (float) – The cutoff to use
switch_off_Ang (float, default is None) – TODO
- Returns:
freqs
- Return type:
1D np.ndarray of len(n_ctcs)
- frequency_per_traj(ctc_cutoff_Ang, switch_off_Ang=None) ndarray
Frequency per contact, per-trajectory, over all trajectory
Wraps around
mdciao.contacts.ContactPair.frequency_per_traj
- Parameters:
ctc_cutoff_Ang (float) – The cutoff to use
switch_off_Ang (float, default is None) – TODO
- Returns:
freqs – Shape (n,m) is (self.n_trajs, self.n_ctcs)
- Return type:
np.ndarray
- frequency_spreadsheet(sheet1_dataframe, sheet2_dataframes, ctc_cutoff_Ang, fname_excel, sheet1_name='pairs by frequency', sheet2_name='residues by frequency')
Write an Excel file with the
Dataframe
that is returned byself.frequency_dataframe
.- Parameters:
sheet1_dataframe (
DataFrame
) – Normally, these are pairwise frequenciessheet2_dataframes (list) – Contains
DataFrame
objects with per-residue frequenciesctc_cutoff_Ang (float) – The cutoff used
fname_excel (str) – The filename to save to
sheet1_name (str, default is “pairs by frequency”,)
sheet2_name (str, default is ‘residues by frequency’)
- frequency_str_ASCII_file(idf, ascii_file=None)
Create a string with the frequencies from a
DataFrame
- Parameters:
idf (
DataFrame
) – A frequency table, typically generated by self.frequency_dataframeascii_file (str, default is None) – Instead of returning the formatted a table as a string, provided a filename here and write the frequencies will be directly written to it
- Returns:
freq_str
- Return type:
str or None
- frequency_sum_per_residue_idx_dict(ctc_cutoff_Ang, switch_off_Ang=None, sort_by_freq=True, return_array=False)
Dictionary of aggregated
frequency_per_contact
per residue indices Values larger than 1 are possible, example if [0,1], [0,2] are always formed (=1) freqs_dict[0]=2- Parameters:
ctc_cutoff_Ang (float) – The cutoff to use
switch_off_Ang (float, default is None) – TODO
sort_by_freq (bool, default is True) – Sort the dictionary by descending order of frequency. If False, it will be sorted by residue index.
sort_by_freq
only has effect ifreturn_array
is Falsereturn_array (bool, default is False) – If True, the return value is not a dict but an array of len(self.top.n_residues). In this case, sort_by_freq doesn’t have any effect.
- Returns:
freqs_dict – If dict, keys are the residue indices present in
res_idxs_pairs
If array, idxs are the residue indices of self.top- Return type:
dictionary or array
- frequency_sum_per_residue_names(ctc_cutoff_Ang, switch_off_Ang=None, sort_by='freq', AA_format='short', list_by_interface=False, return_as_dataframe=False)
Aggregate the frequencies of
frequency_per_contact
by residue name, using the most informative names possible, seeresidx2resnamefragnamebest
for more info on this- Parameters:
ctc_cutoff_Ang (float) – The cutoff to use
switch_off_Ang (float, default is None) – TODO
sort_by (str or None, default is None) – The frequencies are returned by default in the order in which the
ContactPair
-objects are stored in theContactGroup.contact_pairs
. This order depends on the ctc_cutoff_Ang originally used to instantiate thisContactGroup
You can re-sort them for display purposes, leaving the original order untouched, via:AA_format (str, default is ‘short’) – Use E30@3.50 instead of GLU30@3.50. Alternatives are:
“long”: GLU30@3.50
“just_consensus”: 3.50, fail if none is found
“try_consensus”: 3.50, fallback to “short” if none is found
list_by_interface (bool, default is False) – group the freq_dict by interface residues. Only has an effect if self.is_interface
return_as_dataframe (bool, default is False) – Return an
DataFrame
with the column names labels and freqs
- Returns:
res – list of dictionaries (or dataframes). If list_by_interface is True, then the list has two items, default (False) is to be of len=1
- Return type:
list
- frequency_table(ctc_cutoff_Ang, fname, switch_off_Ang=None, write_interface=True, sort_by_freq=False, **freq_dataframe_kwargs)
Print and/or save frequencies as a formatted table
Internally, it calls
frequency_spreadsheet
and/orfrequency_str_ASCII_file
depending on the extension offname
If you want a
DataFrame
usefrequency_dataframe
- Parameters:
ctc_cutoff_Ang (float) – The cutoff to use
fname (str or None) – Full path to the desired filename Spreadsheet extensions are currently only ‘.xlsx’, all other extensions save to formatted ascii. None returns the formatted ascii string.
switch_off_Ang (float, default is None) – TODO
write_interface (bool, default is True) – Only has effect if self.is_interface is True A second sheet will be added to the table where residues are sorted by interface membership and per-residue interface participation.
sort_by_freq (bool, default is False) – Only has effect if self.is_interface is True and
write_interface
is True. Sort the second sheet by descending order of frequencies If False, residues are in ascending order within each member of the interface, as returned by self.interface_residxsfreq_dataframe_kwargs (dict) – Optional parameters for
self.frequency_dataframe
, which are listed below.
- Other Parameters:
switch_off_Ang (float, default is None) – TODO
atom_types (bool, default is false) – Include the relative frequency of atom-type-pairs involved in the contact
sort_by_freq (bool, default is False) – Sort by descending frequency value, default is to keep the order of self.contact_pairs
AA_format (str, default is “short”) –
- Amino-acid format for the label, can be
“short”: A35@4.55
“long”: ALA35@4.50
“just_consensus”: 4.50 if consensus labels are present, else fail
“try_consensus”: 4.50 if consensus labels are present, else
fallback to “short”
pad_label (bool, default is True) – Pad the labels with whitespace so that stacked contact labels become easier-to-read in plain ascii formats
defrag (char, default is None) – Character to use when defragging the contact label. Default is to leave them as is, e.g. would be “@”
fmt1 (str, default is “%-15s”) – Specify how the labels of res1 should be formatted. Only has effect if pad_label is True
fmt2 (str, default is “%-15s”) – Specify how the labels of res2 should be formatted. Only has effect if pad_label is True
- Returns:
table – If
fname
is None, then return the table as formatted string, using- Return type:
None or str
- frequency_to_bfactor(ctc_cutoff_Ang, pdbfile, geom, interface_sign=False, verbose=True)
Save the contact frequency aggregated by residue to a pdb file
- Parameters:
ctc_cutoff_Ang (float) – The cutoff to use
pdbfile (str) – The path to the pdbfile to save the
geom
geom (
mdtraj.Trajectory
) – Has to have the same topology asself.top
interface_sign (bool, default is False) – Give the bfactor values of the members of the interface different sign s.t. the appear with different colors in a visualizer
verbose (bool, default is True) – Inform of the file being saved
- Returns:
bfactors
- Return type:
1D np.array of len(self.top.n_atoms)
- gen_ctc_labels(**kwargs) list
Generate a labels with different parameters
Wraps around
mdciao.contacts.ContactPair.gen_label
- Parameters:
AA_format (str, default is “short”) –
- Options are:
“short”: “E30@3.50”
“long”: GLU30@3.50
“just_consensus”: 3.50, fail if none is found
“try_consensus”: 3.50, fallback to “short” if none is found
fragments (bool, default is False) – Include fragment information Will get the “best” information available, ie consensus>fragname>fragindex When trying to get consensus labels, this option is ignored, s.t. the full “E30@3.50” is returned regardless.
delete_anchor (bool, default is False) – Delete the anchor from the label
- Returns:
labels
- Return type:
list
- property interface_fragments: list
Two residue lists provided at initialization
They are supersets of the residues contained in self.interface_residxs
Empty lists mean no residues were found in the interface defined at initialization
- Returns:
interface_fragments
- Return type:
list
- interface_frequency_matrix(ctc_cutoff_Ang, switch_off_Ang=None)
Rectangular matrix of size (N,M) where N is the length of the first list of
interface_residxs
and M the length of the second list ofinterface_residxs
.Note
Pairs missing from
res_idxs_pairs
will be NaNs, to differentiate from those pairs that were present but have zero contact- Parameters:
ctc_cutoff_Ang (float) – The cutoff to use
switch_off_Ang (float, default is None) – TODO
- Returns:
mat
- Return type:
2D numpy.ndarray
- property interface_labels_consensus
Consensus labels of whatever residues
interface_residxs
holds.If there is no consensus labels, the corresponding label is None
- property interface_residue_names_w_best_fragments_short
Best possible residue@fragment string for the residues in
interface_residxs
In case neither a consensus label > fragment name > fragment index is found, nothing is returned after the residue name
- property interface_residxs: list
The residues of self.res_idxs_pairs grouped into two lists, depending on what self.interface_fragments they belong to
Empty lists mean no residues were found in the interface defined at initialization
- Returns:
interface_residxs
- Return type:
list
- property interface_reslabels_short
Residue labels of whatever residues
interface_residxs
holds
- property is_interface
Whether this ContactGroup can be interpreted as an interface.
Note
If none of the
residxs_pairs
were found in theinterface_residxs
(both provided at initialization), this property will evaluate to False even if some indeces were parsed
- property is_neighborhood: bool
Whether this ContactGroup is a neighborhood or not
When instantiating this ContactGroup, it is checked whether all the used
ContactPair
have a shared :obj:anchor_residue_idx attribute, whichand whether if self.neighbors_excluded is None. This means this ContactGroup is a neighborhood around the residue stored in the attribute self.shared_anchor_residue_index- Other neighborhood-only attributes get populated, e.g.
self.anchor_res_and_fragment_str
self.anchor_res_and_fragment_str_short
self.partner_res_and_fragment_labels
self.partner_res_and_fragment_labels_short
self.partner_fragment_colors
self.anchor_fragment_color
Note that all these attributes will raise an Exception when called if self.is_neighborhood is False
- Returns:
is_neighborhood
- Return type:
bool
- property max_cutoff_Ang: float
Operations involving cutoffs higher than this will be forbidden and wil raise ValueError.
- property maxima
Per-contact maximum values over all distance time-traces
- Returns:
mean – No unit transformation is done, whatever was given at instantiation (most likely nanometers), is returned here
- Return type:
1D np.array of len(self.n_ctcs)
- property means
Per-contact mean values over all distance time-traces
- Returns:
mean – No unit transformation is done, whatever was given at instantiation (most likely nanometers), is returned here
- Return type:
1D np.array of len(self.n_ctcs)
- property minima
Per-contact minimum values over all distance time-traces
- Returns:
mean – No unit transformation is done, whatever was given at instantiation (most likely nanometers), is returned here
- Return type:
1D np.array of len(self.n_ctcs)
- property modes
//en.wikipedia.org/wiki/Mode_(statistics)>`_ over all distance time-traces
Note
In order to quickly compute modes, residue-residue distances are multiplied by 1000 and rounded to integers, to be able to use
numpy.bincount
for speed. Then, the argmax(bincount) is returned- Returns:
modes – No unit transformation is done, whatever was given at instantiation (most likely nanometers), is returned here
- Return type:
1D np.array of len(self.n_ctcs)
- Type:
Per-contact `modes <https
- property n_ctcs: int
The number of contact pairs (
mdciao.contacts.ContactPair
-objects) stored in this object- Returns:
n_ctcs
- Return type:
int
- n_ctcs_timetraces(ctc_cutoff_Ang, switch_off_Ang=None)
time-traces of the number of contacts, by summing overall contacts for each frame
- Parameters:
ctc_cutoff_Ang (float) – The cutoff to use
switch_off_Ang (float, default is None) – TODO
- Returns:
nctc_trajs
- Return type:
list of 1D np.ndarrays
- property n_frames: list
List of per-trajectory n_frames
- Returns:
n_frames
- Return type:
list
- property n_frames_total: int
Total number of frames
- Returns:
n_frames_total
- Return type:
int
- property n_trajs: int
The number of trajectories contained in this ContactGroup
- Returns:
n_trajs
- Return type:
int
- property name: str
The name of this ContactGroup, given when creating it
- Returns:
name
- Return type:
str
- property neighbors_excluded: int
The number of neighbors that were excluded when creating this ContactGroup
- Returns:
neighbors_excluded
- Return type:
int
- property partner_fragment_colors
The colors associated with the fragments of the anchor partner residues
The fragment colors were given as pairs of values to the individual ContactPairs that were used to instantiate this ContactGroup. These colors might have been passed by the user themselves or given by default e.g. by mdciao.cli._parse_coloring_options. Check the defaults there.
>>> CG = mdciao.examples.ContactGroupL394() >>> CG.partner_fragment_colors ['tab:blue', 'tab:blue', 'tab:blue', 'tab:blue', 'tab:blue']
or
>>> CG = mdciao.examples.ContactGroupL394(fragment_colors=["red","blue","yellow","orange","black"]) >>> CG.partner_fragment_colors ['red', 'orange', 'red', 'orange', 'red']
Note
This colors are not automatically used by self.plot_neighborhood_freqs or self.plot_freqs_as_bars unless passed as
color=self.partner_fragment_colors
Will fail if self.is_neighborhood is False
- Returns:
color
- Return type:
str
- property partner_res_and_fragment_labels: list
List of labels the partner (not anchor) residues of this neighborhood, including fragment
Will fail if self.is_neighborhood is False
>>> CG = mdciao.examples.ContactGroupL394() >>> CG.partner_res_and_fragment_labels ['ARG389@G.H5.21', 'LYS270@6.32', 'LEU388@G.H5.20', 'LEU230@5.69', 'ARG385@G.H5.17']
- Returns:
labels
- Return type:
list
- property partner_res_and_fragment_labels_short: list
List of labels (short) the partner (not anchor) residues of this neighborhood, including fragment
Will fail if self.is_neighborhood is False
>>> CG = mdciao.examples.ContactGroupL394() >>> CG.partner_res_and_fragment_labels_short
- [‘R389@G.H5.21’,
‘K270@6.32’, ‘L388@G.H5.20’, ‘L230@5.69’, ‘R385@G.H5.17’]
labels : list
- plot_distance_distributions(bins=10, xlim=None, ax=None, shorten_AAs=False, ctc_cutoff_Ang=None, legend_sort=True, label_fontsize_factor=1, max_handles_per_row=4, defrag=None, smooth_bw=False, background=True) Axes
Plot distance distributions for the distance trajectories of the contacts
The title will get try to get the name from
self.name
- Parameters:
bins (int, default is 10) – How many bins to use for the distribution
xlim (iterable of two floats, default is None) – Limits of the x-axis. Outlier can stretch the scale, this forces it to a given range
ax (
Axes
, default is None) – One will be created if None is passedshorten_AAs (bool, default is False) – Use amino-acid one-letter codes
ctc_cutoff_Ang (float, default is None) – Include in the legend of the plot how much of the distribution is below this cutoff. A vertical line will be draw at this x-value nearest bonded neighbors were excluded
legend_sort (boolean, default is True) – Sort the legend in descending order of frequency. Has only an effect when
ctc_cutoff_Ang
is not Nonelabel_fontsize_factor (int, default is 1) – Labels will be written in a fontsize rcParams[“font.size”] * label_fontsize_factor
max_handles_per_row (int, default is 4) – legend control
defrag (char, default is None) – Delete fragment labels from the residue labels, “G30@frag1”->”G30”. If None, don’t delete the fragment label
smooth_bw (bool or float) – If True smooth the histogram using a Gaussian-kernel-density estimation with an estimator bandwidth of .5 Angstrom. If float, use this value as estimator bandwidth, check
matplotlib.mlab.GaussianKDE
for more info. If False, don’t smoothbackground (bool, or color-like, (str, hex, rgb), default is True) – When smoothing, the original curve can appear in the background in different colors
True: use a fainted version of color
False: don’t plot any background
color-like: use this color for the background, can be: str, hex, rgba, anything
matplotlib.pyplot.colors
understands
- Returns:
ax
- Return type:
- plot_freqs_as_bars(ctc_cutoff_Ang, title_label=None, switch_off_Ang=None, xlim=None, ax=None, color='tab:blue', shorten_AAs=False, label_fontsize_factor=1, lower_cutoff_val=None, plot_atomtypes=False, sort_by=None, sum_freqs=True, total_freq=None, defrag=None, cumsum=False)
Plot a contact frequencies as a bar plot
- Parameters:
ctc_cutoff_Ang (float) – The cutoff to use
title_label (str, default is None) – If None, the method will default to self.name If self.name is also None, the method will fail
switch_off_Ang (float, default is None) – TODO
xlim (float, default is None) – The right limit of the x-axis. +.5 will be added to this number to accommodate some padding around the bars. If None, it’s chosen automatically
ax (
Axes
, default is None) – Draw into this axis. If None is passed, then one will be createdshorten_AAs (bool, default is None) – Shorten residue labels from “GLU30” to “E30”
color (color-like (str or RGB triple) or list thereof, default is “tab:blue”) – The color for the bars. If string or RGB array, all bars will have this color. If list, it’s assumed in the order of the self.res_idx_pairs. It will get re-sorted according to
sort
, s.t. residues always have the same color not matter the ordershorten_AAs (bool, default is None) – Shorten residue labels from “GLU30” to “E30”
label_fontsize_factor (float, default is 1) – Labels will be written in a fontsize rcParams[“font.size”] * label_fontsize_factor
lower_cutoff_val (float, default is None) – Only plot frequencies above this value. Default is to plot all
plot_atomtypes (bool, default is False) – Use stripe-patterns to inform about the types of interactions (sidechain, backbone, etc)
sort_by (str or None, default is None) – The frequencies are by default plotted in the order in which the
ContactPair
-objects are stored in theContactGroup.contact_pairs
. This order depends on the ctc_cutoff_Ang originally used to instantiate thisContactGroup
You can re-sort them for display purposes, leaving the original order untouched, via:sum_freqs (bool, default is True) – Inform, in the legend and in the title, about the sum of frequencies/bar-heights being plotted
total_freq (float, default is None) – Add a line to the title informing about the fraction of the total_freq that’s being plotted in the figure. Only has an effect if
sum_freqs
is Truedefrag (str, default is None) – Delete fragment labels from the residue labels, “G30@frag1”->”G30”. If None, don’t delete the fragment label
cumsum (bool, default is False) – Plot the cumulative frequency (aka cumsum, as in
numpy.cumsum
) as a faint dotted line in the graph. This quantity:Is normalized to 1 s.t. the summed frequencies
numerically coincide with the y-axis limit
Sums over all available frequencies in this
ContactGroup
,
regardless of the value of truncate, which hides some of these. I.e. it might be that you don’t see the cummulative frequency fully arrive at 1 if some small contributions have been truncated
- Returns:
ax
- Return type:
- plot_freqs_as_flareplot(ctc_cutoff_Ang, fragments=None, fragment_names=None, fragment_colors=None, consensus_maps=None, SS=None, scheme='auto', **kwargs_freqs2flare)
Produce contact flareplots by wrapping around
mdciao.flare.freqs2flare
Note
The logic to assign fragments and colors can lead to unexpected behavior in cases where too much guess-work has to be done. If a particular combination of fragments and colors is desired but not achievable through this method, it is highly recommended the user uses
mdciao.flare.freqs2flare
directly and experiment there with parameter combinations. It is also a good idea to check out the notebook called “Controlling Flareplots”- Parameters:
ctc_cutoff_Ang (float) – The cutoff to use
fragments (string or list of iterables, default is None) – The way the topology is fragmented. Default is to put all residues in one fragment. This optarg can modify the behaviour of scheme=’all’, since residues absent from fragments will not be plotted, see below. If string, it will be passed as method to :obj:mdciao.fragments.get_fragments`, to get the fragments on the fly.
fragment_names (list of strings, default is None) – The fragment names, at least len(fragments)
fragment_colors (None or list of color-likes) – Will be used to give the fragments their colors, needs to be color-like and of len(fragments)
consensus_maps (list, default is None) –
- The items of this list are either:
- indexables containing the consensus
labels (strings) themselves. They need to be “gettable” by residue index, i.e. dict, list or array. Typically, one generates these maps by using
mdciao.nomenclature.LabelerConsensus.top2labels
.
mdciao.nomenclature.LabelerConsensus
-objectsWhen these objects are passed, their
mdciao.nomenclature.LabelerConsensus.top2labels
andmdciao.nomenclature.LabelerConsensus.top2fragments
are called on-the-fly, generating not only the consensus labels but also the consensus fragments (i.e. subdomains) to further fragment the topology into sub-domains, like TM6 or G.H5. If fragments are parsed, they will be made compatible with the consensus fragments.
If you want the consensus labels but not the sub-fragmentation, simply use the first option.
SS (secondary structure information, default is None) – Whether and how to include information about secondary structure. Can be many things:
- triple of ints (CP_idx, traj_idx, frame_idx)
Go to contact group CP_idx, trajectory traj_idx and grab this frame to compute the SS. Will read xtcs when necessary or otherwise directly grab it from a
mdtraj.Trajectory
in case it was passed. Ignores potential stride values. SeeContactPair.time_traces
for more info
- True
same as [0,0,0]
- None or False
Do nothing
mdtraj.Trajectory
Use this geometry to compute the SS
- string
Path to a filename, of which only the first frame will be read. The SS will be computed from there. The file will be tried to read first without topology information (e.g. .pdb, .gro, .h5) will work, and when this fails, self.top will be passed (e.g. .xtc, .dcd)
- array_like
Use the SS from here, s.t. ss_inf[idx] gives the SS-info for the residue with that idx
scheme (str, default is ‘auto’) –
- How to decide which residues to plot
- ‘all’
plot as many residues as possible. E.g., if a
self.topology
is present, plot all its residues. This can be modified with fragments, see above. Using ‘all’ without any fragments means that the topology won’t be separated into interface fragments, even if it is an interface. Given that some of the topology (which the user insists on plotting) might not have been assigned to either side of the interface, it’s unclear how to proceed here.
- ‘interface’:
use only the fragments in
self.interface_fragments
. Will only work if self.is_interface is True
- ‘auto’
Uses
self.is_interface
to decide. If True, scheme is set to ‘interface’. If False, e.g. a residue neighborhood or a site, then scheme is set to ‘all’
- ‘interface_sparse’:
like ‘interface’, but using the input fragments to break self.interface_fragments (which are only two, by definition) further down into other fragments. Of these, show only the ones where at least one residue participates in the interface. If fragments is None, scheme=’interface’ and scheme=’interface_sparse’ are the same thing.
- ‘residues’:
plot only the residues present in self.res_idxs_pairs
- ‘residues_sparse’ :
plot only the residues that have a non-zero frequency
- ‘consensus_sparse’:
like ‘interface_sparse’, but leaving out sub-domains not participating in the interface with any contacts.For this, the consensus_maps need to be actual LabelerConsensus-objects
kwargs_freqs2flare (dict) – Optional keyword arguments for
mdciao.flare.freqs2flare
. Note that many of these kwargs will be overwritten internally by this method, mostly to accommodate the scheme+fragment+color combinations, but not only (please see the note above). These are the kwargs that this method manipulates internally and might be overwritten:top, ss_array, fragments, fragment_names fragment_names, colors
Note that some of values in kwargs_freqs2flare (in particular sparse_residues) might alter (with or w/o conflict) the scheme option. The full list of optional arguments is listed below
- Other Parameters:
sparse_residues (boolean, default is False) – Show only those residues that appear in the initial
res_idxs_pairs
Note
There is a development option for this argument where a residue list is passed, meaning, show these residues regardless of any other option that has been passed. Perhaps this changes in the future.
sparse_fragments (boolean, default is False) – Same as
sparse_residues
, but with fragments. Whensparse_residues
isn’t False, this option has no effect.exclude_neighbors (int, default is 1) – Do not show contacts where the partners are separated by these many residues. If no
top
is passed, the neighborhood-condition is checked using residue serial-numbers, assuming the molecule only has one long peptidic-chain.freq_cutoff (float, default is 0) – Contact frequencies lower than this value will not be shown
ax (
Axes
) – Parse an axis to draw on, otherwise one will be created using panelsize. In case you want to re-use the same circle of residues as a background to plot different sets of freqs, YOU HAVE TO USE THE SAME fragments and sparse valueson all calls, else the
bezier lines will be placed erroneously.
center (np.ndarray, default is [0,0]) – In axis units, where the flareplot will be centered around
r (float, default is 1) – In axis units, the radius of the flareplot
mute_fragments (iterable of integers, default is None) – Curves involving these fragments will be hidden. Fragments are expressed as indices of
fragments
anchor_fragments (iterable of integers, default is None) – Curves not involving these fragments will be not be shown, i.e. it is the complementary
of
mute_fragments
. Both cannot be passed simultaneously.panelsize (float, default is 10) – Size in inches of the panel (=figsize in matplotlib). Will be ignored if a pre-existing axis object is parsed
angle_offset (float, default is 0) – In degrees, where the flareplot “begins”. Zero is xy = [1,0]
highlight_residxs (iterable of ints, default is None) – Show the labels for these residues in red
select_residxs (iterable of ints, default is None) – Only the residues here can be connected with a Bezier curve
fontsize (float, default is None) – Currently, the fontsize is internally computed as a function of the dotsize, since the space available for the labels is determined by the dotsize. There’s plans for user control in the future, but until then NotImplementedError will be thrown
shortenAAs (boolean, default is True) – Use short AA-codes, e.g. E30 for GLU30. Only has effect if a topology is parsed
aa_offset (int, default is 0) – Add this number to the resSeq value
markersize (float, default is None) – The size of the dots. It is internally optimized to have adjacent dots fill the available space without overlapping among them. There’s plans for user control in the future, but until then NotImplementedError will be thrown
bezier_linecolor (color-like, default is ‘k’) – The color of the bezier curves connecting the residues. Can be a character, string or RGB value (not RGBA)
plot_curves_only (bool, default is False) – Only plot the curves connecting the dots, but not the dots themselves or any other annotation. (labels, fragment names or SS information). The same caution as
ax
applies.textlabels (bool or array_like, default is True) – How to label the residue dots. Gets passed directly to
mdciao.flare.circle_plot_residues
. Options are:True: the dots representing the residues will get a label automatically, either their serial index or the residue name, e.g. GLU30, if a
top
was passed.False: no labeling
array_like : will be passed as
replacement_labels
tomdciao.flare.add_fragmented_residue_labels
padding (iterable of len 3, default is [1,1,1]) – The padding, expressed as empty dot positions. Each number is used for:
the beginning of the flareplot, before the first residue
between fragments
at the end of the plot, after the last residue
lw (float, default is None) – Line width of the contact lines
signed_colors (dict, default is None) – Provide a color dictionary, e.g. {-1:”b”, +1:”r”} to give different colors to positive and negative alpha values. If None, defaults to
bezier_linecolor
subplot (bool, default is False) – If True, the method checks if
ax
is the last axis in a figure (=all other panels have been already drawn) and then transfers the last plot’s fontsizes and linewidths to panels (if possible). It will help produce more homogeneous plots when heuristics about font-sizing failaura (iterable, default is None) – Scalar array (positive or negative), indexed with residue indices, e.g. RMSF, SASA, degree of conservation etc. It will be drawn as an aura around the flareplot.
coarse_grain (bool, default is False) – If True, will use the fragment definitions of
fragments
and/or sparse_fragments to coarse grain the frequencies into per-fragment frequencies and show them as a chord-diagram wrapping aroundfreqs2chord
Check there for more infonormalize_to_sigma (bool or float, default is False) – Only used if coarse_grain is True. Allows for scaling the arc occupied by the chords to a particular sigma value. This is explained in detail in the documentation of
freqs2chord
.
- Returns:
ifig (
Figure
)ax (
Axes
)flareplot_attrs (dict) – Flareplot attributes as dictionary containing matplotlib objects (texts, dots, curves etc) for further manipulation and fine tuning of the plot if necessary. See the returned values of
mdciao.flare.freqs2flare
for more information.
- plot_frequency_sums_as_bars(ctc_cutoff_Ang, title_str, switch_off_Ang=None, xmax=None, ax=None, shorten_AAs=False, label_fontsize_factor=1, lower_cutoff_val=0, bar_width_in_inches=0.75, list_by_interface=False, sort_by='freq', interface_vline=False)
Bar plot with per-residue sums of frequencies (called Sigma in mdciao)
- Parameters:
ctc_cutoff_Ang (float) – The cutoff to use
title_str (str) – The title of the plot
switch_off_Ang (float, default is None) – TODO
xmax (float, default is None) – X-axis will extend from -.5 to xmax+.5
ax (obj:~matplotlib.axes.Axes`, default is None) – If None, one will be created, else draw here
shorten_AAs (boolean, default is False) – Unused ATM
label_fontsize_factor (float, default is 1) – Some control over fontsizes when plotting a high number of bars
lower_cutoff_val (float, default is 0) – Do not show sums of freqs lower than this value
bar_width_in_inches (float, default is .75) – If no
ax
is parsed, this controls that the drawn figure always has a size proportional to the number of frequencies being shown. Allows for combining multiple subplots with different number of bars in one figure with all bars equally wide regardles of the subplotlist_by_interface (boolean, default is True) – Separate residues by interface
sort_by (str or None, default is None) – The frequencies are by default plotted in the order in which the
ContactPair
-objects are stored in theContactGroup.contact_pairs
. This order depends on the ctc_cutoff_Ang originally used to instantiate thisContactGroup
You can re-sort them for display purposes, leaving the original order untouched, via:interface_vline (bool, default is False) – Plot a vertical line visually separating both interfaces
- Returns:
ax
- Return type:
- plot_interface_frequency_matrix(ctc_cutoff_Ang, switch_off_Ang=None, transpose=False, label_type='best', **kwargs_plot_matrix)
Plot the
interface_frequency_matrix
The first group of
interface_residxs
are the row indices, shown in the y-axis top-to-bottom (since imshow is used to plot) The second group ofinterface_residxs
are the column indices, shown in the x-axis left-to-right- Parameters:
ctc_cutoff_Ang (float) – The cutoff to use
switch_off_Ang (float, default is None) – TODO
transpose (bool, default is False) – Transpose the contact matrix in the plot
label_type (str, default is “best”) – Best tries resname@consensus(>fragname>fragidx) Alternatives are “residue” or “consensus”, but “consensus” alone might lead to empty labels since it is not guaranteed that all residues of the interface have consensus labels
kwargs_plot_matrix (dict, default is None) – Optional keyword arguments for
mdciao.plots.plot_matrix
, listed below.
- Other Parameters:
pixelsize (int, default is 1) – The size in inches of the pixel representing the contact. Ultimately controls the size of the figure, because figsize = _np.array(mat.shape)*pixelsize
grid (boolean, default is False) – overlap a grid of dashed lines
cmap (str, default is ‘binary’) – What
matplotlib.cmap
to usecolorbar (boolean, default is False) – whether to use a colorbar
- Returns:
ax (
Axes
)fig (
matplotlib.pyplot.Figure
)
- plot_neighborhood_freqs(ctc_cutoff_Ang, switch_off_Ang=None, color='tab:blue', xmax=None, ax=None, shorten_AAs=False, label_fontsize_factor=1, sum_freqs=True, plot_atomtypes=False, sort_by=None)
Wrapper around
ContactGroup.plot_freqs_as_bars
for plotting neighborhoods#TODO perhaps get rid of the wrapper altogether. ATM it would break the API
- Parameters:
ctc_cutoff_Ang (float) – The cutoff to use
switch_off_Ang (float, default is None) – TODO
color (color-like (str or RGB triple) or list thereof, default is “tab:blue”) – The color for the bars. If string or RGB array, all bars will have this color. If list, it’s assumed in the order of the self.res_idx_pairs. It will get re-sorted according to
sort
, s.t. residues always have the same color not matter the orderxmax (int, default is None) – Default behaviour is to go to n_ctcs, use this parameter to homogenize different calls to this function over different contact groups, s.t. each subplot has equal xlimits
ax (
Axes
, default is None) – Axes to plot into, if None, one will be createdshorten_AAs (bool, default is False,) – Shorten residue names from “GLU30”->”E30”
label_fontsize_factor (float, default is 1) – Fontsize for the tilted labels and the legend, as fraction [0,1] of the default value in rcParams[“font.size”]
sum_freqs (bool, default is True) – Add the sum of frequencies of the represented (and only those) frequencies
plot_atomtypes (bool, default is False) – Add stripes to frequency bars to include the atom-types (backbone, sidechain, etc)
sort_by (str or None, default is None) – The frequencies are by default plotted in the order in which the
ContactPair
-objects are stored in theContactGroup.contact_pairs
. This order depends on the ctc_cutoff_Ang originally used to instantiate thisContactGroup
You can re-sort them for display purposes, leaving the original order untouched, via:
- Returns:
ax
- Return type:
- plot_timedep_ctcs(panelheight=3, plot_N_ctcs=True, pop_N_ctcs=False, skip_timedep=False, ctc_cutoff_Ang=None, sort_by_freq=False, **plot_timetrace_kwargs)
For each trajectory, plot the time-traces of the all the contacts (one per panel) and/or the timetrace of the overall number of contacts
In order for the number of contacts to be plotted, ctc_cutoff_Ang should be provided.
- Parameters:
panelheight (float, default is 3) – The height of the per-contact panels, in inches
plot_N_ctcs (bool, default is True) – Add an extra panel at the bottom of the figure containing the number of formed contacts for each frame for each trajecotry A valid cutoff has to be passed along in
plot_contact_kwargs
otherwise this has no effectpop_N_ctcs (bool, default is False) – Put the panel with the number of contacts in a separate figure A valid cutoff has to be passed along in
plot_contact_kwargs
otherwise this has no effectskip_timedep (bool, default is False) – Skip plotting the individual timetraces and plot only the time trace of overall formed contacts. This sets pop_N_ctcs to True internally
ctc_cutoff_Ang (float, default is None,) – The cutoff to use, in Angstrom
sort_by_freq (bool, default is False) – Sort by descending frequency. Default is to plot in the same order as
ContactGroup._contacts
, which will be in descending order of frequencies with the cutoff used originally to compute thisContactGroup
Only works if a ctc_cutoff_Ang is provided.plot_timetrace_kwargs (dict) – Optional parameters for
mdciao.contacts.ContactPair.plot_timetrace
, which are documented below:
- Other Parameters:
ax (None,
Axes
) – The axis where to plot the timetrace. Default is to plot on the current axis, and if there’s no current axes, a new one will be created. If a new one is created, it’ll have the default width and height, you have to change it afterwards or create it beforehand with your desired size.color_scheme (list, default is None) – Pass a list of colors, each one should be understandable by
matplotlib.colors.is_color_like
n_smooth_hw (int, default is 0) – Size, in frames, of half the window size of the smoothing window
dt (float, default is 1) – How many units of t_unit one frame represents
background (bool, or color-like, (str, hex, rgb), default is True) – When smoothing, the original curve can appear in the background in different colors
True: use a fainted version of color
False: don’t plot any background
color-like: use this color for the background,
can be: str, hex, rgba, anything
matplotlib.colors.is_color_like
understandsshorten_AAs (bool, default is False) – Whether to shorten the AA labels
t_unit (str, default is ‘ps’) – The time unit with which to label the x-axis
ylim_Ang (float or “auto”) – The limit in Angstrom of the y-axis
max_handles_per_row (int, default is 4) – How many rows the legend can have
- Returns:
list_of_figs – The wanted figure(s)
- Return type:
list
Note
The keywords plot_N_ctcs, pop_N_ctcs, and skip_timedep allow this method to both include or totally exclude the total number of contacts and/or the time-traces in the figure. This might change in the future, it was coded this way to avoid breaking the command_line tools API. Also note that some combinations will produce an empty return!
- plot_timedep_ctcs_matrix(ctc_cutoff_Ang, inches_per_contact=0.35, figsize=None, panelwidth=10, color='lightblue', shorten_AAs=True, dt=1, t_unit='ps', grid=True, show_freqs=True, anchor=None, bookends=True, defrag=None, ctc_control=None, sort_by='freq', lower_cutoff_val=0, n_smooth_hw=0) tuple
Per-trajectory time-traces of the formed contacts, shown as binary traces, i.e. formed or not formed.
Each trajectory gets displayed in its own panel.
Note
Contacts are shown in descending order of contact-frequency, as obtained using ctc_cutoff_Ang, over all dataset. Expect different orders when changing ctc_cutoff_Ang.
- Parameters:
ctc_cutoff_Ang (float) – The cutoff to use, in Angstrom
inches_per_contact (float, default is .5) – The height, in inches, that each contact will take up on the whole plot. Making this number too small to make the figure look flatter might squeeze contact-labels vertically, try instead using panaelwidth.
figsize (tuple, default is None,) – Default behavior is to set the size of the figure automatically as
height, width = self.n_trajs * self.n_ctcs * inches_per_contact, panelwidth
s.t. figure sizes are consistent across systems and number of contacts. However, you can override this behavior by setting the figsize yourself here.
panelwidth (float, default is 10) – The width of the figure, in inches
color (any color-like, default is “lightblue”) – The color assigned to the formed contacts
shorten_AAs (bool, default is True) – Whether to use short verions of residue names
dt (float, default is 1) – How many units of t_unit one frame represents
t_unit (str, default is “ps”) – The time unit with which to label the x-axis
grid (boolean, default is True) – Overlap a grid of faint dashed lines on x and y ticks
show_freqs (bool, default is True) – Use the right-handside y-axis to annotate each contact with its contact-frequency. When multiple trajectories are plotted, the label includes per-trajectory frequency and overall frequency.
anchor (str, default is None) – This string will be deleted from the contact labels, leaving only the partner-residue to identify the contact. The final anchor label will be that of the deleted keys (allows for keeping e.g. pre-existing consensus nomenclature). No consistency-checks are carried out, i.e. use at your own risk (plus it looks ugly, somehow).
bookends (bool, default is True) – Indicate the beginning and end of each trajectory with a faint dashed line, to differentiate non formed contacts from simply absent trajectory data. Only has effect if trajectories have different starting or ending timestamps.
defrag (bool, default is None) – Whether or not to include the fragment information in the contact labels
ctc_control (None, float or int, default None) – Control the number of contacts that gets plotted. Default is to show all regardless of their frequency value.
If integer, interpret directly as number of contacts to be shown, e.g. ctc_control = 5 means show the 5 most frequent contacts (regardless of how many other there might be).
If float must be between [0,1]. It is interpreted as fraction of the total number of contacts to keep over all dataset, i.e. ctc_control=.75 means show contacts until 75% of all aggregated frequency is shown. The aggregate is computed on the frequencies that have not been truncated by lower_cutoff_val.
If None show all contacts regardless of their frequency.
This paramater will be ignored if sort_by is different from “frequency”, as it is only meaningful if contacts are sorted in descending order of frequency.
The difference between None and 1.0 (100% of overall frequency) is that ctc_control = None will still show zero-frequency contacts, whereas ctc_control = 1.0 won’t, since 100% of overall frequency is achieved without the zero-frequency contacts.
sort_by (str, default is freq) – Default is to sort contacts by descending order of frequency. Alternatively, you can sort them by residue number by passing “residue” or “numeric” here
lower_cutoff_val (float, default is 0) – Hide contacts with frequencies lower than this value.
n_smooth_hw (int default is 0) – Half-window size for a smoothing the time-traces before computing the contact
- Returns:
fig (
Figure
) – The figure with the plotsplotted_freqs (dict) – A dictionary keyed with the plotted contact labels and valued with the plotted overall frequencies. Keys are sorted in the same order as plotted.
plotted_trajs (list) – The binary trajectories, as plotted, i.e. each item of this list is a np.ndarray of shape (len(plotted_freqs), n_frames_i), where i is the trajectory index. The order of the rows is the same as the order of the keys in plotted_freqs.
- plot_violins(sort_by=False, ctc_cutoff_Ang=None, truncate_at_mean=None, zero_freq=0.01, switch_off_Ang=None, ax=None, title_label=None, xmax=None, color='tab:blue', shorten_AAs=False, label_fontsize_factor=1, sum_freqs=True, defrag=None, stride=1)
Plot residue-residue distances as violin plots
violinplot
The default behaviour is to plot all residue pairs in the order in which the
ContactPair
-objects are stored in theContactGroup
. You can check this order in self.res_idxs_pairs. This order typically depends on the original ctc_cutoff_Ang used to instantiate thisContactGroup
, which might not carry the same meaning here.For more than 50 contacts or so, violin plots take some time to compute, because a Gaussian-kernel-density estimation is done for each residue pair.
Also, plots with many residue pairs simply might be difficult to read.
Hence, to control the number of shown contacts, you can control the you can use these parameters, sorted somewhat hierarchically
sort_by
ctc_cutoff_ang
truncate_at_mean
zero_freq
Please check their documentation below.
Finally, if the plots still take too long to compute/show for the desired number of violins, try reducing the amount of data by using stride > 1
- Parameters:
sort_by (iterable of ints, boolean, int, default is False) –
- Can be different things:
- iterable of ints
Strongest selection. Show only these residue pairs, in this order. Indices are intended as self.res_idxs_pairs indices. All other parameters are ignored.
- str “numeric” or “residue”
Sort by ascending residue number
- boolean False
Don’t sort, i.e. use the order in self.contact_pairs
- boolean True
Sort. There’s two options for sorting, depending on the value of ctc_cutoff_Ang (more below)
sort by distance means, ascending: ctc_cutoff_Ang is None
- sort by contact-frequencies, descending: ctc_cutoff_Ang is needed is a float
For contacts with zero frequency, fallback on ascending distance means This it means that you frequent contacts will be displayed first (=sorted by freq high to low). followed by infrequent ones sorted form (short to long)
- int n
Like True but up to n contacts at most. Other parameters like truncate_at_mean can reduce this number automatically
ctc_cutoff_Ang (opt, default is None) – If provided, contact-frequencies will be computed and shown in the contact-labels. Additionally, if
sort
is True or int, then the violins are sorted by contact-frequency in the plottruncate_at_mean (float, default is None) – Don’t show violins with mean values higher than this (in Angstrom). This remains effectless for contacts in which the mean is above the cutoff BUT the frequency is > zero_freq. This case is very common, since a contact can be formed at small distances but broken at very large ones, s.t. the mean (or median) values are meaningless.
zero_freq (float, default is 1e-2) – Frequencies below this number will be considered zero and not shown. For this parameter to have effect, you need a
ctc_cutoff_Ang
switch_off_Ang (float, default is None) – TODO
ax (None or
Axes
, default is None) – The axis to plot into. If None, one will be createdtitle_label (str, default is None) – If None, the method will default to self.name If self.name is also None, the method will fail
xmax (float, default is None) – X-axis will extend from -.5 to xmax+.5
color (iterable (list or dict), or str, default is None) –
list, the colors will be reordered so that the same residue pair always gets the same color, regardless of order in which they appear. This way you can track a violin across different sorting orders
str, it has to be a matplotlib color or a case-sensitive matplotlib colorname https://matplotlib.org/stable/tutorials/colors/colormaps.html
dict, keys are integers and values are colors This is the best way to work with
sort
is an iterable of ints, e.g. [ii,jj], because you can pass only those colors here as {ii:”red”,jj:”blue”}If None, the ‘tab10’ colormap (tableau) is chosen
shorten_AAs (bool, default is None) – Shorten residue labels from “GLU30”->”E30”
label_fontsize_factor (float, default is 1) – Labels will use the fontsize rcParams[“font.size”]*label_fontsize_factor
sum_freqs (bool, default is True) – Whether to sum per-contact frequencies and place the in the label as \(Sigma\) values
defrag (char, default is None) – Whether to leave out the fragment affiliation, e.g. “GLU30@3.50” w/ defrag=”@” appears as “GLU30” only
stride (int,default is 1) – Stride the data down by this much, in case the computation of the violins takes too long
- Returns:
ax (
Axes
)order (np.ndarray) –
- Indices of the plotted residue pairs,
in the order in which they were plotted.
Is the result from the combination of the above selection parameters
- relabel_consensus(new_labels=None)
Relabel any residue missing its consensus label to shortAA
Alternative (or additional) labels can be given as a dictionary.
- Parameters:
new_labels (dict) – keyed with shortAA-codes and valued with the new desired labels
Warning
For expert use only. The changes in consensus labels propagates down to the attribute consensus labels of the the low-level attribute
Residues.consensus_labels
of theResidues
objects underlying each of theContactPair`s in this :obj:`ContactGroup
- relative_frequency_formed_atom_pairs_overall_trajs(ctc_cutoff_Ang, switch_off_Ang=None, **kwargs) list
Relative frequencies interaction-type (by atom-type) for all contact-pairs in the ContactGroup
“Relative” means that they will sum up to 1 regardless of the contact’s frequency
>>> CG = mdciao.examples.ContactGroupL394() >>> CG.relative_frequency_formed_atom_pairs_overall_trajs(4) [{'SC-SC': 0.62, 'SC-BB': 0.21, 'BB-BB': 0.09, 'BB-SC': 0.08} {'BB-BB': 0.74, 'SC-SC': 0.26} {'SC-SC': 1.0} {'BB-SC': 0.59, 'SC-SC': 0.41} {'BB-SC': 0.73, 'SC-SC': 0.27}]
- Parameters:
ctc_cutoff_Ang (float) – Cutoff in Angstrom. The comparison operator is “<=”
switch_off_Ang (float, default is None) – TODO
kwargs (dict) – Optional parameters for
mdciao.ContactPair.relative_frequency_of_formed_atom_pairs_overall_trajs
, which are listed below.
- Other Parameters:
keep_resname (bool, default is False) – Keep the atom’s residue name in its descriptor. Only make sense if consolidate_by_atom_type is False
aggregate_by_atomtype (bool, default is True) – Aggregate the frequencies of the contact by tye atom types involved. Atom types are backbone, sidechain or other (BB,SC, X)
min_freq (float, default is .05) – Do not report relative frequencies below this cutoff, e.g. “BB-BB”:.9, “BB-SC”:0.03, “SC-SC”:0.03, “SC-BB”:0.03 gets reported as “BB-BB”:.9
- Returns:
refreq_dicts – Lists of dictionaries with the relative freqs, keyed by atom-type (atoms) involved in the contact The order is the same as in
self.ctc_labels
- Return type:
list
- repframes(scheme='mode', ctc_cutoff_Ang=None, return_traj=False, show_violins=False, n_frames=1, verbose=True)
Find representative frames for this
ContactGroup
A “representative frame” means, in this context, a frame that minimizes the average distance to the modes (or means) of the residue-residue distances contained in this object.
Please note that “representative” can have other meanings in other contexts. Here, it’s just a way to pick a frames/geometries that will most likely resemble most of what is also seen in the distributions, barplots, violinplots, and flareplots.
Please also note that minimizing averages has its own limitations and might not always yield the best result, However, it is the easiest and quickest to implement. Feel free to use any of Sklearn’s great regression tools under constraints to get a better “representative”.
- Parameters:
scheme (str, default is “mode”) – Two options: * “mode” : minimize average distance
to the most likely distance, i.e. to the mode, i.e. to the distance values at which the distributions (
plot_distance_distributions
orplot_violins
) peak. You can check the mode values inmodes
“mean” : minimize average distance to the mean values of the distances You can check the means in
means
“min” : minimize average distance to the minimum values of the distances You can check the means in
minima
“max” : minimize average distance to the maximum values of the distances You can check the means in
maxima
ctc_cutoff_Ang (float, default is None) – THIS IS EXPERIMENTAL If given, the contact frequencies will be used as weights when computing the average. In cases with many contacts, many of them broken, this might help
return_traj (bool, default is False) – If True, try to return also the
Trajectory
objects Will fail that is not possible because the original files aren’t accessible (or there weren’t any)show_violins (bool, default is False) – Superimpose the distance values as dots on top of a violin plot, created by using the
plot_violins
n_frames (int, default is 1) – The number of representative frames to return
verbose (bool, default is True) – Inform of the frames that are being selected
- Returns:
frames (list) – A list of n_frames tuples, each tuple containing the trajectory and frame index that minimize RMSDd.
RMSDd (np.ndarray) – A 1D array containing the root-mean-square-deviation (in Angstrom) over distances (not positions) of the returned frames to the computed reference as specified by the scheme. This mean is weighted by the contact frequencies in case a ctc_cutoff_Ang was given. Should always be in ascending order, i.e. the frames are sorted from closest to furthest to the reference.
values (np.ndarray) – A 2D array of shape(n_frames, n_ctcs) containing the distance values of the frames in Angstrom
trajs (
Trajectory
) – AnTrajectory
with n_frames frames. Only if `return_traj`=True
- property res_idxs_pairs: ndarray
Pairs of residue indices of the contacts in this object
- Returns:
res_idxs_pairs
- Return type:
_np.ndarray
- property residue_names_long: list
Pairs of long residue names of the ContactPairs
>>> CG = mdciao.examples.ContactGroupL394() >>> CG.residue_names_short [['ARG389', 'LEU394'], ['LEU394', 'LYS270'], ['LEU388', 'LEU394'], ['LEU394', 'LEU230'], ['ARG385', 'LEU394']]
- Returns:
residue_names_long
- Return type:
list
- property residue_names_short: list
Pairs of short residue names of the ContactPairs
>>> CG = mdciao.examples.ContactGroupL394() >>> CG.residue_names_short [['R389', 'L394'], ['L394', 'K270'], ['L388', 'L394'], ['L394', 'L230'], ['R385', 'L394']]
- Returns:
residue_names_short
- Return type:
list
- property residx2consensuslabel: dict
Dictionary mapping residue indices to consensus labels:
>>> CG = mdciao.examples.ContactGroupL394() >>> CG.residx2consensuslabel {348: 'G.H5.21', 353: 'G.H5.26', 972: '6.32', 347: 'G.H5.20', 957: '5.69', 344: 'G.H5.17'}
- Returns:
residx2consensuslabel
- Return type:
dict
- residx2ctcidx(idx)
Indices of the contacts and the position (0 or 1) in which the residue with residue
idx
appears>>> CG = examples.ContactGroupL394() >>> CG.res_idxs_pairs array([[348, 353], [353, 972], [347, 353], [353, 957], [344, 353]]) >>> CG.residx2ctcidx(347) array([[2, 0]])
- Parameters:
idx (int) – A residue index
- Returns:
ctc_idxs – The first index is the contact index, the second the pair index (0 or 1)
- Return type:
2D np.ndarray of shape (N,2)
- property residx2fragnamebest: dict
Dictionary mapping residue indices to best possible fragment names
“best” means consensus label > fragment name > fragment index
>>> CG = mdciao.examples.ContactGroupL394() >>> CG.residx2fragnamebest {348: 'G.H5.21', 353: 'G.H5.26', 972: '6.32', 347: 'G.H5.20', 957: '5.69', 344: 'G.H5.17'}
- Returns:
residx2fragnamebest
- Return type:
dict
- residx2resnamefragnamebest(fragsep='@', shorten_AAs=True) dict
Dictionary mapping residue indices to best possible residue+fragment label
“best” means consensus label > fragment name > fragment index
- Parameters:
fragsep (str, default is “@”) – The str or char to separate residue labels from fragment labels, “A30@frag1”
shorten_AAs (bool, default is True) – Whether to use short residue names
>>> CG = mdciao.examples.ContactGroupL394()
>>> CG.residx2resnamefragnamebest()
{344 (‘R385@G.H5.17’,) – 347: ‘L388@G.H5.20’, 348: ‘R389@G.H5.21’, 353: ‘L394@G.H5.26’, 957: ‘L230@5.69’, 972: ‘K270@6.32’}
- Returns:
residx2resnamefragnamebest
- Return type:
dict
- property residx2resnamelong: dict
Dictionary mapping residue indices to short residue names:
>>> CG = mdciao.examples.ContactGroupL394() >>> CG.residx2resnamelong {348: 'ARG389', 353: 'LEU394', 972: 'LYS270', 347: 'LEU388', 957: 'LEU230', 344: 'ARG385'}
- Returns:
residx2resnamelong
- Return type:
dict
- property residx2resnameshort
Dictionary mapping residue indices to short residue names:
>>> CG = mdciao.examples.ContactGroupL394() >>> CG.residx2resnameshort {348: 'R389', 353: 'L394', 972: 'K270', 347: 'L388', 957: 'L230', 344: 'R385'}
- Returns:
residx2resnameshort
- Return type:
dict
- retop(top, mapping, deepcopy=False)
Return a copy of this object with a different topology.
Uses the
mapping
to generate new residue-indices where necessary, using the rest of the attributes (time-traces, labels, colors, fragments…) as they wereWraps thinly around
mdciao.contacts.ContactPair.retop
Note
When re-topping interfaces, those residues of the ‘old’ interface_fragments which are not covered by the
mapping
will be missing in the ‘new’ interface_fragments. However, the new interface is guaranteed to have at least all the ‘new’ interface_residxs mapped. So, as long as the ‘old’ interface_residxs are covered by the mapping, this isn’t a problem (TODO except, perhaps, when plotting flareplots using the spare=”interface” option after re-topping)- Parameters:
top (
Topology
) – The new topologymapping (indexable (array, dict, list)) – A mapping of old residue indices to new residue indices. Usually, comes from aligning the old and the new topology using
mdciao.utils.sequence.maptops
.deepcopy (bool, default is False) – Use
copy.deepcopy
on the attributes when creating the newContactPair
.
- Returns:
CG
- Return type:
- save(filename)
Save this
ContactGroup
as a pickle- Parameters:
filename (str) – filename
- save_trajs(prepend_filename, ext, output_dir='.', t_unit='ps', verbose=False, ctc_cutoff_Ang=None, self_descriptor='mdciaoCG')
Save time-traces to disk.
FileNames will be created based on the property
self.trajlabels
, but using only the basenames and prepending with the stringprepend_filename
If there is an anchor residue (i.e. this
ContactGroup
is a neighborhood, the anchor will be included in the filename, otherwise the string “contact_group” will be used. You can control the output_directory usingoutput_dir
If a ctc_cutoff is given, the time-traces will be binarized (see
self.binarize_trajs
). Else, the distances themselves are stored.- Parameters:
prepend_filename (str) – Each filename will be prepended with this string
ext (str) – Extension, can be “xlsx” or anything
numpy.savetext
can handleoutput_dir (str, default is “.”) – The output directory
t_unit (str, default is “ps”) – Other units are “ns”, “mus”, and “ms”. The transformation happens internally
verbose (boolean, default is False) – Prints filenames
ctc_cutoff_Ang (float, default is None) – Use this cutoff and save bintrajs instead
self_descriptor (str, default is “mdciaoCG”) – Saved filenames will be tagged with this descriptor
- Return type:
None
- select_by_frames(frames) ContactPair
Return a copy this ContactGroup, but with a sub-selection of trajectories and frames. The returned ContactGroup has the same ContactPairs as the original.
- Parameters:
frames (int, dict, or iterable of pairs) – Control what frames of the trajectory data gets used in the returned ContactGroups. Several modes of input are possible.
integer n: select the first n frames of each trajectory. If n is negative, then select the last n frames of each trajectory. If a trajectory has less than n frames, all frames are selected.
dict: keyed with trajectory indices, valued with a list of trajectory frames. E.g. if frames = {2 : [101,100], 0: [10, 20]}, then the new ContactGroup has two trajectories which consist of old trajectories 2 and 0, with the frames 101,100 and 10,20, respectively. The output order corresponds the input order both in terms of keys and values of the input dictionary.
list of pairs of integers: individual frames of individual trajectories merged into a single ContactGroup, e.g.
>>> frames = [[i,j], >>> [k,l], >>> [m,n]]
- means the new ContactGroup has three frames
frame j of trajectory i
frame k of trajectory l
frame n of trajectory m
- Returns:
newCG – A new ContactGroup, equivalent to the original one but with only those trajectories and frames selected by frames
- Return type:
Note
Any trajectory filenames used to instantiate the original ContactGroup, which are stored in
ContactGroup.trajlabels
, are NOT passed onto the newCG returned by this method. This is because frame-indices of the time-traces contained in the newCG most likely do not correspond to the frame-indices of the those original filenames. However, the methods of newCG are not aware of this and things likeContactGroup.repframes
will return the wrong frames. Hence, the newCG always getsmdtraj.Trajectory
objects as traj input and accordingly has [“mdtraj.00”, “mdtraj.01”…] as trajlabels. The same principle applies to the order of trajectories, i.e. if you reorder trajectories by passing a dict to frames, the newCG is not aware of the fact that these trajectories had a previous order. newCG has them stored (and readily available) asTrajectory
objects and calls them [“mdtraj.00”, “mdtraj.01”…].
- select_by_residues(CSVexpression=None, residue_indices=None, residue_pairs=None, allow_multiple_matches=False, merge=True, keep_interface=True, n_residues=1)
Return a copy this ContactGroup, but with a sub-selection of ContactGroup.contact_pairs based on residues. The returned ContactGroup has the same trajectories and frames as the original.
The filtering of ContactPairs is done using CSVexpression, residue_indices, or residue_pairs so that: * one residue match per ContactPair is enough, or * both residues of the ContactPair need to match for the ContactPair to be selected for the new ContactGroup. See n_residues for more info.
CSVexpression, residue_indices, and residue_pairs are mutually exclusive, only one of them can be not None.
- Parameters:
CSVexpression (str or None, default is None) – CSV expression like “GLU30,K*,3.50” to select the residue-pairs of
self
for the new ContactGroup. Seemdciao.utils.residue_and_atom.find_AA
for the syntax of the expression.residue_indices (list, default is None,) – Input your selection via zero-indexed residue indices of self.top.
residue_pairs (list, default is None) – Input your selection via pairs of zero-indexed residue indices of self.top. Sets n_residues automatically to two.
allow_multiple_matches (bool, default is False) – Fail if the substrings of the
CSVexpression
return more than one residue. Protects from over-grabbing residues. Only has effect if CSVexpression is used, since residue_indices matches are uniquemerge (bool, default is True) – Merge the selected residue-pairs into one single ContactGroup. If False every sub-string of
CSVexpression
returns its own ContactGroupkeep_interface (bool, default is True) – If self.is_interface and merge are both True, then returned ContactGroup will also be an interfaces itself
n_residues (int, default is 1) – Number of residues-matches that a ContactPair has to have be selected for the new ContactGroup. By default, one residue alone is enough. Using n_residues = 2 selects only ContactPairs where the both residues match against CSVexpression, residue_indices, or residue_pairs. This is useful when trying to keep interface properties. Any n_residues value different from [1,2] will raise an error.
- Returns:
newCG – If dict, it’s keyed with substrings of CSVexpression and valued with ContactGroups
- Return type:
ContactGroup or dict
The index of the anchor residue, i.e. the residue at the center of this neighborhood
Only populated if self.is_neighborhood is True, else returns None
- Returns:
idx
- Return type:
int
- property stacked_time_traces
All ContactPair time_traces stacked into an 2D np.array
- Returns:
data – The array is of shape(self.n_frames_total, self.n_ctcs)
- Return type:
np.ndarray
- property time_arrays: list
The time-arrays of each trajectory contained in this ContactGroup
- Returns:
time_arrays – The units of these arrays will be whatever was given to the ContactPairs used to instantiate this ContactGroup
- Return type:
list
- property time_max: float
Maximum time-value of the ContactGroup
- Returns:
time_max – Its units will be whatever was given to the ContactPairs used to instantiate this ContactGroup. The most frequent case are “ps”, since that’s how time arrays are stored in xtc files
- Return type:
float
- property time_min: float
Minimum time-value of the ContactGroup
- Returns:
time_min – Its units will be whatever was given to the ContactPairs used to instantiated this ContactGroup. The most frequent case are “ps”, since that’s how time arrays are stored in xtc files
- Return type:
float
- to_ContactGroups_per_traj() dict
Break this ContactGroup (potentially containing many trajectories) into individual, per-trajectory ContactGroups
- Returns:
CGs – The dictionary is keyed with each of the original
self.trajlabels
, and valued with ContactGroups that only contain information regarding that single trajectory.- Return type:
dict
Note
The attribute
mdciao.contacts.ContactGroup.trajlabels
of the returned, n-th CG will necessarily only contain one trajectory label. In case the original labels were strings containing pathnames, that name will coincide with he n-th original trajlabel. On the contrary, in case it contained a placeholder name created on-the-fly (e.g. ‘mdtraj.01’) because no pathnames were originally known, but rathermdtraj.Trajectory
objects were passed as trajs, that placeholder-name gets re-set to mdtraj.00 since each returned CG only “knows” one traj and it’s necessarily the first one.
- property top
The topology used to instantiate the ContactPairs in this ContactGroup
>>> CG = mdciao.examples.ContactGroupL394() >>> CG.top <mdtraj.Topology with 1 chains, 1044 residues, 8384 atoms, 8502 bonds at 0x7efdae47e990>
Returns:
top : :obj:~mdtraj.Trajectory or None
- property topology
The topology used to instantiate the ContactPairs in this ContactGroup
>>> CG = mdciao.examples.ContactGroupL394() >>> CG.top <mdtraj.Topology with 1 chains, 1044 residues, 8384 atoms, 8502 bonds at 0x7efdae47e990>
Returns:
topology : :obj:~mdtraj.Trajectory or None
- property trajlabels: list
List of trajectory labels shared by all
ContactGroup.contact_pairs
.If
Trajectory
objects were passed originally to the underlyingContactGroup.contact_pairs
, then [“mdtraj.00”, “mdtraj.01”,…] descriptors will be used. If filenames were passed, then the trajlabels are the filenames (basename, no files) without the extension. If no labels and no trajectories were passed , then labels like [“traj 0”, “traj 1”,…] are used.>>> CG = mdciao.examples.ContactGroupL394() >>> CG.trajlabels ['gs-b2ar.noH.stride.5']
- Returns:
trajlabels
- Return type:
list