mdciao.plots.compare_violins
- mdciao.plots.compare_violins(groups, colors=None, ctc_cutoff_Ang=None, fontsize=16, mutations_dict={}, legend_rows=4, AA_format='short', defrag='@', anchor=None, ymax=None, key_separator='-', sort_by='mean', figsize=None, panelheight_inches=5, inch_per_contacts=1, zero_freq=0.01, remove_identities=False, identity_cutoff=1, representatives=None)
Plot all distance-distributions of several
ContactGroup
s together usingviolinplot
sContacts across different
groups
are grouped together by matching their contact labels, since the residue indices might differ acrossgroups
. To achieve this:“K30-D40” is considered equivalent to “D40-D30”, use key_separator to change this.
“K30-D40” is considered equivalent to “K30-E40” if a
mutations_dict={"E40":"D40"}
is passed“K30@3.50-D40” is considered equivalent to “K30-D40” if you defragment your labels using
defrag="@"
- Parameters:
groups (dictionary or list of
ContactGroup
-objects) – The keys are the system/setup descriptors, e.g. “WT”, “MUT” etc. If list, keys will be generated on the fly “mdcCG 0, mdcCG 1…”colors (iterable (list or dict), or str, default is None) –
If list, the colors will be assigned in the same order of
groups
.If dict, has to have the same keys as
groups
.If str, it has to be a case-sensitive colormap-name of matplotlib: https://matplotlib.org/stable/tutorials/colors/colormaps.html
If None, the ‘tab10’ colormap (tableau) is chosen if 10 or less colors are needed, and ‘tab20’ if more than 10 are needed.
ctc_cutoff_Ang (float, default is None) – If provided, draw a horizontal line across the panel at this distance value. It will also be used by representatives to be passed onto
mdciao.contacts.ContactGroup.repframes
, see below for more info.fontsize (int, default is 16) – Will be used in
rcParams
[“font.size”]panelheight_inches (int, default is 5) – The height of the panel, in inches. Determines the figure size if
figsize
is None, else has no effectinch_per_contacts (int, default is 1) – How many inches each contact-pair is given in the panel. Determines the figure size if
figsize
is None, else has no effectfigsize (None or iterable of len 2, default is None) – Figure size (x,y), in inches. If None, one will be created using
panelheight_inches
andinch_per_contacts
. If you are transposing the figure usingvertical_plot
, you do not have to invert (y,x) this parameter here, it is done automatically.mutations_dict (dictionary, default is {}) – A mutation dictionary that contains allows to plot together residues that would otherwise be identified as different contacts. If there were two mutations, e.g A30K and D35A the mutation dictionary will be {“A30”:”K30”, “D35”:”A35”}. You can also use this parameter for correcting indexing offsets, e.g {“GDP395”:”GDP”, “GDP396”:”GDP”}
legend_rows (int, default is 4) – The maximum number of rows per column of the legend. If you have 10 systems, :obj:`legend_rows`=5 means you’ll get two columns, =2 means you’ll get five.
AA_format (str, default is “short”) – see
frequency_dict
for more infodefrag (str, default is “@”) – see
unify_freq_dicts
for more infoanchor (str, default is None) – When str, e.g. “L394”, that residue is eliminated from the contact-labels. It is also checked that all
ContactGroup
-objects are indeed neighborhoods sharing this anchor, i.e., some sanity checks are carried outymax (float, default is None) – Maximum value of the y-axis, default is to set it automatically
key_separator (str, default is “-”) – How each contact label separates the pair of residues, “ALA50-GLU30”. If you set this to None, it means the label won’t be separated before matching and “ALA50-GLU30” will be different from “GLU30-ALA50”.
sort_by (str or list, default is ‘mean’) – By default, the violins are sorted by ascending order of mean distance, i.e. from most “formed” on the left of the plot to least “formed” on the right of the plot. However, for each residue pair, this mean is an average over the distance in all the different groups, so some heterogeneity is expected. Alternatively, you can sort using the contact labels, regardless of the distance values. Note that for this, string comparisons between contact-labels will take place. and that contact-labels are altered by key_separator to unify across different groups Try setting key_separator to None if you see unexpected behavior, although though this might have other side effects, (see
unify_freq_dicts
) sort_by can be a:- str‘residue’ or ‘numeric’
Sort by ascending residue sequence index (resSeq), which will be inferred from each contact label, e.g. 30 for “GLU30@3.50”. See
gen_ctc_labels
for more info on how they are generated. Internally, the order is generated vialexsort_ctc_labels
. If you want to reverse or alter this ascending default order, we recommend usinglexsort_ctc_labels
before callingcompare_violins
and use its output (labels) as a list argument for sort_by. Also note that residue indices as contained inres_idx_pairs
- str‘keep’
Sort using the same order of the labels as in the first contact group
- str‘consensus’
Sort following consensus nomenclature (GPCR, CGN or KLIFS)
- lista list of contact labels,
eg. [“GLU30-ALA30”, “ARG131@3.50-TYR20”]. Only these residue pairs (in this order) will be shown, regardless of what other pairs are contained in the groups. It assumes the user knows what contacts are present and can come up with a meaningful list. Not all labels need to be in all groups nor do all groups have to contain all labels, but at least one label needs to match, otherwise the method will fail
zero_freq (float, default is 1e-2) – Frequencies below this number will be considered zero and not shown it they are zero for the same residue pair across all
groups
For this parameter to have effect, you need actc_cutoff_Ang
remove_identities (bool, default is False) – If True, the contacts where freq[sys][ctc] >=
identity_cutoff
across all systems will not be plotted nor considered in the sum over contacts. Only has an effect if ctc_cutoff_Ang is not None.identity_cutoff (float, default is 1) – If
remove_identities
, use this value to define what is considered an identity, s.t. contacts with values e.g. .95 can also be removed. Only has an effect if ctc_cutoff_Ang is not None.representatives (bool, int, dict, default is None) – Include information about representative values in the plot. This can be done in several ways. Easiest is to let this method call
mdciao.contacts.ContactGroup.repframes
internally. This will locate representative frames, extract their residue-residue distance values and plot them as small dots on top of the violins. When possible, also the geometries corresponding to these frames will be returned. Alternatively, the user can directly input a dictionary ofTrajectory
objects (representative or not) for which the residue-residue distance values will be computed and plotted, or even more direct, input a number of values (representative or not) to be plotted. This last type of input (dictionary withTrajectory
objects or arrays of values) can be 1) mixed (some groups get values, some trajectories) and 2) incomplete (groups w/o entry in representatives simply won’t get “dots” shown).Check the docs of
mdciao.contacts.ContactGroup.repframes
to find out what is meant with “representative”.This is what each type of input does:
- boolean True:
Calls
mdciao.ContactGroup.repframes
with the method’s default parameters.
- int > 0:
Calls
mdciao.ContactGroup.repframes
with the parameter n_frames set to this integer. This parameter controls how many representatives are extracted and subsequently plotted.
- dict of parameters:
A dictionary with explict values for the optional parameters of
mdciao.contacts.ContactGroup.repframes
, usually n_frames (an int) and scheme, (“mean” or “mode”), depending on what you mean with “representative”. Check the method’s documentation for more info. The value passed as ctc_cutoff_Ang will also be passed.
- dict of
Trajectory
objects: Has to have the same keys as groups. No checks are done whether these objects match the actual molecular topologies of groups, so beware of potential mismatches here. Typically, these frames come from having used
mdciao.contacts.ContactGroup.repframes
with `return_traj`=True
- dict of
- dict containing np.ndarrays of shape (M, N):
M is the number of values and N is the number of contacts. M can have different values for each of the groups and N needs match n_ctcs of each group and be in the same order as of the group.res_idxs_pairs. Rearrangements due to sort_by will sort this array automatically, it just has to be in the order of residxs_pairs initially (no other checks are done).
- Returns: