mdciao.plots.compare_violins¶
-
mdciao.plots.
compare_violins
(groups, colors=None, ctc_cutoff_Ang=None, fontsize=16, mutations_dict={}, legend_rows=4, AA_format='short', defrag='@', anchor=None, ymax=None, key_separator='-', sort_by='mean', figsize=None, panelheight_inches=5, inch_per_contacts=1, zero_freq=0.01, remove_identities=False, identity_cutoff=1, representatives=None)¶ Plot all distance-distributions of several
ContactGroup
s together usingviolinplot
sContacts across different
groups
are grouped together by matching their contact labels, since the residue indices might differ acrossgroups
. To achieve this:“K30-D40” is considered equivalent to “D40-D30”, use
key_separator
to change this.“K30-D40” is considered equivalent to “K30-E40” if a
mutations_dict={"E40":"D40"}
is passed“K30@3.50-D40” is considered equivalent to “K30-D40” if you defragment your labels using
defrag="@"
- Parameters
groups (dictionary or list of
ContactGroup
-objects) – The keys are the system/setup descriptors, e.g. “WT”, “MUT” etc. If list, keys will be generated on the fly “mdcCG 0, mdcCG 1…”colors (iterable (list or dict), or str, default is None) –
If list, the colors will be assigned in the same order of
groups
.If dict, has to have the same keys as
groups
.If str, it has to be a case-sensitive colormap-name of matplotlib: https://matplotlib.org/stable/tutorials/colors/colormaps.html
If None, the ‘tab10’ colormap (tableau) is chosen
TODO: I could set the default to “tab10”, but then it’d be hard coded in a lot places
ctc_cutoff_Ang (float, default is None) – If provided, draw a horizontal line across the panel at this distance value.
fontsize (int, default is 16) – Will be used in
rcParams
[“font.size”]panelheight_inches (int, default is 5) – The height of the panel, in inches. Determines the figure size if
figsize
is None, else has no effectinch_per_contacts (int, default is 1) – How many inches each contact-pair is given in the panel. Determines the figure size if
figsize
is None, else has no effectfigsize (None or iterable of len 2, default is None) – Figure size (x,y), in inches. If None, one will be created using
panelheight_inches
andinch_per_contacts
. If you are transposing the figure usingvertical_plot
, you do not have to invert (y,x) this parameter here, it is done automatically.mutations_dict (dictionary, default is {}) – A mutation dictionary that contains allows to plot together residues that would otherwise be identified as different contacts. If there were two mutations, e.g A30K and D35A the mutation dictionary will be {“A30”:”K30”, “D35”:”A35”}. You can also use this parameter for correcting indexing offsets, e.g {“GDP395”:”GDP”, “GDP396”:”GDP”}
legend_rows (int, default is 4) – The maximum number of rows per column of the legend. If you have 10 systems, :obj:`legend_rows`=5 means you’ll get two columns, =2 means you’ll get five.
AA_format (str, default is "short") – see
frequency_dict
for more infodefrag (str, default is "@") – see
unify_freq_dicts
for more infoanchor (str, default is None) – When str, e.g. “L394”, that residue is eliminated from the contact-labels. It is also checked that all
ContactGroup
-objects are indeed neighborhoods sharing this anchor, i.e., some sanity checks are carried outymax (float, default is None) – Maximum value of the y-axis, default is to set it automatically
key_separator (str, default is "-") – How each contact label separates the pair of residues, “ALA50-GLU30”. If you set this to None, it means the label won’t be separated before matching and “ALA50-GLU30” will be different from “GLU30-ALA50”.
sort_by (str or list, default is 'mean') –
By default, the violins are sorted by ascending order of mean distance, i.e. from most “formed” on the left of the plot to least “formed” on the right of the plot. However, for each residue pair, this mean is an average over the distance in all the different
groups
, so some heterogeneity is expected. Alternatively, you can sort using the contact labels, regardless of the distance values. Note that for this, string comparisons between contact-labels will take place. and that contact-labels are altered bykey_separator
to unify across differentgroups
Try settingkey_separator
to None if you see unexpected behavior, although though this might have other side effects, (see obj:~`mdciao.utils.str_and_dict.unify_freq_dicts`)sort_by
can be a:str : ‘residue’ Sort by ascending residue sequence index (resSeq), which will be inferred from each contact label, e.g. 30 for “GLU30@3.50”. See
gen_ctc_labels
for more info on how they are generated. Internally, the order is generated vialexsort_ctc_labels
. If you want to reverse or alter this ascending default order, we recommend usinglexsort_ctc_labels
before callingcompare_violins
and use its output (sorted_ctc_labels) as a list argument forsort_by
. Also note that residue indices as contained inres_idx_pairs
list : a list of contact labels, eg. [“GLU30-ALA30”, “ARG131@3.50-TYR20”]. Only these residue pairs (in this order) will be shown, regardless of what other pairs are contained in the
groups
. It assumes the user knows what contacts are present and can come up with a meaningful list. Not all labels need to be in allgroups
nor do allgroups
have to contain all labels, but at least one label needs to match, otherwise the method will fail
zero_freq (float, default is 1e-2) – Frequencies below this number will be considered zero and not shown it they are zero for the same residue pair across all
groups
For this parameter to have effect, you need actc_cutoff_Ang
remove_identities (bool, default is False) – If True, the contacts where freq[sys][ctc] >=
identity_cutoff
across all systems will not be plotted nor considered in the sum over contactsidentity_cutoff (float, default is 1) – If
remove_identities
, use this value to define what is considered an identity, s.t. contacts with values e.g. .95 can also be removedrepresentatives (anything (bool, int, dict, list) default is None) –
Plot, with a small dot on top of the violins, the values of the residue-residue distances of representative geometries. The representative geometries can be parsed directly as a dict of
Trajectory
objects, or extracted on-the-fly by calling themdciao.contacts.ContactGroup.repframes
method of each of the groups. Check the docs ofmdciao.contacts.ContactGroup.repframes
to find out what is meant with “representative”. This is what each type of input does:boolean True: Calls
mdciao.ContactGroup.repframes
with the method’s default parameters and plots the resultint > 0: Calls
mdciao.ContactGroup.repframes
with the parameter n_frames set to this integer. This parameter controls how many representatives are extracted and subsequently plotted.dict of parameters: A dictionary with explict values for the optional parameters of
mdciao.contacts.ContactGroup.repframes
, usually n_frames (an int) and scheme, (“mean” or “mode”), depending what you mean with “representative”. Check the method’s documentation for more info.dict of
Trajectory
objects: Has to have the same keys as groups. No checks are done whether these objects match the actual molecular topologies of groups, so beware of potential mismatches here. Typically, these frames come from having usedmdciao.contacts.ContactGroup.repframes
with `return_traj`=True.dict of dicts containing values #TODO not implemented yet
- Returns