mdciao.plots.compare_violins

mdciao.plots.compare_violins(groups, colors=None, ctc_cutoff_Ang=None, fontsize=16, mutations_dict={}, legend_rows=4, AA_format='short', defrag='@', anchor=None, ymax=None, key_separator='-', sort_by='mean', figsize=None, panelheight_inches=5, inch_per_contacts=1, zero_freq=0.01, remove_identities=False, identity_cutoff=1, representatives=None)

Plot all distance-distributions of several ContactGroup s together using violinplot s

Contacts across different groups are grouped together by matching their contact labels, since the residue indices might differ across groups. To achieve this:

  • “K30-D40” is considered equivalent to “D40-D30”, use key_separator to change this.

  • “K30-D40” is considered equivalent to “K30-E40” if a mutations_dict={"E40":"D40"} is passed

  • K30@3.50-D40” is considered equivalent to “K30-D40” if you defragment your labels using defrag="@"

Parameters
  • groups (dictionary or list of ContactGroup-objects) – The keys are the system/setup descriptors, e.g. “WT”, “MUT” etc. If list, keys will be generated on the fly “mdcCG 0, mdcCG 1…”

  • colors (iterable (list or dict), or str, default is None) –

    TODO: I could set the default to “tab10”, but then it’d be hard coded in a lot places

  • ctc_cutoff_Ang (float, default is None) – If provided, draw a horizontal line across the panel at this distance value.

  • fontsize (int, default is 16) – Will be used in rcParams [“font.size”]

  • panelheight_inches (int, default is 5) – The height of the panel, in inches. Determines the figure size if figsize is None, else has no effect

  • inch_per_contacts (int, default is 1) – How many inches each contact-pair is given in the panel. Determines the figure size if figsize is None, else has no effect

  • figsize (None or iterable of len 2, default is None) – Figure size (x,y), in inches. If None, one will be created using panelheight_inches and inch_per_contacts. If you are transposing the figure using vertical_plot, you do not have to invert (y,x) this parameter here, it is done automatically.

  • mutations_dict (dictionary, default is {}) – A mutation dictionary that contains allows to plot together residues that would otherwise be identified as different contacts. If there were two mutations, e.g A30K and D35A the mutation dictionary will be {“A30”:”K30”, “D35”:”A35”}. You can also use this parameter for correcting indexing offsets, e.g {“GDP395”:”GDP”, “GDP396”:”GDP”}

  • legend_rows (int, default is 4) – The maximum number of rows per column of the legend. If you have 10 systems, :obj:`legend_rows`=5 means you’ll get two columns, =2 means you’ll get five.

  • AA_format (str, default is "short") – see frequency_dict for more info

  • defrag (str, default is "@") – see unify_freq_dicts for more info

  • anchor (str, default is None) – When str, e.g. “L394”, that residue is eliminated from the contact-labels. It is also checked that all ContactGroup-objects are indeed neighborhoods sharing this anchor, i.e., some sanity checks are carried out

  • ymax (float, default is None) – Maximum value of the y-axis, default is to set it automatically

  • key_separator (str, default is "-") – How each contact label separates the pair of residues, “ALA50-GLU30”. If you set this to None, it means the label won’t be separated before matching and “ALA50-GLU30” will be different from “GLU30-ALA50”.

  • sort_by (str or list, default is 'mean') –

    By default, the violins are sorted by ascending order of mean distance, i.e. from most “formed” on the left of the plot to least “formed” on the right of the plot. However, for each residue pair, this mean is an average over the distance in all the different groups, so some heterogeneity is expected. Alternatively, you can sort using the contact labels, regardless of the distance values. Note that for this, string comparisons between contact-labels will take place. and that contact-labels are altered by key_separator to unify across different groups Try setting key_separator to None if you see unexpected behavior, although though this might have other side effects, (see obj:~`mdciao.utils.str_and_dict.unify_freq_dicts`) sort_by can be a:

    • str : ‘residue’ Sort by ascending residue sequence index (resSeq), which will be inferred from each contact label, e.g. 30 for “GLU30@3.50”. See gen_ctc_labels for more info on how they are generated. Internally, the order is generated via lexsort_ctc_labels. If you want to reverse or alter this ascending default order, we recommend using lexsort_ctc_labels before calling compare_violins and use its output (sorted_ctc_labels) as a list argument for sort_by. Also note that residue indices as contained in res_idx_pairs

    • list : a list of contact labels, eg. [“GLU30-ALA30”, “ARG131@3.50-TYR20”]. Only these residue pairs (in this order) will be shown, regardless of what other pairs are contained in the groups. It assumes the user knows what contacts are present and can come up with a meaningful list. Not all labels need to be in all groups nor do all groups have to contain all labels, but at least one label needs to match, otherwise the method will fail

  • zero_freq (float, default is 1e-2) – Frequencies below this number will be considered zero and not shown it they are zero for the same residue pair across all groups For this parameter to have effect, you need a ctc_cutoff_Ang

  • remove_identities (bool, default is False) – If True, the contacts where freq[sys][ctc] >= identity_cutoff across all systems will not be plotted nor considered in the sum over contacts

  • identity_cutoff (float, default is 1) – If remove_identities, use this value to define what is considered an identity, s.t. contacts with values e.g. .95 can also be removed

  • representatives (anything (bool, int, dict, list) default is None) –

    Plot, with a small dot on top of the violins, the values of the residue-residue distances of representative geometries. The representative geometries can be parsed directly as a dict of Trajectory objects, or extracted on-the-fly by calling the mdciao.contacts.ContactGroup.repframes method of each of the groups. Check the docs of mdciao.contacts.ContactGroup.repframes to find out what is meant with “representative”. This is what each type of input does:

    • boolean True: Calls mdciao.ContactGroup.repframes with the method’s default parameters and plots the result

    • int > 0: Calls mdciao.ContactGroup.repframes with the parameter n_frames set to this integer. This parameter controls how many representatives are extracted and subsequently plotted.

    • dict of parameters: A dictionary with explict values for the optional parameters of mdciao.contacts.ContactGroup.repframes, usually n_frames (an int) and scheme, (“mean” or “mode”), depending what you mean with “representative”. Check the method’s documentation for more info.

    • dict of Trajectory objects: Has to have the same keys as groups. No checks are done whether these objects match the actual molecular topologies of groups, so beware of potential mismatches here. Typically, these frames come from having used mdciao.contacts.ContactGroup.repframes with `return_traj`=True.

    • dict of dicts containing values #TODO not implemented yet

Returns

  • fig (Figure)

  • ax (Axes)

  • labels (list) – The list of plotted labels, in the order they are plotted