mdciao.plots.compare_violins

mdciao.plots.compare_violins(groups, colors=None, ctc_cutoff_Ang=None, fontsize=16, mutations_dict={}, legend_rows=4, AA_format='short', defrag='@', anchor=None, ymax=None, key_separator='-', sort_by='mean', figsize=None, panelheight_inches=5, inch_per_contacts=1, zero_freq=0.01, remove_identities=False, identity_cutoff=1, representatives=None)

Plot all distance-distributions of several ContactGroup s together using violinplot s

Contacts across different groups are grouped together by matching their contact labels, since the residue indices might differ across groups. To achieve this:

  • “K30-D40” is considered equivalent to “D40-D30”, use key_separator to change this.

  • “K30-D40” is considered equivalent to “K30-E40” if a mutations_dict={"E40":"D40"} is passed

  • K30@3.50-D40” is considered equivalent to “K30-D40” if you defragment your labels using defrag="@"

Parameters:
  • groups (dictionary or list of ContactGroup-objects) – The keys are the system/setup descriptors, e.g. “WT”, “MUT” etc. If list, keys will be generated on the fly “mdcCG 0, mdcCG 1…”

  • colors (iterable (list or dict), or str, default is None) –

    • If list, the colors will be assigned in the same order of groups.

    • If dict, has to have the same keys as groups.

    • If str, it has to be a case-sensitive colormap-name of matplotlib: https://matplotlib.org/stable/tutorials/colors/colormaps.html

    • If None, the ‘tab10’ colormap (tableau) is chosen if 10 or less colors are needed, and ‘tab20’ if more than 10 are needed.

  • ctc_cutoff_Ang (float, default is None) – If provided, draw a horizontal line across the panel at this distance value. It will also be used by representatives to be passed onto mdciao.contacts.ContactGroup.repframes, see below for more info.

  • fontsize (int, default is 16) – Will be used in rcParams [“font.size”]

  • panelheight_inches (int, default is 5) – The height of the panel, in inches. Determines the figure size if figsize is None, else has no effect

  • inch_per_contacts (int, default is 1) – How many inches each contact-pair is given in the panel. Determines the figure size if figsize is None, else has no effect

  • figsize (None or iterable of len 2, default is None) – Figure size (x,y), in inches. If None, one will be created using panelheight_inches and inch_per_contacts. If you are transposing the figure using vertical_plot, you do not have to invert (y,x) this parameter here, it is done automatically.

  • mutations_dict (dictionary, default is {}) – A mutation dictionary that contains allows to plot together residues that would otherwise be identified as different contacts. If there were two mutations, e.g A30K and D35A the mutation dictionary will be {“A30”:”K30”, “D35”:”A35”}. You can also use this parameter for correcting indexing offsets, e.g {“GDP395”:”GDP”, “GDP396”:”GDP”}

  • legend_rows (int, default is 4) – The maximum number of rows per column of the legend. If you have 10 systems, :obj:`legend_rows`=5 means you’ll get two columns, =2 means you’ll get five.

  • AA_format (str, default is “short”) – see frequency_dict for more info

  • defrag (str, default is “@”) – see unify_freq_dicts for more info

  • anchor (str, default is None) – When str, e.g. “L394”, that residue is eliminated from the contact-labels. It is also checked that all ContactGroup-objects are indeed neighborhoods sharing this anchor, i.e., some sanity checks are carried out

  • ymax (float, default is None) – Maximum value of the y-axis, default is to set it automatically

  • key_separator (str, default is “-”) – How each contact label separates the pair of residues, “ALA50-GLU30”. If you set this to None, it means the label won’t be separated before matching and “ALA50-GLU30” will be different from “GLU30-ALA50”.

  • sort_by (str or list, default is ‘mean’) – By default, the violins are sorted by ascending order of mean distance, i.e. from most “formed” on the left of the plot to least “formed” on the right of the plot. However, for each residue pair, this mean is an average over the distance in all the different groups, so some heterogeneity is expected. Alternatively, you can sort using the contact labels, regardless of the distance values. Note that for this, string comparisons between contact-labels will take place. and that contact-labels are altered by key_separator to unify across different groups Try setting key_separator to None if you see unexpected behavior, although though this might have other side effects, (see unify_freq_dicts) sort_by can be a:

    • str‘residue’ or ‘numeric’

      Sort by ascending residue sequence index (resSeq), which will be inferred from each contact label, e.g. 30 for “GLU30@3.50”. See gen_ctc_labels for more info on how they are generated. Internally, the order is generated via lexsort_ctc_labels. If you want to reverse or alter this ascending default order, we recommend using lexsort_ctc_labels before calling compare_violins and use its output (labels) as a list argument for sort_by. Also note that residue indices as contained in res_idx_pairs

    • str‘keep’

      Sort using the same order of the labels as in the first contact group

    • str‘consensus’

      Sort following consensus nomenclature (GPCR, CGN or KLIFS)

    • lista list of contact labels,

      eg. [“GLU30-ALA30”, “ARG131@3.50-TYR20”]. Only these residue pairs (in this order) will be shown, regardless of what other pairs are contained in the groups. It assumes the user knows what contacts are present and can come up with a meaningful list. Not all labels need to be in all groups nor do all groups have to contain all labels, but at least one label needs to match, otherwise the method will fail

  • zero_freq (float, default is 1e-2) – Frequencies below this number will be considered zero and not shown it they are zero for the same residue pair across all groups For this parameter to have effect, you need a ctc_cutoff_Ang

  • remove_identities (bool, default is False) – If True, the contacts where freq[sys][ctc] >= identity_cutoff across all systems will not be plotted nor considered in the sum over contacts. Only has an effect if ctc_cutoff_Ang is not None.

  • identity_cutoff (float, default is 1) – If remove_identities, use this value to define what is considered an identity, s.t. contacts with values e.g. .95 can also be removed. Only has an effect if ctc_cutoff_Ang is not None.

  • representatives (bool, int, dict, default is None) – Include information about representative values in the plot. This can be done in several ways. Easiest is to let this method call mdciao.contacts.ContactGroup.repframes internally. This will locate representative frames, extract their residue-residue distance values and plot them as small dots on top of the violins. When possible, also the geometries corresponding to these frames will be returned. Alternatively, the user can directly input a dictionary of Trajectory objects (representative or not) for which the residue-residue distance values will be computed and plotted, or even more direct, input a number of values (representative or not) to be plotted. This last type of input (dictionary with Trajectory objects or arrays of values) can be 1) mixed (some groups get values, some trajectories) and 2) incomplete (groups w/o entry in representatives simply won’t get “dots” shown).

    Check the docs of mdciao.contacts.ContactGroup.repframes to find out what is meant with “representative”.

    This is what each type of input does:

    • boolean True:

      Calls mdciao.ContactGroup.repframes with the method’s default parameters.

    • int > 0:

      Calls mdciao.ContactGroup.repframes with the parameter n_frames set to this integer. This parameter controls how many representatives are extracted and subsequently plotted.

    • dict of parameters:

      A dictionary with explict values for the optional parameters of mdciao.contacts.ContactGroup.repframes, usually n_frames (an int) and scheme, (“mean” or “mode”), depending on what you mean with “representative”. Check the method’s documentation for more info. The value passed as ctc_cutoff_Ang will also be passed.

    • dict of Trajectory objects:

      Has to have the same keys as groups. No checks are done whether these objects match the actual molecular topologies of groups, so beware of potential mismatches here. Typically, these frames come from having used mdciao.contacts.ContactGroup.repframes with `return_traj`=True

    • dict containing np.ndarrays of shape (M, N):

      M is the number of values and N is the number of contacts. M can have different values for each of the groups and N needs match n_ctcs of each group and be in the same order as of the group.res_idxs_pairs. Rearrangements due to sort_by will sort this array automatically, it just has to be in the order of residxs_pairs initially (no other checks are done).

Returns:

  • fig (Figure)

  • ax (Axes)

  • labels (list) – The list of plotted labels, in the order they are plotted

  • repframes (dict) – Will only be returned if representatives was not None. The representative frames for each group according to the parameters of representatives