mdciao.plots.compare_groups_of_contacts

mdciao.plots.compare_groups_of_contacts(groups, colors=None, mutations_dict=None, width=0.2, ax=None, figsize=(10, 5), fontsize=16, anchor=None, plot_singles=False, ctc_cutoff_Ang=None, AA_format='short', defrag='@', per_residue=False, title='comparison', distro=False, interface=False, n_cols=1, sharex=False, **kwargs_plot_unified_freq_dicts)

Compare contact groups across different systems using different plots and strategies

Parameters:
  • groups (iterable (list or dict)) – The contact groups. If dict, the keys will be used as names for the contact groups, e.g. “WT”, “MUT” etc, if list the keys will be auto-generated. The values can be:

    • ContactGroup objects

    • dictionaries where the keys are residue-pairs

    (one letter-codes, no fragment info, as in ContactGroup.ctc_labels_short) and the values are contact frequencies [0,1] * ascii-files with the contact the frequencies in the first

    column and labels in the second and/or third column, see frequency_str_ASCII_file and freq_ascii2dict

    • .xlsx files with the header in the second row, containing at least the column-names “label” and “freqs”

    Note

    If a ContactGroup is passed, then a ctc_cutoff_Ang needs to be passed along, otherwise frequencies cannot be computed on-the-fly

  • colors (iterable (list or dict), or str, default is None) –

  • mutations_dict (dictionary, default is None) – A mutation dictionary that allows to plot together residues that would otherwise be identified as different contacts. If there were two mutations, e.g A30K and D35A the mutation dictionary will be {“A30”:”K30”, “D35”:”A35”}. You can also use this parameter for correcting indexing offsets, e.g {“GDP395”:”GDP”, “GDP396”:”GDP”}.

  • width (float, default is .2) – The witdth of the bars

  • ax (Axes or array thereof, default is None) – The default is to let the method draw its own figure and axis, but you can pass pre-exisintg axis here. If distro is False, it means only one axis is needed, so you can pass the axis object direclty here. If distro is True, a subplot is needed, where each panel contains the distributions of each contact. Hence, pass an array of axis if distro is True. See mdciao.plots.plot_unified_distro_dicts for more info (in particular ax_array).

  • figsize (tuple, default is (10,5)) – The figure size in inches, in case it is instantiated automatically by not passing an ax

  • fontsize (float, default is 16) – The fontsize to use

  • anchor (str, default is None) – This string will be deleted from the contact labels, leaving only the partner-residue to identify the contact. The deletion takes place after the mutations_dict has been applied. The final anchor label will be that of the deleted keys (allows for keeping e.g. pre-existing consensus nomenclature). No consistency-checks are carried out, i.e. use at your own risk

  • plot_singles (bool, default is False) – Produce one extra figure with as many subplots as systems in dictionary_of_groups, where each system is plotted separately. The labels used will have been already “mutated” using mutations_dict and “anchored” using anchor. This plot is temporary and cannot be saved Needed value to compute frequencies on-the-fly if the input was using ContactGroup objects

  • AA_format (str, default is “short”) – see frequency_dict for more info

  • defrag (str, default is “@”) – see unify_freq_dicts for more info

  • per_residue (bool, default is False) – Unify dictionaries by residue and not by pairs. If True, remove_identities is set to False automatically when calling plot_unified_freq_dicts

  • title (str, default is “comparison”) – The title for the plot

  • distro (bool, default is False) – Instead of plotting contact frequencies, plot contact distributions

  • interface (bool, default is False) – Sorts the residues into interface fragments. Will fail if the passed groups don’t have self.is_interface==True It enforces a per-residue view, plotting a single bar per residue indicating in how many contacts that residue participates in. See below ‘sort_by’ for how these residues get sorted within their respective interface fragments.

  • n_cols (int, default is 1) – Only has effect if distro is True. The number of columns in the multi-panel figure with the per-contact distributions.

  • sharex (bool, or string, default is False) – Only has effect if distro is True. Can be True or “col”, for sharing the x-axis across columns. See subplots for more info. Only has an effect if ax is None.

  • kwargs_plot_unified_freq_dicts (dict) – Optional arguments for plot_unified_freq_dicts. Some of them will be overwritten, e.g. if interface or per_residue are True, then remove_identities or sort_by get set internally for consistency. The optional parameters of are:

Other Parameters:
  • colordict (dict, default is None.) – What color each system gets. Default is some sane matplotlib values

  • width (None or float, default is .2) – Bar width each bar in the plot. If None, .8/len(freqs) will be used, leaving a .1 gap of free space between contacts.

  • ax (Axes, default is None) – Plot into this axis, else create one using figsize.

  • figsize (iterable of len 2) – Figure size (x,y), in inches. If None, one will be created using panelheight_inches and inch_per_contacts. If you are transposing the figure using vertical_plot, you do not have to invert (y,x) this parameter here, it is done automatically.

  • panelheight_inches (int, default is 5) – The height of the panel, in inches. Determines the figure size if figsize is None, else has no effect

  • inch_per_contacts (int, default is 1) – How many inches each contact-pair is given in the panel. Determines the figure size if figsize is None, else has no effect

  • fontsize (int, default is 16) – Will be used in matplotlib._rcParams["font.size"] # TODO be less invasive

  • sort_by (str or list of strings, default is “mean”) – If str, the property by which to sort the contacts. If list, the list of contact labels in the order in which they will be shown. If str, the possibilities are

    • “mean” sort (descending) by mean frequency over all systems, making most frequent contacts appear on the left/top of the plot.

    • “std” sort (descending) by per-contact standard deviation over all systems, making the contacts with most different values appear on top. This highlights more “deviant” contacts and might hence be more informative than “mean” in cases where a lot of contacts have similar frequencies (high or low). If this option is activated, a faint dotted line is incorporated into the plot that marks the std for each contact group

    • “keep” keep the contacts in whatever order they have in the first dictionary

    • “numeric” sort (ascending) the contacts by the first number

    that appears in the contact labels, e.g. “30” if the label is “GLU30@3.50-GDP”. You can use this to order by resSeq if the AA to sort by is the first one of the pair. Contact labels without numbers in them will be sorted alphabetically at the end of the labels with numbers.

    • “residue” alias for “numeric”

    • list of contact-labels : sort in the order established by this list. What will actually be plotted is the intersection of this list and the available contact labels of freqs after other parameters like lower_cutoff_val or identity_cutoff have taken effect, e.g. if a contact-label is discarded because of lower_cutoff_val, adding the label to this list won’t have any effect.

  • lower_cutoff_val (float, default is 0) – Hide contacts with small values. “values” changes meaning depending on sort_by. If sort_by is any of

    • “mean”, “keep”, “numeric”, “residue” or a list, then the contacts where all systems have frequencies lower than this value are hidden.

    • “std”, then the contacts where the standard deviation across systems itself is lower than this value are hidden. This hides contacts where all systems are similar, regardless of whether they’re all around 1, around .5 or around 0

  • remove_identities (bool, default is False) – If True, the contacts where freq[sys][ctc] >= identity_cutoff across all systems will not be plotted nor considered in the sum over contacts TODO : the word identity might be confusing

  • vertical_plot (bool, default is False) – Plot the bars vertically in descending sort_by instead of horizontally (better for large number of frequencies)

  • identity_cutoff (float, default is 1) – If remove_identities, use this value to define what is considered an identity, s.t. contacts with values e.g. .95 can also be removed TODO consider merging both identity parameters into one that is None or float

  • assign_w_color (boolean, default is False) – Color the text of the contact-labels according to the following criterion.

    • If all frequencies are below the lower_cutoff_val except for one system, then the label adopts the color of this system and gets prepended with a “+” sign.

    • If all frequencies are above the lower_cutoff_val except for one system, then the label adopts the color of this system and gets prepended with a “-” sign

    For more details see the paragraph “Visual Aides” of this notebook

  • title (str, default is None) – The title of the plot, if any

  • legend_rows (int, default is 4) – The maximum number of rows per column of the legend. If you have 10 systems, :obj:`legend_rows`=5 means you’ll get two columns, =2 means you’ll get five.

  • verbose_legend (bool, default is True) – Verbose legends inform about contacts that were in the input but have been left out of the plot. Contacts are left out if they are:

    • above the identity_cutoff or

    • below the lower_cutoff_val

    They will appear in the verbose legend as “+ A.a + B.b”, respectively denoting the missing contacts that are “a(bove” and b(elow)” with their respective sums “A” and “B”.

  • half_sigma (bool, default is False) – When True, instead of showing Sigma=20, Sigma = 2x10 will be shown. If a ContactGroup has a Sigma=10 normally, when showing per-residue values, that number doubles, because each contact is shown two times. Hence, showing half-sigma allows to “keep” the number 10 in the legend, even though the shown Sigma is 20

Returns:

  • myfig (Figure) – Figure with the comparison plot

  • freqs (dictionary) – Unified frequency dictionaries, including mutations and anchor

  • plotted_freqs (dictionary) – Like freqs but sorted and purged according to the user-defined input options, s.t. it represents the plotted values