mdciao.utils.residue_and_atom

Deal with residues, atoms, and their names, mostly. The function residues_from_descriptors is probably the most elaborate and most higher-level.

Functions

AAtype(res[, return_color, typecolors])

Residue types, optionally color coded

atom_type(aa[, no_BB_no_SC])

Return a string BB or SC for backbone or sidechain atom.

find_AA(AA_pattern, top[, extra_columns, …])

Residue matching with UNIX-shell patterns

find_CA(res[, CA_name, CA_dict])

Return the CA atom (or something equivalent) for this residue

get_SS(SS[, top])

Try to guess what type of input for secondary-structure computation the user wants, and compute it

int_from_AA_code(key)

Returns the integer part from a residue name, None if there isn’t

name_from_AA(key)

Return the residue name from a string

parse_and_list_AAs_input(AAs, top[, map_conlab])

Helper method to print information regarding AA descriptors

rangeexpand_residues2residxs(range_as_str, …)

Generalized range-expander from residue descriptors.

residue_line(item_desc, residue, frag_idx[, …])

Return a string that describes the residue

residues_from_descriptors(…[, …])

Returns residue idxs based on a list of residue descriptors.

shorten_AA(AA[, substitute_fail, keep_index])

Return the short name of an AA, e.g.

top2lsd(top[, substitute_fail, extra_columns])

Return a list of per-residue attributes as dictionaries

mdciao.utils.residue_and_atom.AAtype(res, return_color=False, typecolors={'NA': 'purple', 'hydrophobic': 'gray', 'negative': 'red', 'polar': 'green', 'positive': 'blue', 'special': 'gray'})

Residue types, optionally color coded

The types are: * “positive”: “ARG HIS LYS”, * “negative”: “ASP GLU”, * “polar”: “SER THR ASN GLN”, * “special”: “CYS GLY PRO”, * “hydrophobic”: “ALA ILE LEU MET PHE TRP TYR VAL”

Parameters
  • res (str or Residue) –

  • return_color (bool, default is False) – Return the color associated with the type (positive:blue, negative:red, etc) rather than type itself

  • typecolors (dict) – The map of types to colors

Returns

rtype – Either the type or the color

Return type

str

mdciao.utils.residue_and_atom.atom_type(aa, no_BB_no_SC='X')

Return a string BB or SC for backbone or sidechain atom.

Parameters
  • aa (mdtraj.core.topology.Atom object) –

  • no_BB_no_SC (str, default is X) – Return this string if aa isn’t either BB or SC

Returns

aatype

Return type

str

mdciao.utils.residue_and_atom.find_AA(AA_pattern, top, extra_columns=None, return_df=False)

Residue matching with UNIX-shell patterns

Similar to the shell command “ls”, using posix-style wildcards like shown in the examples or here: https://docs.python.org/3/library/fnmatch.html

Any other attribute that’s passed as extra_columns will be matched as explained below, e.g. “3.50” to get one residue in the GPCR-nomenclature or “3.*” to get the whole TM-helix 3

The examples use ‘*’ as wildcard, but ‘?’ (as in ‘ls’) also works

Examples

  • ‘PRO’ : returns all PROs, matching via the attribute “name”

  • ‘P’ : returns all PROs, matching via the attribute “code”

  • ‘P*’ : returns all PROs,PHEs and any other residue that starts with “P”, either in “name” or in “code”

  • ‘PRO39’ : returns PRO39, matching via full residue name (long)

  • ‘P39’ : returns PRO39, matching via full residue name (short)

  • ‘PRO3*’ : returns all PROs with sequence indices that start with 3, e.g. ‘PRO39, PRO323, PRO330’ etc

  • ‘3’ : returns all residues with sequence indices 3

  • ‘3*’ : returns all residues with sequence indices that start with 3

Parameters
  • AA_patt (str or int) –

  • top (Topology) –

  • return_df (bool, default is False) – Return the full DataFrame of the matching residues

Returns

AAs – List of serial residue indices, s.t. top.residue(idx) would return the wanted residue. With return_df, you can get the full DataFrame of the matching residues.

Return type

list or DataFrame

mdciao.utils.residue_and_atom.find_CA(res, CA_name='CA', CA_dict=None)

Return the CA atom (or something equivalent) for this residue

Parameters
  • res (mdtraj.Residue object) –

  • CA_name (str, default is "CA") – The name by which you identify the CA. This overrules anything that’s parsed in the CA_dict, i.e. if the residue you are passing has both an atom “CA” and an entry in the CA_dict, the “CA” atom will be returned.

  • CA_dict (dict, default is None) – You can provide a dictionary keyed with residue names and valued with strings that identify a “CA”-equivalent atom (e.g. in ligands) If None, the default _CA_rules are used: _CA_rules = {“GDP”: “C1”, “P0G”:”C12”}

mdciao.utils.residue_and_atom.get_SS(SS, top=None)

Try to guess what type of input for secondary-structure computation the user wants, and compute it

Parameters
  • SS (secondary structure information) –

    Can be many things: * triple of ints (CP_idx, traj_idx, frame_idx)

    Nothing happens, the tuple is returned as is and handled externally by the ContactGroup that called this method. Tuple representing a ContactPair, trajectory See the docs there for more info

    • True same as [0,0,0]

    • None or False Do nothing

    • mdtraj.Trajectory Use this geometry to compute the SS

    • string Path to a filename, of which only the first frame will be read. The SS will be computed from there. The file will be tried to read first without topology information (e.g. .pdb, .gro, .h5 will work), and when this fails, the top will be passed (e.g. .xtc, .dcd)

    • array_like Use the SS from here, s.t.ss_inf[idx] gives the SS-info for the residue with that idx

  • top (Topology, default is None) –

Returns

  • from_tuple (bool) – Whether the infor should be gotten from a tuple or not

  • ss_array (np.ndarray or None)

mdciao.utils.residue_and_atom.int_from_AA_code(key)

Returns the integer part from a residue name, None if there isn’t

Parameters

key (string) – Residue name passed as a string, example “GLU30”

Returns

Integer part of the residue id, example- 30 if the input is “GLU30”

Return type

int

mdciao.utils.residue_and_atom.name_from_AA(key)

Return the residue name from a string

Parameters

key (string or obj:mdtraj.Topology.Residue object) – Residue name passed as a string, example “GLU30” or as residue object

Returns

name – Name of the residue, like “GLU” for “GLU30” or “E” for “E30”

Return type

str

mdciao.utils.residue_and_atom.parse_and_list_AAs_input(AAs, top, map_conlab=None)

Helper method to print information regarding AA descriptors

Parameters
  • AAs (None or str) – CSVs of AA descriptors, e.g. ‘GLU30,GLU*,GDP’, anything that find_AA can read How AAs are being described

  • top (Topology) – Topology where the AAs live

  • map_conlab (dict, list or array, default is None) – maps residue indices to consensus labels

mdciao.utils.residue_and_atom.rangeexpand_residues2residxs(range_as_str, fragments, top, interpret_as_res_idxs=False, sort=False, **residues_from_descriptors_kwargs)

Generalized range-expander from residue descriptors.

Residue descriptors can be anything that find_AA understands. Expanding a range means getting “2-5,7” as input and returning “2,3,4,5,7”

To dis-ambiguate descriptors, a fragment definition and a topology are needed

Note

The input (= compressed range) is very flexible and accepts mixed descriptors and wildcards, eg: GLU*,ARG*,GDP*,LEU394,380-385 is a valid range.

Wildcards use the full resnames, i.e. E* is NOT equivalent to GLU*

Be aware, though, that wildcards are very powerful and easily “grab” a lot of residues, leading to long calculations and large outputs.

See find_AA for more on residue descriptors

Parameters
  • range_as_str (string, int or iterable of ints) –

  • fragments (list of iterable of residue indices) –

  • top (Topology object) –

  • interpret_as_res_idxs (bool, default is False) – If True, indices without residue names (“380-385”) values will be interpreted as residue indices, not residue sequential indices

  • sort (bool) – sort the expanded range on return

  • residues_from_descriptors_kwargs – Optional parameters for residues_from_descriptors

Returns

Return type

residxs_out = list of unique residue indices

mdciao.utils.residue_and_atom.residue_line(item_desc, residue, frag_idx, consensus_maps=None, fragment_names=None, table=False)

Return a string that describes the residue

Can be used justo to inform or to help dis-ambiguating: 0.0) GLU10 in fragment 0 with residue index 6 (CGN: G.HN.27) … 1.0) GLU10 in fragment 1 with residue index 363

Parameters
  • item_desc (str) – Description for the item of the list, “1.0” or “3.2”

  • residue (Residue) –

  • frag_idx (int) – Fragment index

  • fragment_names (list, default is None) – Fragment names

  • consensus_maps (dict of indexables, default is None) – Dictionary of dictionaries. Lower-level dicts are keyed with residue indices and valued with additional residue names. Higher-level keys can be whatever. Use case is e.g. if “R131” needs to be disambiguated bc. it pops up in many fragments. You can pass {“GPCR”:{895:”3.50”, …} here and that label will be displayed next to the residue.

  • table (bool, default is False) – Assume a header has been aready printed out and print the line with the inline tags

Returns

istr – An informative string about this residue, that can be used to dis-ambiguate via the unique item descriptor, e.g: 3.1) GLU122 in fragment 3 with residue index 852 (: 3.41)

Return type

str

mdciao.utils.residue_and_atom.residues_from_descriptors(residue_descriptors, fragments, top, pick_this_fragment_by_default=None, fragment_names=None, additional_resnaming_dicts=None, extra_string_info='', just_inform=False)

Returns residue idxs based on a list of residue descriptors.

Fragments are needed to better identify residues. If a residue is present in multiple fragments, the user can dis-ambiguate or pick all residue idxs matching the residue_descriptor

Because of this (one descriptor can match more than one residue) the return values are not necessarily of the same length as residue_descriptors

Parameters
  • residue_descriptors (string or list of of strings) – AAs of the form of “GLU30” or “E30” or 30, can be mixed

  • fragments (iterable of iterables of integers) – The integers in the iterables of ‘fragments’ represent residue indices of that fragment

  • top (Topology) –

  • pick_this_fragment_by_default (None or integer.) – Pick this fragment without asking in case of ambiguity. If None, the user will we prompted

  • fragment_names – list of strings providing informative names input fragments

  • additional_resnaming_dicts (dict of dicts, default is None) – Dictionary of dictionaries. Lower-level dicts are keyed with residue indices and valued with additional residue names. Higher-level keys can be whatever. Use case is e.g. if “R131” needs to be disambiguated bc. it pops up in many fragments. You can pass {“GPCR”:{895:”3.50”, …} here and that label will be displayed next to the residue. mdciao.cli methods use this.

  • just_inform (bool, default is False) – Just inform about the AAs, don’t ask for a selection

  • extra_string_info (string with any additional info to be printed in case of ambiguity) –

Returns

  • residxs (list) – lists of integers that have been selected

  • fragidxs (list) – The list of fragments where the residues are

mdciao.utils.residue_and_atom.shorten_AA(AA, substitute_fail=None, keep_index=False)

Return the short name of an AA, e.g. TRP30 to W by trying to use either the mdtraj.Topology.Residue.code attribute or mdtraj internals AA dictionary

Parameters
  • AA (Residue or a str) – The residue in question

  • substitute_fail (str, default is None) – If there is no .code attribute, different options are there depending on the value of this parameter * None : throw an exception when no short code is found (default) * ‘long’ : keep the residue’s long name, i.e. do nothing * ‘c’: any alphabetic character, as long as it is of len=1 * 0 : the first alphabetic character in the residue’s name

  • keep_index (bool, default is False) – If True return “Y30” for “TRP30”, instead of returning just “Y”

Returns

code – A string representing this AA using the short code

Return type

str

mdciao.utils.residue_and_atom.top2lsd(top, substitute_fail='X', extra_columns=None)

Return a list of per-residue attributes as dictionaries

Use DataFrame on the return value for a nice table

Parameters
  • top (Topology) –

  • substitute_fail (str, None, int, default is "X") –

    If there is no .code attribute, different options are there depending on the value of this parameter

    • None : throw an exception when no short code is found (default)

    • ’long’ : keep the residue’s long name, i.e. do nothing

    • ’c’: any alphabetic character, as long as it is of len=1

    • 0 : the first alphabetic character in the residue’s name

  • extra_columns (dictionary of indexables) – Any other column you want to include in the DataFrame

Returns

df

Return type

DataFrame