mdciao.utils.residue_and_atom
Deal with residues, atoms, and their names, mostly.
The function residues_from_descriptors
is probably the
most elaborate and most higher-level.
Functions
|
Residue types, optionally color coded |
|
Return a string BB or SC for backbone or sidechain atom. |
|
Residue matching with UNIX-shell patterns |
|
Return the CA atom (or something equivalent) for this residue |
|
Try to guess what type of input for secondary-structure computation the user wants, and compute it |
|
Returns the integer part from a residue name, None if there isn't |
|
Return the residue name from a string |
|
Helper method to print information regarding AA descriptors |
|
Generalized range-expander from residue descriptors. |
|
Return a string that describes the residue |
|
Returns residue idxs based on a list of residue descriptors. |
|
Return the short name of an AA, e.g. |
|
Return a list of per-residue attributes as dictionaries |
- mdciao.utils.residue_and_atom.AAtype(res, return_color=False, typecolors={'NA': 'purple', 'hydrophobic': 'gray', 'negative': 'red', 'polar': 'green', 'positive': 'blue', 'special': 'gray'})
Residue types, optionally color coded
The types are: * “positive”: “ARG HIS LYS”, * “negative”: “ASP GLU”, * “polar”: “SER THR ASN GLN”, * “special”: “CYS GLY PRO”, * “hydrophobic”: “ALA ILE LEU MET PHE TRP TYR VAL”
- Parameters:
res (str or
Residue
)return_color (bool, default is False) – Return the color associated with the type (positive:blue, negative:red, etc) rather than type itself
typecolors (dict) – The map of types to colors
- Returns:
rtype – Either the type or the color
- Return type:
str
- mdciao.utils.residue_and_atom.atom_type(aa, no_BB_no_SC='X')
Return a string BB or SC for backbone or sidechain atom.
- Parameters:
aa (
mdtraj.core.topology.Atom
object)no_BB_no_SC (str, default is X) – Return this string if
aa
isn’t either BB or SC
- Returns:
aatype
- Return type:
str
- mdciao.utils.residue_and_atom.find_AA(AA_pattern, top, extra_columns=None, return_df=False)
Residue matching with UNIX-shell patterns
Similar to the shell command “ls”, using posix-style wildcards like shown in the examples or here: https://docs.python.org/3/library/fnmatch.html
Any other attribute that’s passed as
extra_columns
will be matched as explained below, e.g. “3.50” to get one residue in the GPCR-nomenclature or “3.*” to get the whole TM-helix 3The examples use ‘*’ as wildcard, but ‘?’ (as in ‘ls’) also works
Examples
‘PRO’ : returns all PROs, matching via the attribute “name”
‘P’ : returns all PROs, matching via the attribute “code”
‘P*’ : returns all PROs,PHEs and any other residue that starts with “P”, either in “name” or in “code”
‘PRO39’ : returns PRO39, matching via full residue name (long)
‘P39’ : returns PRO39, matching via full residue name (short)
‘PRO3*’ : returns all PROs with sequence indices that start with 3, e.g. ‘PRO39, PRO323, PRO330’ etc
‘3’ : returns all residues with sequence indices 3
‘3*’ : returns all residues with sequence indices that start with 3
- mdciao.utils.residue_and_atom.find_CA(res, CA_name='CA', CA_dict=None)
Return the CA atom (or something equivalent) for this residue
- Parameters:
res (
mdtraj.Residue
object)CA_name (str, default is “CA”) – The name by which you identify the CA. This overrules anything that’s parsed in the
CA_dict
, i.e. if the residue you are passing has both an atom “CA” and an entry in the CA_dict, the “CA” atom will be returned.CA_dict (dict, default is None) – You can provide a dictionary keyed with residue names and valued with strings that identify a “CA”-equivalent atom (e.g. in ligands) If None, the default
_CA_rules
are used: _CA_rules = {“GDP”: “C1”, “P0G”:”C12”}
- mdciao.utils.residue_and_atom.get_SS(SS, top=None)
Try to guess what type of input for secondary-structure computation the user wants, and compute it
- Parameters:
SS (secondary structure information) – Can be many things: * triple of ints (CP_idx, traj_idx, frame_idx)
Nothing happens, the tuple is returned as is and handled externally by the
ContactGroup
that called this method. Tuple representing a ContactPair, trajectory See the docs there for more infoTrue same as [0,0,0]
None or False Do nothing
mdtraj.Trajectory
Use this geometry to compute the SSstring Path to a filename, of which only the first frame will be read. The SS will be computed from there. The file will be tried to read first without topology information (e.g. .pdb, .gro, .h5 will work), and when this fails, the
top
will be passed (e.g. .xtc, .dcd)array_like Use the SS from here, s.t.ss_inf[idx] gives the SS-info for the residue with that idx
top (
Topology
, default is None)
- Returns:
from_tuple (bool) – Whether the infor should be gotten from a tuple or not
ss_array (np.ndarray or None)
- mdciao.utils.residue_and_atom.int_from_AA_code(key)
Returns the integer part from a residue name, None if there isn’t
- Parameters:
key (string) – Residue name passed as a string, example “GLU30”
- Returns:
Integer part of the residue id, example- 30 if the input is “GLU30”
- Return type:
int
- mdciao.utils.residue_and_atom.name_from_AA(key) str
Return the residue name from a string
- Parameters:
key (string or obj:mdtraj.Topology.Residue object) – Residue name passed as a string, example “GLU30” or as residue object
- Returns:
name – Name of the residue, like “GLU” for “GLU30” or “E” for “E30”
- Return type:
str
- mdciao.utils.residue_and_atom.parse_and_list_AAs_input(AAs, top, map_conlab=None)
Helper method to print information regarding AA descriptors
- mdciao.utils.residue_and_atom.rangeexpand_residues2residxs(range_as_str, fragments, top, interpret_as_res_idxs=False, sort=False, **residues_from_descriptors_kwargs)
Generalized range-expander from residue descriptors.
Residue descriptors can be anything that
find_AA
understands. Expanding a range means getting “2-5,7” as input and returning “2,3,4,5,7”.To dis-ambiguate descriptors, a fragment definition and a topology are needed
Note
The input (= compressed range) is very flexible and accepts mixed descriptors and wildcards, eg: GLU*,ARG*,GDP*,LEU394,380-385 is a valid range.
Expressions starting with “-”, e.g. are exclusions, s.t. “GLU*,-GLU30” will select all GLUs except GLU30.
Wildcards use the full resnames, i.e. “E*” is NOT equivalent to “GLU*”
Expressions leading to empty ranges raise ValueError.
Be aware, though, that wildcards are very powerful and easily “grab” a lot of residues, leading to long calculations and large outputs.
See
find_AA
for more on residue descriptors.- Parameters:
range_as_str (string, int or iterable of ints)
fragments (list of iterable of residue indices)
top (
Topology
object)interpret_as_res_idxs (bool, default is False) – If True, indices without residue names (“380-385”) values will be interpreted as residue indices, not residue sequential indices
sort (bool) – sort the expanded range on return
residues_from_descriptors_kwargs – Optional parameters for
residues_from_descriptors
, which are listed below
- Other Parameters:
pick_this_fragment_by_default (None or integer.) – Pick this fragment without asking in case of ambiguity. If None, the user will we prompted
fragment_names – list of strings providing informative names for the input
fragments
additional_resnaming_dicts (dict of dicts, default is None) – Dictionary of dictionaries. Lower-level dicts are keyed with residue indices and valued with additional residue names. Higher-level keys can be whatever. Use case is e.g. if “R131” needs to be disambiguated bc. it pops up in many fragments. You can pass {“GPCR”:{895:”3.50”, …} here and that label will be displayed next to the residue.
mdciao.cli
methods use this.just_inform (bool, default is False) – Just inform about the AAs, don’t ask for a selection
extra_string_info (str,) – string with any additional info to be printed in case of ambiguity
- Return type:
residxs_out = list of unique residue indices
- mdciao.utils.residue_and_atom.residue_line(item_desc, residue, frag_idx, consensus_maps=None, fragment_names=None, table=False)
Return a string that describes the residue
Can be used just to to inform or to help dis-ambiguating:
>>> 0.0) LEU45 with residue index 41 in fragment 0 ( CGN: LEU45@G.S1.6) >>> 3.0) LEU45 with residue index 775 in fragment 3 ( GPCR: LEU45@1.44x44)
- Parameters:
item_desc (str) – Description for the item of the list, “1.0” or “3.2”
residue (
Residue
)frag_idx (int) – Fragment index
fragment_names (list, default is None) – Fragment names
consensus_maps (dict of indexables, default is None) – Dictionary of dictionaries. Lower-level dicts are keyed with residue indices and valued with additional residue names. Higher-level keys can be whatever. Use case is e.g. if “R131” needs to be disambiguated bc. it pops up in many fragments. You can pass {“GPCR”:{895:”3.50”, …} here and that label will be displayed next to the residue.
table (bool, default is False) – Assume a header has been already printed out and print the line with the inline tags
- Returns:
istr – An informative string about this residue, that can be used to dis-ambiguate via the unique item descriptor, e.g: 3.1) GLU122 in fragment 3 with residue index 852 (: 3.41)
- Return type:
str
- mdciao.utils.residue_and_atom.residues_from_descriptors(residue_descriptors, fragments, top, pick_this_fragment_by_default=None, fragment_names=None, additional_resnaming_dicts=None, extra_string_info='', just_inform=False)
Returns residue idxs based on a list of residue descriptors.
Fragments are needed to better identify residues. If a residue is present in multiple fragments, the user can dis-ambiguate or pick all residue idxs matching the
residue_descriptor
Because of this (one descriptor can match more than one residue) the return values are not necessarily of the same length as
residue_descriptors
- Parameters:
residue_descriptors (string or list of of strings) – AAs of the form of “GLU30” or “E30” or 30, can be mixed
fragments (iterable of iterables of integers) – The integers in the iterables of ‘fragments’ represent residue indices of that fragment
top (
Topology
)pick_this_fragment_by_default (None or integer.) – Pick this fragment without asking in case of ambiguity. If None, the user will we prompted
fragment_names – list of strings providing informative names for the input
fragments
additional_resnaming_dicts (dict of dicts, default is None) – Dictionary of dictionaries. Lower-level dicts are keyed with residue indices and valued with additional residue names. Higher-level keys can be whatever. Use case is e.g. if “R131” needs to be disambiguated bc. it pops up in many fragments. You can pass {“GPCR”:{895:”3.50”, …} here and that label will be displayed next to the residue.
mdciao.cli
methods use this.just_inform (bool, default is False) – Just inform about the AAs, don’t ask for a selection
extra_string_info (str,) – string with any additional info to be printed in case of ambiguity
- Returns:
residxs (list) – lists of integers that have been selected
fragidxs (list) – The list of fragments where the residues are
- mdciao.utils.residue_and_atom.shorten_AA(AA, substitute_fail=None, keep_index=False)
Return the short name of an AA, e.g. TRP30 to W by trying to use either the
mdtraj.Topology.Residue.code
attribute ormdtraj
internals AA dictionary- Parameters:
AA (
Residue
or a str) – The residue in questionsubstitute_fail (str, default is None) – If there is no .code attribute, there are different options depending on the value of this parameter * None : throw an exception when no short code is found (default) * ‘long’ : keep the residue’s long name, i.e. do nothing * ‘c’: any alphabetic character, as long as it is of len=1 * 0 : the first alphabetic character in the residue’s name
keep_index (bool, default is False) – If True return “Y30” for “TRP30”, instead of returning just “Y”
- Returns:
code – A string representing this AA using the short code
- Return type:
str
- mdciao.utils.residue_and_atom.top2lsd(top, substitute_fail='X', extra_columns=None)
Return a list of per-residue attributes as dictionaries
Use
DataFrame
on the return value for a nice table- Parameters:
top (
Topology
)substitute_fail (str, None, int, default is “X”) – If there is no .code attribute, there are different options depending on the value of this parameter
None : throw an exception when no short code is found (default)
‘long’ : keep the residue’s long name, i.e. do nothing
‘c’: any alphabetic character, as long as it is of len=1
0 : the first alphabetic character in the residue’s name
extra_columns (dictionary of indexables) – Any other columns you want to include in the
DataFrame
, e.g. {“GPCR” : [None, None,…,3.50, 3.51…],“CGN” : [G.H5.25, None, None, …]} If the values are lists, they sould be len=top.n_residues, if dicts, the dicts don’t need to cover all residues of top, e.g.
- {“GPCR”{200“3.50”, 201“3.51”},
“CGN” : {0 : “G.H5.25”}}
- Returns:
df
- Return type: