mdciao.nomenclature.AlignerConsensus¶
-
class
mdciao.nomenclature.
AlignerConsensus
(tops, maps=None, CL: mdciao.nomenclature.nomenclature.LabelerConsensus = None)¶ Use consensus labels for multiple sequence alignment
Instead of doing an actual multiple sequence alignment, we can exploit the existing consensus labels to align positions across very different (=low sequence identity) topologies/sequences.
For example (edited table):
consensus 3CAP 3SN6 3.50x50 ARG135 ARG131 3.51x51 TYR136 TYR132 3.52x52 VAL137 PHE133
Or, for residue atom indices (edited table):
consensus 3CAP 3SN6 3.50x50 1065 7835 3.51x51 1076 7846 3.52x52 1088 7858
-
__init__
(tops, maps=None, CL: mdciao.nomenclature.nomenclature.LabelerConsensus = None)¶ - Parameters
tops (dict) – Dictionary of
Topology
objects. Keys can be arbitrary identifiers to distinguish among the tops, like different PDB IDs or system-setups (WT vs MUT). These keys will be used throughout the object in this ordermaps (dict, default is None) – Dictionary of dictionaries, each mapping residue indices of each the tops to a consensus label. Typically, this map comes from invoking
top2labels
on the respective top.CL (
LabelerConsensus
, default is None) – If provided, it is assumed that this object can operate on all tops, because all the sequences in tops are a good match with the descriptor (e.g. a UniProt Accession Code) used to create the CL. Hence, the the maps can be generated on-the-fly using this object. Note, however, that in this case there is likely very high sequence similarities between the tops and a normal sequence alignment works as well.
Methods
AAresSeq_match
([patterns, keys])Filter the self.AAreSeq to the rows where all consensus labels are present.
CAidxs_match
([patterns, keys])Filter the self.CAidxs to the rows where all consensus labels are present.
__init__
(tops[, maps, CL])- param tops
Dictionary of
Topology
objects. Keys
residxs_match
([patterns, keys])Filter the self.residxs to the rows where all consensus labels are present.
Attributes
The
DataFrame
containing the alignment based on consensus labelsThe
DataFrame
containing the alignment based on consensus labelsThe keys with which the tops and/or the maps were given at input
The dictionaries mapping residue indices of the tops to consensus labels.
The
DataFrame
containing the alignment based on consensus labelsThe topologies given at input
-
property
AAresSeq
¶ The
DataFrame
containing the alignment based on consensus labels‘AAreSeq’ means labels like ‘ARG130’ and so on.
Currently, sorted alphabetically by consensus labels, which works well for GPCR, not so much for CGN, KLIFS (this will change soon)
Will have NaNs where residues weren’t found, i.e. a given map didn’t contain that consensus label
- Returns
df
- Return type
-
AAresSeq_match
(patterns=None, keys=None) → pandas.core.frame.DataFrame¶ Filter the self.AAreSeq to the rows where all consensus labels are present.
You can filter by consensus label using patterns and by system using keys.
- Parameters
patterns (str, default is None) –
A list in CSV-format of patterns to be matched by the consensus labels. Matches are done using Unix filename pattern matching, and are allows for exclusion, e.g.
”H*,-H8” will include all TMs but not H8
”G.S*” will include all beta-sheets
keys (list, default is None) – If only a sub-set of columns need to match, provide them here as list of strings. If None, all columns (except filter_on) will be used.
- Returns
df
- Return type
-
property
CAidxs
¶ The
DataFrame
containing the alignment based on consensus labelsIndices are zero-based atom indices of the respective tops.
Currently, sorted alphabetically by consensus labels, which works well for GPCR, not so much for CGN, KLIFS (this will change soon)
Will have NaNs where residues weren’t found, i.e. a given map didn’t contain that consensus label
- Returns
df
- Return type
-
CAidxs_match
(patterns=None, keys=None) → pandas.core.frame.DataFrame¶ Filter the self.CAidxs to the rows where all consensus labels are present.
You can filter by consensus label using patterns and by system using keys.
- Parameters
patterns (str, default is None) –
A list in CSV-format of patterns to be matched by the consensus labels. Matches are done using Unix filename pattern matching, and are allows for exclusion, e.g.
”H*,-H8” will include all TMs but not H8
”G.S*” will include all beta-sheets
keys (list, default is None) – If only a sub-set of columns need to match, provide them here as list of strings. If None, all columns (except filter_on) will be used.
- Returns
df
- Return type
-
property
keys
¶ The keys with which the tops and/or the maps were given at input
-
property
maps
¶ The dictionaries mapping residue indices of the tops to consensus labels.
These maps were either given at input or created on-the-fly with the provided LabelerConsensus
-
property
residxs
¶ The
DataFrame
containing the alignment based on consensus labelsIndices are zero-based residue indices of the respective tops.
Currently, sorted alphabetically by consensus labels, which works well for GPCR, not so much for CGN, KLIFS (this will change soon)
Will have NaNs where residues weren’t found, i.e. a given map didn’t contain that consensus label
- Returns
df
- Return type
-
residxs_match
(patterns=None, keys=None) → pandas.core.frame.DataFrame¶ Filter the self.residxs to the rows where all consensus labels are present.
You can filter by consensus label using patterns and by system using keys.
- Parameters
patterns (str, default is None) –
A list in CSV-format of patterns to be matched by the consensus labels. Matches are done using Unix filename pattern matching, and are allows for exclusion, e.g.
”H*,-H8” will include all TMs but not H8
”G.S*” will include all beta-sheets
keys (list, default is None) – If only a sub-set of columns need to match, provide them here as list of strings. If None, all columns (except filter_on) will be used.
- Returns
df
- Return type
-
property
tops
¶ The topologies given at input
-