mdciao.nomenclature.AlignerConsensus

class mdciao.nomenclature.AlignerConsensus(tops, maps=None, CL: mdciao.nomenclature.nomenclature.LabelerConsensus = None)

Use consensus labels for multiple sequence alignment

Instead of doing an actual multiple sequence alignment, we can exploit the existing consensus labels to align positions across very different (=low sequence identity) topologies/sequences.

For example (edited table):

consensus    3CAP    3SN6
3.50x50  ARG135  ARG131
3.51x51  TYR136  TYR132
3.52x52  VAL137  PHE133

Or, for residue atom indices (edited table):

consensus  3CAP  3SN6
3.50x50  1065  7835
3.51x51  1076  7846
3.52x52  1088  7858
__init__(tops, maps=None, CL: mdciao.nomenclature.nomenclature.LabelerConsensus = None)
Parameters
  • tops (dict) – Dictionary of Topology objects. Keys can be arbitrary identifiers to distinguish among the tops, like different PDB IDs or system-setups (WT vs MUT). These keys will be used throughout the object in this order

  • maps (dict, default is None) – Dictionary of dictionaries, each mapping residue indices of each the tops to a consensus label. Typically, this map comes from invoking top2labels on the respective top.

  • CL (LabelerConsensus, default is None) – If provided, it is assumed that this object can operate on all tops, because all the sequences in tops are a good match with the descriptor (e.g. a UniProt Accession Code) used to create the CL. Hence, the the maps can be generated on-the-fly using this object. Note, however, that in this case there is likely very high sequence similarities between the tops and a normal sequence alignment works as well.

Methods

AAresSeq_match([patterns, keys])

Filter the self.AAreSeq to the rows where all consensus labels are present.

CAidxs_match([patterns, keys])

Filter the self.CAidxs to the rows where all consensus labels are present.

__init__(tops[, maps, CL])

param tops

Dictionary of Topology objects. Keys

residxs_match([patterns, keys])

Filter the self.residxs to the rows where all consensus labels are present.

Attributes

AAresSeq

The DataFrame containing the alignment based on consensus labels

CAidxs

The DataFrame containing the alignment based on consensus labels

keys

The keys with which the tops and/or the maps were given at input

maps

The dictionaries mapping residue indices of the tops to consensus labels.

residxs

The DataFrame containing the alignment based on consensus labels

tops

The topologies given at input

property AAresSeq

The DataFrame containing the alignment based on consensus labels

‘AAreSeq’ means labels like ‘ARG130’ and so on.

Currently, sorted alphabetically by consensus labels, which works well for GPCR, not so much for CGN, KLIFS (this will change soon)

Will have NaNs where residues weren’t found, i.e. a given map didn’t contain that consensus label

Returns

df

Return type

DataFrame

AAresSeq_match(patterns=None, keys=None) → pandas.core.frame.DataFrame

Filter the self.AAreSeq to the rows where all consensus labels are present.

You can filter by consensus label using patterns and by system using keys.

Parameters
  • patterns (str, default is None) –

    A list in CSV-format of patterns to be matched by the consensus labels. Matches are done using Unix filename pattern matching, and are allows for exclusion, e.g.

    • ”H*,-H8” will include all TMs but not H8

    • ”G.S*” will include all beta-sheets

  • keys (list, default is None) – If only a sub-set of columns need to match, provide them here as list of strings. If None, all columns (except filter_on) will be used.

Returns

df

Return type

DataFrame

property CAidxs

The DataFrame containing the alignment based on consensus labels

Indices are zero-based atom indices of the respective tops.

Currently, sorted alphabetically by consensus labels, which works well for GPCR, not so much for CGN, KLIFS (this will change soon)

Will have NaNs where residues weren’t found, i.e. a given map didn’t contain that consensus label

Returns

df

Return type

DataFrame

CAidxs_match(patterns=None, keys=None) → pandas.core.frame.DataFrame

Filter the self.CAidxs to the rows where all consensus labels are present.

You can filter by consensus label using patterns and by system using keys.

Parameters
  • patterns (str, default is None) –

    A list in CSV-format of patterns to be matched by the consensus labels. Matches are done using Unix filename pattern matching, and are allows for exclusion, e.g.

    • ”H*,-H8” will include all TMs but not H8

    • ”G.S*” will include all beta-sheets

  • keys (list, default is None) – If only a sub-set of columns need to match, provide them here as list of strings. If None, all columns (except filter_on) will be used.

Returns

df

Return type

DataFrame

property keys

The keys with which the tops and/or the maps were given at input

property maps

The dictionaries mapping residue indices of the tops to consensus labels.

These maps were either given at input or created on-the-fly with the provided LabelerConsensus

property residxs

The DataFrame containing the alignment based on consensus labels

Indices are zero-based residue indices of the respective tops.

Currently, sorted alphabetically by consensus labels, which works well for GPCR, not so much for CGN, KLIFS (this will change soon)

Will have NaNs where residues weren’t found, i.e. a given map didn’t contain that consensus label

Returns

df

Return type

DataFrame

residxs_match(patterns=None, keys=None) → pandas.core.frame.DataFrame

Filter the self.residxs to the rows where all consensus labels are present.

You can filter by consensus label using patterns and by system using keys.

Parameters
  • patterns (str, default is None) –

    A list in CSV-format of patterns to be matched by the consensus labels. Matches are done using Unix filename pattern matching, and are allows for exclusion, e.g.

    • ”H*,-H8” will include all TMs but not H8

    • ”G.S*” will include all beta-sheets

  • keys (list, default is None) – If only a sub-set of columns need to match, provide them here as list of strings. If None, all columns (except filter_on) will be used.

Returns

df

Return type

DataFrame

property tops

The topologies given at input