mdciao.fragments.get_fragments

mdciao.fragments.get_fragments(top, method='lig_resSeq+', fragment_breaker_fullresname=None, atoms=False, verbose=True, join_fragments=None, maxjump=500, salt=['Na+', 'Cl-', 'Na', 'Cl'], water=True, **kwargs_residues_from_descriptors)

Group residues of a molecular topology into fragments using different methods.

Water and ions get their own fragment by default except for the methods None, chains, and any method involving bonds

Parameters:
  • top (Topology or str) – When str, path to filename

  • method (str, default is ‘lig_resSeq+’) – The method passed will be the basis for creating fragments. Check the following options with the example sequence

    “…-A27,Lig28,K29-…-W40,D45-…-W50,CYSP51,GDP52”

    • ‘resSeq’

      breaks at jumps in resSeq entry:

      […A27,Lig28,K29,…,W40],[D45,…,W50,CYSP51,GDP52]

    • ‘resSeq+’

      breaks only at negative jumps in resSeq:

      […A27,Lig28,K29,…,W40,D45,…,W50,CYSP51,GDP52]

    • ‘bonds’

      breaks when residues are not connected by bonds, ignores resSeq:

      […A27][Lig28],[K29,…,W40],[D45,…,W50],[CYSP51],[GDP52]

      notice that because phosphorylated CYSP51 didn’t get a bond in the topology, it’s considered a ligand

    • ‘resSeq_bonds’

      breaks at resSeq jumps and at missing bonds

    • ‘lig_resSeq+’

      Like resSeq+ but put’s any non-AA residue into it’s own fragment. […A27][Lig28],[K29,…,W40],[D45,…,W50,CYSP51],[GDP52] Also check maxjump

    • ‘chains’

      breaks into chains of the PDB file/entry

    • None or ‘None’

      all residues are in one fragment, fragment 0

  • fragment_breaker_fullresname (list) – list of full residue names. Example [GLU30] will be used to break fragments, so that [R1, R2, … GLU30,…R10, R11] will be broken into [R1, R2, …], [GLU30,…,R10,R11]

  • atoms (boolean, optional) – Instead of returning residue indices, return atom indices

  • join_fragments (list of lists) – After getting the fragments with method, join these fragments again. The use case are hard cases where no method gets it right and some post-processing is needed. Duplicate entries in any inner list will be removed. One fragment idx cannot appear in more than one inner list, otherwise an exception is thrown

  • verbose (boolean, optional) – Be verbose

  • salt (list, default is [“Na+”,”Cl+”, “NA”,”CL”]) – Residues that match these residue names and have only one atom will be put together in the last fragment. Use salt = [] to deactivate. Doesn’t apply for methods involving bonds or None and chains

  • water (bool, default is True) – Put water on its own fragment. Doesn’t apply for methods involving bonds or None and chains

  • maxjump (int or None, default is 500) – The maximum allowed positive sequence-jump in the ‘resSeq+’ methods, i.e. don’t join ALA500 with GLU551 even though the jump in sequence is positive None means no limit for positive jumps

  • kwargs_residues_from_descriptors (optional) – additional arguments, see residues_from_descriptors

Other Parameters:
  • pick_this_fragment_by_default (None or integer.) – Pick this fragment without asking in case of ambiguity. If None, the user will we prompted

  • fragment_names – list of strings providing informative names for the input fragments

  • additional_resnaming_dicts (dict of dicts, default is None) – Dictionary of dictionaries. Lower-level dicts are keyed with residue indices and valued with additional residue names. Higher-level keys can be whatever. Use case is e.g. if “R131” needs to be disambiguated bc. it pops up in many fragments. You can pass {“GPCR”:{895:”3.50”, …} here and that label will be displayed next to the residue. mdciao.cli methods use this.

  • just_inform (bool, default is False) – Just inform about the AAs, don’t ask for a selection

  • extra_string_info (str,) – string with any additional info to be printed in case of ambiguity

Returns:

Each array within the list has the residue indices of each fragment. These fragments do not have overlap. Their union contains all indices

Return type:

List of integer arrays