mdc_neighborhoods.py

Analyse residue neighborhoods using a distance cutoff. residue-residue contacts are reported in terms of theiroverall frequencies and time-traces. A number of files containing plots, tables and data will be generated.

usage: mdc_neighborhoods.py [-h] [-r RESIDUES] [--ctc_cutoff_Ang CTC_CUTOFF_ANG] [--serial_idxs] [--stride STRIDE] [-cc CTC_CONTROL] [--n_nearest N_NEAREST]
                            [--chunksize_in_frames CHUNKSIZE_IN_FRAMES] [--n_smooth_hw N_SMOOTH_HW] [--nlist_cutoff_Ang NLIST_CUTOFF_ANG] [-fr FRAGMENTS [FRAGMENTS ...]]
                            [--fragment_names FRAGMENT_NAMES] [-nf] [--fragment_colors FRAGMENT_COLORS] [--no-sort] [--no-pbc] [-tx TABLE_EXT] [-gx GRAPHIC_EXT] [-GPCR GPCR_UNIPROT]
                            [--save_nomenclature] [-GGN CGN_PDB] [-od OUTPUT_DIR] [-o OUTPUT_DESC] [--t_unit T_UNIT] [--curve_color CURVE_COLOR] [--background BACKGROUND]
                            [--graphic_dpi GRAPHIC_DPI] [-sa] [-nsf] [-nt] [-st] [-d] [--n_cols N_COLS] [--n_jobs N_JOBS] [--pop_N_ctcs] [--ylim_Ang YLIM_ANG] [-ni] [-s SWITCH_OFF_ANG]
                            [-at] [--naive_bonds] [--scheme SCHEME]
                            topology trajectories [trajectories ...]

Positional Arguments

topology

Topology file

trajectories

trajectory file(s)

Named Arguments

-r, --residues

The residues of interest, as coma-separated-values without spaces. The input is very flexible and accepts mixed descriptors and wildcards, eg: “GLU*,ARG*,GDP*,LEU394,380-385” is a valid input. Numbers are interpeted as a residue’s sequence number

(394 in LEU394), unless –serial_idxs is passed as an option.

--ctc_cutoff_Ang, -co

The cutoff distance between two residues for them to be considered in contact. Default is 3.5 Angstrom.

Default: 3.5

--serial_idxs

Interpret the indices of –residues not as their sequence idxs (e.g. 30 for GLU30), but as their serial order in the topology (e.g. 0 for GLU30 if GLU30 is the first residue in the topology). Default is False

Default: False

--stride

Stride down the input trajectoy files by this factor. Default is 1.

Default: 1

-cc, --ctc_control

Control the number of reported contacts. Can be an integer (keep the first n contacts) or a float representing a fraction [0,1] of the totalnumber of contacts.Default is 5.

Default: 5

--n_nearest, -nn

Ignore this many nearest neighbors when computing neighbor lists. ‘Near’ means ‘connected by this many bonds’. Default is 4.

Default: 4

--chunksize_in_frames

Trajectories are read in chunks of this size. Helps with big files and/or large number of contacts when you run into memory problems. Default is 10000

Default: 10000

--n_smooth_hw, -ns

Number of frames one half of the averaging window for the time-traces. Default is 0, which means no averaging.

Default: 0

--nlist_cutoff_Ang

Cutoff for the initial neighborlist. Only atoms that are within this distance in the original reference (the topology file) are considered potential neighbors of the –residues, s.t. non-necessary distances

(e.g. between the receptor’s N-terminus and G-protein) are not even computed. Default is 15 Angstrom.

Default: 15

-fr, --fragments

R|How to sub-divide the topology into fragments. Several options possible. Taking the example sequence: …-A27,Lig28,K29-…-W40,D45-…-W50,CYSP51,GDP52

  • ‘resSeq’

    breaks at jumps in resSeq entry: […A27,Lig28,K29,…,W40],[D45,…,W50,CYSP51,GDP52]

  • ‘resSeq+’

    breaks only at negative jumps in resSeq: […A27,Lig28,K29,…,W40,D45,…,W50,CYSP51,GDP52]

  • ‘bonds’

    breaks when AAs are not connected by bonds, ignores resSeq: […A27][Lig28],[K29,…,W40],[D45,…,W50],[CYSP51],[GDP52] notice that because phosphorylated CYSP51 didn’t get a bond in the topology, it’s considered a ligand

  • ‘resSeq_bonds’

    breaks both at resSeq jumps or missing bond

  • ‘lig_resSeq+

‘ Like resSeq+ but put’s any non-AA residue into

it’s own fragment: […A27][Lig28],[K29,…,W40],[D45,…,W50,CYSP51],[GDP52]

  • ‘chains’

    breaks into chains of the PDB file/entry

  • None or ‘None

‘ all residues are in one fragment, fragment 0
  • ‘consensus’

    If any consensus nomenclature is provided, ask the user for definitions using consensus labels

  • 0-10,15,14 20,21,30-50 51 (example, advanced users only)

    Input arbitrary fragments via their residue serial indices (zero-indexed) using space as separator. Not recommended

. - ‘None’

All residues are in one fragment (fragment 0) Can be harmless or potentially dangerous if residue

labels are repeated.If you are unsure of any of these options, use

the command line tool mdc_fragments.py on your topology file.

Default: [‘lig_resSeq+’]

--fragment_names

Name of the fragments. Leave empty if you want them automatically named. Otherwise, give a quoted list of strings separated by commas, e.g. ‘TM1, TM2, TM3,’

Default: “”

-nf, --no-fragments

Do not use fragments. Default is to use them

Default: True

--fragment_colors
comma-separated vales of the fragment colors.

If only one value, use that color for all fragments Any matplotlib colors can be used. Default is ‘tab:blue’ Why ‘tab’? check https://matplotlib.org/3.1.1/tutorials/colors/colors.html !

Default: “tab:blue”

--no-sort

Don’t sort the residues by their index. Default is to sort them.

Default: True

--no-pbc

Do not consider periodic boundary conditions when computing distances. Default is to consider them

Default: True

-tx, --table_ext

Extension for tabled files (.dat, .txt, .xlsx, .ods). Default is ‘.dat’

Default: “dat”

-gx, --graphic_ext

Extension of the output graphics, default is .pdf

Default: “.pdf”

-GPCR, --GPCR_uniprot

Look for Ballesteros-Weinstein definitions in the GPCRdb using a uniprot code, e.g. adrb2_human. See https://gpcrdb.org/services/ for more details.Default is None.

Default: “None”

--save_nomenclature

Save available nomenclature definitions to disk so that they can be accessed locally in later uses. Default is False

Default: False

-GGN, --CGN_PDB

PDB code for a consensus G-protein nomenclature

Default: “None”

-od, --output_dir

directory to which the results are written. Default is ‘.’

Default: “.”

-o, --output_desc

Descriptor for output files. Default is neighborhood

Default: “neighborhood”

--t_unit

Unit used for the temporal axis, default is ns.

Default: “ns”

--curve_color

Type of color used for the curves. Default is auto. Alternatives are ‘P’ or ‘H’

Default: “auto”

--background, -bg

Type of background when using smoothing windows. Default (True) is to use the unsmoothed curve’s color. A color string e.g. ‘g’ or ‘red’ or ‘gray’ also works, as does an RGB string ‘0.5, 1., 0.5’. Use False for no color.

Default: True

--graphic_dpi

Dots per Inch (DPI) of the graphic output. Only has an effect for bitmap outputs. Default is 150.

Default: 150

-sa, --short_AAs

Use one-letter aminoacid names when possible, e.g. K145 insted of Lys145. Default is False

Default: False

-nsf, --no-same_fragment

Don’t allow contact partners in the same fragment. Default is to allow it.

Default: True

-nt, --no-time-trace

Don’t plot the time-traces of the contacts. Default is to plot them.

Default: True

-st, --save-trajs

Save trajectory data, default is not to save it.

Default: False

-d, --distribution

Plot distance distributions instead of contact bar plots. Default is False.

Default: False

--n_cols

number of columns of the overall plot. Default is 4

Default: 4

--n_jobs

Number of processors to use. The parallelization is done over trajectories and not over contacts, beyond n_jobs>n_trajs parallelization will not have any effect.

Default: 1

--pop_N_ctcs

Separate the plot with the total number contacts from the time-trace plot. Default is False

Default: False

--ylim_Ang

Limit in Angstrom of the y-axis of the time-traces. Default is 10. Switch to any other float or ‘auto’ for automatic scaling

Default: “10”

-ni, -no-interactive

Try not to be interactive. This can make wrong choices for the user, advanced only.

Default: False

-s, --switch_off_Ang

Use a linear switchoff instead of a crisp one. Deafault is None

-at, --atomtypes
Add the atom-types to the frequency bars by ‘hatching’ them.

‘–’ is sidechain-sidechain ‘|’ is backbone-backbone ‘’ is backbone-sidechain ‘/’ is sidechain-backbone

Default is false

Default: False

--naive_bonds, -nb

Build naive, linear bonds between protein residues of the same fragment if mdtraj can’t build them automatically. For more info check mdciao.utils.bonds.top2residue_bond_matrix_naive

Default: False

--scheme

Type of scheme for computing distance between residues. Choices are {‘ca’, ‘closest’, ‘closest-heavy’, ‘sidechain’, ‘sidechain-heavy’}. See mdtraj documentation for more info

Default: “closest-heavy”