mdciao.utils.COM

Functions related to Center-Of-Mass (COM) computations

Functions

geom2COMdist(geom, residue_pairs[, ...])

Returns the time-trace of the distances between pairs of residues' centers-of-mass (COM)

geom2COMxyz(igeom[, residue_idxs])

Returns the time-trace of per-residue center-of-masses (COMs)

geom2max_residue_radius(geom[, ...])

Per-residue maximum radius, i.e. the maximum distance between any atom of the residue and the residue's center of mass.

mdciao.utils.COM.geom2COMdist(geom, residue_pairs, subtract_max_radii=False, low_mem=True, periodic=True, per_residue_unwrap=True) ndarray

Returns the time-trace of the distances between pairs of residues’ centers-of-mass (COM)

The option subtract_max_radii can be used to produce a time-dependent lower bound on the distance between any atoms of each residue pair, i.e. a lower bound on the pairwise-residue distance. This lower bound can then be used to discard any contact between some pairs residues in any frame of geom, without having to compute all pairwise atom distances between the residue_pairs.

Please see below the periodic and per_residue_unwrap for how periodic-boundary-conditions affect this calculation.

Parameters:
  • geom (mdtraj.Trajectory)

  • residue_pairs (iterable of pairs of integers) – zero-indexed pairs of residues

  • subtract_max_radii (bool, default is False) – Subtract the sum of maximum radii (as computed by geom2max_residue_radius) of both residues of each residue pair from the residue-residue COM distance, effectively producing a lower bound for the residue-residue mindist. Please note that this option can produce negative values, meaning the (very normal situation) of two residue COMs being closer to each than the sum their max-radii.

  • low_mem (bool, default is True) – Try to use less memory by using each residue’s maximum radius for all frames (instead of a frame-dependent value) when subtracting pairs of radii from the COM distances. This results in an even lower value for lower bound of the COM-distances, which is itself still a lower-bond. For the purposes of cutoff-thresholding, this “overshooting” has little consequence, see the note below for a benchmark case.

  • periodic (bool, default is True) – Compute COM distances under the minimum image convention. Please see the warning in COMs_xyz wrt residues being split across periodic boundaries into adjacent periodic images and the impact on COM computation.

  • per_residue_unwrap (bool, default is True) – Unwrap residues individually for the purpose of computing the residue COM. This is to deal with the warning in geom2COMxyz. The original coordinates in geom remain unchanged. Since periodic boundary conditions are necessarily used when unwrapping, and the unwrapping doesn’t make the entire geom whole, but only residues, the method will fail if per_residue_unwrap is True but periodic is False, alerting the user of what they are trying to do. See the note below for information on the overhead induced by the per_residue_unwrap.

Note

In a benchmark case of 6K frames and ~100K residue_pairs, after using subtract_max_radii=True, we checked for residue_pairs with a lower-bound for their residue-residue distance smaller than a given cutoff, deeming them potential contacts. As expected, thresholding the values obtained with low_mem=True resulted in longer lists of potential contacts (between 20%-30% longer depending on the cutoff), and accordingly longer computation times when calling mdtraj.compute_contacts on the kept residue_pairs. However, those “longer” lists (and times) are not significant at all when expressed as a percentage of the initial residue_pairs:

  • Using a cutoff of 1 Angstrom when thresholding the mindist values obtained with low_mem=False discarded 99.46% of the original residue_pairs, whereas using the low_mem=True values meant discarding “only” 99.29%.

  • Using a cutoff of 6 Angstrom when thresholding the mindist values obtained with low_mem=False discarded 98.3% of the original residue_pairs, whereas using the low_mem=True values meant discarding “only” 98%.

So there is still a huge reduction in the number of potential contact-pairs when thresholding, from ca 100K residue pairs to between 500 and 2K. For this benchmark system, low_mem=True used ~10GBs of memory and low_mem=False used ~15GB. There’s a table included as a comment in the source code of the method showing the benchmark numbers.

Note

Just how much overhead the per-residue unwrapping adds is hard to know a-priori. In our benchmark system, it can be between 5% or 25% of the overall computation time. It simply depends on how many residues are split across PBCs in how many frames, which depends highly on what post-processing steps the trajectory has undergone. * The 5% overhead is when no residues have to be unwrapped because the pre-processing

has taken care of this, e.g. with gmx trjconv -pbc whole. The 5% is just the time used to check whether the residues need to be unwrapped or not.

  • The 25% overhead is for a plausible worst-case, when the split across PBCs

affects a high number of residues. The system is effectively halved at the plane where most residues are affected simultaneously by the split.

  • For the highly-constructed, and physically inplausible

situation where all are residues need to be unwrapped, the overhead is 70%. This is very unlikely, because it implies that all residues have some atoms just a few Angstrom away from the boundary

In essence, you can always leave per_residue_unwrap=True on unless significant slowdown is noticed. Even better, if you notice a significant slowdown, why not pre-process your trajectory to be whole and centered in the box (which is sometimes called unwrapping), and then there’s no need to use per_residue_unwrap. Note, even in that case a decision needs to be made whether to use PBCs when computing distances between residue COMs

Returns:

COMs_array – contains the time-traces of the distance between residue COMs, optionally minus the sum of each pair of residue radii. Please note that this option can produce negative values, meaning the residue-spheres actually intersect at some point.

Return type:

np.ndarray of shape(geom.n_frames, len(res_pairs))

mdciao.utils.COM.geom2COMxyz(igeom, residue_idxs=None)

Returns the time-trace of per-residue center-of-masses (COMs)

Warning

In some cases, molecules may have residues split across the periodic boundaries, i.e. all atoms are within the periodic unit cell, but atoms of the same residue are scattered in different places inside the cell, near the walls. Then, the residue COM might fall between these fragments, far away from any of the residues’ atoms. In that case, the residue COM is not a useful measure for the residue’s approximate position.

Parameters:
  • igeom (mdtraj.Trajectory)

  • residue_idxs (iterable, default is None) – Residues for which the center of mass will be computed. Default is to compute all residues.

Returns:

COMs

Return type:

numpy.ndarray of shape (igeom.n_frames, len(residue_idxs),3)

mdciao.utils.COM.geom2max_residue_radius(geom, residue_idxs=None, res_COMs=None) ndarray

Per-residue maximum radius, i.e. the maximum distance between any atom of the residue and the residue’s center of mass

Warning

No periodic-boundary-conditions are taken into account, i.e. residues are assumed “whole” or “unwrapped”, and then the mass-weighted average of a residues’s atoms’ cartesian coordinates are computed. If your residues are not “whole”, i.e. atoms o the same residue, then the residue COM might be meaningless (check the warning in geom2COMxyz) and so will be the maximum residue radius.

Parameters:
  • geom (mdtraj.Trajectory)

  • residue_idxs (iterable of ints, default is None) – The indices of the residues for which the residue radius will be computed. If None, all residues will be considered. If residue_idxs is None, then res_COM has to be None as well, else you would be providing res_COMs any information on what residues they belong to.

  • res_COMs (np.ndarray, default is None) – The time-traces of the xyz coordinates of the residue COMs. Has to have shape(geom.n_frames, len(residue_idxs), 3). It will be computed on the fly if None is provided.

Returns:

r – Shape (igeom.n_frames, len(residue_idxs))

Return type:

numpy.ndarray