mdciao.fragments.match_fragments

mdciao.fragments.match_fragments(seq0, seq1, frags0=None, frags1=None, probe=None, verbose=False, shortest=3)

Align fragments of seq0 and seq1 pairwise and return a matrix of scores.

The score is the absolute number matches between two fragments. Depending on how informed the user is about the topologies, their fragments, their similarities, and what the user is trying to do, this absolute measure can be just right or be highly misleading, e.g:

two fragments of ~500 AAs each can score 20 matches “easily”, without this being meaningful

two fragments of 11 AAs each having 10 matches between them are almost identical

however, in absolute terms, the first case has a higher score

If you know what you’re doing, you can specify which one of the sequences is the probe, s.t. the score is divided by the length the fragment of the probe. E.g., if probe =1, it means that you are interested in finding out if fragments of seq1 appear in fragments of seq0, (the ‘target’ sequence), regardless of how long the target fragments are. The score is then normalized to 1, where 1 means you found the entire probe fragment in the target fragment, no matter how long the probe or the target were.

Parameters:

seq0 (str or Topology)
seq1 (str or Topology)
frags0 (list or None, default is None) – If None, get_fragments will be called with the default options to generate a fragment list.
frags1 (list or None, default is None) – If None, get_fragments will be called with the default options to generate a fragment list.
probe (int, default is None) – If None, scores are absolute numbers. If 0, the scores are divided by the seq0’s fragment length. If 1, by seq1’s fragment length. In these cases, the score is always between 0 and 1, regardless how long the probe and the target fragments are.
shortest (int, default is 3) – Fragments of len < shortest won’t produce a score but a np.NaN, s.t. the score doesn’t get highjacked by very small probe fragments, which will always yield relative good scores. Absolute scores (probe = None) are not affected by this.
verbose (bool, default is False) – Be verbose, affects all methods called by the this method as well.

Returns:

score (2D np.ndarray of shape(len(frags0),len(frags1))) – Will be between 0 and 1 if a probe is specified
frags0 (list) – The fragments that were either provided or generated on the fly. Their indices are the row-indices of score
frags1 (list) – The fragments that were either provided or generated on the fly. Their indices are the row-indices of score