mdciao.utils.lists

Miscellaneous operations on list or list-like objects .. autosummary:

:nosignatures:
:toctree: generated/

Functions

assert_min_len(input_iterable[, min_len])

Checks if an iterable satisfies the criteria of minimum length. (Default minimum length is 2). :Parameters: * input_iterable (numpy array, list of list) -- example np.zeros((2,1,1) or [[1,2],[3,4]] when min_len = 2 * min_len (minimum length which the iterable should satisfy (Default is 2)).

assert_no_intersection(list_of_lists_of_integers)

Assert if two or more lists contain the same integer(s)

contiguous_ranges(list_in)

For every unique entry in list_in return the contiguous ranges in list

does_not_contain_strings(iterable)

Checks if iterable has any string element, returns False if it contains atleast one string

exclude_same_fragments_from_residx_pairlist(...)

If the members of the pair belong to the same fragment, exclude them from pairlist.

find_parent_list(sublists, parent_lists)

For each sublist, return the index of the parent list

force_iterable(var)

Forces var to be iterable, if not already

hash_list(ilist)

Try to hash all the objects of a list (regardless of type) into one hash

idx_at_fraction(val_desc_order, frac)

Index of val_desc_order where np.cumsum(val)/np.sum(val)>= frac for the first time

in_what_N_fragments(idxs, fragment_list)

For each element of idxs, return the index of "fragments" in which it appears

in_what_fragment(residx, ...[, fragment_names])

For the residue id, returns the name(if provided) or the index of the "fragment" in which it appears

is_iterable(var)

Checks if the input is an iterable or not

join_lists(lists, idxs_of_lists_to_join)

Provided a list of lists, join them following idxs_of_lists_to_join

put_this_idx_first_in_pair(idx, pair)

Returns the original pair if the value already appears first, else returns reversed pair :Parameters: * idx (value which needs to be brought in the first place (not the index but value itself)) * pair (list) -- pair of values as a list

rangeexpand(txt)

For a given integer range or multiple integer ranges, returns a list of individual integers.

re_warp(array_in, lengths)

Return iterable ::py:obj:array_in as a list of arrays, each

remove_from_lists(list_of_lists, remove_these)

Wraps safely around numpy.setdiff1d not returning empty lists

unique_list_of_iterables_by_tuple_hashing(ilist)

Returns the unique entries(if there are duplicates) from a list of iterables.

unique_product_w_intersection(a1, a2)

Fast way to create the product of two intersecting sets without repeated/unwanted pairs

window_average_fast(input_array_y[, ...])

Returns the moving average using numpy.convolve

mdciao.utils.lists.assert_min_len(input_iterable, min_len=2)

Checks if an iterable satisfies the criteria of minimum length. (Default minimum length is 2). :Parameters: * input_iterable (numpy array, list of list) – example np.zeros((2,1,1) or [[1,2],[3,4]] when min_len = 2

  • min_len (minimum length which the iterable should satisfy (Default is 2))

Return type:

Prints error if each item within the iterable has lesser number of elements than min_len

mdciao.utils.lists.assert_no_intersection(list_of_lists_of_integers, word='iterables')

Assert if two or more lists contain the same integer(s)

Parameters:

list_of_lists_of_integers (list of lists) – Empty lists are considered not intersecting and won’t raise AssertionError, though this is an interesting read: https://www.coopertoons.com/education/emptyclass_intersection/emptyclass_union_intersection.html

Return type:

Raises AssertionError if inner lists have the same integer, else no output

mdciao.utils.lists.contiguous_ranges(list_in)

For every unique entry in list_in return the contiguous ranges in list

Parameters:

list_in (list)

Returns:

ranges – The keys are with unique entries of list_in, values are the ranges in which the entry appears

Return type:

dict

mdciao.utils.lists.does_not_contain_strings(iterable)

Checks if iterable has any string element, returns False if it contains atleast one string

Parameters:

iterable (integer, float, string or any combination thereof)

Returns:

True if iterable does not contain any string, else False

Return type:

boolean

mdciao.utils.lists.exclude_same_fragments_from_residx_pairlist(pairlist, fragments, return_excluded_idxs=False)

If the members of the pair belong to the same fragment, exclude them from pairlist.

Parameters:
  • pairlist (list of iterables) – each iterable within the list should be a pair.

  • fragments (list of iterables) – each inner list should have residue indexes that form a fragment

  • return_excluded_idxs (boolean) – True if index of excluded pair is needed as an output. (Default is False).

Returns:

pairs that don’t belong to the same fragment, or index of the excluded pairs if return_excluded_idxs is True

Return type:

list

mdciao.utils.lists.find_parent_list(sublists, parent_lists)

For each sublist, return the index of the parent list

Parameters:
  • sublists (list of iterables)

  • parent_lists (list of iterables)

Returns:

  • parents_by_child (list) – A list of len(sublists) with indices indicating which element of parent_lists each sublist is a subset of. If a sublist doesn’t have a parent, its parent is None

  • child_by_parent (dict) – A dictionary keyed by parent idx and valued with idxs of their children

mdciao.utils.lists.force_iterable(var)

Forces var to be iterable, if not already

Parameters:

var (integer, float, string , list)

Returns:

var as iterable

Return type:

iterable

mdciao.utils.lists.hash_list(ilist)

Try to hash all the objects of a list (regardless of type) into one hash

Parameters:
  • iobj (anthing)

  • Returns (hashed object)

  • ——-

mdciao.utils.lists.idx_at_fraction(val_desc_order, frac)

Index of val_desc_order where np.cumsum(val)/np.sum(val)>= frac for the first time

Parameters:
  • val_desc_order (array like of floats) – The values that the determine the sum of which a fraction will be taken The have to be in descending order

  • frac (float) – The target fraction of sum(val) that is needed

Returns:

n – Index of val where the fraction is attained for the first time. For the number of entries of val, just use n+1

Return type:

int

mdciao.utils.lists.in_what_N_fragments(idxs, fragment_list)

For each element of idxs, return the index of “fragments” in which it appears

Parameters:
  • idxs (integer, float, or iterable thereof)

  • fragment_list (iterable of iterables) – iterable of iterables containing integers or floats

Returns:

list of length len(idxs) containing an iterable with the indices of ‘fragments’ in which that index appears

Return type:

list

mdciao.utils.lists.in_what_fragment(residx, list_of_nonoverlapping_lists_of_residxs, fragment_names=None)

For the residue id, returns the name(if provided) or the index of the “fragment” in which it appears

Parameters:
  • residx (int) – residue index

  • list_of_nonoverlapping_lists_of_residxs (list) – list of integer list of non overlapping ids

  • fragment_names ((optional) list of strings) – fragment names for each list in list_of_nonoverlapping_lists_of_residxs

Returns:

returns the name (if names is provided) otherwise returns index of the “fragment” in which the residue index appears

Return type:

integer or string

mdciao.utils.lists.is_iterable(var)

Checks if the input is an iterable or not

Parameters:

var (integer, float, string, list)

Returns:

Returns ‘True’ if var is iterable else False

Return type:

boolean

mdciao.utils.lists.join_lists(lists, idxs_of_lists_to_join)

Provided a list of lists, join them following idxs_of_lists_to_join

Parameters:
  • lists (iterable of iterables) – The lists to be joined

  • idxs_of_lists_to_join (iterable of iterables containing integers) –

    The lists to join. These 3 things will be done before using this array
    • remove duplicate entries in each iterable

    • sort the entries in each iterable by ascending order

    • assert there is no overlap between iterables

Returns:

joined_listslists joined following the criterion of idxs_of_lists_to_join Once the new iterables have been created by joining the initial interables, they will be re-ordered by ascending first element

Return type:

iterable of iterables

mdciao.utils.lists.put_this_idx_first_in_pair(idx, pair)

Returns the original pair if the value already appears first, else returns reversed pair :Parameters: * idx (value which needs to be brought in the first place (not the index but value itself))

  • pair (list) – pair of values as a list

Return type:

pair

mdciao.utils.lists.rangeexpand(txt)

For a given integer range or multiple integer ranges, returns a list of individual integers. Example- “1-2,3-4” will return [1,2,3,4]

Parameters:

txt (string) – string of integers or integer range separated by “,”

Returns:

list of integers

Return type:

list

mdciao.utils.lists.re_warp(array_in, lengths)
Return iterable ::py:obj:array_in as a list of arrays, each

one with the length specified in lengths

Parameters:
  • array_in (any iterable) – Iterable to be re_warped

  • lengths (int or iterable of integers) – Lengths of the individual elements of the returned array. If only one int is parsed, all lengths will be that int. Special cases:

    • more lengths than needed are parsed: the last elements of the returned value are empty

    until all lengths have been used * less lengths than array_in could take: only the lenghts specified are returned in the warped list, the rest is unreturned

Returns:

warped

Return type:

list

mdciao.utils.lists.remove_from_lists(list_of_lists, remove_these)

Wraps safely around numpy.setdiff1d not returning empty lists

Parameters:
  • list_of_lists (iterable of iterables)

  • remove_these (iterable)

Returns:

clean_list

Return type:

list

mdciao.utils.lists.unique_list_of_iterables_by_tuple_hashing(ilist, return_idxs=False, ignore_order=False)

Returns the unique entries(if there are duplicates) from a list of iterables.

Default is to take order into account, i.e. [[0,1],[1,0]] are considered different iterables

If ilist contains non-iterables, they will be turned into iterables, s.t. 1==[1]==np.array(1) and ‘A’==[‘A’]. They will also be returned as iterables

Parameters:
  • ilist (list of iterables) – list of iterables with redundant entries (redundant in the list, not in entries)

  • return_idxs (boolean) – ‘True’ if required to return indices instead of unique list. (Default is False).

  • ignore_order (bool, default is False) – ignore order, s.t. [0,1] and [1,0] are considered equal. Only the first instance ([0,1]) is kept

Returns:

result – list of unique iterables or indices of ‘ilist’ where the unique entries are

Return type:

list

mdciao.utils.lists.unique_product_w_intersection(a1, a2)

Fast way to create the product of two intersecting sets without repeated/unwanted pairs

Consider that >>> list(itertools.product([0,1,2,3],[2,3,4,5])) [(0, 2),

(0, 3), (0, 4), (0, 5), (1, 2), (1, 3), (1, 4), (1, 5), (2, 2), (2, 3), (2, 4), (2, 5), (3, 2), (3, 3), (3, 4), (3, 5)]

Has the repeated/unwanted pairs (2,2),(3,3),(3,2) which need to be taken out a posteriori by comparing pairs.

The unique_list_of_iterables_by_tuple_hashing method accepts also arrays (since pairlists may not necessarily have been generated as tuples, but also as np.arrays), s.t. the arrays need to be casted into tuples before hashing and one comparison per pair (grows quadratically)

>>> a1 = np.arange(200)
>>> a2 = np.arange(195,300)
>>> pairs = np.array(list(itertools.product(a1,a2)))
>>> %timeit mdciao.utils.lists.unique_list_of_iterables_by_tuple_hashing(slow)
2.83 s ± 170 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Whereas >>> %timeit mdciao.utils.lists.unique_product_w_intersection(a1,a2) 47 ms ± 394 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

For reference >>> %timeit list(itertools.product(a1,a2)) 783 µs ± 5.37 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

I.e. clearly, for non-intersecting sets a1 and a2 without unwanted/repeated pairs, it’s always better to use itertools.product directly

Parameters:
  • a1 (iterable) – The integers of the set1

  • a2 (iterable) – The integers of the set2

Returns:

pairlist – The pairlist product of a1 and a2 without self-pairs (ii,ii) and the only (ii,jj) (not (jj,ii))

Return type:

np.ndarray

mdciao.utils.lists.window_average_fast(input_array_y, half_window_size=2)

Returns the moving average using numpy.convolve

Parameters:
  • input_array_y (array) – numpy array for which moving average should be calculated

  • half_window_size (int) – the actual window size will be 2 * half_window_size + 1. Example- when half window size = 2, moving average calculation will use window=5

Return type:

array