arkouda.alignment

Exceptions

NonUniqueError

Inappropriate argument value (of correct type).

Functions

align(*args)

Map multiple arrays of sparse identifiers to a common 0-up index.

find(query, space[, all_occurrences, remove_missing])

Return indices of query items in a search list of items.

in1d_intervals(vals, intervals[, symmetric])

Test each value for membership in any of a set of half-open (pythonic)

interval_lookup(keys, values, arguments[, fillvalue, ...])

Apply a function defined over intervals to an array of arguments.

is_cosorted(arrays)

Return True iff the arrays are cosorted, i.e., if the arrays were columns in a table

left_align(left, right)

Map two arrays of sparse identifiers to the 0-up index set implied by the left array,

lookup(keys, values, arguments[, fillvalue])

Apply the function defined by the mapping keys --> values to arguments.

right_align(left, right)

Map two arrays of sparse values to the 0-up index set implied by the right array,

search_intervals(vals, intervals[, tiebreak, hierarchical])

Given an array of query vals and non-overlapping, closed intervals, return

unsqueeze(p)

zero_up(vals)

Map an array of sparse values to 0-up indices.

Module Contents

exception arkouda.alignment.NonUniqueError[source]

Bases: ValueError

Inappropriate argument value (of correct type).

arkouda.alignment.align(*args)[source]

Map multiple arrays of sparse identifiers to a common 0-up index.

Parameters:

*args (pdarrays or sequences of pdarrays) – Arrays to map to dense index

Returns:

aligned – Arrays with values replaced by 0-up indices

Return type:

list of pdarrays

arkouda.alignment.find(query, space, all_occurrences=False, remove_missing=False)[source]

Return indices of query items in a search list of items.

Parameters:
  • query ((sequence of) array-like) – The items to search for. If multiple arrays, each “row” is an item.

  • space ((sequence of) array-like) – The set of items in which to search. Must have same shape/dtype as query.

  • all_occurrences (bool) – When duplicate terms are present in search space, if all_occurrences is True, return all occurrences found as a SegArray, otherwise return only the first occurrences as a pdarray. Defaults to only finding the first occurrence. Finding all occurrences is not yet supported on sequences of arrays

  • remove_missing (bool) – If all_occurrences is True, remove_missing is automatically enabled. If False, return -1 for any items in query not found in space. If True, remove these and only return indices of items that are found.

Returns:

indices – For each item in query, its index in space. If all_occurrences is False, the return will be a pdarray of the first index where each value in the query appears in the space. If all_occurrences is True, the return will be a SegArray containing every index where each value in the query appears in the space. If all_occurrences is True, remove_missing is automatically enabled. If remove_missing is True, exclude missing values, otherwise return -1.

Return type:

pdarray or SegArray

Examples

>>> select_from = ak.arange(10)
>>> arr1 = select_from[ak.randint(0, select_from.size, 20, seed=10)]
>>> arr2 = select_from[ak.randint(0, select_from.size, 20, seed=11)]
# remove some values to ensure we have some values
# which don't appear in the search space
>>> arr2 = arr2[arr2 != 9]
>>> arr2 = arr2[arr2 != 3]

# find with defaults (all_occurrences and remove_missing both False) >>> ak.find(arr1, arr2) array([-1 -1 -1 0 1 -1 -1 -1 2 -1 5 -1 8 -1 5 -1 -1 11 5 0])

# set remove_missing to True, only difference from default # is missing values are excluded >>> ak.find(arr1, arr2, remove_missing=True)

array([0 1 2 5 8 5 11 5 0])

# set both remove_missing and all_occurrences to True, missing values # will be empty segments >>> ak.find(arr1, arr2, remove_missing=True, all_occurrences=True).to_list() [[],

[], [], [0, 4], [1, 3, 10], [], [], [], [2, 6, 12, 13], [], [5, 7], [], [8, 9, 14], [], [5, 7], [], [], [11, 15], [5, 7], [0, 4]]

arkouda.alignment.in1d_intervals(vals, intervals, symmetric=False)[source]

Test each value for membership in any of a set of half-open (pythonic) intervals.

Parameters:
  • vals (pdarray(int, float)) – Values to test for membership in intervals

  • intervals (2-tuple of pdarrays) – Non-overlapping, half-open intervals, as a tuple of (lower_bounds_inclusive, upper_bounds_exclusive)

  • symmetric (bool) – If True, also return boolean pdarray indicating which intervals contained one or more query values.

Returns:

  • pdarray(bool) – Array of same length as <vals>, True if corresponding value is included in any of the ranges defined by (low[i], high[i]) inclusive.

  • pdarray(bool) (if symmetric=True) – Array of same length as number of intervals, True if corresponding interval contains any of the values in <vals>.

Notes

First return array is equivalent to the following:

((vals >= intervals[0][0]) & (vals < intervals[1][0])) | ((vals >= intervals[0][1]) & (vals < intervals[1][1])) | … ((vals >= intervals[0][-1]) & (vals < intervals[1][-1]))

But much faster when testing many ranges.

Second (optional) return array is equivalent to:

((intervals[0] <= vals[0]) & (intervals[1] > vals[0])) | ((intervals[0] <= vals[1]) & (intervals[1] > vals[1])) | … ((intervals[0] <= vals[-1]) & (intervals[1] > vals[-1]))

But much faster when vals is non-trivial size.

arkouda.alignment.interval_lookup(keys, values, arguments, fillvalue=-1, tiebreak=None, hierarchical=False)[source]

Apply a function defined over intervals to an array of arguments.

Parameters:
  • keys (2-tuple of (sequences of) pdarrays) – Tuple of closed intervals expressed as (lower_bounds_inclusive, upper_bounds_inclusive). Must have same dtype(s) as vals.

  • values (pdarray) – Function value to return for each entry in keys.

  • arguments ((sequences of) pdarray) – Values to search for in intervals. If multiple arrays, each “row” is an item.

  • fillvalue (scalar) – Default value to return when argument is not in any interval.

  • tiebreak ((optional) pdarray, numeric) – When an argument is present in more than one key interval, the interval with the lowest tiebreak value will be chosen. If no tiebreak is given, the first valid key interval will be chosen.

Returns:

Value of function corresponding to the keys interval containing each argument, or fillvalue if argument not in any interval.

Return type:

pdarray

arkouda.alignment.is_cosorted(arrays)[source]

Return True iff the arrays are cosorted, i.e., if the arrays were columns in a table then the rows are sorted.

Parameters:

arrays (list-like of pdarrays) – Arrays to check for cosortedness

Returns:

True iff arrays are cosorted.

Return type:

bool

Raises:
  • ValueError – Raised if arrays are not the same length

  • TypeError – Raised if arrays is not a list-like of pdarrays

arkouda.alignment.left_align(left, right)[source]

Map two arrays of sparse identifiers to the 0-up index set implied by the left array, discarding values from right that do not appear in left.

arkouda.alignment.lookup(keys, values, arguments, fillvalue=-1)[source]

Apply the function defined by the mapping keys –> values to arguments.

Parameters:
  • keys ((sequence of) array-like) – The domain of the function. Entries must be unique (if a sequence of arrays is given, each row is treated as a tuple-valued entry).

  • values (pdarray) – The range of the function. Must be same length as keys.

  • arguments ((sequence of) array-like) – The arguments on which to evaluate the function. Must have same dtype (or tuple of dtypes, for a sequence) as keys.

  • fillvalue (scalar) – The default value to return for arguments not in keys.

Returns:

evaluated – The result of evaluating the function over arguments.

Return type:

pdarray

Notes

While the values cannot be Strings (or other complex objects), the same result can be achieved by passing an arange as the values, then using the return as indices into the desired object.

Examples

# Lookup numbers by two-word name >>> keys1 = ak.array([‘twenty’ for _ in range(5)]) >>> keys2 = ak.array([‘one’, ‘two’, ‘three’, ‘four’, ‘five’]) >>> values = ak.array([21, 22, 23, 24, 25]) >>> args1 = ak.array([‘twenty’, ‘thirty’, ‘twenty’]) >>> args2 = ak.array([‘four’, ‘two’, ‘two’]) >>> aku.lookup([keys1, keys2], values, [args1, args2]) array([24, -1, 22])

# Other direction requires an intermediate index >>> revkeys = values >>> revindices = ak.arange(values.size) >>> revargs = ak.array([24, 21, 22]) >>> idx = aku.lookup(revkeys, revindices, revargs) >>> keys1[idx], keys2[idx] (array([‘twenty’, ‘twenty’, ‘twenty’]), array([‘four’, ‘one’, ‘two’]))

arkouda.alignment.right_align(left, right)[source]

Map two arrays of sparse values to the 0-up index set implied by the right array, discarding values from left that do not appear in right.

Parameters:
  • left (pdarray or a sequence of pdarrays) – Left-hand identifiers

  • right (pdarray or a sequence of pdarrays) – Right-hand identifiers that define the index

Returns:

  • keep (pdarray, bool) – Logical index of left-hand values that survived

  • aligned ((pdarray, pdarray)) – Left and right arrays with values replaced by 0-up indices

arkouda.alignment.search_intervals(vals, intervals, tiebreak=None, hierarchical=True)[source]

Given an array of query vals and non-overlapping, closed intervals, return the index of the best (see tiebreak) interval containing each query value, or -1 if not present in any interval.

Parameters:
  • vals ((sequence of) pdarray(int, uint, float)) – Values to search for in intervals. If multiple arrays, each “row” is an item.

  • intervals (2-tuple of (sequences of) pdarrays) – Non-overlapping, half-open intervals, as a tuple of (lower_bounds_inclusive, upper_bounds_exclusive) Must have same dtype(s) as vals.

  • tiebreak ((optional) pdarray, numeric) – When a value is present in more than one interval, the interval with the lowest tiebreak value will be chosen. If no tiebreak is given, the first containing interval will be chosen.

  • hierarchical (boolean) – When True, sequences of pdarrays will be treated as components specifying a single dimension (i.e. hierarchical) When False, sequences of pdarrays will be specifying multi-dimensional intervals

Returns:

idx – Index of interval containing each query value, or -1 if not found

Return type:

pdarray(int64)

Notes

The return idx satisfies the following condition:

present = idx > -1 ((intervals[0][idx[present]] <= vals[present]) &

(intervals[1][idx[present]] >= vals[present])).all()

Examples

>>> starts = (ak.array([0, 5]), ak.array([0, 11]))
>>> ends = (ak.array([5, 9]), ak.array([10, 20]))
>>> vals = (ak.array([0, 0, 2, 5, 5, 6, 6, 9]), ak.array([0, 20, 1, 5, 15, 0, 12, 30]))
>>> ak.search_intervals(vals, (starts, ends), hierarchical=False)
array([0 -1 0 0 1 -1 1 -1])
>>> ak.search_intervals(vals, (starts, ends))
array([0 0 0 0 1 1 1 -1])
>>> bi_starts = ak.bigint_from_uint_arrays([ak.cast(a, ak.uint64) for a in starts])
>>> bi_ends = ak.bigint_from_uint_arrays([ak.cast(a, ak.uint64) for a in ends])
>>> bi_vals = ak.bigint_from_uint_arrays([ak.cast(a, ak.uint64) for a in vals])
>>> bi_starts, bi_ends, bi_vals
(array(["0" "92233720368547758091"]),
array(["92233720368547758090" "166020696663385964564"]),
array(["0" "20" "36893488147419103233" "92233720368547758085" "92233720368547758095"
"110680464442257309696" "110680464442257309708" "166020696663385964574"]))
>>> ak.search_intervals(bi_vals, (bi_starts, bi_ends))
array([0 0 0 0 1 1 1 -1])
arkouda.alignment.unsqueeze(p)[source]
arkouda.alignment.zero_up(vals)[source]

Map an array of sparse values to 0-up indices.

Parameters:

vals (pdarray) – Array to map to dense index

Returns:

aligned – Array with values replaced by 0-up indices

Return type:

pdarray