arkouda.alignment

Module Contents

Functions

align(*args)

Map multiple arrays of sparse identifiers to a common 0-up index.

find(query, space)

Return indices of query items in a search list of items (-1 if not found).

in1d_intervals(vals, intervals[, symmetric])

Test each value for membership in any of a set of half-open (pythonic)

interval_lookup(keys, values, arguments[, fillvalue, ...])

Apply a function defined over intervals to an array of arguments.

is_cosorted(arrays)

Return True iff the arrays are cosorted, i.e., if the arrays were columns in a table

left_align(left, right)

Map two arrays of sparse identifiers to the 0-up index set implied by the left array,

lookup(keys, values, arguments[, fillvalue])

Apply the function defined by the mapping keys --> values to arguments.

right_align(left, right)

Map two arrays of sparse values to the 0-up index set implied by the right array,

search_intervals(vals, intervals[, tiebreak, hierarchical])

Given an array of query vals and non-overlapping, closed intervals, return

unsqueeze(p)

zero_up(vals)

Map an array of sparse values to 0-up indices.

exception arkouda.alignment.NonUniqueError[source]

Bases: ValueError

Inappropriate argument value (of correct type).

arkouda.alignment.align(*args)[source]

Map multiple arrays of sparse identifiers to a common 0-up index.

Parameters:

*args (pdarrays or sequences of pdarrays) – Arrays to map to dense index

Returns:

aligned – Arrays with values replaced by 0-up indices

Return type:

list of pdarrays

arkouda.alignment.find(query, space)[source]

Return indices of query items in a search list of items (-1 if not found).

Parameters:
  • query ((sequence of) array-like) – The items to search for. If multiple arrays, each “row” is an item.

  • space ((sequence of) array-like) – The set of items in which to search. Must have same shape/dtype as query.

Returns:

indices – For each item in query, its index in space or -1 if not found.

Return type:

pdarray, int64

arkouda.alignment.in1d_intervals(vals, intervals, symmetric=False)[source]

Test each value for membership in any of a set of half-open (pythonic) intervals.

Parameters:
  • vals (pdarray(int, float)) – Values to test for membership in intervals

  • intervals (2-tuple of pdarrays) – Non-overlapping, half-open intervals, as a tuple of (lower_bounds_inclusive, upper_bounds_exclusive)

  • symmetric (bool) – If True, also return boolean pdarray indicating which intervals contained one or more query values.

Returns:

  • pdarray(bool) – Array of same length as <vals>, True if corresponding value is included in any of the ranges defined by (low[i], high[i]) inclusive.

  • pdarray(bool) (if symmetric=True) – Array of same length as number of intervals, True if corresponding interval contains any of the values in <vals>.

Notes

First return array is equivalent to the following:

((vals >= intervals[0][0]) & (vals < intervals[1][0])) | ((vals >= intervals[0][1]) & (vals < intervals[1][1])) | … ((vals >= intervals[0][-1]) & (vals < intervals[1][-1]))

But much faster when testing many ranges.

Second (optional) return array is equivalent to:

((intervals[0] <= vals[0]) & (intervals[1] > vals[0])) | ((intervals[0] <= vals[1]) & (intervals[1] > vals[1])) | … ((intervals[0] <= vals[-1]) & (intervals[1] > vals[-1]))

But much faster when vals is non-trivial size.

arkouda.alignment.interval_lookup(keys, values, arguments, fillvalue=-1, tiebreak=None, hierarchical=False)[source]

Apply a function defined over intervals to an array of arguments.

Parameters:
  • keys (2-tuple of (sequences of) pdarrays) – Tuple of closed intervals expressed as (lower_bounds_inclusive, upper_bounds_inclusive). Must have same dtype(s) as vals.

  • values (pdarray) – Function value to return for each entry in keys.

  • arguments ((sequences of) pdarray) – Values to search for in intervals. If multiple arrays, each “row” is an item.

  • fillvalue (scalar) – Default value to return when argument is not in any interval.

  • tiebreak ((optional) pdarray, numeric) – When an argument is present in more than one key interval, the interval with the lowest tiebreak value will be chosen. If no tiebreak is given, the first valid key interval will be chosen.

Returns:

Value of function corresponding to the keys interval containing each argument, or fillvalue if argument not in any interval.

Return type:

pdarray

arkouda.alignment.is_cosorted(arrays)[source]

Return True iff the arrays are cosorted, i.e., if the arrays were columns in a table then the rows are sorted.

Parameters:

arrays (list-like of pdarrays) – Arrays to check for cosortedness

Returns:

True iff arrays are cosorted.

Return type:

bool

Raises:
  • ValueError – Raised if arrays are not the same length

  • TypeError – Raised if arrays is not a list-like of pdarrays

arkouda.alignment.left_align(left, right)[source]

Map two arrays of sparse identifiers to the 0-up index set implied by the left array, discarding values from right that do not appear in left.

arkouda.alignment.lookup(keys, values, arguments, fillvalue=-1)[source]

Apply the function defined by the mapping keys –> values to arguments.

Parameters:
  • keys ((sequence of) array-like) – The domain of the function. Entries must be unique (if a sequence of arrays is given, each row is treated as a tuple-valued entry).

  • values (pdarray) – The range of the function. Must be same length as keys.

  • arguments ((sequence of) array-like) – The arguments on which to evaluate the function. Must have same dtype (or tuple of dtypes, for a sequence) as keys.

  • fillvalue (scalar) – The default value to return for arguments not in keys.

Returns:

evaluated – The result of evaluating the function over arguments.

Return type:

pdarray

Notes

While the values cannot be Strings (or other complex objects), the same result can be achieved by passing an arange as the values, then using the return as indices into the desired object.

Examples

# Lookup numbers by two-word name >>> keys1 = ak.array([‘twenty’ for _ in range(5)]) >>> keys2 = ak.array([‘one’, ‘two’, ‘three’, ‘four’, ‘five’]) >>> values = ak.array([21, 22, 23, 24, 25]) >>> args1 = ak.array([‘twenty’, ‘thirty’, ‘twenty’]) >>> args2 = ak.array([‘four’, ‘two’, ‘two’]) >>> aku.lookup([keys1, keys2], values, [args1, args2]) array([24, -1, 22])

# Other direction requires an intermediate index >>> revkeys = values >>> revindices = ak.arange(values.size) >>> revargs = ak.array([24, 21, 22]) >>> idx = aku.lookup(revkeys, revindices, revargs) >>> keys1[idx], keys2[idx] (array([‘twenty’, ‘twenty’, ‘twenty’]), array([‘four’, ‘one’, ‘two’]))

arkouda.alignment.right_align(left, right)[source]

Map two arrays of sparse values to the 0-up index set implied by the right array, discarding values from left that do not appear in right.

Parameters:
  • left (pdarray or a sequence of pdarrays) – Left-hand identifiers

  • right (pdarray or a sequence of pdarrays) – Right-hand identifiers that define the index

Returns:

  • keep (pdarray, bool) – Logical index of left-hand values that survived

  • aligned ((pdarray, pdarray)) – Left and right arrays with values replaced by 0-up indices

arkouda.alignment.search_intervals(vals, intervals, tiebreak=None, hierarchical=True)[source]

Given an array of query vals and non-overlapping, closed intervals, return the index of the best (see tiebreak) interval containing each query value, or -1 if not present in any interval.

Parameters:
  • vals ((sequence of) pdarray(int, uint, float)) – Values to search for in intervals. If multiple arrays, each “row” is an item.

  • intervals (2-tuple of (sequences of) pdarrays) – Non-overlapping, half-open intervals, as a tuple of (lower_bounds_inclusive, upper_bounds_exclusive) Must have same dtype(s) as vals.

  • tiebreak ((optional) pdarray, numeric) – When a value is present in more than one interval, the interval with the lowest tiebreak value will be chosen. If no tiebreak is given, the first containing interval will be chosen.

  • hierarchical (boolean) – When True, sequences of pdarrays will be treated as components specifying a single dimension (i.e. hierarchical) When False, sequences of pdarrays will be specifying multi-dimensional intervals

Returns:

idx – Index of interval containing each query value, or -1 if not found

Return type:

pdarray(int64)

Notes

The return idx satisfies the following condition:

present = idx > -1 ((intervals[0][idx[present]] <= vals[present]) &

(intervals[1][idx[present]] >= vals[present])).all()

Examples

>>> starts = (ak.array([0, 5]), ak.array([0, 11]))
>>> ends = (ak.array([5, 9]), ak.array([10, 20]))
>>> vals = (ak.array([0, 0, 2, 5, 5, 6, 6, 9]), ak.array([0, 20, 1, 5, 15, 0, 12, 30]))
>>> ak.search_intervals(vals, (starts, ends), hierarchical=False)
array([0 -1 0 0 1 -1 1 -1])
>>> ak.search_intervals(vals, (starts, ends))
array([0 0 0 0 1 1 1 -1])
>>> bi_starts = ak.bigint_from_uint_arrays([ak.cast(a, ak.uint64) for a in starts])
>>> bi_ends = ak.bigint_from_uint_arrays([ak.cast(a, ak.uint64) for a in ends])
>>> bi_vals = ak.bigint_from_uint_arrays([ak.cast(a, ak.uint64) for a in vals])
>>> bi_starts, bi_ends, bi_vals
(array(["0" "92233720368547758091"]),
array(["92233720368547758090" "166020696663385964564"]),
array(["0" "20" "36893488147419103233" "92233720368547758085" "92233720368547758095"
"110680464442257309696" "110680464442257309708" "166020696663385964574"]))
>>> ak.search_intervals(bi_vals, (bi_starts, bi_ends))
array([0 0 0 0 1 1 1 -1])
arkouda.alignment.unsqueeze(p)[source]
arkouda.alignment.zero_up(vals)[source]

Map an array of sparse values to 0-up indices.

Parameters:

vals (pdarray) – Array to map to dense index

Returns:

aligned – Array with values replaced by 0-up indices

Return type:

pdarray