arkouda.join

Module Contents

Functions

compute_join_size(→ Tuple[int, int])

Compute the internal size of a hypothetical join between a and b. Returns

gen_ranges(starts, ends[, stride, return_lengths])

Generate a segmented array of variable-length, contiguous ranges between pairs of

join_on_eq_with_dt(...)

Performs an inner-join on equality between two integer arrays where

arkouda.join.compute_join_size(a: arkouda.pdarrayclass.pdarray, b: arkouda.pdarrayclass.pdarray) Tuple[int, int][source]

Compute the internal size of a hypothetical join between a and b. Returns both the number of elements and number of bytes required for the join.

arkouda.join.gen_ranges(starts, ends, stride=1, return_lengths=False)[source]

Generate a segmented array of variable-length, contiguous ranges between pairs of start- and end-points.

Parameters:
  • starts (pdarray, int64) – The start value of each range

  • ends (pdarray, int64) – The end value (exclusive) of each range

  • stride (int) – Difference between successive elements of each range

  • return_lengths (bool, optional) – Whether or not to return the lengths of each segment. Default False.

Returns:

  • segments (pdarray, int64) – The starting index of each range in the resulting array

  • ranges (pdarray, int64) – The actual ranges, flattened into a single array

  • lengths (pdarray, int64) – The lengths of each segment. Only returned if return_lengths=True.

arkouda.join.join_on_eq_with_dt(a1: arkouda.pdarrayclass.pdarray, a2: arkouda.pdarrayclass.pdarray, t1: arkouda.pdarrayclass.pdarray, t2: arkouda.pdarrayclass.pdarray, dt: int | numpy.int64, pred: str, result_limit: int | numpy.int64 = 1000) Tuple[arkouda.pdarrayclass.pdarray, arkouda.pdarrayclass.pdarray][source]

Performs an inner-join on equality between two integer arrays where the time-window predicate is also true

Parameters:
  • a1 (pdarray, int64) – pdarray to be joined

  • a2 (pdarray, int64) – pdarray to be joined

  • t1 (pdarray) – timestamps in millis corresponding to the a1 pdarray

  • t2 (pdarray) – timestamps in millis corresponding to the a2 pdarray

  • dt (Union[int,np.int64]) – time delta

  • pred (str) – time window predicate

  • result_limit (Union[int,np.int64]) – size limit for returned result

Returns:

  • result_array_one (pdarray, int64) – a1 indices where a1 == a2

  • result_array_one (pdarray, int64) – a2 indices where a2 == a1

Raises:
  • TypeError – Raised if a1, a2, t1, or t2 is not a pdarray, or if dt or result_limit is not an int

  • ValueError – if a1, a2, t1, or t2 dtype is not int64, pred is not ‘true_dt’, ‘abs_dt’, or ‘pos_dt’, or result_limit is < 0