Array Set Operations

Following numpy.lib.arraysetops, arkouda supports parallel, distributed set operations using pdarray objects.

The unique function effectively converts a pdarray to a set:

arkouda.unique(pda: groupable, return_groups: bool = False, assume_sorted: bool = False, return_indices: bool = False) groupable | Tuple[groupable, pdarray, pdarray, int][source]

Find the unique elements of an array.

Returns the unique elements of an array, sorted if the values are integers. There is an optional output in addition to the unique elements: the number of times each unique value comes up in the input array.

Parameters:
  • pda ((list of) pdarray, Strings, or Categorical) – Input array.

  • return_groups (bool, optional) – If True, also return grouping information for the array.

  • return_indices (bool, optional) – Only applicable if return_groups is True. If True, return unique key indices along with other groups

  • assume_sorted (bool, optional) – If True, assume pda is sorted and skip sorting step

Returns:

  • unique ((list of) pdarray, Strings, or Categorical) – The unique values. If input dtype is int64, return values will be sorted.

  • permutation (pdarray, optional) – Permutation that groups equivalent values together (only when return_groups=True)

  • segments (pdarray, optional) – The offset of each group in the permuted array (only when return_groups=True)

Raises:
  • TypeError – Raised if pda is not a pdarray or Strings object

  • RuntimeError – Raised if the pdarray or Strings dtype is unsupported

Notes

For integer arrays, this function checks to see whether pda is sorted and, if so, whether it is already unique. This step can save considerable computation. Otherwise, this function will sort pda.

Examples

>>> A = ak.array([3, 2, 1, 1, 2, 3])
>>> ak.unique(A)
array([1, 2, 3])
arkouda.in1d(pda1: pdarray | Strings | Categorical | Sequence[pdarray | Strings | Categorical], pda2: pdarray | Strings | Categorical | Sequence[pdarray | Strings | Categorical], assume_unique: bool = False, symmetric: bool = False, invert: bool = False) pdarray | Strings | Categorical | Sequence[pdarray | Strings | Categorical][source]

Test whether each element of a 1-D array is also present in a second array.

Returns a boolean array the same length as pda1 that is True where an element of pda1 is in pda2 and False otherwise.

Support multi-level – test membership of rows of a in the set of rows of b.

Parameters:
  • a (list of pdarrays, pdarray, Strings, or Categorical) – Rows are elements for which to test membership in b

  • b (list of pdarrays, pdarray, Strings, or Categorical) – Rows are elements of the set in which to test membership

  • assume_unique (bool) – If true, assume rows of a and b are each unique and sorted. By default, sort and unique them explicitly.

  • symmetric (bool) – Return in1d(pda1, pda2), in1d(pda2, pda1) when pda1 and 2 are single items.

  • invert (bool, optional) – If True, the values in the returned array are inverted (that is, False where an element of pda1 is in pda2 and True otherwise). Default is False. ak.in1d(a, b, invert=True) is equivalent to (but is faster than) ~ak.in1d(a, b).

Return type:

True for each row in a that is contained in b

Return Type

pdarray, bool

Notes

Only works for pdarrays of int64 dtype, float64, Strings, or Categorical

arkouda.union1d(pda1: pdarray | Strings | Categorical | Sequence[pdarray | Strings | Categorical], pda2: pdarray | Strings | Categorical | Sequence[pdarray | Strings | Categorical]) pdarray | Strings | Categorical | Sequence[pdarray | Strings | Categorical][source]

Find the union of two arrays/List of Arrays.

Return the unique, sorted array of values that are in either of the two input arrays.

Parameters:
  • pda1 (pdarray/Sequence[pdarray, Strings, Categorical]) – Input array/Sequence of groupable objects

  • pda2 (pdarray/List) – Input array/sequence of groupable objects

Returns:

Unique, sorted union of the input arrays.

Return type:

pdarray/groupable

Raises:
  • TypeError – Raised if either pda1 or pda2 is not a pdarray

  • RuntimeError – Raised if the dtype of either array is not supported

Notes

ak.union1d is not supported for bool or float64 pdarrays

Examples

>>>
# 1D Example
>>> ak.union1d(ak.array([-1, 0, 1]), ak.array([-2, 0, 2]))
array([-2, -1, 0, 1, 2])
#Multi-Array Example
>>> a = ak.arange(1, 6)
>>> b = ak.array([1, 5, 3, 4, 2])
>>> c = ak.array([1, 4, 3, 2, 5])
>>> d = ak.array([1, 2, 3, 5, 4])
>>> multia = [a, a, a]
>>> multib = [b, c, d]
>>> ak.union1d(multia, multib)
[array[1, 2, 2, 3, 4, 4, 5, 5], array[1, 2, 5, 3, 2, 4, 4, 5], array[1, 2, 4, 3, 5, 4, 2, 5]]
arkouda.intersect1d(pda1: pdarray | Strings | Categorical | Sequence[pdarray | Strings | Categorical], pda2: pdarray | Strings | Categorical | Sequence[pdarray | Strings | Categorical], assume_unique: bool = False) pdarray | Strings | Categorical | Sequence[pdarray | Strings | Categorical][source]

Find the intersection of two arrays.

Return the sorted, unique values that are in both of the input arrays.

Parameters:
  • pda1 (pdarray/Sequence[pdarray, Strings, Categorical]) – Input array/Sequence of groupable objects

  • pda2 (pdarray/List) – Input array/sequence of groupable objects

  • assume_unique (bool) – If True, the input arrays are both assumed to be unique, which can speed up the calculation. Default is False.

Returns:

Sorted 1D array/List of sorted pdarrays of common and unique elements.

Return type:

pdarray/groupable

Raises:
  • TypeError – Raised if either pda1 or pda2 is not a pdarray

  • RuntimeError – Raised if the dtype of either pdarray is not supported

Notes

ak.intersect1d is not supported for bool or float64 pdarrays

Examples

>>>
# 1D Example
>>> ak.intersect1d([1, 3, 4, 3], [3, 1, 2, 1])
array([1, 3])
# Multi-Array Example
>>> a = ak.arange(5)
>>> b = ak.array([1, 5, 3, 4, 2])
>>> c = ak.array([1, 4, 3, 2, 5])
>>> d = ak.array([1, 2, 3, 5, 4])
>>> multia = [a, a, a]
>>> multib = [b, c, d]
>>> ak.intersect1d(multia, multib)
[array([1, 3]), array([1, 3]), array([1, 3])]
arkouda.setdiff1d(pda1: pdarray | Strings | Categorical | Sequence[pdarray | Strings | Categorical], pda2: pdarray | Strings | Categorical | Sequence[pdarray | Strings | Categorical], assume_unique: bool = False) pdarray | Strings | Categorical | Sequence[pdarray | Strings | Categorical][source]

Find the set difference of two arrays.

Return the sorted, unique values in pda1 that are not in pda2.

Parameters:
  • pda1 (pdarray/Sequence[pdarray, Strings, Categorical]) – Input array/Sequence of groupable objects

  • pda2 (pdarray/List) – Input array/sequence of groupable objects

  • assume_unique (bool) – If True, the input arrays are both assumed to be unique, which can speed up the calculation. Default is False.

Returns:

Sorted 1D array/List of sorted pdarrays of values in pda1 that are not in pda2.

Return type:

pdarray/groupable

Raises:
  • TypeError – Raised if either pda1 or pda2 is not a pdarray

  • RuntimeError – Raised if the dtype of either pdarray is not supported

Notes

ak.setdiff1d is not supported for bool or float64 pdarrays

Examples

>>> a = ak.array([1, 2, 3, 2, 4, 1])
>>> b = ak.array([3, 4, 5, 6])
>>> ak.setdiff1d(a, b)
array([1, 2])
#Multi-Array Example
>>> a = ak.arange(1, 6)
>>> b = ak.array([1, 5, 3, 4, 2])
>>> c = ak.array([1, 4, 3, 2, 5])
>>> d = ak.array([1, 2, 3, 5, 4])
>>> multia = [a, a, a]
>>> multib = [b, c, d]
>>> ak.setdiff1d(multia, multib)
[array([2, 4, 5]), array([2, 4, 5]), array([2, 4, 5])]
arkouda.setxor1d(pda1: pdarray | Strings | Categorical | Sequence[pdarray | Strings | Categorical], pda2: pdarray | Strings | Categorical | Sequence[pdarray | Strings | Categorical], assume_unique: bool = False) pdarray | Strings | Categorical | Sequence[pdarray | Strings | Categorical][source]

Find the set exclusive-or (symmetric difference) of two arrays.

Return the sorted, unique values that are in only one (not both) of the input arrays.

Parameters:
  • pda1 (pdarray/Sequence[pdarray, Strings, Categorical]) – Input array/Sequence of groupable objects

  • pda2 (pdarray/List) – Input array/sequence of groupable objects

  • assume_unique (bool) – If True, the input arrays are both assumed to be unique, which can speed up the calculation. Default is False.

Returns:

Sorted 1D array/List of sorted pdarrays of unique values that are in only one of the input arrays.

Return type:

pdarray/groupable

Raises:
  • TypeError – Raised if either pda1 or pda2 is not a pdarray

  • RuntimeError – Raised if the dtype of either pdarray is not supported

Notes

ak.setxor1d is not supported for bool or float64 pdarrays

Examples

>>> a = ak.array([1, 2, 3, 2, 4])
>>> b = ak.array([2, 3, 5, 7, 5])
>>> ak.setxor1d(a,b)
array([1, 4, 5, 7])
#Multi-Array Example
>>> a = ak.arange(1, 6)
>>> b = ak.array([1, 5, 3, 4, 2])
>>> c = ak.array([1, 4, 3, 2, 5])
>>> d = ak.array([1, 2, 3, 5, 4])
>>> multia = [a, a, a]
>>> multib = [b, c, d]
>>> ak.setxor1d(multia, multib)
[array([2, 2, 4, 4, 5, 5]), array([2, 5, 2, 4, 4, 5]), array([2, 4, 5, 4, 2, 5])]