Array Set Operations

Following numpy.lib.arraysetops, arkouda supports parallel, distributed set operations using pdarray objects.

The unique function effectively converts a pdarray to a set:

arkouda.unique(pda, return_groups=False, assume_sorted=False, return_indices=False)[source]

Find the unique elements of an array.

Returns the unique elements of an array, sorted if the values are integers. There is an optional output in addition to the unique elements: the number of times each unique value comes up in the input array.

Parameters:
  • pda ((list of) pdarray, Strings, or Categorical) – Input array.

  • return_groups (bool, optional) – If True, also return grouping information for the array.

  • assume_sorted (bool, optional) – If True, assume pda is sorted and skip sorting step

  • return_indices (bool, optional) – Only applicable if return_groups is True. If True, return unique key indices along with other groups

Return type:

Union[pdarray, Strings, Categorical, Sequence[Union[pdarray, Strings, Categorical]], Tuple[Union[pdarray, Strings, Categorical, Sequence[Union[pdarray, Strings, Categorical]]], pdarray, pdarray, int]]

Returns:

  • unique ((list of) pdarray, Strings, or Categorical) – The unique values. If input dtype is int64, return values will be sorted.

  • permutation (pdarray, optional) – Permutation that groups equivalent values together (only when return_groups=True)

  • segments (pdarray, optional) – The offset of each group in the permuted array (only when return_groups=True)

Raises:
  • TypeError – Raised if pda is not a pdarray or Strings object

  • RuntimeError – Raised if the pdarray or Strings dtype is unsupported

Notes

For integer arrays, this function checks to see whether pda is sorted and, if so, whether it is already unique. This step can save considerable computation. Otherwise, this function will sort pda.

Examples

>>> A = ak.array([3, 2, 1, 1, 2, 3])
>>> ak.unique(A)
array([1, 2, 3])
arkouda.in1d(A, B, assume_unique=False, symmetric=False, invert=False)[source]

Test whether each element of a 1-D array is also present in a second array.

Returns a boolean array the same length as A that is True where an element of A is in B and False otherwise.

Supports multi-level, i.e. test if rows of a are in the set of rows of b. But note that multi-dimensional pdarrays are not supported.

Parameters:
  • A (list of pdarrays, pdarray, Strings, or Categorical) – Entries will be tested for membership in B

  • B (list of pdarrays, pdarray, Strings, or Categorical) – The set of elements in which to test membership

  • assume_unique (bool, optional, defaults to False) – If true, assume rows of a and b are each unique and sorted. By default, sort and unique them explicitly.

  • symmetric (bool, optional, defaults to False) – Return in1d(A, B), in1d(B, A) when A and B are single items.

  • invert (bool, optional, defaults to False) – If True, the values in the returned array are inverted (that is, False where an element of A is in B and True otherwise). Default is False. ak.in1d(a, b, invert=True) is equivalent to (but is faster than) ~ak.in1d(a, b).

Returns:

True for each row in a that is contained in b

Return type:

pdarray, bool

Raises:
  • TypeError – Raised if either A or B is not a pdarray, Strings, or Categorical object, or if both are pdarrays and either has rank > 1, or if invert is not a bool

  • RuntimeError – Raised if the dtype of either array is not supported

Examples

>>> ak.in1d(ak.array([-1, 0, 1]), ak.array([-2, 0, 2]))
array([False True False])
>>> ak.in1d(ak.array(['one','two']),ak.array(['two', 'three','four','five']))
array([False True])

Notes

in1d can be considered as an element-wise function version of the python keyword in, for 1-D sequences. in1d(a, b) is logically equivalent to ak.array([item in b for item in a]), but is much faster and scales to arbitrarily large a.

ak.in1d is not supported for bool or float64 pdarrays

arkouda.union1d(A, B)[source]

Find the union of two arrays/List of Arrays.

Return the unique, sorted array of values that are in either of the two input arrays.

Parameters:
Returns:

Unique, sorted union of the input arrays.

Return type:

pdarray/groupable

Raises:
  • TypeError – Raised if either A or B is not a groupable

  • RuntimeError – Raised if the dtype of either input is not supported

Examples

1D Example

>>> ak.union1d(ak.array([-1, 0, 1]), ak.array([-2, 0, 2]))
array([-2 -1 0 1 2])

Multi-Array Example

>>> a = ak.arange(1, 6)
>>> b = ak.array([1, 5, 3, 4, 2])
>>> c = ak.array([1, 4, 3, 2, 5])
>>> d = ak.array([1, 2, 3, 5, 4])
>>> multia = [a, a, a]
>>> multib = [b, c, d]
>>> ak.union1d(multia, multib)
[array([1 2 2 3 4 4 5 5]), array([1 2 5 3 2 4 4 5]), array([1 2 4 3 5 4 2 5])]
arkouda.intersect1d(A, B, assume_unique=False)[source]

Find the intersection of two arrays.

Return the sorted, unique values that are in both of the input arrays.

Parameters:
Returns:

Sorted 1D array/List of sorted pdarrays of common and unique elements.

Return type:

pdarray/groupable

Raises:
  • TypeError – Raised if either A or B is not a groupable

  • RuntimeError – Raised if the dtype of either pdarray is not supported

Examples

1D Example

>>> ak.intersect1d(ak.array([1, 3, 4, 3]), ak.array([3, 1, 2, 1]))
array([1 3])

Multi-Array Example

>>> a = ak.arange(5)
>>> b = ak.array([1, 5, 3, 4, 2])
>>> c = ak.array([1, 4, 3, 2, 5])
>>> d = ak.array([1, 2, 3, 5, 4])
>>> multia = [a, a, a]
>>> multib = [b, c, d]
>>> ak.intersect1d(multia, multib)
[array([1 3]), array([1 3]), array([1 3])]
arkouda.setdiff1d(A, B, assume_unique=False)[source]

Find the set difference of two arrays.

Return the sorted, unique values in A that are not in B.

Parameters:
Returns:

Sorted 1D array/List of sorted pdarrays of values in A that are not in B.

Return type:

pdarray/groupable

Raises:
  • TypeError – Raised if either A or B is not a pdarray

  • RuntimeError – Raised if the dtype of either pdarray is not supported

Notes

ak.setdiff1d is not supported for bool pdarrays

Examples

>>> a = ak.array([1, 2, 3, 2, 4, 1])
>>> b = ak.array([3, 4, 5, 6])
>>> ak.setdiff1d(a, b)
array([1 2])

Multi-Array Example

>>> a = ak.arange(1, 6)
>>> b = ak.array([1, 5, 3, 4, 2])
>>> c = ak.array([1, 4, 3, 2, 5])
>>> d = ak.array([1, 2, 3, 5, 4])
>>> multia = [a, a, a]
>>> multib = [b, c, d]
>>> ak.setdiff1d(multia, multib)
[array([2 4 5]), array([2 4 5]), array([2 4 5])]
arkouda.setxor1d(A, B, assume_unique=False)[source]

Find the set exclusive-or (symmetric difference) of two arrays.

Return the sorted, unique values that are in only one (not both) of the input arrays.

Parameters:
Returns:

Sorted 1D array/List of sorted pdarrays of unique values that are in only one of the input arrays.

Return type:

pdarray/groupable

Raises:
  • TypeError – Raised if either A or B is not a groupable

  • RuntimeError – Raised if the dtype of either pdarray is not supported

Examples

>>> a = ak.array([1, 2, 3, 2, 4])
>>> b = ak.array([2, 3, 5, 7, 5])
>>> ak.setxor1d(a,b)
array([1 4 5 7])

Multi-Array Example

>>> a = ak.arange(1, 6)
>>> b = ak.array([1, 5, 3, 4, 2])
>>> c = ak.array([1, 4, 3, 2, 5])
>>> d = ak.array([1, 2, 3, 5, 4])
>>> multia = [a, a, a]
>>> multib = [b, c, d]
>>> ak.setxor1d(multia, multib)
[array([2 2 4 4 5 5]), array([2 5 2 4 4 5]), array([2 4 5 4 2 5])]