Array Set Operations¶
Following numpy.lib.arraysetops
, arkouda supports parallel, distributed set operations using pdarray
objects.
The unique
function effectively converts a pdarray
to a set:
- arkouda.unique(pda, return_groups=False, assume_sorted=False, return_indices=False)[source]¶
Find the unique elements of an array.
Returns the unique elements of an array, sorted if the values are integers. There is an optional output in addition to the unique elements: the number of times each unique value comes up in the input array.
- Parameters:
pda ((list of) pdarray, Strings, or Categorical) – Input array.
return_groups (bool, optional) – If True, also return grouping information for the array.
assume_sorted (bool, optional) – If True, assume pda is sorted and skip sorting step
return_indices (bool, optional) – Only applicable if return_groups is True. If True, return unique key indices along with other groups
- Return type:
Union
[pdarray
,Strings
,Categorical
,Sequence
[Union
[pdarray
,Strings
,Categorical
]],Tuple
[Union
[pdarray
,Strings
,Categorical
,Sequence
[Union
[pdarray
,Strings
,Categorical
]]],pdarray
,pdarray
,int
]]- Returns:
unique ((list of) pdarray, Strings, or Categorical) – The unique values. If input dtype is int64, return values will be sorted.
permutation (pdarray, optional) – Permutation that groups equivalent values together (only when return_groups=True)
segments (pdarray, optional) – The offset of each group in the permuted array (only when return_groups=True)
- Raises:
TypeError – Raised if pda is not a pdarray or Strings object
RuntimeError – Raised if the pdarray or Strings dtype is unsupported
Notes
For integer arrays, this function checks to see whether pda is sorted and, if so, whether it is already unique. This step can save considerable computation. Otherwise, this function will sort pda.
Examples
>>> A = ak.array([3, 2, 1, 1, 2, 3]) >>> ak.unique(A) array([1, 2, 3])
- arkouda.in1d(A, B, assume_unique=False, symmetric=False, invert=False)[source]¶
Test whether each element of a 1-D array is also present in a second array.
Returns a boolean array the same length as A that is True where an element of A is in B and False otherwise.
Supports multi-level, i.e. test if rows of a are in the set of rows of b. But note that multi-dimensional pdarrays are not supported.
- Parameters:
A (list of pdarrays, pdarray, Strings, or Categorical) – Entries will be tested for membership in B
B (list of pdarrays, pdarray, Strings, or Categorical) – The set of elements in which to test membership
assume_unique (bool, optional, defaults to False) – If true, assume rows of a and b are each unique and sorted. By default, sort and unique them explicitly.
symmetric (bool, optional, defaults to False) – Return in1d(A, B), in1d(B, A) when A and B are single items.
invert (bool, optional, defaults to False) – If True, the values in the returned array are inverted (that is, False where an element of A is in B and True otherwise). Default is False.
ak.in1d(a, b, invert=True)
is equivalent to (but is faster than)~ak.in1d(a, b)
.
- Returns:
True for each row in a that is contained in b
- Return type:
pdarray, bool
- Raises:
TypeError – Raised if either A or B is not a pdarray, Strings, or Categorical object, or if both are pdarrays and either has rank > 1, or if invert is not a bool
RuntimeError – Raised if the dtype of either array is not supported
Examples
>>> ak.in1d(ak.array([-1, 0, 1]), ak.array([-2, 0, 2])) array([False True False])
>>> ak.in1d(ak.array(['one','two']),ak.array(['two', 'three','four','five'])) array([False True])
See also
Notes
in1d can be considered as an element-wise function version of the python keyword in, for 1-D sequences.
in1d(a, b)
is logically equivalent toak.array([item in b for item in a])
, but is much faster and scales to arbitrarily largea
.ak.in1d is not supported for bool or float64 pdarrays
- arkouda.union1d(A, B)[source]¶
Find the union of two arrays/List of Arrays.
Return the unique, sorted array of values that are in either of the two input arrays.
- Parameters:
A (list of pdarrays, pdarray, Strings, or Categorical)
B (list of pdarrays, pdarray, Strings, or Categorical)
- Returns:
Unique, sorted union of the input arrays.
- Return type:
pdarray/groupable
- Raises:
TypeError – Raised if either A or B is not a groupable
RuntimeError – Raised if the dtype of either input is not supported
See also
Examples
1D Example
>>> ak.union1d(ak.array([-1, 0, 1]), ak.array([-2, 0, 2])) array([-2 -1 0 1 2])
Multi-Array Example
>>> a = ak.arange(1, 6) >>> b = ak.array([1, 5, 3, 4, 2]) >>> c = ak.array([1, 4, 3, 2, 5]) >>> d = ak.array([1, 2, 3, 5, 4]) >>> multia = [a, a, a] >>> multib = [b, c, d] >>> ak.union1d(multia, multib) [array([1 2 2 3 4 4 5 5]), array([1 2 5 3 2 4 4 5]), array([1 2 4 3 5 4 2 5])]
- arkouda.intersect1d(A, B, assume_unique=False)[source]¶
Find the intersection of two arrays.
Return the sorted, unique values that are in both of the input arrays.
- Parameters:
A (list of pdarrays, pdarray, Strings, or Categorical)
B (list of pdarrays, pdarray, Strings, or Categorical)
assume_unique (bool) – If True, the input arrays are both assumed to be unique, which can speed up the calculation. Default is False.
- Returns:
Sorted 1D array/List of sorted pdarrays of common and unique elements.
- Return type:
pdarray/groupable
- Raises:
TypeError – Raised if either A or B is not a groupable
RuntimeError – Raised if the dtype of either pdarray is not supported
See also
Examples
1D Example
>>> ak.intersect1d(ak.array([1, 3, 4, 3]), ak.array([3, 1, 2, 1])) array([1 3])
Multi-Array Example
>>> a = ak.arange(5) >>> b = ak.array([1, 5, 3, 4, 2]) >>> c = ak.array([1, 4, 3, 2, 5]) >>> d = ak.array([1, 2, 3, 5, 4]) >>> multia = [a, a, a] >>> multib = [b, c, d] >>> ak.intersect1d(multia, multib) [array([1 3]), array([1 3]), array([1 3])]
- arkouda.setdiff1d(A, B, assume_unique=False)[source]¶
Find the set difference of two arrays.
Return the sorted, unique values in A that are not in B.
- Parameters:
A (list of pdarrays, pdarray, Strings, or Categorical)
B (list of pdarrays, pdarray, Strings, or Categorical)
assume_unique (bool) – If True, the input arrays are both assumed to be unique, which can speed up the calculation. Default is False.
- Returns:
Sorted 1D array/List of sorted pdarrays of values in A that are not in B.
- Return type:
pdarray/groupable
- Raises:
TypeError – Raised if either A or B is not a pdarray
RuntimeError – Raised if the dtype of either pdarray is not supported
See also
Notes
ak.setdiff1d is not supported for bool pdarrays
Examples
>>> a = ak.array([1, 2, 3, 2, 4, 1]) >>> b = ak.array([3, 4, 5, 6]) >>> ak.setdiff1d(a, b) array([1 2])
Multi-Array Example
>>> a = ak.arange(1, 6) >>> b = ak.array([1, 5, 3, 4, 2]) >>> c = ak.array([1, 4, 3, 2, 5]) >>> d = ak.array([1, 2, 3, 5, 4]) >>> multia = [a, a, a] >>> multib = [b, c, d] >>> ak.setdiff1d(multia, multib) [array([2 4 5]), array([2 4 5]), array([2 4 5])]
- arkouda.setxor1d(A, B, assume_unique=False)[source]¶
Find the set exclusive-or (symmetric difference) of two arrays.
Return the sorted, unique values that are in only one (not both) of the input arrays.
- Parameters:
A (list of pdarrays, pdarray, Strings, or Categorical)
B (list of pdarrays, pdarray, Strings, or Categorical)
assume_unique (bool) – If True, the input arrays are both assumed to be unique, which can speed up the calculation. Default is False.
- Returns:
Sorted 1D array/List of sorted pdarrays of unique values that are in only one of the input arrays.
- Return type:
pdarray/groupable
- Raises:
TypeError – Raised if either A or B is not a groupable
RuntimeError – Raised if the dtype of either pdarray is not supported
Examples
>>> a = ak.array([1, 2, 3, 2, 4]) >>> b = ak.array([2, 3, 5, 7, 5]) >>> ak.setxor1d(a,b) array([1 4 5 7])
Multi-Array Example
>>> a = ak.arange(1, 6) >>> b = ak.array([1, 5, 3, 4, 2]) >>> c = ak.array([1, 4, 3, 2, 5]) >>> d = ak.array([1, 2, 3, 5, 4]) >>> multia = [a, a, a] >>> multib = [b, c, d] >>> ak.setxor1d(multia, multib) [array([2 2 4 4 5 5]), array([2 5 2 4 4 5]), array([2 4 5 4 2 5])]