Array Set Operations¶
Following numpy.lib.arraysetops
, arkouda supports parallel, distributed set operations using pdarray
objects.
The unique
function effectively converts a pdarray
to a set:
- arkouda.unique(pda, return_groups=False, assume_sorted=False, return_indices=False)[source]¶
Find the unique elements of an array.
Returns the unique elements of an array, sorted if the values are integers. There is an optional output in addition to the unique elements: the number of times each unique value comes up in the input array.
- Parameters:
pda ((list of) pdarray, Strings, or Categorical) – Input array.
return_groups (bool, optional) – If True, also return grouping information for the array.
assume_sorted (bool, optional) – If True, assume pda is sorted and skip sorting step
return_indices (bool, optional) – Only applicable if return_groups is True. If True, return unique key indices along with other groups
- Return type:
Union
[pdarray
,Strings
,Categorical
,Sequence
[Union
[pdarray
,Strings
,Categorical
]],Tuple
[Union
[pdarray
,Strings
,Categorical
,Sequence
[Union
[pdarray
,Strings
,Categorical
]]],pdarray
,pdarray
,int
]]- Returns:
unique ((list of) pdarray, Strings, or Categorical) – The unique values. If input dtype is int64, return values will be sorted.
permutation (pdarray, optional) – Permutation that groups equivalent values together (only when return_groups=True)
segments (pdarray, optional) – The offset of each group in the permuted array (only when return_groups=True)
- Raises:
TypeError – Raised if pda is not a pdarray or Strings object
RuntimeError – Raised if the pdarray or Strings dtype is unsupported
Notes
For integer arrays, this function checks to see whether pda is sorted and, if so, whether it is already unique. This step can save considerable computation. Otherwise, this function will sort pda.
Examples
>>> A = ak.array([3, 2, 1, 1, 2, 3]) >>> ak.unique(A) array([1, 2, 3])
- arkouda.in1d(pda1, pda2, assume_unique=False, symmetric=False, invert=False)[source]¶
Test whether each element of a 1-D array is also present in a second array.
Returns a boolean array the same length as pda1 that is True where an element of pda1 is in pda2 and False otherwise.
Support multi-level – test membership of rows of a in the set of rows of b.
- Parameters:
a (list of pdarrays, pdarray, Strings, or Categorical) – Rows are elements for which to test membership in b
b (list of pdarrays, pdarray, Strings, or Categorical) – Rows are elements of the set in which to test membership
assume_unique (bool) – If true, assume rows of a and b are each unique and sorted. By default, sort and unique them explicitly.
symmetric (bool) – Return in1d(pda1, pda2), in1d(pda2, pda1) when pda1 and 2 are single items.
invert (bool, optional) – If True, the values in the returned array are inverted (that is, False where an element of pda1 is in pda2 and True otherwise). Default is False.
ak.in1d(a, b, invert=True)
is equivalent to (but is faster than)~ak.in1d(a, b)
.
- Return type:
Union
[pdarray
,Strings
,Categorical
,Sequence
[Union
[pdarray
,Strings
,Categorical
]]]- Returns:
True for each row in a that is contained in b
Return Type
———— – pdarray, bool
Notes
Only works for pdarrays of int64 dtype, float64, Strings, or Categorical
- arkouda.union1d(pda1, pda2)[source]¶
Find the union of two arrays/List of Arrays.
Return the unique, sorted array of values that are in either of the two input arrays.
- Parameters:
pda1 (pdarray/Sequence[pdarray, Strings, Categorical]) – Input array/Sequence of groupable objects
pda2 (pdarray/List) – Input array/sequence of groupable objects
- Returns:
Unique, sorted union of the input arrays.
- Return type:
pdarray/groupable
- Raises:
TypeError – Raised if either pda1 or pda2 is not a pdarray
RuntimeError – Raised if the dtype of either array is not supported
See also
Notes
ak.union1d is not supported for bool or float64 pdarrays
Examples
>>> # 1D Example >>> ak.union1d(ak.array([-1, 0, 1]), ak.array([-2, 0, 2])) array([-2, -1, 0, 1, 2]) #Multi-Array Example >>> a = ak.arange(1, 6) >>> b = ak.array([1, 5, 3, 4, 2]) >>> c = ak.array([1, 4, 3, 2, 5]) >>> d = ak.array([1, 2, 3, 5, 4]) >>> multia = [a, a, a] >>> multib = [b, c, d] >>> ak.union1d(multia, multib) [array[1, 2, 2, 3, 4, 4, 5, 5], array[1, 2, 5, 3, 2, 4, 4, 5], array[1, 2, 4, 3, 5, 4, 2, 5]]
- arkouda.intersect1d(pda1, pda2, assume_unique=False)[source]¶
Find the intersection of two arrays.
Return the sorted, unique values that are in both of the input arrays.
- Parameters:
pda1 (pdarray/Sequence[pdarray, Strings, Categorical]) – Input array/Sequence of groupable objects
pda2 (pdarray/List) – Input array/sequence of groupable objects
assume_unique (bool) – If True, the input arrays are both assumed to be unique, which can speed up the calculation. Default is False.
- Returns:
Sorted 1D array/List of sorted pdarrays of common and unique elements.
- Return type:
pdarray/groupable
- Raises:
TypeError – Raised if either pda1 or pda2 is not a pdarray
RuntimeError – Raised if the dtype of either pdarray is not supported
See also
Notes
ak.intersect1d is not supported for bool or float64 pdarrays
Examples
>>> # 1D Example >>> ak.intersect1d([1, 3, 4, 3], [3, 1, 2, 1]) array([1, 3]) # Multi-Array Example >>> a = ak.arange(5) >>> b = ak.array([1, 5, 3, 4, 2]) >>> c = ak.array([1, 4, 3, 2, 5]) >>> d = ak.array([1, 2, 3, 5, 4]) >>> multia = [a, a, a] >>> multib = [b, c, d] >>> ak.intersect1d(multia, multib) [array([1, 3]), array([1, 3]), array([1, 3])]
- arkouda.setdiff1d(pda1, pda2, assume_unique=False)[source]¶
Find the set difference of two arrays.
Return the sorted, unique values in pda1 that are not in pda2.
- Parameters:
pda1 (pdarray/Sequence[pdarray, Strings, Categorical]) – Input array/Sequence of groupable objects
pda2 (pdarray/List) – Input array/sequence of groupable objects
assume_unique (bool) – If True, the input arrays are both assumed to be unique, which can speed up the calculation. Default is False.
- Returns:
Sorted 1D array/List of sorted pdarrays of values in pda1 that are not in pda2.
- Return type:
pdarray/groupable
- Raises:
TypeError – Raised if either pda1 or pda2 is not a pdarray
RuntimeError – Raised if the dtype of either pdarray is not supported
See also
Notes
ak.setdiff1d is not supported for bool or float64 pdarrays
Examples
>>> a = ak.array([1, 2, 3, 2, 4, 1]) >>> b = ak.array([3, 4, 5, 6]) >>> ak.setdiff1d(a, b) array([1, 2]) #Multi-Array Example >>> a = ak.arange(1, 6) >>> b = ak.array([1, 5, 3, 4, 2]) >>> c = ak.array([1, 4, 3, 2, 5]) >>> d = ak.array([1, 2, 3, 5, 4]) >>> multia = [a, a, a] >>> multib = [b, c, d] >>> ak.setdiff1d(multia, multib) [array([2, 4, 5]), array([2, 4, 5]), array([2, 4, 5])]
- arkouda.setxor1d(pda1, pda2, assume_unique=False)[source]¶
Find the set exclusive-or (symmetric difference) of two arrays.
Return the sorted, unique values that are in only one (not both) of the input arrays.
- Parameters:
pda1 (pdarray/Sequence[pdarray, Strings, Categorical]) – Input array/Sequence of groupable objects
pda2 (pdarray/List) – Input array/sequence of groupable objects
assume_unique (bool) – If True, the input arrays are both assumed to be unique, which can speed up the calculation. Default is False.
- Returns:
Sorted 1D array/List of sorted pdarrays of unique values that are in only one of the input arrays.
- Return type:
pdarray/groupable
- Raises:
TypeError – Raised if either pda1 or pda2 is not a pdarray
RuntimeError – Raised if the dtype of either pdarray is not supported
Notes
ak.setxor1d is not supported for bool or float64 pdarrays
Examples
>>> a = ak.array([1, 2, 3, 2, 4]) >>> b = ak.array([2, 3, 5, 7, 5]) >>> ak.setxor1d(a,b) array([1, 4, 5, 7]) #Multi-Array Example >>> a = ak.arange(1, 6) >>> b = ak.array([1, 5, 3, 4, 2]) >>> c = ak.array([1, 4, 3, 2, 5]) >>> d = ak.array([1, 2, 3, 5, 4]) >>> multia = [a, a, a] >>> multib = [b, c, d] >>> ak.setxor1d(multia, multib) [array([2, 2, 4, 4, 5, 5]), array([2, 5, 2, 4, 4, 5]), array([2, 4, 5, 4, 2, 5])]