arkouda.groupbyclass ==================== .. py:module:: arkouda.groupbyclass Classes ------- .. autoapisummary:: arkouda.groupbyclass.GROUPBY_REDUCTION_TYPES arkouda.groupbyclass.GroupBy Functions --------- .. autoapisummary:: arkouda.groupbyclass.broadcast arkouda.groupbyclass.unique Module Contents --------------- .. py:class:: GROUPBY_REDUCTION_TYPES frozenset() -> empty frozenset object frozenset(iterable) -> frozenset object Build an immutable unordered collection of unique elements. .. py:method:: copy(*args, **kwargs) Return a shallow copy of a set. .. py:method:: difference(*args, **kwargs) Return the difference of two or more sets as a new set. (i.e. all elements that are in this set but not the others.) .. py:method:: intersection(*args, **kwargs) Return the intersection of two sets as a new set. (i.e. all elements that are in both sets.) .. py:method:: isdisjoint(*args, **kwargs) Return True if two sets have a null intersection. .. py:method:: issubset(*args, **kwargs) Report whether another set contains this set. .. py:method:: issuperset(*args, **kwargs) Report whether this set contains another set. .. py:method:: symmetric_difference(*args, **kwargs) Return the symmetric difference of two sets as a new set. (i.e. all elements that are in exactly one of the sets.) .. py:method:: union(*args, **kwargs) Return the union of sets as a new set. (i.e. all elements that are in either set.) .. py:class:: GroupBy Group an array or list of arrays by value, usually in preparation for aggregating the within-group values of another array. :param keys: The array to group by value, or if list, the column arrays to group by row :type keys: (list of) pdarray, Strings, or Categorical :param assume_sorted: If True, assume keys is already sorted (Default: False) :type assume_sorted: bool .. attribute:: nkeys The number of key arrays (columns) :type: int .. attribute:: size The length of the input array(s), i.e. number of rows :type: int .. attribute:: permutation The permutation that sorts the keys array(s) by value (row) :type: pdarray .. attribute:: unique_keys The unique values of the keys array(s), in grouped order :type: (list of) pdarray, Strings, or Categorical .. attribute:: ngroups The length of the unique_keys array(s), i.e. number of groups :type: int .. attribute:: segments The start index of each group in the grouped array(s) :type: pdarray .. attribute:: logger Used for all logging operations :type: ArkoudaLogger .. attribute:: dropna If True, and the groupby keys contain NaN values, the NaN values together with the corresponding row will be dropped. Otherwise, the rows corresponding to NaN values will be kept. :type: bool (default=True) :raises TypeError: Raised if keys is a pdarray with a dtype other than int64 .. rubric:: Notes Integral pdarrays, Strings, and Categoricals are natively supported, but float64 and bool arrays are not. For a user-defined class to be groupable, it must inherit from pdarray and define or overload the grouping API: 1) a ._get_grouping_keys() method that returns a list of pdarrays that can be (co)argsorted. 2) (Optional) a .group() method that returns the permutation that groups the array If the input is a single array with a .group() method defined, method 2 will be used; otherwise, method 1 will be used. .. py:method:: AND(values: pdarray) -> Tuple[Union[pdarray, List[Union[pdarray, Strings]]], pdarray] Bitwise AND of values in each segment. Using the permutation stored in the GroupBy instance, group another array of values and perform a bitwise AND reduction on each group. :param values: The values to group and reduce with AND :type values: pdarray, int64 :returns: * **unique_keys** (*(list of) pdarray or Strings*) -- The unique keys, in grouped order * **result** (*pdarray, int64*) -- Bitwise AND of values in segments corresponding to keys :raises TypeError: Raised if the values array is not a pdarray or if the pdarray dtype is not int64 :raises ValueError: Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array :raises RuntimeError: Raised if all is not supported for the values dtype .. py:method:: OR(values: pdarray) -> Tuple[Union[pdarray, List[Union[pdarray, Strings]]], pdarray] Bitwise OR of values in each segment. Using the permutation stored in the GroupBy instance, group another array of values and perform a bitwise OR reduction on each group. :param values: The values to group and reduce with OR :type values: pdarray, int64 :returns: * **unique_keys** (*(list of) pdarray or Strings*) -- The unique keys, in grouped order * **result** (*pdarray, int64*) -- Bitwise OR of values in segments corresponding to keys :raises TypeError: Raised if the values array is not a pdarray or if the pdarray dtype is not int64 :raises ValueError: Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array :raises RuntimeError: Raised if all is not supported for the values dtype .. py:method:: Reductions(*args, **kwargs) frozenset() -> empty frozenset object frozenset(iterable) -> frozenset object Build an immutable unordered collection of unique elements. .. py:method:: XOR(values: pdarray) -> Tuple[Union[pdarray, List[Union[pdarray, Strings]]], pdarray] Bitwise XOR of values in each segment. Using the permutation stored in the GroupBy instance, group another array of values and perform a bitwise XOR reduction on each group. :param values: The values to group and reduce with XOR :type values: pdarray, int64 :returns: * **unique_keys** (*(list of) pdarray or Strings*) -- The unique keys, in grouped order * **result** (*pdarray, int64*) -- Bitwise XOR of values in segments corresponding to keys :raises TypeError: Raised if the values array is not a pdarray or if the pdarray dtype is not int64 :raises ValueError: Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array :raises RuntimeError: Raised if all is not supported for the values dtype .. py:method:: aggregate(values: groupable, operator: str, skipna: bool = True, ddof: int_scalars = 1) -> Tuple[groupable, groupable] Using the permutation stored in the GroupBy instance, group another array of values and apply a reduction to each group's values. :param values: The values to group and reduce :type values: pdarray :param operator: The name of the reduction operator to use :type operator: str :param skipna: boolean which determines if NANs should be skipped :type skipna: bool :param ddof: "Delta Degrees of Freedom" used in calculating std :type ddof: int_scalars :returns: * **unique_keys** (*groupable*) -- The unique keys, in grouped order * **aggregates** (*groupable*) -- One aggregate value per unique key in the GroupBy instance :raises TypeError: Raised if the values array is not a pdarray :raises ValueError: Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array :raises RuntimeError: Raised if the requested operator is not supported for the values dtype .. rubric:: Examples >>> keys = ak.arange(0, 10) >>> vals = ak.linspace(-1, 1, 10) >>> g = ak.GroupBy(keys) >>> g.aggregate(vals, 'sum') (array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]), array([-1, -0.77777777777777768, -0.55555555555555536, -0.33333333333333348, -0.11111111111111116, 0.11111111111111116, 0.33333333333333348, 0.55555555555555536, 0.77777777777777768, 1])) >>> g.aggregate(vals, 'min') (array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]), array([-1, -0.77777777777777779, -0.55555555555555558, -0.33333333333333337, -0.11111111111111116, 0.11111111111111116, 0.33333333333333326, 0.55555555555555536, 0.77777777777777768, 1])) .. py:method:: all(values: pdarray) -> Tuple[Union[pdarray, List[Union[pdarray, Strings]]], pdarray] Using the permutation stored in the GroupBy instance, group another array of values and perform an "and" reduction on each group. :param values: The values to group and reduce with "and" :type values: pdarray, bool :returns: * **unique_keys** (*(list of) pdarray or Strings*) -- The unique keys, in grouped order * **group_any** (*pdarray, bool*) -- One bool per unique key in the GroupBy instance :raises TypeError: Raised if the values array is not a pdarray or if the pdarray dtype is not bool :raises ValueError: Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array :raises RuntimeError: Raised if all is not supported for the values dtype .. py:method:: any(values: pdarray) -> Tuple[Union[pdarray, List[Union[pdarray, Strings]]], pdarray] Using the permutation stored in the GroupBy instance, group another array of values and perform an "or" reduction on each group. :param values: The values to group and reduce with "or" :type values: pdarray, bool :returns: * **unique_keys** (*(list of) pdarray or Strings*) -- The unique keys, in grouped order * **group_any** (*pdarray, bool*) -- One bool per unique key in the GroupBy instance :raises TypeError: Raised if the values array is not a pdarray or if the pdarray dtype is not bool :raises ValueError: Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array .. py:method:: argmax(values: pdarray) -> Tuple[groupable, pdarray] Using the permutation stored in the GroupBy instance, group another array of values and return the location of the first maximum of each group's values. :param values: The values to group and find argmax :type values: pdarray :returns: * **unique_keys** (*(list of) pdarray or Strings*) -- The unique keys, in grouped order * **group_argmaxima** (*pdarray, int64*) -- One index per unique key in the GroupBy instance :raises TypeError: Raised if the values array is not a pdarray object or if argmax is not supported for the values dtype :raises ValueError: Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array .. rubric:: Notes The returned indices refer to the original values array as passed in, not the permutation applied by the GroupBy instance. .. rubric:: Examples >>> a = ak.randint(1,5,10) >>> a array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2]) >>> g = ak.GroupBy(a) >>> g.keys array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2]) >>> b = ak.randint(1,5,10) >>> b array([3, 3, 3, 4, 1, 1, 3, 3, 3, 4]) >>> g.argmax(b) (array([2, 3, 4]), array([9, 3, 2])) .. py:method:: argmin(values: pdarray) -> Tuple[groupable, pdarray] Using the permutation stored in the GroupBy instance, group another array of values and return the location of the first minimum of each group's values. :param values: The values to group and find argmin :type values: pdarray :returns: * **unique_keys** (*(list of) pdarray or Strings*) -- The unique keys, in grouped order * **group_argminima** (*pdarray, int64*) -- One index per unique key in the GroupBy instance :raises TypeError: Raised if the values array is not a pdarray object or if argmax is not supported for the values dtype :raises ValueError: Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array :raises RuntimeError: Raised if argmin is not supported for the values dtype .. rubric:: Notes The returned indices refer to the original values array as passed in, not the permutation applied by the GroupBy instance. .. rubric:: Examples >>> a = ak.randint(1,5,10) >>> a array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2]) >>> g = ak.GroupBy(a) >>> g.keys array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2]) >>> b = ak.randint(1,5,10) >>> b array([3, 3, 3, 4, 1, 1, 3, 3, 3, 4]) >>> g.argmin(b) (array([2, 3, 4]), array([5, 4, 2])) .. py:method:: attach(user_defined_name: str) -> GroupBy Function to return a GroupBy object attached to the registered name in the arkouda server which was registered using register() :param user_defined_name: user defined name which GroupBy object was registered under :type user_defined_name: str :returns: The GroupBy object created by re-attaching to the corresponding server components :rtype: GroupBy :raises RegistrationError: if user_defined_name is not registered .. seealso:: :obj:`register`, :obj:`is_registered`, :obj:`unregister`, :obj:`unregister_groupby_by_name` .. py:method:: broadcast(values: Union[pdarray, Strings], permute: bool = True) -> Union[pdarray, Strings] Fill each group's segment with a constant value. :param values: The values to put in each group's segment :type values: pdarray, Strings :param permute: If True (default), permute broadcast values back to the ordering of the original array on which GroupBy was called. If False, the broadcast values are grouped by value. :type permute: bool :returns: The broadcasted values :rtype: pdarray, Strings :raises TypeError: Raised if value is not a pdarray object :raises ValueError: Raised if the values array does not have one value per segment .. rubric:: Notes This function is a sparse analog of ``np.broadcast``. If a GroupBy object represents a sparse matrix (tensor), then this function takes a (dense) column vector and replicates each value to the non-zero elements in the corresponding row. .. rubric:: Examples >>> a = ak.array([0, 1, 0, 1, 0]) >>> values = ak.array([3, 5]) >>> g = ak.GroupBy(a) # By default, result is in original order >>> g.broadcast(values) array([3, 5, 3, 5, 3]) # With permute=False, result is in grouped order >>> g.broadcast(values, permute=False) array([3, 3, 3, 5, 5] >>> a = ak.randint(1,5,10) >>> a array([3, 1, 4, 4, 4, 1, 3, 3, 2, 2]) >>> g = ak.GroupBy(a) >>> keys,counts = g.size() >>> g.broadcast(counts > 2) array([True False True True True False True True False False]) >>> g.broadcast(counts == 3) array([True False True True True False True True False False]) >>> g.broadcast(counts < 4) array([True True True True True True True True True True]) .. py:method:: build_from_components(user_defined_name: Optional[str] = None, **kwargs) -> GroupBy function to build a new GroupBy object from component keys and permutation. :param user_defined_name: and assign it the given name :type user_defined_name: str (Optional) Passing a name will init the new GroupBy :param kwargs: Expected keys are "orig_keys", "permutation", "unique_keys", and "segments" :type kwargs: dict Dictionary of components required for rebuilding the GroupBy. :returns: The GroupBy object created by using the given components :rtype: GroupBy .. py:method:: count(values: pdarray) -> Tuple[groupable, pdarray] Count the number of elements in each group. NaN values will be excluded from the total. :param values: The values to be count by group (excluding NaN values). :type values: pdarray :returns: * **unique_keys** (*(list of) pdarray or Strings*) -- The unique keys, in grouped order * **counts** (*pdarray, int64*) -- The number of times each unique key appears (excluding NaN values). .. rubric:: Examples >>> a = ak.array([1, 0, -1, 1, 0, -1]) >>> a array([1 0 -1 1 0 -1]) >>> b = ak.array([1, np.nan, -1, np.nan, np.nan, -1], dtype = "float64") >>> b array([1.00000000000000000 nan -1.00000000000000000 nan nan -1.00000000000000000]) >>> g = ak.GroupBy(a) >>> keys,counts = g.count(b) >>> keys array([-1 0 1]) >>> counts array([2 0 1]) .. py:method:: first(values: groupable_element_type) -> Tuple[groupable, groupable_element_type] First value in each group. :param values: The values from which to take the first of each group :type values: pdarray-like :returns: * **unique_keys** (*(list of) pdarray-like*) -- The unique keys, in grouped order * **result** (*pdarray-like*) -- The first value of each group .. py:method:: from_return_msg(rep_msg) .. py:method:: head(values: groupable_element_type, n: int = 5, return_indices: bool = True) -> Tuple[groupable, groupable_element_type] Return the first n values from each group. :param values: The values from which to select, according to their group membership. :type values: (list of) pdarray-like :param n: Maximum number of items to return for each group. If the number of values in a group is less than n, all the values from that group will be returned. :type n: int, optional, default = 5 :param return_indices: If True, return the indices of the sampled values. Otherwise, return the selected values. :type return_indices: bool, default False :returns: * **unique_keys** (*(list of) pdarray-like*) -- The unique keys, in grouped order * **result** (*pdarray-like*) -- The first n items of each group. If return_indices is True, the result are indices. O.W. the result are values. .. rubric:: Examples >>> a = ak.arange(10) %3 >>> a array([0 1 2 0 1 2 0 1 2 0]) >>> v = ak.arange(10) >>> v array([0 1 2 3 4 5 6 7 8 9]) >>> g = GroupBy(a) >>> unique_keys, idx = g.head(v, 2, return_indices=True) >>> _, values = g.head(v, 2, return_indices=False) >>> unique_keys array([0 1 2]) >>> idx array([0 3 1 4 2 5]) >>> values array([0 3 1 4 2 5]) >>> v2 = -2 * ak.arange(10) >>> v2 array([0 -2 -4 -6 -8 -10 -12 -14 -16 -18]) >>> _, idx2 = g.head(v2, 2, return_indices=True) >>> _, values2 = g.head(v2, 2, return_indices=False) >>> idx2 array([0 3 1 4 2 5]) >>> values2 array([0 -6 -2 -8 -4 -10]) .. py:method:: is_registered() -> bool Return True if the object is contained in the registry :returns: Indicates if the object is contained in the registry :rtype: bool :raises RegistrationError: Raised if there's a server-side error or a mismatch of registered components .. seealso:: :obj:`register`, :obj:`attach`, :obj:`unregister`, :obj:`unregister_groupby_by_name` .. rubric:: Notes Objects registered with the server are immune to deletion until they are unregistered. .. py:method:: max(values: pdarray, skipna: bool = True) -> Tuple[groupable, pdarray] Using the permutation stored in the GroupBy instance, group another array of values and return the maximum of each group's values. :param values: The values to group and find maxima :type values: pdarray :param skipna: boolean which determines if NANs should be skipped :type skipna: bool :returns: * **unique_keys** (*(list of) pdarray or Strings*) -- The unique keys, in grouped order * **group_maxima** (*pdarray*) -- One maximum per unique key in the GroupBy instance :raises TypeError: Raised if the values array is not a pdarray object or if max is not supported for the values dtype :raises ValueError: Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array :raises RuntimeError: Raised if max is not supported for the values dtype .. rubric:: Examples >>> a = ak.randint(1,5,10) >>> a array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2]) >>> g = ak.GroupBy(a) >>> g.keys array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2]) >>> b = ak.randint(1,5,10) >>> b array([3, 3, 3, 4, 1, 1, 3, 3, 3, 4]) >>> g.max(b) (array([2, 3, 4]), array([4, 4, 3])) .. py:method:: mean(values: pdarray, skipna: bool = True) -> Tuple[groupable, pdarray] Using the permutation stored in the GroupBy instance, group another array of values and compute the mean of each group's values. :param values: The values to group and average :type values: pdarray :param skipna: boolean which determines if NANs should be skipped :type skipna: bool :returns: * **unique_keys** (*(list of) pdarray or Strings*) -- The unique keys, in grouped order * **group_means** (*pdarray, float64*) -- One mean value per unique key in the GroupBy instance :raises TypeError: Raised if the values array is not a pdarray object :raises ValueError: Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array .. rubric:: Notes The return dtype is always float64. .. rubric:: Examples >>> a = ak.randint(1,5,10) >>> a array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2]) >>> g = ak.GroupBy(a) >>> g.keys array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2]) >>> b = ak.randint(1,5,10) >>> b array([3, 3, 3, 4, 1, 1, 3, 3, 3, 4]) >>> g.mean(b) (array([2, 3, 4]), array([2.6666666666666665, 2.7999999999999998, 3])) .. py:method:: median(values: pdarray, skipna: bool = True) -> Tuple[groupable, pdarray] Using the permutation stored in the GroupBy instance, group another array of values and compute the median of each group's values. :param values: The values to group and find median :type values: pdarray :param skipna: boolean which determines if NANs should be skipped :type skipna: bool :returns: * **unique_keys** (*(list of) pdarray or Strings*) -- The unique keys, in grouped order * **group_medians** (*pdarray, float64*) -- One median value per unique key in the GroupBy instance :raises TypeError: Raised if the values array is not a pdarray object :raises ValueError: Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array .. rubric:: Notes The return dtype is always float64. .. rubric:: Examples >>> a = ak.randint(1,5,9) >>> a array([4 1 4 3 2 2 2 3 3]) >>> g = ak.GroupBy(a) >>> g.keys array([4 1 4 3 2 2 2 3 3]) >>> b = ak.linspace(-5,5,9) >>> b array([-5 -3.75 -2.5 -1.25 0 1.25 2.5 3.75 5]) >>> g.median(b) (array([1 2 3 4]), array([-3.75 1.25 3.75 -3.75])) .. py:method:: min(values: pdarray, skipna: bool = True) -> Tuple[groupable, pdarray] Using the permutation stored in the GroupBy instance, group another array of values and return the minimum of each group's values. :param values: The values to group and find minima :type values: pdarray :param skipna: boolean which determines if NANs should be skipped :type skipna: bool :returns: * **unique_keys** (*(list of) pdarray or Strings*) -- The unique keys, in grouped order * **group_minima** (*pdarray*) -- One minimum per unique key in the GroupBy instance :raises TypeError: Raised if the values array is not a pdarray object or if min is not supported for the values dtype :raises ValueError: Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array :raises RuntimeError: Raised if min is not supported for the values dtype .. rubric:: Examples >>> a = ak.randint(1,5,10) >>> a array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2]) >>> g = ak.GroupBy(a) >>> g.keys array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2]) >>> b = ak.randint(1,5,10) >>> b array([3, 3, 3, 4, 1, 1, 3, 3, 3, 4]) >>> g.min(b) (array([2, 3, 4]), array([1, 1, 3])) .. py:method:: mode(values: groupable) -> Tuple[groupable, groupable] Most common value in each group. If a group is multi-modal, return the modal value that occurs first. :param values: The values from which to take the mode of each group :type values: (list of) pdarray-like :returns: * **unique_keys** (*(list of) pdarray-like*) -- The unique keys, in grouped order * **result** (*(list of) pdarray-like*) -- The most common value of each group .. py:method:: most_common(values) (Deprecated) See `GroupBy.mode()`. .. py:method:: nunique(values: groupable) -> Tuple[groupable, pdarray] Using the permutation stored in the GroupBy instance, group another array of values and return the number of unique values in each group. :param values: The values to group and find unique values :type values: pdarray, int64 :returns: * **unique_keys** (*groupable*) -- The unique keys, in grouped order * **group_nunique** (*groupable*) -- Number of unique values per unique key in the GroupBy instance :raises TypeError: Raised if the dtype(s) of values array(s) does/do not support the nunique method :raises ValueError: Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array :raises RuntimeError: Raised if nunique is not supported for the values dtype .. rubric:: Examples >>> data = ak.array([3, 4, 3, 1, 1, 4, 3, 4, 1, 4]) >>> data array([3, 4, 3, 1, 1, 4, 3, 4, 1, 4]) >>> labels = ak.array([1, 1, 1, 2, 2, 2, 3, 3, 3, 4]) >>> labels ak.array([1, 1, 1, 2, 2, 2, 3, 3, 3, 4]) >>> g = ak.GroupBy(labels) >>> g.keys ak.array([1, 1, 1, 2, 2, 2, 3, 3, 3, 4]) >>> g.nunique(data) array([1,2,3,4]), array([2, 2, 3, 1]) # Group (1,1,1) has values [3,4,3] -> there are 2 unique values 3&4 # Group (2,2,2) has values [1,1,4] -> 2 unique values 1&4 # Group (3,3,3) has values [3,4,1] -> 3 unique values # Group (4) has values [4] -> 1 unique value .. py:method:: objType(*args, **kwargs) str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.__str__() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'. .. py:method:: prod(values: pdarray, skipna: bool = True) -> Tuple[groupable, pdarray] Using the permutation stored in the GroupBy instance, group another array of values and compute the product of each group's values. :param values: The values to group and multiply :type values: pdarray :param skipna: boolean which determines if NANs should be skipped :type skipna: bool :returns: * **unique_keys** (*(list of) pdarray or Strings*) -- The unique keys, in grouped order * **group_products** (*pdarray, float64*) -- One product per unique key in the GroupBy instance :raises TypeError: Raised if the values array is not a pdarray object :raises ValueError: Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array :raises RuntimeError: Raised if prod is not supported for the values dtype .. rubric:: Notes The return dtype is always float64. .. rubric:: Examples >>> a = ak.randint(1,5,10) >>> a array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2]) >>> g = ak.GroupBy(a) >>> g.keys array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2]) >>> b = ak.randint(1,5,10) >>> b array([3, 3, 3, 4, 1, 1, 3, 3, 3, 4]) >>> g.prod(b) (array([2, 3, 4]), array([12, 108.00000000000003, 8.9999999999999982])) .. py:method:: register(user_defined_name: str) -> GroupBy Register this GroupBy object and underlying components with the Arkouda server :param user_defined_name: user defined name the GroupBy is to be registered under, this will be the root name for underlying components :type user_defined_name: str :returns: The same GroupBy which is now registered with the arkouda server and has an updated name. This is an in-place modification, the original is returned to support a fluid programming style. Please note you cannot register two different GroupBys with the same name. :rtype: GroupBy :raises TypeError: Raised if user_defined_name is not a str :raises RegistrationError: If the server was unable to register the GroupBy with the user_defined_name .. seealso:: :obj:`unregister`, :obj:`attach`, :obj:`unregister_groupby_by_name`, :obj:`is_registered` .. rubric:: Notes Objects registered with the server are immune to deletion until they are unregistered. .. py:method:: sample(values: groupable, n=None, frac=None, replace=False, weights=None, random_state=None, return_indices=False, permute_samples=False) Return a random sample from each group. You can either specify the number of elements or the fraction of elements to be sampled. random_state can be used for reproducibility :param values: The values from which to sample, according to their group membership. :type values: (list of) pdarray-like :param n: Number of items to return for each group. Cannot be used with frac and must be no larger than the smallest group unless replace is True. Default is one if frac is None. :type n: int, optional :param frac: Fraction of items to return. Cannot be used with n. :type frac: float, optional :param replace: Allow or disallow sampling of the value more than once. :type replace: bool, default False :param weights: Default None results in equal probability weighting. If passed a pdarray, then values must have the same length as the groupby keys and will be used as sampling probabilities after normalization within each group. Weights must be non-negative with at least one positive element within each group. :type weights: pdarray, optional :param random_state: If int, seed for random number generator. If ak.random.Generator, use as given. :type random_state: int or ak.random.Generator, optional :param return_indices: if True, return the indices of the sampled values. Otherwise, return the sample values. :type return_indices: bool, default False :param permute_samples: if True, return permute the samples according to group Otherwise, keep samples in original order. :type permute_samples: bool, default False :returns: if return_indices is True, return the indices of the sampled values. Otherwise, return the sample values. :rtype: pdarray .. py:method:: size() -> Tuple[groupable, pdarray] Count the number of elements in each group, i.e. the number of times each key appears. This counts the total number of rows (including NaN values). :param none: :returns: * **unique_keys** (*(list of) pdarray or Strings*) -- The unique keys, in grouped order * **counts** (*pdarray, int64*) -- The number of times each unique key appears .. seealso:: :obj:`count` .. rubric:: Examples >>> a = ak.randint(1,5,10) >>> a array([3, 2, 3, 1, 2, 4, 3, 4, 3, 4]) >>> g = ak.GroupBy(a) >>> keys,counts = g.size() >>> keys array([1, 2, 3, 4]) >>> counts array([1, 2, 4, 3]) .. py:method:: std(values: pdarray, skipna: bool = True, ddof: int_scalars = 1) -> Tuple[groupable, pdarray] Using the permutation stored in the GroupBy instance, group another array of values and compute the standard deviation of each group's values. :param values: The values to group and find standard deviation :type values: pdarray :param skipna: boolean which determines if NANs should be skipped :type skipna: bool :param ddof: "Delta Degrees of Freedom" used in calculating std :type ddof: int_scalars :returns: * **unique_keys** (*(list of) pdarray or Strings*) -- The unique keys, in grouped order * **group_stds** (*pdarray, float64*) -- One std value per unique key in the GroupBy instance :raises TypeError: Raised if the values array is not a pdarray object :raises ValueError: Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array .. rubric:: Notes The return dtype is always float64. The standard deviation is the square root of the average of the squared deviations from the mean, i.e., ``std = sqrt(mean((x - x.mean())**2))``. The average squared deviation is normally calculated as ``x.sum() / N``, where ``N = len(x)``. If, however, `ddof` is specified, the divisor ``N - ddof`` is used instead. In standard statistical practice, ``ddof=1`` provides an unbiased estimator of the variance of the infinite population. ``ddof=0`` provides a maximum likelihood estimate of the variance for normally distributed variables. The standard deviation computed in this function is the square root of the estimated variance, so even with ``ddof=1``, it will not be an unbiased estimate of the standard deviation per se. .. rubric:: Examples >>> a = ak.randint(1,5,10) >>> a array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2]) >>> g = ak.GroupBy(a) >>> g.keys array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2]) >>> b = ak.randint(1,5,10) >>> b array([3, 3, 3, 4, 1, 1, 3, 3, 3, 4]) >>> g.std(b) (array([2 3 4]), array([1.5275252316519465 1.0954451150103321 0])) .. py:method:: sum(values: pdarray, skipna: bool = True) -> Tuple[groupable, pdarray] Using the permutation stored in the GroupBy instance, group another array of values and sum each group's values. :param values: The values to group and sum :type values: pdarray :param skipna: boolean which determines if NANs should be skipped :type skipna: bool :returns: * **unique_keys** (*(list of) pdarray or Strings*) -- The unique keys, in grouped order * **group_sums** (*pdarray*) -- One sum per unique key in the GroupBy instance :raises TypeError: Raised if the values array is not a pdarray object :raises ValueError: Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array .. rubric:: Notes The grouped sum of a boolean ``pdarray`` returns integers. .. rubric:: Examples >>> a = ak.randint(1,5,10) >>> a array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2]) >>> g = ak.GroupBy(a) >>> g.keys array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2]) >>> b = ak.randint(1,5,10) >>> b array([3, 3, 3, 4, 1, 1, 3, 3, 3, 4]) >>> g.sum(b) (array([2, 3, 4]), array([8, 14, 6])) .. py:method:: tail(values: groupable_element_type, n: int = 5, return_indices: bool = True) -> Tuple[groupable, groupable_element_type] Return the last n values from each group. :param values: The values from which to select, according to their group membership. :type values: (list of) pdarray-like :param n: Maximum number of items to return for each group. If the number of values in a group is less than n, all the values from that group will be returned. :type n: int, optional, default = 5 :param return_indices: If True, return the indices of the sampled values. Otherwise, return the selected values. :type return_indices: bool, default False :returns: * **unique_keys** (*(list of) pdarray-like*) -- The unique keys, in grouped order * **result** (*pdarray-like*) -- The last n items of each group. If return_indices is True, the result are indices. O.W. the result are values. .. rubric:: Examples >>> a = ak.arange(10) %3 >>> a array([0 1 2 0 1 2 0 1 2 0]) >>> v = ak.arange(10) >>> v array([0 1 2 3 4 5 6 7 8 9]) >>> g = GroupBy(a) >>> unique_keys, idx = g.tail(v, 2, return_indices=True) >>> _, values = g.tail(v, 2, return_indices=False) >>> unique_keys array([0 1 2]) >>> idx array([6 9 4 7 5 8]) >>> values array([6 9 4 7 5 8]) >>> v2 = -2 * ak.arange(10) >>> v2 array([0 -2 -4 -6 -8 -10 -12 -14 -16 -18]) >>> _, idx2 = g.tail(v2, 2, return_indices=True) >>> _, values2 = g.tail(v2, 2, return_indices=False) >>> idx2 array([6 9 4 7 5 8]) >>> values2 array([-12 -18 -8 -14 -10 -16]) .. py:method:: to_hdf(prefix_path, dataset='groupby', mode='truncate', file_type='distribute') Save the GroupBy to HDF5. The result is a collection of HDF5 files, one file per locale of the arkouda server, where each filename starts with prefix_path. :param prefix_path: Directory and filename prefix that all output files will share :type prefix_path: str :param dataset: Name prefix for saved data within the HDF5 file :type dataset: str :param mode: By default, truncate (overwrite) output files, if they exist. If 'append', add data as a new column to existing files. :type mode: str {'truncate' | 'append'} :param file_type: Default: "distribute" When set to single, dataset is written to a single file. When distribute, dataset is written on a file per locale. This is only supported by HDF5 files and will have no impact of Parquet Files. :type file_type: str ("single" | "distribute") :returns: * *None* * *GroupBy is not currently supported by Parquet* .. py:method:: unique(values: groupable) Return the set of unique values in each group, as a SegArray. :param values: The values to unique :type values: (list of) pdarray-like :returns: * **unique_keys** (*(list of) pdarray-like*) -- The unique keys, in grouped order * **result** (*(list of) SegArray*) -- The unique values of each group :raises TypeError: Raised if values is or contains Strings or Categorical .. py:method:: unregister() Unregister this GroupBy object in the arkouda server which was previously registered using register() and/or attached to using attach() :raises RegistrationError: If the object is already unregistered or if there is a server error when attempting to unregister .. seealso:: :obj:`register`, :obj:`attach`, :obj:`unregister_groupby_by_name`, :obj:`is_registered` .. rubric:: Notes Objects registered with the server are immune to deletion until they are unregistered. .. py:method:: unregister_groupby_by_name(user_defined_name: str) -> None Function to unregister GroupBy object by name which was registered with the arkouda server via register() :param user_defined_name: Name under which the GroupBy object was registered :type user_defined_name: str :raises TypeError: if user_defined_name is not a string :raises RegistrationError: if there is an issue attempting to unregister any underlying components .. seealso:: :obj:`register`, :obj:`unregister`, :obj:`attach`, :obj:`is_registered` .. py:method:: update_hdf(prefix_path: str, dataset: str = 'groupby', repack: bool = True) .. py:method:: var(values: pdarray, skipna: bool = True, ddof: int_scalars = 1) -> Tuple[groupable, pdarray] Using the permutation stored in the GroupBy instance, group another array of values and compute the variance of each group's values. :param values: The values to group and find variance :type values: pdarray :param skipna: boolean which determines if NANs should be skipped :type skipna: bool :param ddof: "Delta Degrees of Freedom" used in calculating var :type ddof: int_scalars :returns: * **unique_keys** (*(list of) pdarray or Strings*) -- The unique keys, in grouped order * **group_vars** (*pdarray, float64*) -- One var value per unique key in the GroupBy instance :raises TypeError: Raised if the values array is not a pdarray object :raises ValueError: Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array .. rubric:: Notes The return dtype is always float64. The variance is the average of the squared deviations from the mean, i.e., ``var = mean((x - x.mean())**2)``. The mean is normally calculated as ``x.sum() / N``, where ``N = len(x)``. If, however, `ddof` is specified, the divisor ``N - ddof`` is used instead. In standard statistical practice, ``ddof=1`` provides an unbiased estimator of the variance of a hypothetical infinite population. ``ddof=0`` provides a maximum likelihood estimate of the variance for normally distributed variables. .. rubric:: Examples >>> a = ak.randint(1,5,10) >>> a array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2]) >>> g = ak.GroupBy(a) >>> g.keys array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2]) >>> b = ak.randint(1,5,10) >>> b array([3, 3, 3, 4, 1, 1, 3, 3, 3, 4]) >>> g.var(b) (array([2 3 4]), array([2.333333333333333 1.2 0])) .. py:function:: broadcast(segments: pdarray, values: Union[pdarray, Strings], size: Union[int, np.int64, np.uint64] = -1, permutation: Union[pdarray, None] = None) Broadcast a dense column vector to the rows of a sparse matrix or grouped array. :param segments: Offsets of the start of each row in the sparse matrix or grouped array. Must be sorted in ascending order. :type segments: pdarray, int64 :param values: The values to broadcast, one per row (or group) :type values: pdarray, Strings :param size: The total number of nonzeros in the matrix. If permutation is given, this argument is ignored and the size is inferred from the permutation array. :type size: int :param permutation: The permutation to go from the original ordering of nonzeros to the ordering grouped by row. To broadcast values back to the original ordering, this permutation will be inverted. If no permutation is supplied, it is assumed that the original nonzeros were already grouped by row. In this case, the size argument must be given. :type permutation: pdarray, int64 :returns: The broadcast values, one per nonzero :rtype: pdarray, Strings :raises ValueError: - If segments and values are different sizes - If segments are empty - If number of nonzeros (either user-specified or inferred from permutation) is less than one .. rubric:: Examples >>> # Define a sparse matrix with 3 rows and 7 nonzeros >>> row_starts = ak.array([0, 2, 5]) >>> nnz = 7 # Broadcast the row number to each nonzero element >>> row_number = ak.arange(3) >>> ak.broadcast(row_starts, row_number, nnz) array([0 0 1 1 1 2 2]) # If the original nonzeros were in reverse order... >>> permutation = ak.arange(6, -1, -1) >>> ak.broadcast(row_starts, row_number, permutation=permutation) array([2 2 1 1 1 0 0]) .. py:function:: unique(pda: groupable, return_groups: bool = False, assume_sorted: bool = False, return_indices: bool = False) -> Union[groupable, Tuple[groupable, pdarray, pdarray, int]] Find the unique elements of an array. Returns the unique elements of an array, sorted if the values are integers. There is an optional output in addition to the unique elements: the number of times each unique value comes up in the input array. :param pda: Input array. :type pda: (list of) pdarray, Strings, or Categorical :param return_groups: If True, also return grouping information for the array. :type return_groups: bool, optional :param assume_sorted: If True, assume pda is sorted and skip sorting step :type assume_sorted: bool, optional :param return_indices: Only applicable if return_groups is True. If True, return unique key indices along with other groups :type return_indices: bool, optional :returns: * **unique** (*(list of) pdarray, Strings, or Categorical*) -- The unique values. If input dtype is int64, return values will be sorted. * **permutation** (*pdarray, optional*) -- Permutation that groups equivalent values together (only when return_groups=True) * **segments** (*pdarray, optional*) -- The offset of each group in the permuted array (only when return_groups=True) :raises TypeError: Raised if pda is not a pdarray or Strings object :raises RuntimeError: Raised if the pdarray or Strings dtype is unsupported .. rubric:: Notes For integer arrays, this function checks to see whether `pda` is sorted and, if so, whether it is already unique. This step can save considerable computation. Otherwise, this function will sort `pda`. .. rubric:: Examples >>> A = ak.array([3, 2, 1, 1, 2, 3]) >>> ak.unique(A) array([1, 2, 3])