arkouda.util ¶

Notes

Objects registered with the server are immune to deletion until they are unregistered.

registered_name = None¶

reverse = False¶

special_objType = 'BitVector'¶

to_list()[source]¶: Export data to a list of string-formatted bit vectors.

to_ndarray()[source]¶: Export data to a numpy array of string-formatted bit vectors.

values¶

width = 64¶

arkouda.util.BitVectorizer(width=64, reverse=False)[source]¶

Make a callback (i.e. function) that can be called on an array to create a BitVector.

Parameters:

width (int) – The number of bit fields in the vector
reverse (bool) – If True, display bits from least significant (left) to most significant (right). By default, the most significant bit is the left-most bit.

Returns:

bitvectorizer – A function that takes an array and returns a BitVector instance

Return type:

callable

class arkouda.util.Datetime(pda, unit: str = _BASE_UNIT)[source]¶

Bases: _AbstractBaseTime

Represents a date and/or time.

Datetime is the Arkouda analog to pandas DatetimeIndex and other timeseries data types.

Parameters:

pda (int64 pdarray, pd.DatetimeIndex, pd.Series, or np.datetime64 array)
unit (str, default 'ns') –
For int64 pdarray, denotes the unit of the input. Ignored for pandas and numpy arrays, which carry their own unit. Not case-sensitive; prefixes of full names (like ‘sec’) are accepted.

Possible values:
- ’weeks’ or ‘w’
- ’days’ or ‘d’
- ’hours’ or ‘h’
- ’minutes’, ‘m’, or ‘t’
- ’seconds’ or ‘s’
- ’milliseconds’, ‘ms’, or ‘l’
- ’microseconds’, ‘us’, or ‘u’
- ’nanoseconds’, ‘ns’, or ‘n’
Unlike in pandas, units cannot be combined or mixed with integers

Notes

The .values attribute is always in nanoseconds with int64 dtype.

property date¶

property day¶

property day_of_week¶

property day_of_year¶

property dayofweek¶

property dayofyear¶

property hour¶

property is_leap_year¶

is_registered() → numpy.bool_[source]¶

Return True iff the object is contained in the registry or is a component of a registered object.

Returns:: Indicates if the object is contained in the registry
Return type:: numpy.bool
Raises:: RegistrationError – Raised if there’s a server-side error or a mis-match of registered components

See also

register, attach, unregister

Notes

Objects registered with the server are immune to deletion until they are unregistered.

isocalendar()[source]¶

property microsecond¶

property millisecond¶

property minute¶

property month¶

property nanosecond¶

register(user_defined_name)[source]¶

Parameters:

user_defined_name (str) – user defined name the Datetime is to be registered under, this will be the root name for underlying components

Returns:

The same Datetime which is now registered with the arkouda server and has an updated name. This is an in-place modification, the original is returned to support a fluid programming style. Please note you cannot register two different Datetimes with the same name.

Return type:

Datetime

Raises:

TypeError – Raised if user_defined_name is not a str
RegistrationError – If the server was unable to register the Datetimes with the user_defined_name

See also

Notes

Objects registered with the server are immune to deletion until they are unregistered.

property second¶

special_objType = 'Datetime'¶

sum()[source]¶: Return the sum of all elements in the array.

supported_opeq¶

supported_with_datetime¶

supported_with_pdarray¶

supported_with_r_datetime¶

supported_with_r_pdarray¶

supported_with_r_timedelta¶

supported_with_timedelta¶

to_pandas()[source]¶: Convert array to a pandas DatetimeIndex. Note: if the array size exceeds client.maxTransferBytes, a RuntimeError is raised.

See also

to_ndarray

unregister()[source]¶

Unregister this Datetime object in the arkouda server which was previously registered using register() and/or attached to using attach()

Raises:: RegistrationError – If the object is already unregistered or if there is a server error when attempting to unregister

See also

register, attach, is_registered

Notes

Objects registered with the server are immune to deletion until they are unregistered.

property week¶

property weekday¶

property weekofyear¶

property year¶

class arkouda.util.GroupBy[source]¶

Group an array or list of arrays by value, usually in preparation for aggregating the within-group values of another array.

Parameters:

keys ((list of) pdarray, Strings, or Categorical) – The array to group by value, or if list, the column arrays to group by row
assume_sorted (bool) – If True, assume keys is already sorted (Default: False)

nkeys¶

The number of key arrays (columns)

Type:: int

size[source]¶

The length of the input array(s), i.e. number of rows

Type:: int

permutation¶

The permutation that sorts the keys array(s) by value (row)

Type:: pdarray

unique_keys¶

The unique values of the keys array(s), in grouped order

Type:: (list of) pdarray, Strings, or Categorical

ngroups¶

The length of the unique_keys array(s), i.e. number of groups

Type:: int

segments¶

The start index of each group in the grouped array(s)

Type:: pdarray

logger¶

Used for all logging operations

Type:: ArkoudaLogger

dropna¶

If True, and the groupby keys contain NaN values, the NaN values together with the corresponding row will be dropped. Otherwise, the rows corresponding to NaN values will be kept.

Type:: bool (default=True)

Raises:: TypeError – Raised if keys is a pdarray with a dtype other than int64

Notes

Integral pdarrays, Strings, and Categoricals are natively supported, but float64 and bool arrays are not.

For a user-defined class to be groupable, it must inherit from pdarray and define or overload the grouping API:

a ._get_grouping_keys() method that returns a list of pdarrays that can be (co)argsorted.

(Optional) a .group() method that returns the permutation that groups the array

If the input is a single array with a .group() method defined, method 2 will be used; otherwise, method 1 will be used.

AND(values: pdarray) → Tuple[pdarray | List[pdarray | Strings], pdarray][source]¶

Bitwise AND of values in each segment.

Using the permutation stored in the GroupBy instance, group another array of values and perform a bitwise AND reduction on each group.

Parameters:

values (pdarray, int64) – The values to group and reduce with AND

Returns:

unique_keys ((list of) pdarray or Strings) – The unique keys, in grouped order
result (pdarray, int64) – Bitwise AND of values in segments corresponding to keys

Raises:

TypeError – Raised if the values array is not a pdarray or if the pdarray dtype is not int64
ValueError – Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array
RuntimeError – Raised if all is not supported for the values dtype

OR(values: pdarray) → Tuple[pdarray | List[pdarray | Strings], pdarray][source]¶

Bitwise OR of values in each segment.

Using the permutation stored in the GroupBy instance, group another array of values and perform a bitwise OR reduction on each group.

Parameters:

values (pdarray, int64) – The values to group and reduce with OR

Returns:

unique_keys ((list of) pdarray or Strings) – The unique keys, in grouped order
result (pdarray, int64) – Bitwise OR of values in segments corresponding to keys

Raises:

TypeError – Raised if the values array is not a pdarray or if the pdarray dtype is not int64
ValueError – Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array
RuntimeError – Raised if all is not supported for the values dtype

Reductions(*args, **kwargs)¶

frozenset() -> empty frozenset object frozenset(iterable) -> frozenset object

Build an immutable unordered collection of unique elements.

XOR(values: pdarray) → Tuple[pdarray | List[pdarray | Strings], pdarray][source]¶

Bitwise XOR of values in each segment.

Using the permutation stored in the GroupBy instance, group another array of values and perform a bitwise XOR reduction on each group.

Parameters:

values (pdarray, int64) – The values to group and reduce with XOR

Returns:

unique_keys ((list of) pdarray or Strings) – The unique keys, in grouped order
result (pdarray, int64) – Bitwise XOR of values in segments corresponding to keys

Raises:

TypeError – Raised if the values array is not a pdarray or if the pdarray dtype is not int64
ValueError – Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array
RuntimeError – Raised if all is not supported for the values dtype

aggregate(values: groupable, operator: str, skipna: bool = True, ddof: int_scalars = 1) → Tuple[groupable, groupable][source]¶

Using the permutation stored in the GroupBy instance, group another array of values and apply a reduction to each group’s values.

Parameters:

values (pdarray) – The values to group and reduce
operator (str) – The name of the reduction operator to use
skipna (bool) – boolean which determines if NANs should be skipped
ddof (int_scalars) – “Delta Degrees of Freedom” used in calculating std

Returns:

unique_keys (groupable) – The unique keys, in grouped order
aggregates (groupable) – One aggregate value per unique key in the GroupBy instance

Raises:

TypeError – Raised if the values array is not a pdarray
ValueError – Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array
RuntimeError – Raised if the requested operator is not supported for the values dtype

Examples

>>> keys = ak.arange(0, 10)
>>> vals = ak.linspace(-1, 1, 10)
>>> g = ak.GroupBy(keys)
>>> g.aggregate(vals, 'sum')
(array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]), array([-1, -0.77777777777777768,
-0.55555555555555536, -0.33333333333333348, -0.11111111111111116,
0.11111111111111116, 0.33333333333333348, 0.55555555555555536, 0.77777777777777768,
1]))
>>> g.aggregate(vals, 'min')
(array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]), array([-1, -0.77777777777777779,
-0.55555555555555558, -0.33333333333333337, -0.11111111111111116, 0.11111111111111116,
0.33333333333333326, 0.55555555555555536, 0.77777777777777768, 1]))

all(values: pdarray) → Tuple[pdarray | List[pdarray | Strings], pdarray][source]¶

Using the permutation stored in the GroupBy instance, group another array of values and perform an “and” reduction on each group.

Parameters:

values (pdarray, bool) – The values to group and reduce with “and”

Returns:

unique_keys ((list of) pdarray or Strings) – The unique keys, in grouped order
group_any (pdarray, bool) – One bool per unique key in the GroupBy instance

Raises:

TypeError – Raised if the values array is not a pdarray or if the pdarray dtype is not bool
ValueError – Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array
RuntimeError – Raised if all is not supported for the values dtype

any(values: pdarray) → Tuple[pdarray | List[pdarray | Strings], pdarray][source]¶

Using the permutation stored in the GroupBy instance, group another array of values and perform an “or” reduction on each group.

Parameters:

values (pdarray, bool) – The values to group and reduce with “or”

Returns:

unique_keys ((list of) pdarray or Strings) – The unique keys, in grouped order
group_any (pdarray, bool) – One bool per unique key in the GroupBy instance

Raises:

TypeError – Raised if the values array is not a pdarray or if the pdarray dtype is not bool
ValueError – Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array

argmax(values: pdarray) → Tuple[groupable, pdarray][source]¶

Using the permutation stored in the GroupBy instance, group another array of values and return the location of the first maximum of each group’s values.

Parameters:

values (pdarray) – The values to group and find argmax

Returns:

unique_keys ((list of) pdarray or Strings) – The unique keys, in grouped order
group_argmaxima (pdarray, int64) – One index per unique key in the GroupBy instance

Raises:

TypeError – Raised if the values array is not a pdarray object or if argmax is not supported for the values dtype
ValueError – Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array

Notes

The returned indices refer to the original values array as passed in, not the permutation applied by the GroupBy instance.

Examples

>>> a = ak.randint(1,5,10)
>>> a
array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2])
>>> g = ak.GroupBy(a)
>>> g.keys
array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2])
>>> b = ak.randint(1,5,10)
>>> b
array([3, 3, 3, 4, 1, 1, 3, 3, 3, 4])
>>> g.argmax(b)
(array([2, 3, 4]), array([9, 3, 2]))

argmin(values: pdarray) → Tuple[groupable, pdarray][source]¶

Using the permutation stored in the GroupBy instance, group another array of values and return the location of the first minimum of each group’s values.

Parameters:

values (pdarray) – The values to group and find argmin

Returns:

unique_keys ((list of) pdarray or Strings) – The unique keys, in grouped order
group_argminima (pdarray, int64) – One index per unique key in the GroupBy instance

Raises:

TypeError – Raised if the values array is not a pdarray object or if argmax is not supported for the values dtype
ValueError – Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array
RuntimeError – Raised if argmin is not supported for the values dtype

Notes

The returned indices refer to the original values array as passed in, not the permutation applied by the GroupBy instance.

Examples

>>> a = ak.randint(1,5,10)
>>> a
array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2])
>>> g = ak.GroupBy(a)
>>> g.keys
array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2])
>>> b = ak.randint(1,5,10)
>>> b
array([3, 3, 3, 4, 1, 1, 3, 3, 3, 4])
>>> g.argmin(b)
(array([2, 3, 4]), array([5, 4, 2]))

attach(user_defined_name: str) → GroupBy[source]¶

Function to return a GroupBy object attached to the registered name in the arkouda server which was registered using register()

Parameters:: user_defined_name (str) – user defined name which GroupBy object was registered under
Returns:: The GroupBy object created by re-attaching to the corresponding server components
Return type:: GroupBy
Raises:: RegistrationError – if user_defined_name is not registered

broadcast(values: pdarray | Strings, permute: bool = True) → pdarray | Strings[source]¶

Fill each group’s segment with a constant value.

Parameters:

values (pdarray, Strings) – The values to put in each group’s segment
permute (bool) – If True (default), permute broadcast values back to the ordering of the original array on which GroupBy was called. If False, the broadcast values are grouped by value.

Returns:

The broadcasted values

Return type:

pdarray, Strings

Raises:

TypeError – Raised if value is not a pdarray object
ValueError – Raised if the values array does not have one value per segment

Notes

This function is a sparse analog of np.broadcast. If a GroupBy object represents a sparse matrix (tensor), then this function takes a (dense) column vector and replicates each value to the non-zero elements in the corresponding row.

Examples

>>> a = ak.array([0, 1, 0, 1, 0])
>>> values = ak.array([3, 5])
>>> g = ak.GroupBy(a)
# By default, result is in original order
>>> g.broadcast(values)
array([3, 5, 3, 5, 3])
# With permute=False, result is in grouped order
>>> g.broadcast(values, permute=False)
array([3, 3, 3, 5, 5]
>>> a = ak.randint(1,5,10)
>>> a
array([3, 1, 4, 4, 4, 1, 3, 3, 2, 2])
>>> g = ak.GroupBy(a)
>>> keys,counts = g.size()
>>> g.broadcast(counts > 2)
array([True False True True True False True True False False])
>>> g.broadcast(counts == 3)
array([True False True True True False True True False False])
>>> g.broadcast(counts < 4)
array([True True True True True True True True True True])

build_from_components(user_defined_name: str | None = None, **kwargs) → GroupBy[source]¶

function to build a new GroupBy object from component keys and permutation.

Parameters:

user_defined_name (str (Optional) Passing a name will init the new GroupBy) – and assign it the given name
kwargs (dict Dictionary of components required for rebuilding the GroupBy.) – Expected keys are “orig_keys”, “permutation”, “unique_keys”, and “segments”

Returns:

The GroupBy object created by using the given components

Return type:

GroupBy

count(values: pdarray) → Tuple[groupable, pdarray][source]¶

Count the number of elements in each group. NaN values will be excluded from the total.

Parameters:

values (pdarray) – The values to be count by group (excluding NaN values).

Returns:

unique_keys ((list of) pdarray or Strings) – The unique keys, in grouped order
counts (pdarray, int64) – The number of times each unique key appears (excluding NaN values).

Examples

>>> a = ak.array([1, 0, -1, 1, 0, -1])
>>> a
array([1 0 -1 1 0 -1])
>>> b = ak.array([1, np.nan, -1, np.nan, np.nan, -1], dtype = "float64")
>>> b
array([1.00000000000000000 nan -1.00000000000000000 nan nan -1.00000000000000000])
>>> g = ak.GroupBy(a)
>>> keys,counts = g.count(b)
>>> keys
array([-1 0 1])
>>> counts
array([2 0 1])

first(values: groupable_element_type) → Tuple[groupable, groupable_element_type][source]¶

First value in each group.

Parameters:

values (pdarray-like) – The values from which to take the first of each group

Returns:

unique_keys ((list of) pdarray-like) – The unique keys, in grouped order
result (pdarray-like) – The first value of each group

from_return_msg(rep_msg)[source]¶

head(values: groupable_element_type, n: int = 5, return_indices: bool = True) → Tuple[groupable, groupable_element_type][source]¶

Return the first n values from each group.

Parameters:

values ((list of) pdarray-like) – The values from which to select, according to their group membership.
n (int, optional, default = 5) – Maximum number of items to return for each group. If the number of values in a group is less than n, all the values from that group will be returned.
return_indices (bool, default False) – If True, return the indices of the sampled values. Otherwise, return the selected values.

Returns:

unique_keys ((list of) pdarray-like) – The unique keys, in grouped order
result (pdarray-like) – The first n items of each group. If return_indices is True, the result are indices. O.W. the result are values.

Examples

>>> a = ak.arange(10) %3
>>> a
array([0 1 2 0 1 2 0 1 2 0])
>>> v = ak.arange(10)
>>> v
array([0 1 2 3 4 5 6 7 8 9])
>>> g = GroupBy(a)
>>> unique_keys, idx = g.head(v, 2, return_indices=True)
>>> _, values = g.head(v, 2, return_indices=False)
>>> unique_keys
array([0 1 2])
>>> idx
array([0 3 1 4 2 5])
>>> values
array([0 3 1 4 2 5])

>>> v2 =  -2 * ak.arange(10)
>>> v2
array([0 -2 -4 -6 -8 -10 -12 -14 -16 -18])
>>> _, idx2 = g.head(v2, 2, return_indices=True)
>>> _, values2 = g.head(v2, 2, return_indices=False)
>>> idx2
array([0 3 1 4 2 5])
>>> values2
array([0 -6 -2 -8 -4 -10])

is_registered() → bool[source]¶

Return True if the object is contained in the registry

Returns:: Indicates if the object is contained in the registry
Return type:: bool
Raises:: RegistrationError – Raised if there’s a server-side error or a mismatch of registered components

Notes

Objects registered with the server are immune to deletion until they are unregistered.

max(values: pdarray, skipna: bool = True) → Tuple[groupable, pdarray][source]¶

Using the permutation stored in the GroupBy instance, group another array of values and return the maximum of each group’s values.

Parameters:

values (pdarray) – The values to group and find maxima
skipna (bool) – boolean which determines if NANs should be skipped

Returns:

unique_keys ((list of) pdarray or Strings) – The unique keys, in grouped order
group_maxima (pdarray) – One maximum per unique key in the GroupBy instance

Raises:

TypeError – Raised if the values array is not a pdarray object or if max is not supported for the values dtype
ValueError – Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array
RuntimeError – Raised if max is not supported for the values dtype

Examples

>>> a = ak.randint(1,5,10)
>>> a
array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2])
>>> g = ak.GroupBy(a)
>>> g.keys
array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2])
>>> b = ak.randint(1,5,10)
>>> b
array([3, 3, 3, 4, 1, 1, 3, 3, 3, 4])
>>> g.max(b)
(array([2, 3, 4]), array([4, 4, 3]))

mean(values: pdarray, skipna: bool = True) → Tuple[groupable, pdarray][source]¶

Using the permutation stored in the GroupBy instance, group another array of values and compute the mean of each group’s values.

Parameters:

values (pdarray) – The values to group and average
skipna (bool) – boolean which determines if NANs should be skipped

Returns:

unique_keys ((list of) pdarray or Strings) – The unique keys, in grouped order
group_means (pdarray, float64) – One mean value per unique key in the GroupBy instance

Raises:

TypeError – Raised if the values array is not a pdarray object
ValueError – Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array

Notes

The return dtype is always float64.

Examples

>>> a = ak.randint(1,5,10)
>>> a
array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2])
>>> g = ak.GroupBy(a)
>>> g.keys
array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2])
>>> b = ak.randint(1,5,10)
>>> b
array([3, 3, 3, 4, 1, 1, 3, 3, 3, 4])
>>> g.mean(b)
(array([2, 3, 4]), array([2.6666666666666665, 2.7999999999999998, 3]))

median(values: pdarray, skipna: bool = True) → Tuple[groupable, pdarray][source]¶

Using the permutation stored in the GroupBy instance, group another array of values and compute the median of each group’s values.

Parameters:

values (pdarray) – The values to group and find median
skipna (bool) – boolean which determines if NANs should be skipped

Returns:

unique_keys ((list of) pdarray or Strings) – The unique keys, in grouped order
group_medians (pdarray, float64) – One median value per unique key in the GroupBy instance

Raises:

TypeError – Raised if the values array is not a pdarray object
ValueError – Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array

Notes

The return dtype is always float64.

Examples

>>> a = ak.randint(1,5,9)
>>> a
array([4 1 4 3 2 2 2 3 3])
>>> g = ak.GroupBy(a)
>>> g.keys
array([4 1 4 3 2 2 2 3 3])
>>> b = ak.linspace(-5,5,9)
>>> b
array([-5 -3.75 -2.5 -1.25 0 1.25 2.5 3.75 5])
>>> g.median(b)
(array([1 2 3 4]), array([-3.75 1.25 3.75 -3.75]))

min(values: pdarray, skipna: bool = True) → Tuple[groupable, pdarray][source]¶

Using the permutation stored in the GroupBy instance, group another array of values and return the minimum of each group’s values.

Parameters:

values (pdarray) – The values to group and find minima
skipna (bool) – boolean which determines if NANs should be skipped

Returns:

unique_keys ((list of) pdarray or Strings) – The unique keys, in grouped order
group_minima (pdarray) – One minimum per unique key in the GroupBy instance

Raises:

TypeError – Raised if the values array is not a pdarray object or if min is not supported for the values dtype
ValueError – Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array
RuntimeError – Raised if min is not supported for the values dtype

Examples

>>> a = ak.randint(1,5,10)
>>> a
array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2])
>>> g = ak.GroupBy(a)
>>> g.keys
array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2])
>>> b = ak.randint(1,5,10)
>>> b
array([3, 3, 3, 4, 1, 1, 3, 3, 3, 4])
>>> g.min(b)
(array([2, 3, 4]), array([1, 1, 3]))

mode(values: groupable) → Tuple[groupable, groupable][source]¶

Most common value in each group. If a group is multi-modal, return the modal value that occurs first.

Parameters:

values ((list of) pdarray-like) – The values from which to take the mode of each group

Returns:

unique_keys ((list of) pdarray-like) – The unique keys, in grouped order
result ((list of) pdarray-like) – The most common value of each group

most_common(values)[source]¶: (Deprecated) See GroupBy.mode().

nunique(values: groupable) → Tuple[groupable, pdarray][source]¶

Using the permutation stored in the GroupBy instance, group another array of values and return the number of unique values in each group.

Parameters:

values (pdarray, int64) – The values to group and find unique values

Returns:

unique_keys (groupable) – The unique keys, in grouped order
group_nunique (groupable) – Number of unique values per unique key in the GroupBy instance

Raises:

TypeError – Raised if the dtype(s) of values array(s) does/do not support the nunique method
ValueError – Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array
RuntimeError – Raised if nunique is not supported for the values dtype

Examples

>>> data = ak.array([3, 4, 3, 1, 1, 4, 3, 4, 1, 4])
>>> data
array([3, 4, 3, 1, 1, 4, 3, 4, 1, 4])
>>> labels = ak.array([1, 1, 1, 2, 2, 2, 3, 3, 3, 4])
>>> labels
ak.array([1, 1, 1, 2, 2, 2, 3, 3, 3, 4])
>>> g = ak.GroupBy(labels)
>>> g.keys
ak.array([1, 1, 1, 2, 2, 2, 3, 3, 3, 4])
>>> g.nunique(data)
array([1,2,3,4]), array([2, 2, 3, 1])
#    Group (1,1,1) has values [3,4,3] -> there are 2 unique values 3&4
#    Group (2,2,2) has values [1,1,4] -> 2 unique values 1&4
#    Group (3,3,3) has values [3,4,1] -> 3 unique values
#    Group (4) has values [4] -> 1 unique value

objType(*args, **kwargs)¶

str(object=’’) -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.__str__() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to ‘strict’.

prod(values: pdarray, skipna: bool = True) → Tuple[groupable, pdarray][source]¶

Using the permutation stored in the GroupBy instance, group another array of values and compute the product of each group’s values.

Parameters:

values (pdarray) – The values to group and multiply
skipna (bool) – boolean which determines if NANs should be skipped

Returns:

unique_keys ((list of) pdarray or Strings) – The unique keys, in grouped order
group_products (pdarray, float64) – One product per unique key in the GroupBy instance

Raises:

TypeError – Raised if the values array is not a pdarray object
ValueError – Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array
RuntimeError – Raised if prod is not supported for the values dtype

Notes

The return dtype is always float64.

Examples

>>> a = ak.randint(1,5,10)
>>> a
array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2])
>>> g = ak.GroupBy(a)
>>> g.keys
array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2])
>>> b = ak.randint(1,5,10)
>>> b
array([3, 3, 3, 4, 1, 1, 3, 3, 3, 4])
>>> g.prod(b)
(array([2, 3, 4]), array([12, 108.00000000000003, 8.9999999999999982]))

register(user_defined_name: str) → GroupBy[source]¶

Parameters:

user_defined_name (str) – user defined name the GroupBy is to be registered under, this will be the root name for underlying components

Returns:

The same GroupBy which is now registered with the arkouda server and has an updated name. This is an in-place modification, the original is returned to support a fluid programming style. Please note you cannot register two different GroupBys with the same name.

Return type:

GroupBy

Raises:

TypeError – Raised if user_defined_name is not a str
RegistrationError – If the server was unable to register the GroupBy with the user_defined_name

Notes

Objects registered with the server are immune to deletion until they are unregistered.

sample(values: groupable, n=None, frac=None, replace=False, weights=None, random_state=None, return_indices=False, permute_samples=False)[source]¶

Return a random sample from each group. You can either specify the number of elements or the fraction of elements to be sampled. random_state can be used for reproducibility

Parameters:

values ((list of) pdarray-like) – The values from which to sample, according to their group membership.
n (int, optional) – Number of items to return for each group. Cannot be used with frac and must be no larger than the smallest group unless replace is True. Default is one if frac is None.
frac (float, optional) – Fraction of items to return. Cannot be used with n.
replace (bool, default False) – Allow or disallow sampling of the value more than once.
weights (pdarray, optional) – Default None results in equal probability weighting. If passed a pdarray, then values must have the same length as the groupby keys and will be used as sampling probabilities after normalization within each group. Weights must be non-negative with at least one positive element within each group.
random_state (int or ak.random.Generator, optional) – If int, seed for random number generator. If ak.random.Generator, use as given.
return_indices (bool, default False) – if True, return the indices of the sampled values. Otherwise, return the sample values.
permute_samples (bool, default False) – if True, return permute the samples according to group Otherwise, keep samples in original order.

Returns:

if return_indices is True, return the indices of the sampled values. Otherwise, return the sample values.

Return type:

size() → Tuple[groupable, pdarray][source]¶

Count the number of elements in each group, i.e. the number of times each key appears. This counts the total number of rows (including NaN values).

Parameters:

none

Returns:

unique_keys ((list of) pdarray or Strings) – The unique keys, in grouped order
counts (pdarray, int64) – The number of times each unique key appears

See also

count

Examples

>>> a = ak.randint(1,5,10)
>>> a
array([3, 2, 3, 1, 2, 4, 3, 4, 3, 4])
>>> g = ak.GroupBy(a)
>>> keys,counts = g.size()
>>> keys
array([1, 2, 3, 4])
>>> counts
array([1, 2, 4, 3])

std(values: pdarray, skipna: bool = True, ddof: int_scalars = 1) → Tuple[groupable, pdarray][source]¶

Using the permutation stored in the GroupBy instance, group another array of values and compute the standard deviation of each group’s values.

Parameters:

values (pdarray) – The values to group and find standard deviation
skipna (bool) – boolean which determines if NANs should be skipped
ddof (int_scalars) – “Delta Degrees of Freedom” used in calculating std

Returns:

unique_keys ((list of) pdarray or Strings) – The unique keys, in grouped order
group_stds (pdarray, float64) – One std value per unique key in the GroupBy instance

Raises:

TypeError – Raised if the values array is not a pdarray object
ValueError – Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array

Notes

The return dtype is always float64.

The standard deviation is the square root of the average of the squared deviations from the mean, i.e., std = sqrt(mean((x - x.mean())**2)).

The average squared deviation is normally calculated as x.sum() / N, where N = len(x). If, however, ddof is specified, the divisor N - ddof is used instead. In standard statistical practice, ddof=1 provides an unbiased estimator of the variance of the infinite population. ddof=0 provides a maximum likelihood estimate of the variance for normally distributed variables. The standard deviation computed in this function is the square root of the estimated variance, so even with ddof=1, it will not be an unbiased estimate of the standard deviation per se.

Examples

>>> a = ak.randint(1,5,10)
>>> a
array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2])
>>> g = ak.GroupBy(a)
>>> g.keys
array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2])
>>> b = ak.randint(1,5,10)
>>> b
array([3, 3, 3, 4, 1, 1, 3, 3, 3, 4])
>>> g.std(b)
(array([2 3 4]), array([1.5275252316519465 1.0954451150103321 0]))

sum(values: pdarray, skipna: bool = True) → Tuple[groupable, pdarray][source]¶

Using the permutation stored in the GroupBy instance, group another array of values and sum each group’s values.

Parameters:

values (pdarray) – The values to group and sum
skipna (bool) – boolean which determines if NANs should be skipped

Returns:

unique_keys ((list of) pdarray or Strings) – The unique keys, in grouped order
group_sums (pdarray) – One sum per unique key in the GroupBy instance

Raises:

TypeError – Raised if the values array is not a pdarray object
ValueError – Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array

Notes

The grouped sum of a boolean pdarray returns integers.

Examples

>>> a = ak.randint(1,5,10)
>>> a
array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2])
>>> g = ak.GroupBy(a)
>>> g.keys
array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2])
>>> b = ak.randint(1,5,10)
>>> b
array([3, 3, 3, 4, 1, 1, 3, 3, 3, 4])
>>> g.sum(b)
(array([2, 3, 4]), array([8, 14, 6]))

tail(values: groupable_element_type, n: int = 5, return_indices: bool = True) → Tuple[groupable, groupable_element_type][source]¶

Return the last n values from each group.

Parameters:

values ((list of) pdarray-like) – The values from which to select, according to their group membership.
n (int, optional, default = 5) – Maximum number of items to return for each group. If the number of values in a group is less than n, all the values from that group will be returned.
return_indices (bool, default False) – If True, return the indices of the sampled values. Otherwise, return the selected values.

Returns:

unique_keys ((list of) pdarray-like) – The unique keys, in grouped order
result (pdarray-like) – The last n items of each group. If return_indices is True, the result are indices. O.W. the result are values.

Examples

>>> a = ak.arange(10) %3
>>> a
array([0 1 2 0 1 2 0 1 2 0])
>>> v = ak.arange(10)
>>> v
array([0 1 2 3 4 5 6 7 8 9])
>>> g = GroupBy(a)
>>> unique_keys, idx = g.tail(v, 2, return_indices=True)
>>> _, values = g.tail(v, 2, return_indices=False)
>>> unique_keys
array([0 1 2])
>>> idx
array([6 9 4 7 5 8])
>>> values
array([6 9 4 7 5 8])

>>> v2 =  -2 * ak.arange(10)
>>> v2
array([0 -2 -4 -6 -8 -10 -12 -14 -16 -18])
>>> _, idx2 = g.tail(v2, 2, return_indices=True)
>>> _, values2 = g.tail(v2, 2, return_indices=False)
>>> idx2
array([6 9 4 7 5 8])
>>> values2
array([-12 -18 -8 -14 -10 -16])

to_hdf(prefix_path, dataset='groupby', mode='truncate', file_type='distribute')[source]¶

Save the GroupBy to HDF5. The result is a collection of HDF5 files, one file per locale of the arkouda server, where each filename starts with prefix_path.

Parameters:

prefix_path (str) – Directory and filename prefix that all output files will share
dataset (str) – Name prefix for saved data within the HDF5 file
mode (str {'truncate' | 'append'}) – By default, truncate (overwrite) output files, if they exist. If ‘append’, add data as a new column to existing files.
file_type (str ("single" | "distribute")) – Default: “distribute” When set to single, dataset is written to a single file. When distribute, dataset is written on a file per locale. This is only supported by HDF5 files and will have no impact of Parquet Files.

Returns:

None
GroupBy is not currently supported by Parquet

unique(values: groupable)[source]¶

Return the set of unique values in each group, as a SegArray.

Parameters:

values ((list of) pdarray-like) – The values to unique

Returns:

unique_keys ((list of) pdarray-like) – The unique keys, in grouped order
result ((list of) SegArray) – The unique values of each group

Raises:

TypeError – Raised if values is or contains Strings or Categorical

unregister()[source]¶

Unregister this GroupBy object in the arkouda server which was previously registered using register() and/or attached to using attach()

Raises:: RegistrationError – If the object is already unregistered or if there is a server error when attempting to unregister

Notes

Objects registered with the server are immune to deletion until they are unregistered.

unregister_groupby_by_name(user_defined_name: str) → None[source]¶

Function to unregister GroupBy object by name which was registered with the arkouda server via register()

Parameters:

user_defined_name (str) – Name under which the GroupBy object was registered

Raises:

TypeError – if user_defined_name is not a string
RegistrationError – if there is an issue attempting to unregister any underlying components

update_hdf(prefix_path: str, dataset: str = 'groupby', repack: bool = True)[source]¶

var(values: pdarray, skipna: bool = True, ddof: int_scalars = 1) → Tuple[groupable, pdarray][source]¶

Using the permutation stored in the GroupBy instance, group another array of values and compute the variance of each group’s values.

Parameters:

values (pdarray) – The values to group and find variance
skipna (bool) – boolean which determines if NANs should be skipped
ddof (int_scalars) – “Delta Degrees of Freedom” used in calculating var

Returns:

unique_keys ((list of) pdarray or Strings) – The unique keys, in grouped order
group_vars (pdarray, float64) – One var value per unique key in the GroupBy instance

Raises:

TypeError – Raised if the values array is not a pdarray object
ValueError – Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array

Notes

The return dtype is always float64.

The variance is the average of the squared deviations from the mean, i.e., var = mean((x - x.mean())**2).

The mean is normally calculated as x.sum() / N, where N = len(x). If, however, ddof is specified, the divisor N - ddof is used instead. In standard statistical practice, ddof=1 provides an unbiased estimator of the variance of a hypothetical infinite population. ddof=0 provides a maximum likelihood estimate of the variance for normally distributed variables.

Examples

>>> a = ak.randint(1,5,10)
>>> a
array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2])
>>> g = ak.GroupBy(a)
>>> g.keys
array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2])
>>> b = ak.randint(1,5,10)
>>> b
array([3, 3, 3, 4, 1, 1, 3, 3, 3, 4])
>>> g.var(b)
(array([2 3 4]), array([2.333333333333333 1.2 0]))

class arkouda.util.IPv4(values)[source]¶

Bases: arkouda.numpy.pdarrayclass.pdarray

Represent integers as IPv4 addresses.

Parameters:: values (pdarray, int64) – The integer IP addresses
Returns:: The same IP addresses
Return type:: IPv4

Notes

This class is a thin wrapper around pdarray that mostly affects how values are displayed to the user. Operators and methods will typically treat this class like an int64 pdarray.

export_uint()[source]¶

format(x)[source]¶: Format a single integer IP address as a string.

normalize(x)[source]¶: Take in an IP address as a string, integer, or IPAddress object, and convert it to an integer.

opeq(other, op)[source]¶

register(user_defined_name)[source]¶

Parameters:

user_defined_name (str) – user defined name the IPv4 is to be registered under, this will be the root name for underlying components

Returns:

The same IPv4 which is now registered with the arkouda server and has an updated name. This is an in-place modification, the original is returned to support a fluid programming style. Please note you cannot register two different IPv4s with the same name.

Return type:

IPv4

Raises:

TypeError – Raised if user_defined_name is not a str
RegistrationError – If the server was unable to register the IPv4 with the user_defined_name

See also

Notes

Objects registered with the server are immune to deletion until they are unregistered.

special_objType = 'IPv4'¶

to_hdf(prefix_path: str, dataset: str = 'array', mode: str = 'truncate', file_type: str = 'distribute')[source]¶: Override of the pdarray to_hdf to store the special object type

to_list()[source]¶: Export array as a list of integers.

to_ndarray()[source]¶: Export array as a numpy array of integers.

update_hdf(prefix_path: str, dataset: str = 'array', repack: bool = True)[source]¶: Override the pdarray implementation so that the special object type will be used.

values¶

arkouda.util.Index¶

class arkouda.util.SegArray(segments, values, lengths=None, grouping=None)[source]¶

AND(x=None)[source]¶

OR(x=None)[source]¶

XOR(x=None)[source]¶

aggregate(op, x=None)[source]¶

all(x=None)[source]¶

any(x=None)[source]¶

append(other, axis=0)[source]¶

Append other to self, either vertically (axis=0, length of resulting SegArray increases), or horizontally (axis=1, each sub-array of other appends to the corresponding sub-array of self).

Parameters:

other (SegArray) – Array of sub-arrays to append
axis (0 or 1) – Whether to append vertically (0) or horizontally (1). If axis=1, other must be same size as self.

Returns:

axis=0: New SegArray containing all sub-arrays axis=1: New SegArray of same length, with pairs of sub-arrays concatenated

Return type:

append_single(x, prepend=False)[source]¶

Append a single value to each sub-array.

Parameters:: x (pdarray or scalar) – Single value to append to each sub-array
Returns:: Copy of original SegArray with values from x appended to each sub-array
Return type:: SegArray

argmax(x=None)[source]¶

argmin(x=None)[source]¶

classmethod attach(user_defined_name)[source]¶

Using the defined name, attach to a SegArray that has been registered to the Symbol Table

Parameters:: user_defined_name (str) – user defined name which the SegArray object was registered under
Returns:: The resulting SegArray
Return type:: SegArray
Raises:: RuntimeError – Raised if the server could not attach to the SegArray object

classmethod concat(x, axis=0, ordered=True)[source]¶

Concatenate a sequence of SegArrays

Parameters:

x (sequence of SegArray) – The SegArrays to concatenate
axis (0 or 1) – Select vertical (0) or horizontal (1) concatenation. If axis=1, all SegArrays must have same size.
ordered (bool) – Must be True. This option is present for compatibility only, because unordered concatenation is not yet supported.

Returns:

The input arrays joined into one SegArray

Return type:

copy()[source]¶: Return a deep copy.

dtype¶

filter(filter, discard_empty: bool = False)[source]¶

Filter values out of the SegArray object

Parameters:

filter (pdarray, list, or value) – The value/s to be filtered out of the SegArray
discard_empty (bool) – Defaults to False. When True, empty segments are removed from the return SegArray

Return type:

classmethod from_multi_array(m)[source]¶

Construct a SegArray from a list of columns. This essentially transposes the input, resulting in an array of rows.

Parameters:: m (list of pdarray or Strings) – List of columns, the rows of which will form the sub-arrays of the output
Returns:: Array of rows of input
Return type:: SegArray

classmethod from_parts(segments, values, lengths=None, grouping=None) → SegArray[source]¶

DEPRECATED Construct a SegArray object from its parts

Parameters:

segments (pdarray, int64) – Start index of each sub-array in the flattened values array
values (pdarray) – The flattened values of all sub-arrays
lengths (pdarray) – The length of each segment
grouping (GroupBy) – grouping of segments

Returns:

Data structure representing an array whose elements are variable-length arrays.

Return type:

Notes

Keyword args ‘lengths’ and ‘grouping’ are not user-facing. They are used by the attach method.

classmethod from_return_msg(rep_msg) → SegArray[source]¶

get_jth(j, return_origins=True, compressed=False, default=0)[source]¶

Select the j-th element of each sub-array, where possible.

Parameters:

j (int) – The index of the value to get from each sub-array. If j is negative, it counts backwards from the end of each sub-array.
return_origins (bool) – If True, return a logical index indicating where j is in bounds
compressed (bool) – If False, return array is same size as self, with default value where j is out of bounds. If True, the return array only contains values where j is in bounds.
default (scalar) – When compressed=False, the value to return when j is out of bounds for the sub-array

Returns:

val (pdarray) – compressed=False: The j-th value of each sub-array where j is in bounds and the default value where j is out of bounds. compressed=True: The j-th values of only the sub-arrays where j is in bounds
origin_indices (pdarray, bool) – A Boolean array that is True where j is in bounds for the sub-array.

Notes

If values are Strings, only the compressed format is supported.

get_length_n(n, return_origins=True)[source]¶

Return all sub-arrays of length n, as a list of columns.

Parameters:

n (int) – Length of sub-arrays to select
return_origins (bool) – Return a logical index indicating which sub-arrays are length n

Returns:

columns (list of pdarray) – An n-long list of pdarray, where each row is one of the n-long sub-arrays from the SegArray. The number of rows is the number of True values in the returned mask.
origin_indices (pdarray, bool) – Array of bool for each element of the SegArray, True where sub-array has length n.

get_ngrams(n, return_origins=True)[source]¶

Return all n-grams from all sub-arrays.

Parameters:

n (int) – Length of n-gram
return_origins (bool) – If True, return an int64 array indicating which sub-array each returned n-gram came from.

Returns:

ngrams (list of pdarray) – An n-long list of pdarrays, essentially a table where each row is an n-gram.
origin_indices (pdarray, int) – The index of the sub-array from which the corresponding n-gram originated

get_prefixes(n, return_origins=True, proper=True)[source]¶

Return all sub-array prefixes of length n (for sub-arrays that are at least n+1 long)

Parameters:

n (int) – Length of suffix
return_origins (bool) – If True, return a logical index indicating which sub-arrays were long enough to return an n-prefix
proper (bool) – If True, only return proper prefixes, i.e. from sub-arrays that are at least n+1 long. If False, allow the entire sub-array to be returned as a prefix.

Returns:

prefixes (list of pdarray) – An n-long list of pdarrays, essentially a table where each row is an n-prefix. The number of rows is the number of True values in the returned mask.
origin_indices (pdarray, bool) – Boolean array that is True where the sub-array was long enough to return an n-suffix, False otherwise.

get_suffixes(n, return_origins=True, proper=True)[source]¶

Return the n-long suffix of each sub-array, where possible

Parameters:

n (int) – Length of suffix
return_origins (bool) – If True, return a logical index indicating which sub-arrays were long enough to return an n-suffix
proper (bool) – If True, only return proper suffixes, i.e. from sub-arrays that are at least n+1 long. If False, allow the entire sub-array to be returned as a suffix.

Returns:

suffixes (list of pdarray) – An n-long list of pdarrays, essentially a table where each row is an n-suffix. The number of rows is the number of True values in the returned mask.
origin_indices (pdarray, bool) – Boolean array that is True where the sub-array was long enough to return an n-suffix, False otherwise.

property grouping¶

hash() → Tuple[arkouda.numpy.pdarrayclass.pdarray, arkouda.numpy.pdarrayclass.pdarray][source]¶

Compute a 128-bit hash of each segment.

Returns:: A tuple of two int64 pdarrays. The ith hash value is the concatenation of the ith values from each array.
Return type:: Tuple[pdarray,pdarray]

intersect(other)[source]¶

Computes the intersection of 2 SegArrays.

Parameters:: other (SegArray) – SegArray to compute against
Returns:: Segments are the 1d intersections of the segments of self and other
Return type:: SegArray

See also

pdarraysetops.intersect1d

Examples

>>> a = [1, 2, 3, 1, 4]
>>> b = [3, 1, 4, 5]
>>> c = [1, 3, 3, 5]
>>> d = [2, 2, 4]
>>> seg_a = ak.segarray(ak.array([0, len(a)]), ak.array(a+b))
>>> seg_b = ak.segarray(ak.array([0, len(c)]), ak.array(c+d))
>>> seg_a.intersect(seg_b)
SegArray([
[1, 3],
[4]
])

is_registered() → bool[source]¶

Checks if the name of the SegArray object is registered in the Symbol Table

Returns:: True if SegArray is registered, false if not
Return type:: bool

See also

register, unregister, attach

classmethod load(prefix_path, dataset='segarray', segment_name='segments', value_name='values')[source]¶

logger¶

max(x=None)[source]¶

mean(x=None)[source]¶

min(x=None)[source]¶

property nbytes¶

The size of the segarray in bytes.

Returns:: The size of the segarray in bytes.
Return type:: int

property non_empty¶

nunique(x=None)[source]¶

objType = 'SegArray'¶

prepend_single(x)[source]¶

prod(x=None)[source]¶

classmethod read_hdf(prefix_path, dataset='segarray')[source]¶

Load a saved SegArray from HDF5. All arguments must match what was supplied to SegArray.save()

Parameters:

prefix_path (str) – Directory and filename prefix
dataset (str) – Name prefix for saved data within the HDF5 files

Return type:

register(user_defined_name)[source]¶

Parameters:: user_defined_name (str) – user defined name which this SegArray object will be registered under
Returns:: The same SegArray which is now registered with the arkouda server and has an updated name. This is an in-place modification, the original is returned to support a fluid programming style. Please note you cannot register two different SegArrays with the same name.
Return type:: SegArray
Raises:: RegistrationError – Raised if the server could not register the SegArray object

Notes

Objects registered with the server are immune to deletion until they are unregistered.

See also

registered_name: str | None = None¶

remove_repeats(return_multiplicity=False)[source]¶

Condense sequences of repeated values within a sub-array to a single value.

Parameters:

return_multiplicity (bool) – If True, also return the number of times each value was repeated.

Returns:

norepeats (SegArray) – Sub-arrays with runs of repeated values replaced with single value
multiplicity (SegArray) – If return_multiplicity=True, this array contains the number of times each value in the returned SegArray was repeated in the original SegArray.

save(prefix_path, dataset='segarray', mode='truncate', file_type='distribute')[source]¶

DEPRECATED Save the SegArray to HDF5. The object can be saved to a collection of files or single file. :param prefix_path: Directory and filename prefix that all output files share :type prefix_path: str :param dataset: Name of the dataset to create in files (must not already exist) :type dataset: str :param mode: By default, truncate (overwrite) output files, if they exist.

If ‘append’, attempt to create new dataset in existing files.

Parameters:: file_type (str ("single" | "distribute")) – Default: “distribute” When set to single, dataset is written to a single file. When distribute, dataset is written on a file per locale. This is only supported by HDF5 files and will have no impact of Parquet Files.
Return type:: string message indicating result of save operation
Raises:: RuntimeError – Raised if a server-side error is thrown saving the pdarray

Notes

The prefix_path must be visible to the arkouda server and the user must

have write permission. - Output files have names of the form <prefix_path>_LOCALE, where  ranges from 0 to numLocales for file_type=’distribute’. Otherwise, the file name will be prefix_path. - If any of the output files already exist and the mode is ‘truncate’, they will be overwritten. If the mode is ‘append’ and the number of output files is less than the number of locales or a dataset with the same name already exists, a RuntimeError will result. - Any file extension can be used.The file I/O does not rely on the extension to determine the file format.

See also

to_hdf, load

segments¶

set_jth(i, j, v)[source]¶

Set the j-th element of each sub-array in a subset.

Parameters:

i (pdarray, int) – Indices of sub-arrays to set j-th element
j (int) – Index of value to set in each sub-array. If j is negative, it counts backwards from the end of the sub-array.
v (pdarray or scalar) – The value(s) to set. If v is a pdarray, it must have same length as i.

Raises:

ValueError – If j is out of bounds in any of the sub-arrays specified by i.

setdiff(other)[source]¶

Computes the set difference of 2 SegArrays.

Parameters:: other (SegArray) – SegArray to compute against
Returns:: Segments are the 1d set difference of the segments of self and other
Return type:: SegArray

See also

pdarraysetops.setdiff1d

Examples

>>> a = [1, 2, 3, 1, 4]
>>> b = [3, 1, 4, 5]
>>> c = [1, 3, 3, 5]
>>> d = [2, 2, 4]
>>> seg_a = ak.segarray(ak.array([0, len(a)]), ak.array(a+b))
>>> seg_b = ak.segarray(ak.array([0, len(c)]), ak.array(c+d))
>>> seg_a.setdiff(seg_b)
SegArray([
[2, 4],
[1, 3, 5]
])

setxor(other)[source]¶

Computes the symmetric difference of 2 SegArrays.

Parameters:: other (SegArray) – SegArray to compute against
Returns:: Segments are the 1d symmetric difference of the segments of self and other
Return type:: SegArray

See also

pdarraysetops.setxor1d

Examples

>>> a = [1, 2, 3, 1, 4]
>>> b = [3, 1, 4, 5]
>>> c = [1, 3, 3, 5]
>>> d = [2, 2, 4]
>>> seg_a = ak.segarray(ak.array([0, len(a)]), ak.array(a+b))
>>> seg_b = ak.segarray(ak.array([0, len(c)]), ak.array(c+d))
>>> seg_a.setxor(seg_b)
SegArray([
[2, 4, 5],
[1, 3, 5, 2]
])

size¶

sum(x=None)[source]¶

to_hdf(prefix_path, dataset='segarray', mode='truncate', file_type='distribute')[source]¶

Save the SegArray to HDF5. The result is a collection of HDF5 files, one file per locale of the arkouda server, where each filename starts with prefix_path.

Parameters:

prefix_path (str) – Directory and filename prefix that all output files will share
dataset (str) – Name prefix for saved data within the HDF5 file
mode (str {'truncate' | 'append'}) – By default, truncate (overwrite) output files, if they exist. If ‘append’, add data as a new column to existing files.
file_type (str ("single" | "distribute")) – Default: “distribute” When set to single, dataset is written to a single file. When distribute, dataset is written on a file per locale. This is only supported by HDF5 files and will have no impact of Parquet Files.

Return type:

None

See also

load

to_list()[source]¶

Convert the segarray into a list containing sub-arrays

Returns:: A list with the same sub-arrays (also list) as this segarray
Return type:: list

See also

to_ndarray

Examples

>>> segarr = ak.SegArray(ak.array([0, 4, 7]), ak.arange(12))
>>> segarr.to_list()
[[0, 1, 2, 3], [4, 5, 6], [7, 8, 9, 10, 11]]
>>> type(segarr.to_list())
list

to_ndarray()[source]¶

Convert the array into a numpy.ndarray containing sub-arrays

Returns:: A numpy ndarray with the same sub-arrays (also numpy.ndarray) as this array
Return type:: np.ndarray

See also

array, to_list

Examples

>>> segarr = ak.SegArray(ak.array([0, 4, 7]), ak.arange(12))
>>> segarr.to_ndarray()
array([array([1, 2, 3, 4]), array([5, 6, 7]), array([8, 9, 10, 11, 12])])
>>> type(segarr.to_ndarray())
numpy.ndarray

to_parquet(prefix_path, dataset='segarray', mode: str = 'truncate', compression: str | None = None)[source]¶

Save the SegArray object to Parquet. The result is a collection of files, one file per locale of the arkouda server, where each filename starts with prefix_path. Each locale saves its chunk of the object to its corresponding file. :param prefix_path: Directory and filename prefix that all output files share :type prefix_path: str :param dataset: Name of the dataset to create in files (must not already exist) :type dataset: str :param mode: Deprecated.

Parameter kept to maintain functionality of other calls. Only Truncate supported. By default, truncate (overwrite) output files, if they exist. If ‘append’, attempt to create new dataset in existing files.

Parameters:

Return type:

string message indicating result of save operation

Raises:

RuntimeError – Raised if a server-side error is thrown saving the pdarray
ValueError – If write mode is not Truncate.

Notes

Append mode for Parquet has been deprecated. It was not implemented for SegArray.
The prefix_path must be visible to the arkouda server and the user must

have write permission. - Output files have names of the form <prefix_path>_LOCALE, where  ranges from 0 to numLocales for file_type=’distribute’. - If any of the output files already exist and the mode is ‘truncate’, they will be overwritten. If the mode is ‘append’ and the number of output files is less than the number of locales or a dataset with the same name already exists, a RuntimeError will result. - Any file extension can be used.The file I/O does not rely on the extension to determine the file format.

transfer(hostname: str, port: arkouda.numpy.dtypes.int_scalars)[source]¶

Sends a Segmented Array to a different Arkouda server

Parameters:

hostname (str) – The hostname where the Arkouda server intended to receive the Segmented Array is running.
port (int_scalars) – The port to send the array over. This needs to be an open port (i.e., not one that the Arkouda server is running on). This will open up numLocales ports, each of which in succession, so will use ports of the range {port..(port+numLocales)} (e.g., running an Arkouda server of 4 nodes, port 1234 is passed as port, Arkouda will use ports 1234, 1235, 1236, and 1237 to send the array data). This port much match the port passed to the call to ak.receive_array().

Return type:

A message indicating a complete transfer

Raises:

ValueError – Raised if the op is not within the pdarray.BinOps set
TypeError – Raised if other is not a pdarray or the pdarray.dtype is not a supported dtype

union(other)[source]¶

Computes the union of 2 SegArrays.

Parameters:: other (SegArray) – SegArray to compute against
Returns:: Segments are the 1d union of the segments of self and other
Return type:: SegArray

See also

pdarraysetops.union1d

Examples

>>> a = [1, 2, 3, 1, 4]
>>> b = [3, 1, 4, 5]
>>> c = [1, 3, 3, 5]
>>> d = [2, 2, 4]
>>> seg_a = ak.segarray(ak.array([0, len(a)]), ak.array(a+b))
>>> seg_b = ak.segarray(ak.array([0, len(c)]), ak.array(c+d))
>>> seg_a.union(seg_b)
SegArray([
[1, 2, 3, 4, 5],
[1, 2, 3, 4, 5]
])

unique(x=None)[source]¶

Return sub-arrays of unique values.

Parameters:: x (pdarray) – The values to unique, per group. By default, the values of this SegArray’s sub-arrays.
Returns:: Same number of sub-arrays as original SegArray, but elements in sub-array are unique and in sorted order.
Return type:: SegArray

unregister()[source]¶

Unregister this SegArray object in the arkouda server which was previously registered using register() and/or attached to using attach()

Return type:: None
Raises:: RuntimeError – Raised if the server could not unregister the SegArray object from the Symbol Table

Notes

Objects registered with the server are immune to deletion until they are unregistered.

See also

register, attach, is_registered

static unregister_segarray_by_name(user_defined_name)[source]¶

Using the defined name, remove the registered SegArray object from the Symbol Table

Parameters:: user_defined_name (str) – user defined name which the SegArray object was registered under
Return type:: None
Raises:: RuntimeError – Raised if the server could not unregister the SegArray object from the Symbol Table

update_hdf(prefix_path: str, dataset: str = 'segarray', repack: bool = True)[source]¶

Overwrite the dataset with the name provided with this SegArray object. If the dataset does not exist it is added.

Parameters:

prefix_path (str) – Directory and filename prefix that all output files share
dataset (str) – Name of the dataset to create in files
repack (bool) – Default: True HDF5 does not release memory on delete. When True, the inaccessible data (that was overwritten) is removed. When False, the data remains, but is inaccessible. Setting to false will yield better performance, but will cause file sizes to expand.

Return type:

None

Raises:

RuntimeError – Raised if a server-side error is thrown saving the SegArray

Notes

If file does not contain File_Format attribute to indicate how it was saved, the file name is checked for _LOCALE#### to determine if it is distributed.
If the dataset provided does not exist, it will be added
Because HDF5 deletes do not release memory, this will create a copy of the file with the new data

valsize¶

values¶

class arkouda.util.Strings(strings_pdarray: arkouda.numpy.pdarrayclass.pdarray, bytes_size: arkouda.numpy.dtypes.int_scalars)[source]¶

Represents an array of strings whose data resides on the arkouda server. The user should not call this class directly; rather its instances are created by other arkouda functions.

entry¶

Encapsulation of a Segmented Strings array contained on the arkouda server. This is a composite of

offsets array: starting indices for each string

bytes array: raw bytes of all strings joined by nulls

Type:: pdarray

size¶

The number of strings in the array

Type:: int_scalars

nbytes¶

The total number of bytes in all strings

Type:: int_scalars

ndim¶

The rank of the array (currently only rank 1 arrays supported)

Type:: int_scalars

shape¶

The sizes of each dimension of the array

Type:: tuple

dtype¶

The dtype is ak.str

Type:: dtype

logger¶

Used for all logging operations

Type:: ArkoudaLogger

Notes

Strings is composed of two pdarrays: (1) offsets, which contains the starting indices for each string and (2) bytes, which contains the raw bytes of all strings, delimited by nulls.

BinOps¶

astype(dtype: numpy.dtype | str) → arkouda.numpy.pdarrayclass.pdarray[source]¶

Cast values of Strings object to provided dtype

Parameters:: dtype (np.dtype or str) – Dtype to cast to
Returns:: An arkouda pdarray with values converted to the specified data type
Return type:: ak.pdarray

Notes

This is essentially shorthand for ak.cast(x, ‘<dtype>’) where x is a pdarray.

static attach(user_defined_name: str) → Strings[source]¶

class method to return a Strings object attached to the registered name in the arkouda server which was registered using register()

Parameters:: user_defined_name (str) – user defined name which the Strings object was registered under
Returns:: the Strings object registered with user_defined_name in the arkouda server
Return type:: Strings object
Raises:: TypeError – Raised if user_defined_name is not a str

See also

register, unregister

Notes

Registered names/Strings objects in the server are immune to deletion until they are unregistered.

cached_regex_patterns() → List[source]¶: Returns the regex patterns for which Match objects have been cached

capitalize() → Strings[source]¶

Returns a new Strings from the original replaced with the first letter capitilzed and the remaining letters lowercase.

Returns:: Strings from the original replaced with the capitalized equivalent.
Return type:: Strings
Raises:: RuntimeError – Raised if there is a server-side error thrown.

See also

Strings.lower, String.upper, String.title

Examples

>>> strings = ak.array([f'StrINgS aRe Here {i}' for i in range(5)])
>>> strings
array(['StrINgS aRe Here 0', 'StrINgS aRe Here 1', 'StrINgS aRe Here 2', 'StrINgS aRe Here 3', 'StrINgS aRe Here 4'])
>>> strings.title()
array(['Strings Are Here 0', 'Strings Are Here 1', 'Strings Are Here 2', 'Strings Are Here 3', 'Strings Are Here 4'])

contains(substr: bytes | arkouda.numpy.dtypes.str_scalars, regex: bool = False) → arkouda.numpy.pdarrayclass.pdarray[source]¶

Check whether each element contains the given substring.

Parameters:

substr (bytes or str_scalars) – The substring in the form of string or byte array to search for
regex (bool, default=False) – Indicates whether substr is a regular expression Note: only handles regular expressions supported by re2 (does not support lookaheads/lookbehinds)

Returns:

True for elements that contain substr, False otherwise

Return type:

pdarray, bool

Raises:

TypeError – Raised if the substr parameter is not bytes or str_scalars
ValueError – Rasied if substr is not a valid regex
RuntimeError – Raised if there is a server-side error thrown

Examples

>>> strings = ak.array([f'{i} string {i}' for i in range(1, 6)])
>>> strings
array(['1 string 1', '2 string 2', '3 string 3', '4 string 4', '5 string 5'])
>>> strings.contains('string')
array([True True True True True])
>>> strings.contains('string \d', regex=True)
array([True True True True True])

decode(fromEncoding: str, toEncoding: str = 'UTF-8') → Strings[source]¶

Return a new strings object in fromEncoding, expecting that the current Strings is encoded in toEncoding

Parameters:

fromEncoding (str) – The current encoding of the strings object
toEncoding (str, default="UTF-8") – The encoding that the strings will be converted to, default to UTF-8

Returns:

A new Strings object in toEncoding

Return type:

Raises:

RuntimeError – Raised if there is a server-side error thrown

dtype¶

encode(toEncoding: str, fromEncoding: str = 'UTF-8') → Strings[source]¶

Return a new strings object in toEncoding, expecting that the current Strings is encoded in fromEncoding

Parameters:

toEncoding (str) – The encoding that the strings will be converted to
fromEncoding (str, default="UTF-8") – The current encoding of the strings object, default to UTF-8

Returns:

A new Strings object in toEncoding

Return type:

Strings.findall, Strings.match

Raises:

RuntimeError – Raised if there is a server-side error thrown

endswith(substr: bytes | arkouda.numpy.dtypes.str_scalars, regex: bool = False) → arkouda.numpy.pdarrayclass.pdarray[source]¶

Check whether each element ends with the given substring.

Parameters:

substr (bytes or str_scalars) – The suffix to search for
regex (bool, default=False) – Indicates whether substr is a regular expression Note: only handles regular expressions supported by re2 (does not support lookaheads/lookbehinds)

Returns:

True for elements that end with substr, False otherwise

Return type:

pdarray, bool

Raises:

TypeError – Raised if the substr parameter is not bytes or str_scalars
ValueError – Rasied if substr is not a valid regex
RuntimeError – Raised if there is a server-side error thrown

Examples

>>> strings_start = ak.array([f'{i} string' for i in range(1,6)])
>>> strings_start
array(['1 string', '2 string', '3 string', '4 string', '5 string'])
>>> strings_start.endswith('ing')
array([True True True True True])
>>> strings_end = ak.array([f'string {i}' for i in range(1, 6)])
>>> strings_end
array(['string 1', 'string 2', 'string 3', 'string 4', 'string 5'])
>>> strings_end.endswith('ing \d', regex = True)
array([True True True True True])

entry: arkouda.numpy.pdarrayclass.pdarray¶

equals(other: Any) → arkouda.numpy.dtypes.bool_scalars[source]¶

Whether Strings are the same size and all entries are equal.

Parameters:: other (Any) – object to compare.
Returns:: True if the Strings are the same, o.w. False.
Return type:: bool

Examples

>>> import arkouda as ak
>>> ak.connect()
>>> s = ak.array(["a", "b", "c"])
>>> s_cpy = ak.array(["a", "b", "c"])
>>> s.equals(s_cpy)
True
>>> s2 = ak.array(["a", "x", "c"])
>>> s.equals(s2)
False

find_locations(pattern: bytes | arkouda.numpy.dtypes.str_scalars) → Tuple[arkouda.numpy.pdarrayclass.pdarray, arkouda.numpy.pdarrayclass.pdarray, arkouda.numpy.pdarrayclass.pdarray][source]¶

Finds pattern matches and returns pdarrays containing the number, start postitions, and lengths of matches

Parameters:

pattern (bytes or str_scalars) – The regex pattern used to find matches

Returns:

pdarray, int64 – For each original string, the number of pattern matches
pdarray, int64 – The start positons of pattern matches
pdarray, int64 – The lengths of pattern matches

Raises:

TypeError – Raised if the pattern parameter is not bytes or str_scalars
ValueError – Raised if pattern is not a valid regex
RuntimeError – Raised if there is a server-side error thrown

See also

Examples

>>> strings = ak.array([f'{i} string {i}' for i in range(1, 6)])
>>> num_matches, starts, lens = strings.find_locations('\d')
>>> num_matches
array([2 2 2 2 2])
>>> starts
array([0 9 0 9 0 9 0 9 0 9])
>>> lens
array([1 1 1 1 1 1 1 1 1 1])

findall(pattern: bytes | arkouda.numpy.dtypes.str_scalars, return_match_origins: bool = False) → Strings | Tuple[source]¶

Return a new Strings containg all non-overlapping matches of pattern

Parameters:

pattern (bytes or str_scalars) – Regex used to find matches
return_match_origins (bool, default=False) – If True, return a pdarray containing the index of the original string each pattern match is from

Returns:

Strings – Strings object containing only pattern matches
pdarray, int64 (optional) – The index of the original string each pattern match is from

Raises:

TypeError – Raised if the pattern parameter is not bytes or str_scalars
ValueError – Raised if pattern is not a valid regex
RuntimeError – Raised if there is a server-side error thrown

See also

Strings.find_locations

Examples

>>> strings = ak.array(['1_2___', '____', '3', '__4___5____6___7', ''])
>>> strings.findall('_+', return_match_origins=True)
(array(['_', '___', '____', '__', '___', '____', '___']), array([0 0 1 3 3 3 3]))

flatten() → Strings[source]¶

Return a copy of the array collapsed into one dimension.

Return type:: A copy of the input array, flattened to one dimension.

Note

As multidimensional Strings are currently supported, flatten on a Strings object will always return itself.

static from_parts(offset_attrib: arkouda.numpy.pdarrayclass.pdarray | str, bytes_attrib: arkouda.numpy.pdarrayclass.pdarray | str) → Strings[source]¶

Factory method for creating a Strings object from an Arkouda server response where the arrays are separate components.

Parameters:

offset_attrib (pdarray or str) – the array containing the offsets
bytes_attrib (pdarray or str) – the array containing the string values

Returns:

object representing a segmented strings array on the server

Return type:

Strings.islower, Strings.isupper

Raises:

RuntimeError – Raised if there’s an error converting a server-returned str-descriptor

Notes

This factory method is used when we construct the parts of a Strings object on the client side and transfer the offsets & bytes separately to the server. This results in two entries in the symbol table and we need to instruct the server to assemble the into a composite entity.

static from_return_msg(rep_msg: str) → Strings[source]¶

Factory method for creating a Strings object from an Arkouda server response message

Parameters:: rep_msg (str) – Server response message currently of form created name type size ndim shape itemsize+created bytes.size 1234
Returns:: object representing a segmented strings array on the server
Return type:: Strings
Raises:: RuntimeError – Raised if there’s an error converting a server-returned str-descriptor

Notes

We really don’t have an itemsize because these are variable length strings. In the future we could probably use this position to store the total bytes.

fullmatch(pattern: bytes | arkouda.numpy.dtypes.str_scalars) → arkouda.match.Match[source]¶

Returns a match object where elements match only if the whole string matches the regular expression pattern

Parameters:: pattern (bytes or str_scalars) – Regex used to find matches
Returns:: Match object where elements match only if the whole string matches the regular expression pattern
Return type:: Match

Examples

>>> strings = ak.array(['1_2___', '____', '3', '__4___5____6___7', ''])
>>> strings.fullmatch('_+')
<ak.Match object: matched=False; matched=True, span=(0, 4); matched=False;
matched=False; matched=False>

get_bytes() → arkouda.numpy.pdarrayclass.pdarray[source]¶

Getter for the bytes component (uint8 pdarray) of this Strings.

Returns:: Pdarray of bytes of the string accessed
Return type:: pdarray, uint8

Example

>>> x = ak.array(['one', 'two', 'three'])
>>> x.get_bytes()
[111 110 101 0 116 119 111 0 116 104 114 101 101 0]

get_lengths() → arkouda.numpy.pdarrayclass.pdarray[source]¶

Return the length of each string in the array.

Returns:: The length of each string
Return type:: pdarray, int
Raises:: RuntimeError – Raised if there is a server-side error thrown

get_offsets() → arkouda.numpy.pdarrayclass.pdarray[source]¶

Getter for the offsets component (int64 pdarray) of this Strings.

Returns:: Pdarray of offsets of the string accessed
Return type:: pdarray, int64

Example

>>> x = ak.array(['one', 'two', 'three'])
>>> x.get_offsets()
[0 4 8]

get_prefixes(n: arkouda.numpy.dtypes.int_scalars, return_origins: bool = True, proper: bool = True) → Strings | Tuple[Strings, arkouda.numpy.pdarrayclass.pdarray][source]¶

Return the n-long prefix of each string, where possible

Parameters:

n (int_scalars) – Length of prefix
return_origins (bool, default=True) – If True, return a logical index indicating which strings were long enough to return an n-prefix
proper (bool, default=True) – If True, only return proper prefixes, i.e. from strings that are at least n+1 long. If False, allow the entire string to be returned as a prefix.

Returns:

prefixes (Strings) – The array of n-character prefixes; the number of elements is the number of True values in the returned mask.
origin_indices (pdarray, bool) – Boolean array that is True where the string was long enough to return an n-character prefix, False otherwise.

get_suffixes(n: arkouda.numpy.dtypes.int_scalars, return_origins: bool = True, proper: bool = True) → Strings | Tuple[Strings, arkouda.numpy.pdarrayclass.pdarray][source]¶

Return the n-long suffix of each string, where possible

Parameters:

n (int_scalars) – Length of suffix
return_origins (bool, default=True) – If True, return a logical index indicating which strings were long enough to return an n-suffix
proper (bool, default=True) – If True, only return proper suffixes, i.e. from strings that are at least n+1 long. If False, allow the entire string to be returned as a suffix.

Returns:

suffixes (Strings) – The array of n-character suffixes; the number of elements is the number of True values in the returned mask.
origin_indices (pdarray, bool) – Boolean array that is True where the string was long enough to return an n-character suffix, False otherwise.

group() → arkouda.numpy.pdarrayclass.pdarray[source]¶

Return the permutation that groups the array, placing equivalent strings together. All instances of the same string are guaranteed to lie in one contiguous block of the permuted array, but the blocks are not necessarily ordered.

Returns:: The permutation that groups the array by value
Return type:: pdarray

See also

GroupBy, unique

Notes

If the arkouda server is compiled with “-sSegmentedString.useHash=true”, then arkouda uses 128-bit hash values to group strings, rather than sorting the strings directly. This method is fast, but the resulting permutation merely groups equivalent strings and does not sort them. If the “useHash” parameter is false, then a full sort is performed.

Raises:: RuntimeError – Raised if there is a server-side error in executing group request or creating the pdarray encapsulating the return message

hash() → Tuple[arkouda.numpy.pdarrayclass.pdarray, arkouda.numpy.pdarrayclass.pdarray][source]¶

Compute a 128-bit hash of each string.

Returns:: A tuple of two int64 pdarrays. The ith hash value is the concatenation of the ith values from each array.
Return type:: Tuple[pdarray,pdarray]

Notes

The implementation uses SipHash128, a fast and balanced hash function (used by Python for dictionaries and sets). For realistic numbers of strings (up to about 10**15), the probability of a collision between two 128-bit hash values is negligible.

property inferred_type: str¶: Return a string of the type inferred from the values.

info() → str[source]¶

Returns a JSON formatted string containing information about all components of self

Parameters:: None
Returns:: JSON string containing information about all components of self
Return type:: str

is_registered() → numpy.bool_[source]¶

Return True iff the object is contained in the registry

Parameters:: None
Returns:: Indicates if the object is contained in the registry
Return type:: bool
Raises:: RuntimeError – Raised if there’s a server-side error thrown

isalnum() → arkouda.numpy.pdarrayclass.pdarray[source]¶

Returns a boolean pdarray where index i indicates whether string i of the Strings is alphanumeric.

Returns:: True for elements that are alphanumeric, False otherwise
Return type:: pdarray, bool
Raises:: RuntimeError – Raised if there is a server-side error thrown

Examples

>>> not_alnum = ak.array([f'%Strings {i}' for i in range(3)])
>>> alnum = ak.array([f'Strings{i}' for i in range(3)])
>>> strings = ak.concatenate([not_alnum, alnum])
>>> strings
array(['%Strings 0', '%Strings 1', '%Strings 2', 'Strings0', 'Strings1', 'Strings2'])
>>> strings.isalnum()
array([False False False True True True])

isalpha() → arkouda.numpy.pdarrayclass.pdarray[source]¶

Returns a boolean pdarray where index i indicates whether string i of the Strings is alphabetic. This means there is at least one character, and all the characters are alphabetic.

Returns:: True for elements that are alphabetic, False otherwise
Return type:: pdarray, bool
Raises:: RuntimeError – Raised if there is a server-side error thrown

Examples

>>> not_alpha = ak.array([f'%Strings {i}' for i in range(3)])
>>> alpha = ak.array(['StringA','StringB','StringC'])
>>> strings = ak.concatenate([not_alpha, alpha])
>>> strings
array(['%Strings 0', '%Strings 1', '%Strings 2', 'StringA', 'StringB', 'StringC'])
>>> strings.isalpha()
array([False False False True True True])

isdecimal() → arkouda.numpy.pdarrayclass.pdarray[source]¶

Returns a boolean pdarray where index i indicates whether string i of the Strings has all decimal characters.

Returns:: True for elements that are decimals, False otherwise
Return type:: pdarray, bool
Raises:: RuntimeError – Raised if there is a server-side error thrown

See also

Strings.isdigit

Examples

>>> not_decimal = ak.array([f'Strings {i}' for i in range(3)])
>>> decimal = ak.array([f'12{i}' for i in range(3)])
>>> strings = ak.concatenate([not_decimal, decimal])
>>> strings
array(['Strings 0', 'Strings 1', 'Strings 2', '120', '121', '122'])
>>> strings.isdecimal()
array([False False False True True True])

Special Character Examples

>>> special_strings = ak.array(["3.14", "0", "²", "2³₇", "2³x₇"])
>>> special_strings
array(['3.14', '0', '²', '2³₇', '2³x₇'])
>>> special_strings.isdecimal()
array([False True False False False])

isdigit() → arkouda.numpy.pdarrayclass.pdarray[source]¶

Returns a boolean pdarray where index i indicates whether string i of the Strings has all digit characters.

Returns:: True for elements that are digits, False otherwise
Return type:: pdarray, bool
Raises:: RuntimeError – Raised if there is a server-side error thrown

Examples

>>> not_digit = ak.array([f'Strings {i}' for i in range(3)])
>>> digit = ak.array([f'12{i}' for i in range(3)])
>>> strings = ak.concatenate([not_digit, digit])
>>> strings
array(['Strings 0', 'Strings 1', 'Strings 2', '120', '121', '122'])
>>> strings.isdigit()
array([False False False True True True])

Special Character Examples

>>> special_strings = ak.array(["3.14", "0", "²", "2³₇", "2³x₇"])
>>> special_strings
array(['3.14', '0', '²', '2³₇', '2³x₇'])
>>> special_strings.isdigit()
array([False True True True False])

isempty() → arkouda.numpy.pdarrayclass.pdarray[source]¶

Returns a boolean pdarray where index i indicates whether string i of the Strings is empty.

True for elements that are the empty string, False otherwise

Returns:: True for elements that are digits, False otherwise
Return type:: pdarray, bool
Raises:: RuntimeError – Raised if there is a server-side error thrown

Examples

>>> not_empty = ak.array([f'Strings {i}' for i in range(3)])
>>> empty = ak.array(['' for i in range(3)])
>>> strings = ak.concatenate([not_empty, empty])
>>> strings
array(['Strings 0', 'Strings 1', 'Strings 2', '', '', ''])
>>> strings.isempty()
array([False False False True True True])

islower() → arkouda.numpy.pdarrayclass.pdarray[source]¶

Returns a boolean pdarray where index i indicates whether string i of the Strings is entirely lowercase

Returns:: True for elements that are entirely lowercase, False otherwise
Return type:: pdarray, bool
Raises:: RuntimeError – Raised if there is a server-side error thrown

See also

Strings.isupper

Examples

>>> lower = ak.array([f'strings {i}' for i in range(3)])
>>> upper = ak.array([f'STRINGS {i}' for i in range(3)])
>>> strings = ak.concatenate([lower, upper])
>>> strings
array(['strings 0', 'strings 1', 'strings 2', 'STRINGS 0', 'STRINGS 1', 'STRINGS 2'])
>>> strings.islower()
array([True True True False False False])

isspace() → arkouda.numpy.pdarrayclass.pdarray[source]¶

Returns a boolean pdarray where index i indicates whether string i has all whitespace characters (‘ ’, ‘\t’, ‘\n’, ‘\v’, ‘\f’, ‘\r’).

Returns:: True for elements that are whitespace, False otherwise
Return type:: pdarray, bool
Raises:: RuntimeError – Raised if there is a server-side error thrown

Examples

>>> not_space = ak.array([f'Strings {i}' for i in range(3)])
>>> space = ak.array([' ', '\t', '\n', '\v', '\f', '\r', ' \t\n\v\f\r'])
>>> strings = ak.concatenate([not_space, space])
>>> strings
array(['Strings 0', 'Strings 1', 'Strings 2', ' ', 'u0009', 'n', 'u000B', 'u000C', 'u000D', ' u0009nu000Bu000Cu000D'])
>>> strings.isspace()
array([False False False True True True True True True True])

istitle() → arkouda.numpy.pdarrayclass.pdarray[source]¶

Returns a boolean pdarray where index i indicates whether string i of the Strings is titlecase

Returns:: True for elements that are titlecase, False otherwise
Return type:: pdarray, bool
Raises:: RuntimeError – Raised if there is a server-side error thrown

See also

Examples

>>> mixed = ak.array([f'sTrINgs {i}' for i in range(3)])
>>> title = ak.array([f'Strings {i}' for i in range(3)])
>>> strings = ak.concatenate([mixed, title])
>>> strings
array(['sTrINgs 0', 'sTrINgs 1', 'sTrINgs 2', 'Strings 0', 'Strings 1', 'Strings 2'])
>>> strings.istitle()
array([False False False True True True])

isupper() → arkouda.numpy.pdarrayclass.pdarray[source]¶

Returns a boolean pdarray where index i indicates whether string i of the Strings is entirely uppercase

Returns:: True for elements that are entirely uppercase, False otherwise
Return type:: pdarray, bool
Raises:: RuntimeError – Raised if there is a server-side error thrown

See also

Strings.islower

Examples

>>> lower = ak.array([f'strings {i}' for i in range(3)])
>>> upper = ak.array([f'STRINGS {i}' for i in range(3)])
>>> strings = ak.concatenate([lower, upper])
>>> strings
array(['strings 0', 'strings 1', 'strings 2', 'STRINGS 0', 'STRINGS 1', 'STRINGS 2'])
>>> strings.isupper()
array([False False False True True True])

logger¶

lower() → Strings[source]¶

Returns a new Strings with all uppercase characters from the original replaced with their lowercase equivalent

Returns:: Strings with all uppercase characters from the original replaced with their lowercase equivalent
Return type:: Strings
Raises:: RuntimeError – Raised if there is a server-side error thrown

See also

Strings.upper

Examples

>>> strings = ak.array([f'StrINgS {i}' for i in range(5)])
>>> strings
array(['StrINgS 0', 'StrINgS 1', 'StrINgS 2', 'StrINgS 3', 'StrINgS 4'])
>>> strings.lower()
array(['strings 0', 'strings 1', 'strings 2', 'strings 3', 'strings 4'])

lstick(other: Strings, delimiter: bytes | arkouda.numpy.dtypes.str_scalars = '') → Strings[source]¶

Join the strings from another array onto the left of the strings of this array, optionally inserting a delimiter. Warning: This function is experimental and not guaranteed to work.

Parameters:

other (Strings) – The strings to join onto self’s strings
delimiter (bytes or str_scalars, default="") – String inserted between self and other

Returns:

The array of joined strings, as other + self

Return type:

Raises:

TypeError – Raised if the delimiter parameter is neither bytes nor a str or if the other parameter is not a Strings instance
RuntimeError – Raised if there is a server-side error thrown

See also

stick, peel, rpeel

Examples

>>> s = ak.array(['a', 'c', 'e'])
>>> t = ak.array(['b', 'd', 'f'])
>>> s.lstick(t, delimiter='.')
array(['b.a', 'd.c', 'f.e'])

match(pattern: bytes | arkouda.numpy.dtypes.str_scalars) → arkouda.match.Match[source]¶

Returns a match object where elements match only if the beginning of the string matches the regular expression pattern

Parameters:: pattern (bytes or str_scalars) – Regex used to find matches
Returns:: Match object where elements match only if the beginning of the string matches the regular expression pattern
Return type:: Match

Examples

>>> strings = ak.array(['1_2___', '____', '3', '__4___5____6___7', ''])
>>> strings.match('_+')
<ak.Match object: matched=False; matched=True, span=(0, 4); matched=False;
matched=True, span=(0, 2); matched=False>

objType = 'Strings'¶

peel(delimiter: bytes | arkouda.numpy.dtypes.str_scalars, times: arkouda.numpy.dtypes.int_scalars = 1, includeDelimiter: bool = False, keepPartial: bool = False, fromRight: bool = False, regex: bool = False) → Tuple[Strings, Strings][source]¶

Peel off one or more delimited fields from each string (similar to string.partition), returning two new arrays of strings. Warning: This function is experimental and not guaranteed to work.

Parameters:

delimiter (bytes or str_scalars) – The separator where the split will occur
times (int_scalars, default=1) – The number of times the delimiter is sought, i.e. skip over the first (times-1) delimiters
includeDelimiter (bool, default=False) – If true, append the delimiter to the end of the first return array. By default, it is prepended to the beginning of the second return array.
keepPartial (bool, default=False) – If true, a string that does not contain <times> instances of the delimiter will be returned in the first array. By default, such strings are returned in the second array.
fromRight (bool, default=False) – If true, peel from the right instead of the left (see also rpeel)
regex (bool, default=False) – Indicates whether delimiter is a regular expression Note: only handles regular expressions supported by re2 (does not support lookaheads/lookbehinds)

Returns:

left: Strings: The field(s) peeled from the end of each string (unless fromRight is true)
right: Strings: The remainder of each string after peeling (unless fromRight is true)

Return type:

Tuple[Strings, Strings]

Raises:

TypeError – Raised if the delimiter parameter is not byte or str_scalars, if times is not int64, or if includeDelimiter, keepPartial, or fromRight is not bool
ValueError – Raised if times is < 1 or if delimiter is not a valid regex
RuntimeError – Raised if there is a server-side error thrown

See also

rpeel, stick, lstick

Examples

>>> s = ak.array(['a.b', 'c.d', 'e.f.g'])
>>> s.peel('.')
(array(['a', 'c', 'e']), array(['b', 'd', 'f.g']))
>>> s.peel('.', includeDelimiter=True)
(array(['a.', 'c.', 'e.']), array(['b', 'd', 'f.g']))
>>> s.peel('.', times=2)
(array(['', '', 'e.f']), array(['a.b', 'c.d', 'g']))
>>> s.peel('.', times=2, keepPartial=True)
(array(['a.b', 'c.d', 'e.f']), array(['', '', 'g']))

pretty_print_info() → None[source]¶

Prints information about all components of self in a human readable format

Parameters:: None
Return type:: None

purge_cached_regex_patterns() → None[source]¶: purges cached regex patterns

regex_split(pattern: bytes | arkouda.numpy.dtypes.str_scalars, maxsplit: int = 0, return_segments: bool = False) → Strings | Tuple[source]¶

Returns a new Strings split by the occurrences of pattern. If maxsplit is nonzero, at most maxsplit splits occur

Parameters:

pattern (bytes or str_scalars) – Regex used to split strings into substrings
maxsplit (int, default=0) – The max number of pattern match occurences in each element to split. The default maxsplit=0 splits on all occurences
return_segments (bool, default=False) – If True, return mapping of original strings to first substring in return array.

Returns:

Strings – Substrings with pattern matches removed
pdarray, int64 (optional) – For each original string, the index of first corresponding substring in the return array

Examples

>>> strings = ak.array(['1_2___', '____', '3', '__4___5____6___7', ''])
>>> strings.regex_split('_+', maxsplit=2, return_segments=True)
(array(['1', '2', '', '', '', '3', '', '4', '5____6___7', '']), array([0 3 5 6 9]))

register(user_defined_name: str) → Strings[source]¶

Register this Strings object with a user defined name in the arkouda server so it can be attached to later using Strings.attach() This is an in-place operation, registering a Strings object more than once will update the name in the registry and remove the previously registered name. A name can only be registered to one object at a time.

Parameters:

user_defined_name (str) – user defined name which the Strings object is to be registered under

Returns:

The same Strings object which is now registered with the arkouda server and has an updated name. This is an in-place modification, the original is returned to support a fluid programming style. Please note you cannot register two different objects with the same name.

Return type:

Strings.contains, Strings.endswith

Raises:

TypeError – Raised if user_defined_name is not a str
RegistrationError – If the server was unable to register the Strings object with the user_defined_name If the user is attempting to register more than one object with the same name, the former should be unregistered first to free up the registration name.

See also

attach, unregister

Notes

Registered names/Strings objects in the server are immune to deletion until they are unregistered.

registered_name: str | None = None¶

rpeel(delimiter: bytes | arkouda.numpy.dtypes.str_scalars, times: arkouda.numpy.dtypes.int_scalars = 1, includeDelimiter: bool = False, keepPartial: bool = False, regex: bool = False) → Tuple[Strings, Strings][source]¶

Peel off one or more delimited fields from the end of each string (similar to string.rpartition), returning two new arrays of strings. Warning: This function is experimental and not guaranteed to work.

Parameters:

delimiter (bytes or str_scalars) – The separator where the split will occur
times (int_scalars, default=1) – The number of times the delimiter is sought, i.e. skip over the last (times-1) delimiters
includeDelimiter (bool, default=False) – If true, prepend the delimiter to the start of the first return array. By default, it is appended to the end of the second return array.
keepPartial (bool, default=False) – If true, a string that does not contain <times> instances of the delimiter will be returned in the second array. By default, such strings are returned in the first array.
regex (bool, default=False) – Indicates whether delimiter is a regular expression Note: only handles regular expressions supported by re2 (does not support lookaheads/lookbehinds)

Returns:

left: Strings: The remainder of the string after peeling
right: Strings: The field(s) that were peeled from the right of each string

Return type:

Tuple[Strings, Strings]

Raises:

TypeError – Raised if the delimiter parameter is not bytes or str_scalars or if times is not int64
ValueError – Raised if times is < 1 or if delimiter is not a valid regex
RuntimeError – Raised if there is a server-side error thrown

See also

peel, stick, lstick

Examples

>>> s = ak.array(['a.b', 'c.d', 'e.f.g'])
>>> s.rpeel('.')
(array(['a', 'c', 'e.f']), array(['b', 'd', 'g']))

Compared against peel

>>> s.peel('.')
(array(['a', 'c', 'e']), array(['b', 'd', 'f.g']))

save(prefix_path: str, dataset: str = 'strings_array', mode: Literal['truncate', 'append'] = 'truncate', save_offsets: bool = True, compression: Literal['snappy', 'gzip', 'brotli', 'zstd', 'lz4'] | None = None, file_format: Literal['HDF5', 'Parquet'] = 'HDF5', file_type: Literal['single', 'distribute'] = 'distribute') → str[source]¶

DEPRECATED Save the Strings object to HDF5 or Parquet. The result is a collection of files, one file per locale of the arkouda server, where each filename starts with prefix_path. HDF5 support single files, in which case the file name will only be that provided. Each locale saves its chunk of the array to its corresponding file.

Parameters:

prefix_path (str) – Directory and filename prefix that all output files share
dataset (str, default="strings_array") – The name of the Strings dataset to be written, defaults to strings_array
mode ({"truncate", "append"}, default = "truncate") – By default, truncate (overwrite) output files, if they exist. If ‘append’, create a new Strings dataset within existing files.
save_offsets (bool, default=True) – Defaults to True which will instruct the server to save the offsets array to HDF5 If False the offsets array will not be save and will be derived from the string values upon load/read. This is not supported for Parquet files.
compression ({"snappy", "gzip", "brotli", "zstd", "lz4"}, optional) – Sets the compression type used with Parquet files
file_format ({"HDF5", "Parquet"}, default = "HDF5") – By default, saved files will be written to the HDF5 file format. If ‘Parquet’, the files will be written to the Parquet file format. This is case insensitive.
file_type ({"single", "distribute"}, default = "distribute") – Default: Distribute Distribute the dataset over a file per locale. Single file will save the dataset to one file

Return type:

String message indicating result of save operation

Notes

Important implementation notes: (1) Strings state is saved as two datasets within an hdf5 group: one for the string characters and one for the segments corresponding to the start of each string, (2) the hdf5 group is named via the dataset parameter. (3) Parquet files do not store the segments, only the values.

search(pattern: bytes | arkouda.numpy.dtypes.str_scalars) → arkouda.match.Match[source]¶

Returns a match object with the first location in each element where pattern produces a match. Elements match if any part of the string matches the regular expression pattern

Parameters:: pattern (bytes or str_scalars) – Regex used to find matches
Returns:: Match object where elements match if any part of the string matches the regular expression pattern
Return type:: Match

Examples

>>> strings = ak.array(['1_2___', '____', '3', '__4___5____6___7', ''])
>>> strings.search('_+')
<ak.Match object: matched=True, span=(1, 2); matched=True, span=(0, 4);
matched=False; matched=True, span=(0, 2); matched=False>

size: arkouda.numpy.dtypes.int_scalars¶

split(delimiter: str, return_segments: bool = False, regex: bool = False) → Strings | Tuple[source]¶

Unpack delimiter-joined substrings into a flat array.

Parameters:

delimiter (str) – Characters used to split strings into substrings
return_segments (bool, default=False) – If True, also return mapping of original strings to first substring in return array.
regex (bool, default=False) – Indicates whether delimiter is a regular expression Note: only handles regular expressions supported by re2 (does not support lookaheads/lookbehinds)

Returns:

Strings – Flattened substrings with delimiters removed
pdarray, int64 (optional) – For each original string, the index of first corresponding substring in the return array

See also

peel, rpeel

Examples

>>> orig = ak.array(['one|two', 'three|four|five', 'six'])
>>> orig.split('|')
array(['one', 'two', 'three', 'four', 'five', 'six'])
>>> flat, mapping = orig.split('|', return_segments=True)
>>> mapping
array([0 2 5])
>>> under = ak.array(['one_two', 'three_____four____five', 'six'])
>>> under_split, under_map = under.split('_+', return_segments=True, regex=True)
>>> under_split
array(['one', 'two', 'three', 'four', 'five', 'six'])
>>> under_map
array([0 2 5])

startswith(substr: bytes | arkouda.numpy.dtypes.str_scalars, regex: bool = False) → arkouda.numpy.pdarrayclass.pdarray[source]¶

Check whether each element starts with the given substring.

Parameters:

substr (bytes or str_scalars) – The prefix to search for
regex (bool, default=False) – Indicates whether substr is a regular expression Note: only handles regular expressions supported by re2 (does not support lookaheads/lookbehinds)

Returns:

True for elements that start with substr, False otherwise

Return type:

pdarray, bool

Raises:

TypeError – Raised if the substr parameter is not a bytes ior str_scalars
ValueError – Rasied if substr is not a valid regex
RuntimeError – Raised if there is a server-side error thrown

See also

Examples

>>> strings_end = ak.array([f'string {i}' for i in range(1, 6)])
>>> strings_end
array(['string 1', 'string 2', 'string 3', 'string 4', 'string 5'])
>>> strings_end.startswith('string')
array([True True True True True])
>>> strings_start = ak.array([f'{i} string' for i in range(1,6)])
>>> strings_start
array(['1 string', '2 string', '3 string', '4 string', '5 string'])
>>> strings_start.startswith('\d str', regex = True)
array([True True True True True])

stick(other: Strings, delimiter: bytes | arkouda.numpy.dtypes.str_scalars = '', toLeft: bool = False) → Strings[source]¶

Join the strings from another array onto one end of the strings of this array, optionally inserting a delimiter. Warning: This function is experimental and not guaranteed to work.

Parameters:

other (Strings) – The strings to join onto self’s strings
delimiter (bytes or str_scalars, default="") – String inserted between self and other
toLeft (bool, default=False) – If true, join other strings to the left of self. By default, other is joined to the right of self.

Returns:

The array of joined strings

Return type:

Raises:

TypeError – Raised if the delimiter parameter is not bytes or str_scalars or if the other parameter is not a Strings instance
ValueError – Raised if times is < 1
RuntimeError – Raised if there is a server-side error thrown

See also

lstick, peel, rpeel

Examples

>>> s = ak.array(['a', 'c', 'e'])
>>> t = ak.array(['b', 'd', 'f'])
>>> s.stick(t, delimiter='.')
array(['a.b', 'c.d', 'e.f'])

strip(chars: bytes | arkouda.numpy.dtypes.str_scalars | None = '') → Strings[source]¶

Returns a new Strings object with all leading and trailing occurrences of characters contained in chars removed. The chars argument is a string specifying the set of characters to be removed. If omitted, the chars argument defaults to removing whitespace. The chars argument is not a prefix or suffix; rather, all combinations of its values are stripped.

Parameters:: chars (bytes or str_scalars, optional) – the set of characters to be removed
Returns:: Strings object with the leading and trailing characters matching the set of characters in the chars argument removed
Return type:: Strings
Raises:: RuntimeError – Raised if there is a server-side error thrown

Examples

>>> strings = ak.array(['Strings ', '  StringS  ', 'StringS   '])
>>> s = strings.strip()
>>> s
array(['Strings', 'StringS', 'StringS'])

>>> strings = ak.array(['Strings 1', '1 StringS  ', '  1StringS  12 '])
>>> s = strings.strip(' 12')
>>> s
array(['Strings', 'StringS', 'StringS'])

sub(pattern: bytes | arkouda.numpy.dtypes.str_scalars, repl: bytes | arkouda.numpy.dtypes.str_scalars, count: int = 0) → Strings[source]¶

Return new Strings obtained by replacing non-overlapping occurrences of pattern with the replacement repl. If count is nonzero, at most count substitutions occur

Parameters:

pattern (bytes or str_scalars) – The regex to substitue
repl (bytes or str_scalars) – The substring to replace pattern matches with
count (int, default=0) – The max number of pattern match occurences in each element to replace. The default count=0 replaces all occurences of pattern with repl

Returns:

Strings with pattern matches replaced

Return type:

Raises:

TypeError – Raised if pattern or repl are not bytes or str_scalars
ValueError – Raised if pattern is not a valid regex
RuntimeError – Raised if there is a server-side error thrown

See also

Strings.subn

Examples

>>> strings = ak.array(['1_2___', '____', '3', '__4___5____6___7', ''])
>>> strings.sub(pattern='_+', repl='-', count=2)
array(['1-2-', '-', '3', '-4-5____6___7', ''])

subn(pattern: bytes | arkouda.numpy.dtypes.str_scalars, repl: bytes | arkouda.numpy.dtypes.str_scalars, count: int = 0) → Tuple[source]¶

Perform the same operation as sub(), but return a tuple (new_Strings, number_of_substitions)

Parameters:

pattern (bytes or str_scalars) – The regex to substitue
repl (bytes or str_scalars) – The substring to replace pattern matches with
count (int, default=0) – The max number of pattern match occurences in each element to replace. The default count=0 replaces all occurences of pattern with repl

Returns:

Strings – Strings with pattern matches replaced
pdarray, int64 – The number of substitutions made for each element of Strings

Raises:

TypeError – Raised if pattern or repl are not bytes or str_scalars
ValueError – Raised if pattern is not a valid regex
RuntimeError – Raised if there is a server-side error thrown

See also

Strings.sub

Examples

>>> strings = ak.array(['1_2___', '____', '3', '__4___5____6___7', ''])
>>> strings.subn(pattern='_+', repl='-', count=2)
(array(['1-2-', '-', '3', '-4-5____6___7', '']), array([2 1 0 2 0]))

title() → Strings[source]¶

Returns a new Strings from the original replaced with their titlecase equivalent.

Returns:: Strings from the original replaced with their titlecase equivalent.
Return type:: Strings
Raises:: RuntimeError – Raised if there is a server-side error thrown.

See also

Strings.lower, String.upper

Examples

>>> strings = ak.array([f'StrINgS {i}' for i in range(5)])
>>> strings
array(['StrINgS 0', 'StrINgS 1', 'StrINgS 2', 'StrINgS 3', 'StrINgS 4'])
>>> strings.title()
array(['Strings 0', 'Strings 1', 'Strings 2', 'Strings 3', 'Strings 4'])

to_csv(prefix_path: str, dataset: str = 'strings_array', col_delim: str = ',', overwrite: bool = False) → str[source]¶

Write Strings to CSV file(s). File will contain a single column with the Strings data. All CSV Files written by Arkouda include a header denoting data types of the columns. Unlike other file formats, CSV files store Strings as their UTF-8 format instead of storing bytes as uint(8).

Parameters:

prefix_path (str) – The filename prefix to be used for saving files. Files will have _LOCALE#### appended when they are written to disk.
dataset (str, default="strings_array") – Column name to save the Strings under. Defaults to “strings_array”.
col_delim (str, default=",") – Defaults to “,”. Value to be used to separate columns within the file. Please be sure that the value used DOES NOT appear in your dataset.
overwrite (bool, default=False) – Defaults to False. If True, any existing files matching your provided prefix_path will be overwritten. If False, an error will be returned if existing files are found.

Returns:

response message

Return type:

str

Raises:

ValueError – Raised if all datasets are not present in all parquet files or if one or more of the specified files do not exist
RuntimeError – Raised if one or more of the specified files cannot be opened. If allow_errors is true this may be raised if no values are returned from the server.
TypeError – Raised if we receive an unknown arkouda_type returned from the server

Notes

CSV format is not currently supported by load/load_all operations
The column delimiter is expected to be the same for column names and data
Be sure that column delimiters are not found within your data.
All CSV files must delimit rows using newline (\n) at this time.

to_hdf(prefix_path: str, dataset: str = 'strings_array', mode: Literal['truncate', 'append'] = 'truncate', save_offsets: bool = True, file_type: Literal['single', 'distribute'] = 'distribute') → str[source]¶

Save the Strings object to HDF5. The object can be saved to a collection of files or single file.

Parameters:

prefix_path (str) – Directory and filename prefix that all output files share
dataset (str, default="strings_array") – The name of the Strings dataset to be written, defaults to strings_array
mode ({"truncate", "append"}, default = "truncate") – By default, truncate (overwrite) output files, if they exist. If ‘append’, create a new Strings dataset within existing files.
save_offsets (bool, default=True) – Defaults to True which will instruct the server to save the offsets array to HDF5 If False the offsets array will not be save and will be derived from the string values upon load/read.
file_type ({"single", "distribute"}, default = "distribute") – Default: Distribute Distribute the dataset over a file per locale. Single file will save the dataset to one file

Return type:

String message indicating result of save operation

Raises:

RuntimeError – Raised if a server-side error is thrown saving the pdarray

Notes

Parquet files do not store the segments, only the values.
Strings state is saved as two datasets within an hdf5 group: one for the string characters and one for the segments corresponding to the start of each string
the hdf5 group is named via the dataset parameter.
The prefix_path must be visible to the arkouda server and the user must have write permission.
Output files have names of the form <prefix_path>_LOCALE, where  ranges from 0 to numLocales for file_type=’distribute’. Otherwise, the file name will be prefix_path.
If any of the output files already exist and the mode is ‘truncate’, they will be overwritten. If the mode is ‘append’ and the number of output files is less than the number of locales or a dataset with the same name already exists, a RuntimeError will result.
Any file extension can be used.The file I/O does not rely on the extension to determine the file format.

See also

to_hdf

to_list() → list[source]¶

Convert the SegString to a list, transferring data from the arkouda server to Python. If the SegString exceeds a built-in size limit, a RuntimeError is raised.

Returns:: A list with the same strings as this SegString
Return type:: list

Notes

The number of bytes in the array cannot exceed ak.client.maxTransferBytes, otherwise a RuntimeError will be raised. This is to protect the user from overflowing the memory of the system on which the Python client is running, under the assumption that the server is running on a distributed system with much more memory than the client. The user may override this limit by setting ak.client.maxTransferBytes to a larger value, but proceed with caution.

See also

to_ndarray

Examples

>>> a = ak.array(["hello", "my", "world"])
>>> a.to_list()
['hello', 'my', 'world']
>>> type(a.to_list())
<class 'list'>

to_ndarray() → numpy.ndarray[source]¶

Convert the array to a np.ndarray, transferring array data from the arkouda server to Python. If the array exceeds a built-in size limit, a RuntimeError is raised.

Returns:: A numpy ndarray with the same strings as this array
Return type:: np.ndarray

Notes

See also

array, to_list

Examples

>>> a = ak.array(["hello", "my", "world"])
>>> a.to_ndarray()
array(['hello', 'my', 'world'], dtype='<U5')
>>> type(a.to_ndarray())
<class 'numpy.ndarray'>

to_parquet(prefix_path: str, dataset: str = 'strings_array', mode: Literal['truncate', 'append'] = 'truncate', compression: Literal['snappy', 'gzip', 'brotli', 'zstd', 'lz4'] | None = None) → str[source]¶

Save the Strings object to Parquet. The result is a collection of files, one file per locale of the arkouda server, where each filename starts with prefix_path. Each locale saves its chunk of the array to its corresponding file. :param prefix_path: Directory and filename prefix that all output files share :type prefix_path: str :param dataset: Name of the dataset to create in files (must not already exist) :type dataset: str, default=”strings_array” :param mode: By default, truncate (overwrite) output files, if they exist.

If ‘append’, attempt to create new dataset in existing files.

Parameters:: compression ({"snappy", "gzip", "brotli", "zstd", "lz4"}, optional) – Sets the compression type used with Parquet files
Return type:: string message indicating result of save operation
Raises:: RuntimeError – Raised if a server-side error is thrown saving the pdarray

Notes

The prefix_path must be visible to the arkouda server and the user must

have write permission. - Output files have names of the form <prefix_path>_LOCALE, where  ranges from 0 to numLocales for file_type=’distribute’. - ‘append’ write mode is supported, but is not efficient. - If any of the output files already exist and the mode is ‘truncate’, they will be overwritten. If the mode is ‘append’ and the number of output files is less than the number of locales or a dataset with the same name already exists, a RuntimeError will result. - Any file extension can be used.The file I/O does not rely on the extension to determine the file format.

transfer(hostname: str, port: arkouda.numpy.dtypes.int_scalars) → str | memoryview[source]¶

Sends a Strings object to a different Arkouda server

Parameters:

hostname (str) – The hostname where the Arkouda server intended to receive the Strings object is running.
port (int_scalars) – The port to send the array over. This needs to be an open port (i.e., not one that the Arkouda server is running on). This will open up numLocales ports, each of which in succession, so will use ports of the range {port..(port+numLocales)} (e.g., running an Arkouda server of 4 nodes, port 1234 is passed as port, Arkouda will use ports 1234, 1235, 1236, and 1237 to send the array data). This port much match the port passed to the call to ak.receive_array().

Return type:

A message indicating a complete transfer

Raises:

ValueError – Raised if the op is not within the pdarray.BinOps set
TypeError – Raised if other is not a pdarray or the pdarray.dtype is not a supported dtype

unregister() → None[source]¶

Unregister a Strings object in the arkouda server which was previously registered using register() and/or attached to using attach()

Return type:: None
Raises:: RuntimeError – Raised if the server could not find the internal name/symbol to remove

See also

register, attach

Notes

Registered names/Strings objects in the server are immune to deletion until they are unregistered.

static unregister_strings_by_name(user_defined_name: str) → None[source]¶

Unregister a Strings object in the arkouda server previously registered via register()

Parameters:: user_defined_name (str) – The registered name of the Strings object

update_hdf(prefix_path: str, dataset: str = 'strings_array', save_offsets: bool = True, repack: bool = True) → str[source]¶

Overwrite the dataset with the name provided with this Strings object. If the dataset does not exist it is added

Parameters:

prefix_path (str) – Directory and filename prefix that all output files share
dataset (str, default="strings_array") – Name of the dataset to create in files
save_offsets (bool, default=True) – Defaults to True which will instruct the server to save the offsets array to HDF5 If False the offsets array will not be save and will be derived from the string values upon load/read.
repack (bool, default=True) – Default: True HDF5 does not release memory on delete. When True, the inaccessible data (that was overwritten) is removed. When False, the data remains, but is inaccessible. Setting to false will yield better performance, but will cause file sizes to expand.

Return type:

str - success message if successful

Raises:

RuntimeError – Raised if a server-side error is thrown saving the Strings object

Notes

If file does not contain File_Format attribute to indicate how it was saved, the file name is checked for _LOCALE#### to determine if it is distributed.
If the dataset provided does not exist, it will be added

upper() → Strings[source]¶

Returns a new Strings with all lowercase characters from the original replaced with their uppercase equivalent

Returns:: Strings with all lowercase characters from the original replaced with their uppercase equivalent
Return type:: Strings
Raises:: RuntimeError – Raised if there is a server-side error thrown

See also

Strings.lower

Examples

>>> strings = ak.array([f'StrINgS {i}' for i in range(5)])
>>> strings
array(['StrINgS 0', 'StrINgS 1', 'StrINgS 2', 'StrINgS 3', 'StrINgS 4'])
>>> strings.upper()
array(['STRINGS 0', 'STRINGS 1', 'STRINGS 2', 'STRINGS 3', 'STRINGS 4'])

class arkouda.util.Timedelta(pda, unit: str = _BASE_UNIT)[source]¶

Bases: _AbstractBaseTime

Represents a duration, the difference between two dates or times.

Timedelta is the Arkouda equivalent of pandas.TimedeltaIndex.

Parameters:

pda (int64 pdarray, pd.TimedeltaIndex, pd.Series, or np.timedelta64 array)
unit (str, default 'ns') –
For int64 pdarray, denotes the unit of the input. Ignored for pandas and numpy arrays, which carry their own unit. Not case-sensitive; prefixes of full names (like ‘sec’) are accepted.

Possible values:
- ’weeks’ or ‘w’
- ’days’ or ‘d’
- ’hours’ or ‘h’
- ’minutes’, ‘m’, or ‘t’
- ’seconds’ or ‘s’
- ’milliseconds’, ‘ms’, or ‘l’
- ’microseconds’, ‘us’, or ‘u’
- ’nanoseconds’, ‘ns’, or ‘n’
Unlike in pandas, units cannot be combined or mixed with integers

Notes

The .values attribute is always in nanoseconds with int64 dtype.

abs()[source]¶: Absolute value of time interval.

property components¶

property days¶

is_registered() → numpy.bool_[source]¶

Return True iff the object is contained in the registry or is a component of a registered object.

Returns:: Indicates if the object is contained in the registry
Return type:: numpy.bool
Raises:: RegistrationError – Raised if there’s a server-side error or a mis-match of registered components

See also

register, attach, unregister

Notes

Objects registered with the server are immune to deletion until they are unregistered.

property microseconds¶

property nanoseconds¶

register(user_defined_name)[source]¶

Parameters:

user_defined_name (str) – user defined name the timedelta is to be registered under, this will be the root name for underlying components

Returns:

The same Timedelta which is now registered with the arkouda server and has an updated name. This is an in-place modification, the original is returned to support a fluid programming style. Please note you cannot register two different Timedeltas with the same name.

Return type:

Timedelta

Raises:

TypeError – Raised if user_defined_name is not a str
RegistrationError – If the server was unable to register the timedelta with the user_defined_name

See also

see: https://data-apis.org/array-api/latest/API_specification/broadcasting.html#algorithm

Notes

Objects registered with the server are immune to deletion until they are unregistered.

property seconds¶

special_objType = 'Timedelta'¶

std(ddof: arkouda.numpy.dtypes.int_scalars = 0)[source]¶: Returns the standard deviation as a pd.Timedelta object

sum()[source]¶: Return the sum of all elements in the array.

supported_opeq¶

supported_with_datetime¶

supported_with_pdarray¶

supported_with_r_datetime¶

supported_with_r_pdarray¶

supported_with_r_timedelta¶

supported_with_timedelta¶

to_pandas()[source]¶: Convert array to a pandas TimedeltaIndex. Note: if the array size exceeds client.maxTransferBytes, a RuntimeError is raised.

See also

to_ndarray

total_seconds()[source]¶

unregister()[source]¶

Unregister this timedelta object in the arkouda server which was previously registered using register() and/or attached to using attach()

Raises:: RegistrationError – If the object is already unregistered or if there is a server error when attempting to unregister

See also

register, attach, is_registered

Notes

Objects registered with the server are immune to deletion until they are unregistered.

arkouda.util.arange(*args, **kwargs) → arkouda.numpy.pdarrayclass.pdarray[source]¶

arange([start,] stop[, stride,] dtype=int64)

Create a pdarray of consecutive integers within the interval [start, stop). If only one arg is given then arg is the stop parameter. If two args are given, then the first arg is start and second is stop. If three args are given, then the first arg is start, second is stop, third is stride.

The return value is cast to type dtype

Parameters:

start (int_scalars, optional) – Starting value (inclusive)
stop (int_scalars) – Stopping value (exclusive)
stride (int_scalars, optional) – The difference between consecutive elements, the default stride is 1, if stride is specified then start must also be specified.
dtype (np.dtype, type, or str) – The target dtype to cast values to
max_bits (int) – Specifies the maximum number of bits; only used for bigint pdarrays

Returns:

Integers from start (inclusive) to stop (exclusive) by stride

Return type:

pdarray, dtype

Raises:

TypeError – Raised if start, stop, or stride is not an int object
ZeroDivisionError – Raised if stride == 0

See also

linspace, zeros, ones, randint

Notes

Negative strides result in decreasing values. Currently, only int64 pdarrays can be created with this method. For float64 arrays, use the linspace method.

Examples

>>> ak.arange(0, 5, 1)
array([0 1 2 3 4])

>>> ak.arange(5, 0, -1)
array([5 4 3 2 1])

>>> ak.arange(0, 10, 2)
array([0 2 4 6 8])

>>> ak.arange(-5, -10, -1)
array([-5 -6 -7 -8 -9])

arkouda.util.attach(name: str)[source]¶

arkouda.util.attach_all(names: list)[source]¶

Attach to all objects registered with the names provide

Parameters:: names (list) – List of names to attach to
Return type:: dict

arkouda.util.broadcast(segments: pdarray, values: pdarray | Strings, size: int | np.int64 | np.uint64 = -1, permutation: pdarray | None = None)[source]¶

Broadcast a dense column vector to the rows of a sparse matrix or grouped array.

Parameters:

segments (pdarray, int64) – Offsets of the start of each row in the sparse matrix or grouped array. Must be sorted in ascending order.
values (pdarray, Strings) – The values to broadcast, one per row (or group)
size (int) – The total number of nonzeros in the matrix. If permutation is given, this argument is ignored and the size is inferred from the permutation array.
permutation (pdarray, int64) – The permutation to go from the original ordering of nonzeros to the ordering grouped by row. To broadcast values back to the original ordering, this permutation will be inverted. If no permutation is supplied, it is assumed that the original nonzeros were already grouped by row. In this case, the size argument must be given.

Returns:

The broadcast values, one per nonzero

Return type:

pdarray, Strings

Raises:

ValueError –

If segments and values are different sizes
If segments are empty
If number of nonzeros (either user-specified or inferred from permutation) is less than one

Examples

>>>
# Define a sparse matrix with 3 rows and 7 nonzeros
>>> row_starts = ak.array([0, 2, 5])
>>> nnz = 7
# Broadcast the row number to each nonzero element
>>> row_number = ak.arange(3)
>>> ak.broadcast(row_starts, row_number, nnz)
array([0 0 1 1 1 2 2])
# If the original nonzeros were in reverse order...
>>> permutation = ak.arange(6, -1, -1)
>>> ak.broadcast(row_starts, row_number, permutation=permutation)
array([2 2 1 1 1 0 0])

arkouda.util.broadcast_dims(sa: Sequence[int], sb: Sequence[int]) → Tuple[int, Ellipsis][source]¶

Algorithm to determine shape of broadcasted PD array given two array shapes

arkouda.util.coargsort(arrays: Sequence[arkouda.numpy.strings.Strings | arkouda.numpy.pdarrayclass.pdarray | arkouda.categorical.Categorical], algorithm: SortingAlgorithm = SortingAlgorithm.RadixSortLSD) → arkouda.numpy.pdarrayclass.pdarray[source]¶

Return the permutation that groups the rows (left-to-right), if the input arrays are treated as columns. The permutation sorts numeric columns, but not strings/Categoricals – strings/Categoricals are grouped, but not ordered.

Parameters:

arrays (Sequence of Strings, pdarray, or Categorical) – The columns (int64, uint64, float64, Strings, or Categorical) to sort by row
algorithm (SortingAlgorithm, default=SortingAlgorithm.RadixSortLSD) – The algorithm to be used for sorting the arrays.

Returns:

The indices that permute the rows to grouped order

Return type:

pdarray of int64

Raises:

ValueError – Raised if the pdarrays are not of the same size or if the parameter is not an Iterable containing pdarrays, Strings, or Categoricals

See also

argsort

Notes

Uses a least-significant-digit radix sort, which is stable and resilient to non-uniformity in data but communication intensive. Starts with the last array and moves forward. This sort operates directly on numeric types, but for Strings, it operates on a hash. Thus, while grouping of equivalent strings is guaranteed, lexicographic ordering of the groups is not. For Categoricals, coargsort sorts based on Categorical.codes which guarantees grouping of equivalent categories but not lexicographic ordering of those groups.

Examples

>>> a = ak.array([0, 1, 0, 1])
>>> b = ak.array([1, 1, 0, 0])
>>> perm = ak.coargsort([a, b])
>>> perm
array([2 0 3 1])
>>> a[perm]
array([0 0 1 1])
>>> b[perm]
array([0 1 0 1])

arkouda.util.concatenate(items, ordered=True)[source]¶

arkouda.util.convert_bytes(nbytes: arkouda.numpy.dtypes.int_scalars, unit: Literal['B', 'KB', 'MB', 'GB'] = 'B') → arkouda.numpy.dtypes.numeric_scalars[source]¶

Convert the number of bytes to KB, MB, or GB.

Parameters:

nbytes (int_scalars) – The number of bytes to convert
unit ({"B", "KB", "MB", "GB"}, default="B") – Unit to return. One of {‘B’, ‘KB’, ‘MB’, ‘GB’}.

Return type:

int

arkouda.util.convert_if_categorical(values)[source]¶: Convert a Categorical array to Strings for display

arkouda.util.create_pdarray(repMsg: str, max_bits=None) → pdarray[source]¶

Return a pdarray instance pointing to an array created by the arkouda server. The user should not call this function directly.

Parameters:

repMsg (str) – space-delimited string containing the pdarray name, datatype, size dimension, shape,and itemsize

Returns:

A pdarray with the same attributes and data as the pdarray; on GPU

Return type:

Raises:

ValueError – If there’s an error in parsing the repMsg parameter into the six values needed to create the pdarray instance
RuntimeError – Raised if a server-side error is thrown in the process of creating the pdarray instance

arkouda.util.dtype(dtype)[source]¶

Create a data type object.

Parameters:: dtype (object) – Object to be converted to a data type object.
Return type:: type

arkouda.util.enrich_inplace(data, keynames, aggregations, **kwargs)[source]¶

arkouda.util.expand(size: int | numpy.int64 | numpy.uint64, segs: arkouda.numpy.pdarrayclass.pdarray, vals: arkouda.numpy.pdarrayclass.pdarray) → arkouda.numpy.pdarrayclass.pdarray[source]¶

Expand an array with values placed into the indicated segments.

Parameters:

size (int, np.int64, or np.uint64) – The size of the array to be expanded
segs (pdarray) – The indices where the values should be placed
vals (pdarray) – The values to be placed in each segment

Returns:

The expanded array.

Return type:

arkouda.numpy.pdarrayclass.pdarray or arkouda.numpy.strings.Strings

Notes

This function (with different order of arguments) is now in arkouda proper as ak.broadcast. It is retained here for backwards compatibility.

arkouda.util.float_scalars¶

arkouda.util.generic_concat(items, ordered=True)[source]¶

arkouda.util.get_callback(x)[source]¶

arkouda.util.identity(x)[source]¶

arkouda.util.int_scalars¶

arkouda.util.invert_permutation(perm: arkouda.numpy.pdarrayclass.pdarray) → arkouda.numpy.pdarrayclass.pdarray[source]¶

Find the inverse of a permutation array.

Parameters:: perm (pdarray) – The permutation array.
Returns:: The inverse of the permutation array.
Return type:: pdarray

arkouda.util.is_float(arry: arkouda.numpy.pdarrayclass.pdarray | arkouda.numpy.strings.Strings | arkouda.categorical.Categorical | arkouda.series.Series | arkouda.index.Index) → bool[source]¶

Check if the dtype of the given array is float.

Parameters:: arry (((pdarray, Strings, Categorical))) – The input pdarray, Strings, or Categorical object.
Returns:: True if the dtype of pda is of type float, False otherwise.
Return type:: bool

Examples

>>> data = ak.array([1.0, 2, 3, 4, np.nan])
>>> ak.util.is_float(data)
True

>>> data2 = ak.arange(5)
>>> ak.util.is_float(data2)
False

arkouda.util.is_int(arry: arkouda.numpy.pdarrayclass.pdarray | arkouda.numpy.strings.Strings | arkouda.categorical.Categorical | arkouda.series.Series | arkouda.index.Index) → bool[source]¶

Check if the dtype of the given array is int.

Parameters:: arry (pdarray, Strings, Categorical, Series, or Index) – The input pdarray, Strings, or Categorical object.
Returns:: True if the dtype of pda is of type int, False otherwise.
Return type:: bool

Examples

>>> data = ak.array([1.0, 2, 3, 4, np.nan])
>>> ak.util.is_int(data)
False

>>> data2 = ak.arange(5)
>>> ak.util.is_int(data2)
True

arkouda.util.is_numeric(arry: arkouda.numpy.pdarrayclass.pdarray | arkouda.numpy.strings.Strings | arkouda.categorical.Categorical | arkouda.series.Series | arkouda.index.Index) → bool[source]¶

Check if the dtype of the given array is numeric.

Parameters:: arry (((pdarray, Strings, Categorical))) – The input pdarray, Strings, or Categorical object.
Returns:: True if the dtype of pda is numeric, False otherwise.
Return type:: bool

Examples

>>> data = ak.array([1, 2, 3, 4, 5])
>>> ak.util.is_numeric(data)
True

>>> strings = ak.array(["a", "b", "c"])
>>> ak.util.is_numeric(strings)
False

arkouda.util.is_registered(name: str, as_component: bool = False) → bool[source]¶

Determine if the name provided is associated with a registered Object

Parameters:

name (str) – The name to check for in the registry
as_component (bool, default=False) – When True, the name will be checked to determine if it is registered as a component of a registered object

Return type:

bool

arkouda.util.list_registry(detailed: bool = False)[source]¶

Return a list containing the names of all registered objects

Parameters:: detailed (bool) – Default = False Return details of registry objects. Currently includes object type for any objects
Returns:: Dict containing keys “Components” and “Objects”.
Return type:: dict
Raises:: RuntimeError – Raised if there’s a server-side error thrown

arkouda.util.map(values: arkouda.numpy.pdarrayclass.pdarray | arkouda.numpy.strings.Strings | arkouda.categorical.Categorical, mapping: dict | arkouda.Series) → arkouda.numpy.pdarrayclass.pdarray | arkouda.numpy.strings.Strings[source]¶

Map values of an array according to an input mapping.

Parameters:

values (pdarray, Strings, or Categorical) – The values to be mapped.
mapping (dict or Series) – The mapping correspondence.

Returns:

A new array with the values mapped by the mapping correspondence. When the input Series has Categorical values, the return Series will have Strings values. Otherwise, the return type will match the input type.

Return type:

Raises:

TypeError – Raised if arg is not of type dict or arkouda.Series. Raised if values not of type pdarray, Categorical, or Strings.

Examples

>>> import arkouda as ak
>>> ak.connect()
>>> from arkouda.numpy.util import map
>>> a = ak.array([2, 3, 2, 3, 4])
>>> a
array([2 3 2 3 4])
>>> ak.util.map(a, {4: 25.0, 2: 30.0, 1: 7.0, 3: 5.0})
array([30.00000000000000000 5.00000000000000000 30.00000000000000000
5.00000000000000000 25.00000000000000000])
>>> s = ak.Series(ak.array(["a","b","c","d"]), index = ak.array([4,2,1,3]))
>>> ak.util.map(a, s)
array(['b', 'd', 'b', 'd', 'a'])

arkouda.util.most_common(g, values)[source]¶

arkouda.util.numeric_scalars¶

class arkouda.util.pdarray(name: str, mydtype: numpy.dtype | str, size: arkouda.numpy.dtypes.int_scalars, ndim: arkouda.numpy.dtypes.int_scalars, shape: Sequence[int], itemsize: arkouda.numpy.dtypes.int_scalars, max_bits: int | None = None)[source]¶

The basic arkouda array class. This class contains only the attributies of the array; the data resides on the arkouda server. When a server operation results in a new array, arkouda will create a pdarray instance that points to the array data on the server. As such, the user should not initialize pdarray instances directly.

name¶

The server-side identifier for the array

Type:: str

dtype¶

The element type of the array

Type:: dtype

size¶

The number of elements in the array

Type:: int_scalars

ndim¶

The rank of the array (currently only rank 1 arrays supported)

Type:: int_scalars

shape¶

A list or tuple containing the sizes of each dimension of the array

Type:: Sequence[int]

itemsize¶

The size in bytes of each element

Type:: int_scalars

BinOps¶

OpEqOps¶

all(axis: int | Tuple[int, Ellipsis] | None = None, keepdims: bool = False) → arkouda.numpy.dtypes.bool_scalars | pdarray[source]¶: Return True iff all elements of the array evaluate to True.

any(axis: int | Tuple[int, Ellipsis] | None = None, keepdims: bool = False) → arkouda.numpy.dtypes.bool_scalars | pdarray[source]¶: Return True iff any element of the array evaluates to True.

argmax(axis: int | None | None = None, keepdims: bool = False) → numpy.int64 | numpy.uint64 | pdarray[source]¶: Return the index of the first occurrence of the array max value.

argmaxk(k: arkouda.numpy.dtypes.int_scalars) → pdarray[source]¶

Finds the indices corresponding to the maximum “k” values.

Parameters:: k (int_scalars) – The desired count of maximum values to be returned by the output.
Returns:: Indices corresponding to the maximum k values, sorted
Return type:: pdarray, int
Raises:: TypeError – Raised if pda is not a pdarray

argmin(axis: int | None | None = None, keepdims: bool = False) → numpy.int64 | numpy.uint64 | pdarray[source]¶: Return the index of the first occurrence of the array min value

argmink(k: arkouda.numpy.dtypes.int_scalars) → pdarray[source]¶

Compute the minimum “k” values.

Parameters:: k (int_scalars) – The desired count of maximum values to be returned by the output.
Returns:: Indices corresponding to the maximum k values from pda
Return type:: pdarray, int
Raises:: TypeError – Raised if pda is not a pdarray

astype(dtype) → pdarray[source]¶

Cast values of pdarray to provided dtype

Parameters:: dtype (np.dtype or str) – Dtype to cast to
Returns:: An arkouda pdarray with values converted to the specified data type
Return type:: ak.pdarray

Notes

This is essentially shorthand for ak.cast(x, ‘<dtype>’) where x is a pdarray.

static attach(user_defined_name: str) → pdarray[source]¶

class method to return a pdarray attached to the registered name in the arkouda server which was registered using register()

Parameters:: user_defined_name (str) – user defined name which array was registered under
Returns:: pdarray which is bound to the corresponding server side component which was registered with user_defined_name
Return type:: pdarray
Raises:: TypeError – Raised if user_defined_name is not a str

See also

register, unregister, is_registered, unregister_pdarray_by_name, list_registry

Notes

Registered names/pdarrays in the server are immune to deletion until they are unregistered.

Examples

>>> a = zeros(100)
>>> a.register("my_zeros")
>>> # potentially disconnect from server and reconnect to server
>>> b = ak.pdarray.attach("my_zeros")
>>> # ...other work...
>>> b.unregister()

bigint_to_uint_arrays() → List[pdarray][source]¶

Creates a list of uint pdarrays from a bigint pdarray. The first item in return will be the highest 64 bits of the bigint pdarray and the last item will be the lowest 64 bits.

Returns:: A list of uint pdarrays where: The first item in return will be the highest 64 bits of the bigint pdarray and the last item will be the lowest 64 bits.
Return type:: List[pdarrays]
Raises:: RuntimeError – Raised if there is a server-side error thrown

See also

pdarraycreation.bigint_from_uint_arrays

Examples

>>> a = ak.arange(2**64, 2**64 + 5)
>>> a
array(["18446744073709551616" "18446744073709551617" "18446744073709551618"
"18446744073709551619" "18446744073709551620"])

>>> a.bigint_to_uint_arrays()
[array([1 1 1 1 1]), array([0 1 2 3 4])]

clz() → pdarray[source]¶: Count the number of leading zeros in each element. See ak.clz.

corr(y: pdarray) → numpy.float64[source]¶

Compute the correlation between self and y using pearson correlation coefficient.

Parameters:

y (pdarray) – Other pdarray used to calculate correlation

Returns:

The scalar correlation of the two arrays

Return type:

np.float64

Raises:

TypeError – Raised if y is not a pdarray instance
RuntimeError – Raised if there’s a server-side error thrown

cov(y: pdarray) → numpy.float64[source]¶

Compute the covariance between self and y.

Parameters:

y (pdarray) – Other pdarray used to calculate covariance

Returns:

The scalar covariance of the two arrays

Return type:

np.float64

Raises:

TypeError – Raised if y is not a pdarray instance
RuntimeError – Raised if there’s a server-side error thrown

ctz() → pdarray[source]¶: Count the number of trailing zeros in each element. See ak.ctz.

dtype¶

equals(other) → arkouda.numpy.dtypes.bool_scalars[source]¶

Whether pdarrays are the same size and all entries are equal.

Parameters:: other (object) – object to compare.
Returns:: True if the pdarrays are the same, o.w. False.
Return type:: bool

Examples

>>> import arkouda as ak
>>> ak.connect()
>>> a = ak.array([1, 2, 3])
>>> a_cpy = ak.array([1, 2, 3])
>>> a.equals(a_cpy)
True
>>> a2 = ak.array([1, 2, 5)
>>> a.equals(a2)
False

fill(value: arkouda.numpy.dtypes.numeric_scalars) → None[source]¶

Fill the array (in place) with a constant value.

Parameters:: value (numeric_scalars)
Raises:: TypeError – Raised if value is not an int, int64, float, or float64

flatten()[source]¶

Return a copy of the array collapsed into one dimension.

Return type:: A copy of the input array, flattened to one dimension.

format_other(other) → str[source]¶

Attempt to cast scalar other to the element dtype of this pdarray, and print the resulting value to a string (e.g. for sending to a server command). The user should not call this function directly.

Parameters:: other (object) – The scalar to be cast to the pdarray.dtype
Return type:: string representation of np.dtype corresponding to the other parameter
Raises:: TypeError – Raised if the other parameter cannot be converted to Numpy dtype

property inferred_type: str | None¶: Return a string of the type inferred from the values.

info() → str[source]¶

Returns a JSON formatted string containing information about all components of self

Parameters:: None
Returns:: JSON string containing information about all components of self
Return type:: str

is_registered() → numpy.bool_[source]¶

Return True iff the object is contained in the registry

Parameters:: None
Returns:: Indicates if the object is contained in the registry
Return type:: bool
Raises:: RuntimeError – Raised if there’s a server-side error thrown

Note

This will return True if the object is registered itself or as a component of another object

is_sorted(axis: int | Tuple[int, Ellipsis] | None = None, keepdims: bool = False) → arkouda.numpy.dtypes.bool_scalars | pdarray[source]¶

Return True iff the array is monotonically non-decreasing.

Parameters:

None

Returns:

Indicates if the array is monotonically non-decreasing

Return type:

bool

Raises:

TypeError – Raised if pda is not a pdarray instance
RuntimeError – Raised if there’s a server-side error thrown

itemsize¶

max(axis: int | Tuple[int, Ellipsis] | None = None, keepdims: bool = False) → arkouda.numpy.dtypes.numpy_scalars | pdarray[source]¶: Return the maximum value of the array.

property max_bits¶

maxk(k: arkouda.numpy.dtypes.int_scalars) → pdarray[source]¶

Compute the maximum “k” values.

Parameters:: k (int_scalars) – The desired count of maximum values to be returned by the output.
Returns:: The maximum k values from pda
Return type:: pdarray, int
Raises:: TypeError – Raised if pda is not a pdarray

mean() → numpy.float64[source]¶: Return the mean of the array.

min(axis: int | Tuple[int, Ellipsis] | None = None, keepdims: bool = False) → arkouda.numpy.dtypes.numpy_scalars | pdarray[source]¶: Return the minimum value of the array.

mink(k: arkouda.numpy.dtypes.int_scalars) → pdarray[source]¶

Compute the minimum “k” values.

Parameters:: k (int_scalars) – The desired count of maximum values to be returned by the output.
Returns:: The maximum k values from pda
Return type:: pdarray, int
Raises:: TypeError – Raised if pda is not a pdarray

name¶

property nbytes¶

The size of the pdarray in bytes.

Returns:: The size of the pdarray in bytes.
Return type:: int

ndim¶

objType = 'pdarray'¶

opeq(other, op)[source]¶

parity() → pdarray[source]¶: Find the parity (XOR of all bits) in each element. See ak.parity.

popcount() → pdarray[source]¶: Find the population (number of bits set) in each element. See ak.popcount.

pretty_print_info() → None[source]¶

Prints information about all components of self in a human readable format

Parameters:: None
Return type:: None

prod(axis: int | Tuple[int, Ellipsis] | None = None, keepdims: bool = False) → arkouda.numpy.dtypes.numpy_scalars | pdarray[source]¶: Return the product of all elements in the array. Return value is always a np.float64 or np.int64.

register(user_defined_name: str) → pdarray[source]¶

Register this pdarray with a user defined name in the arkouda server so it can be attached to later using pdarray.attach() This is an in-place operation, registering a pdarray more than once will update the name in the registry and remove the previously registered name. A name can only be registered to one pdarray at a time.

Parameters:

user_defined_name (str) – user defined name array is to be registered under

Returns:

The same pdarray which is now registered with the arkouda server and has an updated name. This is an in-place modification, the original is returned to support a fluid programming style. Please note you cannot register two different pdarrays with the same name.

Return type:

Raises:

TypeError – Raised if user_defined_name is not a str
RegistrationError – If the server was unable to register the pdarray with the user_defined_name If the user is attempting to register more than one pdarray with the same name, the former should be unregistered first to free up the registration name.

See also

attach, unregister, is_registered, list_registry, unregister_pdarray_by_name

Notes

Registered names/pdarrays in the server are immune to deletion until they are unregistered.

Examples

>>> a = zeros(100)
>>> a.register("my_zeros")
>>> # potentially disconnect from server and reconnect to server
>>> b = ak.pdarray.attach("my_zeros")
>>> # ...other work...
>>> b.unregister()

registered_name: str | None = None¶

reshape(*shape)[source]¶

Gives a new shape to an array without changing its data.

Parameters:: shape (int, tuple of ints, or pdarray) – The new shape should be compatible with the original shape.
Returns:: a pdarray with the same data, reshaped to the new shape
Return type:: pdarray

rotl(other) → pdarray[source]¶: Rotate bits left by <other>.

rotr(other) → pdarray[source]¶: Rotate bits right by <other>.

save(prefix_path: str, dataset: str = 'array', mode: str = 'truncate', compression: str | None = None, file_format: str = 'HDF5', file_type: str = 'distribute') → str[source]¶

DEPRECATED Save the pdarray to HDF5 or Parquet. The result is a collection of files, one file per locale of the arkouda server, where each filename starts with prefix_path. HDF5 support single files, in which case the file name will only be that provided. Each locale saves its chunk of the array to its corresponding file. :param prefix_path: Directory and filename prefix that all output files share :type prefix_path: str :param dataset: Name of the dataset to create in files (must not already exist) :type dataset: str :param mode: By default, truncate (overwrite) output files, if they exist.

If ‘append’, attempt to create new dataset in existing files.

Parameters:

compression (str (Optional)) – (None | “snappy” | “gzip” | “brotli” | “zstd” | “lz4”) Sets the compression type used with Parquet files
file_format (str {'HDF5', 'Parquet'}) – By default, saved files will be written to the HDF5 file format. If ‘Parquet’, the files will be written to the Parquet file format. This is case insensitive.
file_type (str ("single" | "distribute")) – Default: “distribute” When set to single, dataset is written to a single file. When distribute, dataset is written on a file per locale. This is only supported by HDF5 files and will have no impact of Parquet Files.

Return type:

string message indicating result of save operation

Raises:

RuntimeError – Raised if a server-side error is thrown saving the pdarray
ValueError – Raised if there is an error in parsing the prefix path pointing to file write location or if the mode parameter is neither truncate nor append
TypeError – Raised if any one of the prefix_path, dataset, or mode parameters is not a string

See also

save_all, load, read, to_parquet, to_hdf

Notes

The prefix_path must be visible to the arkouda server and the user must have write permission. Output files have names of the form <prefix_path>_LOCALE, where  ranges from 0 to numLocales. If any of the output files already exist and the mode is ‘truncate’, they will be overwritten. If the mode is ‘append’ and the number of output files is less than the number of locales or a dataset with the same name already exists, a RuntimeError will result. Previously all files saved in Parquet format were saved with a .parquet file extension. This will require you to use load as if you saved the file with the extension. Try this if an older file is not being found. Any file extension can be used.The file I/O does not rely on the extension to determine the file format.

Examples

>>> a = ak.arange(25)
>>> # Saving without an extension
>>> a.save('path/prefix', dataset='array')
Saves the array to numLocales HDF5 files with the name ``cwd/path/name_prefix_LOCALE####``
>>> # Saving with an extension (HDF5)
>>> a.save('path/prefix.h5', dataset='array')
Saves the array to numLocales HDF5 files with the name
``cwd/path/name_prefix_LOCALE####.h5`` where #### is replaced by each locale number
>>> # Saving with an extension (Parquet)
>>> a.save('path/prefix.parquet', dataset='array', file_format='Parquet')
Saves the array in numLocales Parquet files with the name
``cwd/path/name_prefix_LOCALE####.parquet`` where #### is replaced by each locale number

property shape¶

Return the shape of an array.

Returns:: The elements of the shape tuple give the lengths of the corresponding array dimensions.
Return type:: tuple of int

size¶

slice_bits(low, high) → pdarray[source]¶

Returns a pdarray containing only bits from low to high of self.

This is zero indexed and inclusive on both ends, so slicing the bottom 64 bits is pda.slice_bits(0, 63)

Parameters:

low (int) – The lowest bit included in the slice (inclusive) zero indexed, so the first bit is 0
high (int) – The highest bit included in the slice (inclusive)

Returns:

A new pdarray containing the bits of self from low to high

Return type: