arkouda.numpy

Submodules

Attributes

Exceptions

RegistrationError

Error/Exception used when the Arkouda Server cannot register an object

RegistrationError

Error/Exception used when the Arkouda Server cannot register an object

RegistrationError

Error/Exception used when the Arkouda Server cannot register an object

Classes

ARKOUDA_SUPPORTED_BOOLS

Built-in immutable sequence.

ARKOUDA_SUPPORTED_DTYPES

Built-in immutable sequence.

ARKOUDA_SUPPORTED_FLOATS

Built-in immutable sequence.

ARKOUDA_SUPPORTED_INTS

Built-in immutable sequence.

ARKOUDA_SUPPORTED_NUMBERS

Built-in immutable sequence.

BoolDType

DType class corresponding to the scalar type and dtype of the same name.

ByteDType

DType class corresponding to the scalar type and dtype of the same name.

BytesDType

DType class corresponding to the scalar type and dtype of the same name.

CLongDoubleDType

DType class corresponding to the scalar type and dtype of the same name.

Complex128DType

DType class corresponding to the scalar type and dtype of the same name.

Complex64DType

DType class corresponding to the scalar type and dtype of the same name.

DType

An enumeration.

DTypeObjects

frozenset() -> empty frozenset object

DTypes

frozenset() -> empty frozenset object

DateTime64DType

DType class corresponding to the scalar type and dtype of the same name.

Datetime

Represents a date and/or time.

Enum

Generic enumeration.

ErrorMode

Generic enumeration.

Float16DType

DType class corresponding to the scalar type and dtype of the same name.

Float32DType

DType class corresponding to the scalar type and dtype of the same name.

Float64DType

DType class corresponding to the scalar type and dtype of the same name.

GroupBy

Group an array or list of arrays by value, usually in preparation

Int16DType

DType class corresponding to the scalar type and dtype of the same name.

Int32DType

DType class corresponding to the scalar type and dtype of the same name.

Int64DType

DType class corresponding to the scalar type and dtype of the same name.

Int8DType

DType class corresponding to the scalar type and dtype of the same name.

IntDType

DType class corresponding to the scalar type and dtype of the same name.

LongDType

DType class corresponding to the scalar type and dtype of the same name.

LongDoubleDType

DType class corresponding to the scalar type and dtype of the same name.

LongLongDType

DType class corresponding to the scalar type and dtype of the same name.

NUMBER_FORMAT_STRINGS

dict() -> new empty dictionary

NumericDTypes

frozenset() -> empty frozenset object

ObjectDType

DType class corresponding to the scalar type and dtype of the same name.

ScalarDTypes

frozenset() -> empty frozenset object

SegArray

SeriesDTypes

dict() -> new empty dictionary

ShortDType

DType class corresponding to the scalar type and dtype of the same name.

StrDType

DType class corresponding to the scalar type and dtype of the same name.

Strings

Represents an array of strings whose data resides on the

Strings

Represents an array of strings whose data resides on the

TimeDelta64DType

DType class corresponding to the scalar type and dtype of the same name.

Timedelta

Represents a duration, the difference between two dates or times.

UByteDType

DType class corresponding to the scalar type and dtype of the same name.

UInt16DType

DType class corresponding to the scalar type and dtype of the same name.

UInt32DType

DType class corresponding to the scalar type and dtype of the same name.

UInt64DType

DType class corresponding to the scalar type and dtype of the same name.

UInt8DType

DType class corresponding to the scalar type and dtype of the same name.

UIntDType

DType class corresponding to the scalar type and dtype of the same name.

ULongDType

DType class corresponding to the scalar type and dtype of the same name.

ULongLongDType

DType class corresponding to the scalar type and dtype of the same name.

UShortDType

DType class corresponding to the scalar type and dtype of the same name.

Union

Union type; Union[X, Y] means either X or Y.

VoidDType

DType class corresponding to the scalar type and dtype of the same name.

akbool

Boolean type (True or False), stored as a byte.

akint64

Signed integer type, compatible with Python int and C long.

akuint64

Unsigned integer type, compatible with C unsigned long.

all_scalars

The central part of internal API.

annotations

bigint

Datatype for representing integers of variable size.

bitType

Unsigned integer type, compatible with C unsigned long.

bool_

Boolean type (True or False), stored as a byte.

bool_scalars

The central part of internal API.

complex128

Complex number type composed of two double-precision floating-point

complex64

Complex number type composed of two single-precision floating-point

float16

Half-precision floating-point number type.

float32

Single-precision floating-point number type, compatible with C float.

float64

Double-precision floating-point number type, compatible with Python float

float_scalars

The central part of internal API.

int16

Signed integer type, compatible with C short.

int32

Signed integer type, compatible with C int.

int64

Signed integer type, compatible with Python int and C long.

int64

Signed integer type, compatible with Python int and C long.

int8

Signed integer type, compatible with C char.

intTypes

frozenset() -> empty frozenset object

intTypes

frozenset() -> empty frozenset object

int_scalars

The central part of internal API.

int_scalars

The central part of internal API.

int_scalars

The central part of internal API.

numeric_and_bool_scalars

The central part of internal API.

numeric_scalars

The central part of internal API.

numpy_scalars

The central part of internal API.

pdarray

The basic arkouda array class. This class contains only the

pdarray

The basic arkouda array class. This class contains only the

pdarray

The basic arkouda array class. This class contains only the

str_

A unicode string.

str_

A unicode string.

str_scalars

The central part of internal API.

uint16

Unsigned integer type, compatible with C unsigned short.

uint32

Unsigned integer type, compatible with C unsigned int.

uint64

Unsigned integer type, compatible with C unsigned long.

uint8

Unsigned integer type, compatible with C unsigned char.

Functions

abs(→ arkouda.numpy.pdarrayclass.pdarray)

Return the element-wise absolute value of the array.

arange(→ arkouda.numpy.pdarrayclass.pdarray)

arange([start,] stop[, stride,] dtype=int64)

arange(→ arkouda.numpy.pdarrayclass.pdarray)

arange([start,] stop[, stride,] dtype=int64)

arccos(→ arkouda.numpy.pdarrayclass.pdarray)

Return the element-wise inverse cosine of the array. The result is between 0 and pi.

arccosh(→ arkouda.numpy.pdarrayclass.pdarray)

Return the element-wise inverse hyperbolic cosine of the array.

arcsin(→ arkouda.numpy.pdarrayclass.pdarray)

Return the element-wise inverse sine of the array. The result is between -pi/2 and pi/2.

arcsinh(→ arkouda.numpy.pdarrayclass.pdarray)

Return the element-wise inverse hyperbolic sine of the array.

arctan(→ arkouda.numpy.pdarrayclass.pdarray)

Return the element-wise inverse tangent of the array. The result is between -pi/2 and pi/2.

arctan2(→ arkouda.numpy.pdarrayclass.pdarray)

Return the element-wise inverse tangent of the array pair. The result chosen is the

arctanh(→ arkouda.numpy.pdarrayclass.pdarray)

Return the element-wise inverse hyperbolic tangent of the array.

argmaxk(→ pdarray)

Find the indices corresponding to the k maximum values of an array.

argmink(→ pdarray)

Finds the indices corresponding to the k minimum values of an array.

argsort(→ arkouda.numpy.pdarrayclass.pdarray)

Return the permutation that sorts the array.

array(→ Union[arkouda.numpy.pdarrayclass.pdarray, ...)

Convert a Python or Numpy Iterable to a pdarray or Strings object, sending

array(→ Union[arkouda.numpy.pdarrayclass.pdarray, ...)

Convert a Python or Numpy Iterable to a pdarray or Strings object, sending

array_equal(→ bool)

Compares two pdarrays for equality.

attach(name)

attach_all(names)

Attach to all objects registered with the names provide

attach_pdarray(→ pdarray)

class method to return a pdarray attached to the registered name in the arkouda

bigint_from_uint_arrays(arrays[, max_bits])

Create a bigint pdarray from an iterable of uint pdarrays.

broadcast(segments, values[, size, permutation])

Broadcast a dense column vector to the rows of a sparse matrix or grouped array.

broadcast_dims(→ Tuple[int, Ellipsis])

Algorithm to determine shape of broadcasted PD array given two array shapes

broadcast_to_shape(→ pdarray)

Create a "broadcasted" array (of rank 'nd') by copying an array into an

can_cast(→ bool)

Returns True if cast between data types can occur according to the casting rule.

cast(→ Union[Union[arkouda.numpy.pdarrayclass.pdarray, ...)

Cast an array to another dtype.

cast(→ Union[Union[arkouda.numpy.pdarrayclass.pdarray, ...)

Cast an array to another dtype.

ceil(→ arkouda.numpy.pdarrayclass.pdarray)

Return the element-wise ceiling of the array.

clear(→ None)

Send a clear message to clear all unregistered data from the server symbol table

clip(→ arkouda.numpy.pdarrayclass.pdarray)

Clip (limit) the values in an array to a given range [lo,hi]

clz(→ pdarray)

Count leading zeros for each integer in an array.

coargsort(→ arkouda.numpy.pdarrayclass.pdarray)

Return the permutation that groups the rows (left-to-right), if the

concatenate(...)

Concatenate a list or tuple of pdarray or Strings objects into

concatenate(...)

Concatenate a list or tuple of pdarray or Strings objects into

corr(→ numpy.float64)

Return the correlation between x and y

cos(→ arkouda.numpy.pdarrayclass.pdarray)

Return the element-wise cosine of the array.

cosh(→ arkouda.numpy.pdarrayclass.pdarray)

Return the element-wise hyperbolic cosine of the array.

count_nonzero(→ numpy.int64)

Compute the nonzero count of a given array. 1D case only, for now.

cov(→ numpy.float64)

Return the covariance of x and y

create_pdarray(→ pdarray)

Return a pdarray instance pointing to an array created by the arkouda server.

create_pdarray(→ pdarray)

Return a pdarray instance pointing to an array created by the arkouda server.

ctz(→ pdarray)

Count trailing zeros for each integer in an array.

cumprod(→ arkouda.numpy.pdarrayclass.pdarray)

Return the cumulative product over the array.

cumsum(→ arkouda.numpy.pdarrayclass.pdarray)

Return the cumulative sum over the array.

date_range([start, end, periods, freq, tz, normalize, ...])

Creates a fixed frequency Datetime range. Alias for

deg2rad(→ arkouda.numpy.pdarrayclass.pdarray)

Converts angles element-wise from degrees to radians.

delete(→ arkouda.numpy.pdarrayclass.pdarray)

Return a copy of 'arr' with elements along the specified axis removed.

divmod(→ Tuple[pdarray, pdarray])

dot(→ Union[arkouda.numpy.dtypes.numpy_scalars, pdarray])

Returns the sum of the elementwise product of two arrays of the same size (the dot product) or

dtype(dtype)

Create a data type object.

exp(→ arkouda.numpy.pdarrayclass.pdarray)

Return the element-wise exponential of the array.

expm1(→ arkouda.numpy.pdarrayclass.pdarray)

Return the element-wise exponential of the array minus one.

eye(→ arkouda.numpy.pdarrayclass.pdarray)

Return a pdarray with zeros everywhere except along a diagonal, which is all ones.

flip(→ Union[arkouda.numpy.pdarrayclass.pdarray, ...)

Reverse an array's values along a particular axis or axes.

floor(→ arkouda.numpy.pdarrayclass.pdarray)

Return the element-wise floor of the array.

fmod(→ pdarray)

Returns the element-wise remainder of division.

from_series(...)

Converts a Pandas Series to an Arkouda pdarray or Strings object. If

from_series(...)

Converts a Pandas Series to an Arkouda pdarray or Strings object. If

full(→ Union[arkouda.numpy.pdarrayclass.pdarray, ...)

Create a pdarray filled with fill_value.

full_like(→ Union[arkouda.numpy.pdarrayclass.pdarray, ...)

Create a pdarray filled with fill_value of the same size and dtype as an existing

getArkoudaLogger(→ ArkoudaLogger)

A convenience method for instantiating an ArkoudaLogger that retrieves the

get_byteorder(→ str)

Get a concrete byteorder (turns '=' into '<' or '>') on the client.

get_server_byteorder(→ str)

Get the server's byteorder

hash(→ Union[Tuple[arkouda.numpy.pdarrayclass.pdarray, ...)

Return an element-wise hash of the array or list of arrays.

histogram(→ Tuple[arkouda.numpy.pdarrayclass.pdarray, ...)

Compute a histogram of evenly spaced bins over the range of an array.

histogram2d(...)

Compute the bi-dimensional histogram of two data samples with evenly spaced bins

histogramdd(...)

Compute the multidimensional histogram of data in sample with evenly spaced bins.

in1d(→ arkouda.groupbyclass.groupable)

Test whether each element of a 1-D array is also present in a second array.

indexof1d(→ arkouda.numpy.pdarrayclass.pdarray)

Return indices of query items in a search list of items. Items not found will be excluded.

intersect1d(...)

Find the intersection of two arrays.

isSupportedBool(num)

Whether a scalar is an arkouda supported boolean dtype.

isSupportedDType(→ bool)

Whether a scalar is an arkouda supported dtype.

isSupportedFloat(num)

Whether a scalar is an arkouda supported float dtype.

isSupportedInt(num)

Whether a scalar is an arkouda supported integer dtype.

isSupportedInt(num)

Whether a scalar is an arkouda supported integer dtype.

isSupportedInt(num)

Whether a scalar is an arkouda supported integer dtype.

isSupportedNumber(num)

Whether a scalar is an arkouda supported numeric dtype.

is_registered(→ bool)

Determine if the name provided is associated with a registered Object

isfinite(→ arkouda.numpy.pdarrayclass.pdarray)

Return the element-wise isfinite check applied to the array.

isinf(→ arkouda.numpy.pdarrayclass.pdarray)

Return the element-wise isinf check applied to the array.

isnan(→ arkouda.numpy.pdarrayclass.pdarray)

Return the element-wise isnan check applied to the array.

linspace(→ arkouda.numpy.pdarrayclass.pdarray)

Create a pdarray of linearly-spaced floats in a closed interval.

log(→ arkouda.numpy.pdarrayclass.pdarray)

Return the element-wise natural log of the array.

log10(→ arkouda.numpy.pdarrayclass.pdarray)

Return the element-wise base 10 log of the array.

log1p(→ arkouda.numpy.pdarrayclass.pdarray)

Return the element-wise natural log of one plus the array.

log2(→ arkouda.numpy.pdarrayclass.pdarray)

Return the element-wise base 2 log of the array.

matmul(→ arkouda.numpy.pdarrayclass.pdarray)

Compute the product of two matrices.

maxk(→ pdarray)

Find the k maximum values of an array.

mean(→ numpy.float64)

Return the mean of the array.

median(→ numpy.float64)

Compute the median of a given array. 1d case only, for now.

mink(→ pdarray)

Find the k minimum values of an array.

mod(→ pdarray)

Returns the element-wise remainder of division.

ones(→ arkouda.numpy.pdarrayclass.pdarray)

Create a pdarray filled with ones.

ones(→ arkouda.numpy.pdarrayclass.pdarray)

Create a pdarray filled with ones.

ones_like(→ arkouda.numpy.pdarrayclass.pdarray)

Create a one-filled pdarray of the same size and dtype as an existing

parity(→ pdarray)

Find the bit parity (XOR of all bits) for each integer in an array.

popcount(→ pdarray)

Find the population (number of bits set) for each integer in an array.

power(→ pdarray)

Raises an array to a power. If where is given, the operation will only take place in the positions

promote_to_common_dtype(→ Tuple[Any, ...)

Promote a list of pdarrays to a common dtype.

putmask(→ None)

Overwrites elements of A with elements from B based upon a mask array.

rad2deg(→ arkouda.numpy.pdarrayclass.pdarray)

Converts angles element-wise from radians to degrees.

randint(→ arkouda.numpy.pdarrayclass.pdarray)

Generate a pdarray of randomized int, float, or bool values in a

random_strings_lognormal(→ arkouda.numpy.strings.Strings)

Generate random strings with log-normally distributed lengths and

random_strings_uniform(→ arkouda.numpy.strings.Strings)

Generate random strings with lengths uniformly distributed between

register_all(data)

Register all objects in the provided dictionary

repeat(→ arkouda.numpy.pdarrayclass.pdarray)

Repeat each element of an array after themselves

resolve_scalar_dtype(→ str)

Try to infer what dtype arkouda_server should treat val as.

rotl(→ pdarray)

Rotate bits of <x> to the left by <rot>.

rotr(→ pdarray)

Rotate bits of <x> to the left by <rot>.

round(→ arkouda.numpy.pdarrayclass.pdarray)

Return the element-wise rounding of the array.

scalar_array(→ arkouda.numpy.pdarrayclass.pdarray)

Create a pdarray from a single scalar value.

segarray(segments, values[, lengths, grouping])

Alias for the from_parts function. Prevents user from needing to call ak.SegArray constructor

setdiff1d(→ Union[arkouda.numpy.pdarrayclass.pdarray, ...)

Find the set difference of two arrays.

setxor1d(→ Union[arkouda.numpy.pdarrayclass.pdarray, ...)

Find the set exclusive-or (symmetric difference) of two arrays.

shape(→ Tuple)

Return the shape of an array.

sign(→ arkouda.numpy.pdarrayclass.pdarray)

Return the element-wise sign of the array.

sin(→ arkouda.numpy.pdarrayclass.pdarray)

Return the element-wise sine of the array.

sinh(→ arkouda.numpy.pdarrayclass.pdarray)

Return the element-wise hyperbolic sine of the array.

sort(→ arkouda.numpy.pdarrayclass.pdarray)

Return a sorted copy of the array. Only sorts numeric arrays;

sqrt(→ pdarray)

Takes the square root of array. If where is given, the operation will only take place in

square(→ arkouda.numpy.pdarrayclass.pdarray)

Return the element-wise square of the array.

squeeze(→ arkouda.numpy.pdarrayclass.pdarray)

Remove degenerate (size one) dimensions from an array.

standard_normal(→ arkouda.numpy.pdarrayclass.pdarray)

Draw real numbers from the standard normal distribution.

std(→ numpy.float64)

Return the standard deviation of values in the array. The standard

tan(→ arkouda.numpy.pdarrayclass.pdarray)

Return the element-wise tangent of the array.

tanh(→ arkouda.numpy.pdarrayclass.pdarray)

Return the element-wise hyperbolic tangent of the array.

tile(→ arkouda.numpy.pdarrayclass.pdarray)

Construct an array by repeating A the number of times given by reps.

timedelta_range([start, end, periods, freq, name, closed])

Return a fixed frequency TimedeltaIndex, with day as the default

transpose(→ arkouda.numpy.pdarrayclass.pdarray)

Compute the transpose of a matrix.

tril(→ arkouda.numpy.pdarrayclass.pdarray)

Return a copy of the pda with the upper triangle zeroed out

triu(→ arkouda.numpy.pdarrayclass.pdarray)

Return a copy of the pda with the lower triangle zeroed out

trunc(→ arkouda.numpy.pdarrayclass.pdarray)

Return the element-wise truncation of the array.

uniform(, high, seed, ...)

Generate a pdarray with uniformly distributed random float values

union1d(→ arkouda.groupbyclass.groupable)

Find the union of two arrays/List of Arrays.

unregister(→ str)

unregister_all(names)

Unregister all names provided

unregister_pdarray_by_name(→ None)

Unregister a named pdarray in the arkouda server which was previously

value_counts(→ tuple[arkouda.groupbyclass.groupable, ...)

Count the occurrences of the unique values of an array.

var(→ numpy.float64)

Return the variance of values in the array.

vecdot(→ arkouda.numpy.pdarrayclass.pdarray)

Compute the generalized dot product of two vectors along the given axis.

vstack(→ arkouda.numpy.pdarrayclass.pdarray)

Stack a sequence of arrays vertically (row-wise).

where(→ Union[arkouda.numpy.pdarrayclass.pdarray, ...)

Returns an array with elements chosen from A and B based upon a

zeros(→ arkouda.numpy.pdarrayclass.pdarray)

Create a pdarray filled with zeros.

zeros(→ arkouda.numpy.pdarrayclass.pdarray)

Create a pdarray filled with zeros.

zeros_like(→ arkouda.numpy.pdarrayclass.pdarray)

Create a zero-filled pdarray of the same size and dtype as an existing

Package Contents

class arkouda.numpy.ARKOUDA_SUPPORTED_BOOLS

Built-in immutable sequence.

If no argument is given, the constructor returns an empty tuple. If iterable is specified the tuple is initialized from iterable’s items.

If the argument is a tuple, the return value is the same object.

count(value, /)

Return number of occurrences of value.

index(value, start=0, stop=9223372036854775807, /)

Return first index of value.

Raises ValueError if the value is not present.

class arkouda.numpy.ARKOUDA_SUPPORTED_DTYPES

Built-in immutable sequence.

If no argument is given, the constructor returns an empty tuple. If iterable is specified the tuple is initialized from iterable’s items.

If the argument is a tuple, the return value is the same object.

count(value, /)

Return number of occurrences of value.

index(value, start=0, stop=9223372036854775807, /)

Return first index of value.

Raises ValueError if the value is not present.

class arkouda.numpy.ARKOUDA_SUPPORTED_FLOATS

Built-in immutable sequence.

If no argument is given, the constructor returns an empty tuple. If iterable is specified the tuple is initialized from iterable’s items.

If the argument is a tuple, the return value is the same object.

count(value, /)

Return number of occurrences of value.

index(value, start=0, stop=9223372036854775807, /)

Return first index of value.

Raises ValueError if the value is not present.

class arkouda.numpy.ARKOUDA_SUPPORTED_INTS

Built-in immutable sequence.

If no argument is given, the constructor returns an empty tuple. If iterable is specified the tuple is initialized from iterable’s items.

If the argument is a tuple, the return value is the same object.

count(value, /)

Return number of occurrences of value.

index(value, start=0, stop=9223372036854775807, /)

Return first index of value.

Raises ValueError if the value is not present.

class arkouda.numpy.ARKOUDA_SUPPORTED_NUMBERS

Built-in immutable sequence.

If no argument is given, the constructor returns an empty tuple. If iterable is specified the tuple is initialized from iterable’s items.

If the argument is a tuple, the return value is the same object.

count(value, /)

Return number of occurrences of value.

index(value, start=0, stop=9223372036854775807, /)

Return first index of value.

Raises ValueError if the value is not present.

class arkouda.numpy.BoolDType(obj, align=False, copy=False)

Bases: numpy.dtype

DType class corresponding to the scalar type and dtype of the same name.

Please see numpy.dtype for the typical way to create dtype instances and arrays.dtypes for additional information.

class arkouda.numpy.ByteDType(obj, align=False, copy=False)

Bases: numpy.dtype

DType class corresponding to the scalar type and dtype of the same name.

Please see numpy.dtype for the typical way to create dtype instances and arrays.dtypes for additional information.

class arkouda.numpy.BytesDType(obj, align=False, copy=False)

Bases: numpy.dtype

DType class corresponding to the scalar type and dtype of the same name.

Please see numpy.dtype for the typical way to create dtype instances and arrays.dtypes for additional information.

class arkouda.numpy.CLongDoubleDType(obj, align=False, copy=False)

Bases: numpy.dtype

DType class corresponding to the scalar type and dtype of the same name.

Please see numpy.dtype for the typical way to create dtype instances and arrays.dtypes for additional information.

class arkouda.numpy.Complex128DType(obj, align=False, copy=False)

Bases: numpy.dtype

DType class corresponding to the scalar type and dtype of the same name.

Please see numpy.dtype for the typical way to create dtype instances and arrays.dtypes for additional information.

class arkouda.numpy.Complex64DType(obj, align=False, copy=False)

Bases: numpy.dtype

DType class corresponding to the scalar type and dtype of the same name.

Please see numpy.dtype for the typical way to create dtype instances and arrays.dtypes for additional information.

class arkouda.numpy.DType[source]

An enumeration.

BIGINT(*args, **kwargs)

An enumeration.

BOOL(*args, **kwargs)

An enumeration.

COMPLEX128(*args, **kwargs)

An enumeration.

COMPLEX64(*args, **kwargs)

An enumeration.

FLOAT(*args, **kwargs)

An enumeration.

FLOAT32(*args, **kwargs)

An enumeration.

FLOAT64(*args, **kwargs)

An enumeration.

INT(*args, **kwargs)

An enumeration.

INT16(*args, **kwargs)

An enumeration.

INT32(*args, **kwargs)

An enumeration.

INT64(*args, **kwargs)

An enumeration.

INT8(*args, **kwargs)

An enumeration.

STR(*args, **kwargs)

An enumeration.

UINT(*args, **kwargs)

An enumeration.

UINT16(*args, **kwargs)

An enumeration.

UINT32(*args, **kwargs)

An enumeration.

UINT64(*args, **kwargs)

An enumeration.

UINT8(*args, **kwargs)

An enumeration.

name(*args, **kwargs)

The name of the Enum member.

value(*args, **kwargs)

The value of the Enum member.

class arkouda.numpy.DTypeObjects

frozenset() -> empty frozenset object frozenset(iterable) -> frozenset object

Build an immutable unordered collection of unique elements.

copy(*args, **kwargs)

Return a shallow copy of a set.

difference(*args, **kwargs)

Return the difference of two or more sets as a new set.

(i.e. all elements that are in this set but not the others.)

intersection(*args, **kwargs)

Return the intersection of two sets as a new set.

(i.e. all elements that are in both sets.)

isdisjoint(*args, **kwargs)

Return True if two sets have a null intersection.

issubset(*args, **kwargs)

Report whether another set contains this set.

issuperset(*args, **kwargs)

Report whether this set contains another set.

symmetric_difference(*args, **kwargs)

Return the symmetric difference of two sets as a new set.

(i.e. all elements that are in exactly one of the sets.)

union(*args, **kwargs)

Return the union of sets as a new set.

(i.e. all elements that are in either set.)

class arkouda.numpy.DTypes

frozenset() -> empty frozenset object frozenset(iterable) -> frozenset object

Build an immutable unordered collection of unique elements.

copy(*args, **kwargs)

Return a shallow copy of a set.

difference(*args, **kwargs)

Return the difference of two or more sets as a new set.

(i.e. all elements that are in this set but not the others.)

intersection(*args, **kwargs)

Return the intersection of two sets as a new set.

(i.e. all elements that are in both sets.)

isdisjoint(*args, **kwargs)

Return True if two sets have a null intersection.

issubset(*args, **kwargs)

Report whether another set contains this set.

issuperset(*args, **kwargs)

Report whether this set contains another set.

symmetric_difference(*args, **kwargs)

Return the symmetric difference of two sets as a new set.

(i.e. all elements that are in exactly one of the sets.)

union(*args, **kwargs)

Return the union of sets as a new set.

(i.e. all elements that are in either set.)

class arkouda.numpy.DateTime64DType(obj, align=False, copy=False)

Bases: numpy.dtype

DType class corresponding to the scalar type and dtype of the same name.

Please see numpy.dtype for the typical way to create dtype instances and arrays.dtypes for additional information.

class arkouda.numpy.Datetime(pda, unit: str = _BASE_UNIT)[source]

Bases: _AbstractBaseTime

Represents a date and/or time.

Datetime is the Arkouda analog to pandas DatetimeIndex and other timeseries data types.

Parameters:
  • pda (int64 pdarray, pd.DatetimeIndex, pd.Series, or np.datetime64 array)

  • unit (str, default 'ns') –

    For int64 pdarray, denotes the unit of the input. Ignored for pandas and numpy arrays, which carry their own unit. Not case-sensitive; prefixes of full names (like ‘sec’) are accepted.

    Possible values:

    • ’weeks’ or ‘w’

    • ’days’ or ‘d’

    • ’hours’ or ‘h’

    • ’minutes’, ‘m’, or ‘t’

    • ’seconds’ or ‘s’

    • ’milliseconds’, ‘ms’, or ‘l’

    • ’microseconds’, ‘us’, or ‘u’

    • ’nanoseconds’, ‘ns’, or ‘n’

    Unlike in pandas, units cannot be combined or mixed with integers

Notes

The .values attribute is always in nanoseconds with int64 dtype.

property date
property day
property day_of_week
property day_of_year
property dayofweek
property dayofyear
property hour
property is_leap_year
is_registered() numpy.bool_[source]

Return True iff the object is contained in the registry or is a component of a registered object.

Returns:

Indicates if the object is contained in the registry

Return type:

numpy.bool

Raises:

RegistrationError – Raised if there’s a server-side error or a mis-match of registered components

Notes

Objects registered with the server are immune to deletion until they are unregistered.

isocalendar()[source]
property microsecond
property millisecond
property minute
property month
property nanosecond
register(user_defined_name)[source]

Register this Datetime object and underlying components with the Arkouda server

Parameters:

user_defined_name (str) – user defined name the Datetime is to be registered under, this will be the root name for underlying components

Returns:

The same Datetime which is now registered with the arkouda server and has an updated name. This is an in-place modification, the original is returned to support a fluid programming style. Please note you cannot register two different Datetimes with the same name.

Return type:

Datetime

Raises:
  • TypeError – Raised if user_defined_name is not a str

  • RegistrationError – If the server was unable to register the Datetimes with the user_defined_name

Notes

Objects registered with the server are immune to deletion until they are unregistered.

property second
special_objType = 'Datetime'
sum()[source]

Return sum of array elements along the given axis.

Parameters:
  • axis (int, Tuple[int, ...], optional, default = None) – The axis or axes along which to do the operation If None, the computation is done across the entire array.

  • keepdims (bool, optional, default = False) – Whether to keep the singleton dimension(s) along axis in the result.

Returns:

numpy_scalar if axis is omitted, in which case operation is done over entire array pdarray if axis is supplied, in which case the operation is done along that axis

Return type:

numpy_scalar or pdarray

Raises:
  • TypeError – Raised if pda is not a pdarray instance

  • RuntimeError – Raised if there’s a server-side error thrown

Examples

>>> ak.sum(ak.array([1,2,3,4,5]))
15
>>> ak.sum(ak.array([5.5,4.5,3.5,2.5,1.5]))
17.5
>>> ak.array([[1,2,3],[5,4,3]]).sum(axis=1)
array([6 12])

Notes

Works as a method of a pdarray (e.g. a.sum()) or a standalone function (e.g. ak.sum(a))

supported_opeq
supported_with_datetime
supported_with_pdarray
supported_with_r_datetime
supported_with_r_pdarray
supported_with_r_timedelta
supported_with_timedelta
to_pandas()[source]

Convert array to a pandas DatetimeIndex. Note: if the array size exceeds client.maxTransferBytes, a RuntimeError is raised.

See also

to_ndarray

unregister()[source]

Unregister this Datetime object in the arkouda server which was previously registered using register() and/or attached to using attach()

Raises:

RegistrationError – If the object is already unregistered or if there is a server error when attempting to unregister

Notes

Objects registered with the server are immune to deletion until they are unregistered.

property week
property weekday
property weekofyear
property year
class arkouda.numpy.Enum

Generic enumeration.

Derive from this class to define new enumerations.

class arkouda.numpy.ErrorMode[source]

Bases: enum.Enum

Generic enumeration.

Derive from this class to define new enumerations.

ignore = 'ignore'
return_validity = 'return_validity'
strict = 'strict'
class arkouda.numpy.Float16DType(obj, align=False, copy=False)

Bases: numpy.dtype

DType class corresponding to the scalar type and dtype of the same name.

Please see numpy.dtype for the typical way to create dtype instances and arrays.dtypes for additional information.

class arkouda.numpy.Float32DType(obj, align=False, copy=False)

Bases: numpy.dtype

DType class corresponding to the scalar type and dtype of the same name.

Please see numpy.dtype for the typical way to create dtype instances and arrays.dtypes for additional information.

class arkouda.numpy.Float64DType(obj, align=False, copy=False)

Bases: numpy.dtype

DType class corresponding to the scalar type and dtype of the same name.

Please see numpy.dtype for the typical way to create dtype instances and arrays.dtypes for additional information.

class arkouda.numpy.GroupBy[source]

Group an array or list of arrays by value, usually in preparation for aggregating the within-group values of another array.

Parameters:
  • keys ((list of) pdarray, Strings, or Categorical) – The array to group by value, or if list, the column arrays to group by row

  • assume_sorted (bool) – If True, assume keys is already sorted (Default: False)

nkeys

The number of key arrays (columns)

Type:

int

size[source]

The length of the input array(s), i.e. number of rows

Type:

int

permutation

The permutation that sorts the keys array(s) by value (row)

Type:

pdarray

unique_keys

The unique values of the keys array(s), in grouped order

Type:

(list of) pdarray, Strings, or Categorical

ngroups

The length of the unique_keys array(s), i.e. number of groups

Type:

int

segments

The start index of each group in the grouped array(s)

Type:

pdarray

logger

Used for all logging operations

Type:

ArkoudaLogger

dropna

If True, and the groupby keys contain NaN values, the NaN values together with the corresponding row will be dropped. Otherwise, the rows corresponding to NaN values will be kept.

Type:

bool (default=True)

Raises:

TypeError – Raised if keys is a pdarray with a dtype other than int64

Notes

Integral pdarrays, Strings, and Categoricals are natively supported, but float64 and bool arrays are not.

For a user-defined class to be groupable, it must inherit from pdarray and define or overload the grouping API:

  1. a ._get_grouping_keys() method that returns a list of pdarrays that can be (co)argsorted.

  2. (Optional) a .group() method that returns the permutation that groups the array

If the input is a single array with a .group() method defined, method 2 will be used; otherwise, method 1 will be used.

AND(values: pdarray) Tuple[pdarray | List[pdarray | Strings], pdarray][source]

Bitwise AND of values in each segment.

Using the permutation stored in the GroupBy instance, group another array of values and perform a bitwise AND reduction on each group.

Parameters:

values (pdarray, int64) – The values to group and reduce with AND

Returns:

  • unique_keys ((list of) pdarray or Strings) – The unique keys, in grouped order

  • result (pdarray, int64) – Bitwise AND of values in segments corresponding to keys

Raises:
  • TypeError – Raised if the values array is not a pdarray or if the pdarray dtype is not int64

  • ValueError – Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array

  • RuntimeError – Raised if all is not supported for the values dtype

OR(values: pdarray) Tuple[pdarray | List[pdarray | Strings], pdarray][source]

Bitwise OR of values in each segment.

Using the permutation stored in the GroupBy instance, group another array of values and perform a bitwise OR reduction on each group.

Parameters:

values (pdarray, int64) – The values to group and reduce with OR

Returns:

  • unique_keys ((list of) pdarray or Strings) – The unique keys, in grouped order

  • result (pdarray, int64) – Bitwise OR of values in segments corresponding to keys

Raises:
  • TypeError – Raised if the values array is not a pdarray or if the pdarray dtype is not int64

  • ValueError – Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array

  • RuntimeError – Raised if all is not supported for the values dtype

Reductions(*args, **kwargs)

frozenset() -> empty frozenset object frozenset(iterable) -> frozenset object

Build an immutable unordered collection of unique elements.

XOR(values: pdarray) Tuple[pdarray | List[pdarray | Strings], pdarray][source]

Bitwise XOR of values in each segment.

Using the permutation stored in the GroupBy instance, group another array of values and perform a bitwise XOR reduction on each group.

Parameters:

values (pdarray, int64) – The values to group and reduce with XOR

Returns:

  • unique_keys ((list of) pdarray or Strings) – The unique keys, in grouped order

  • result (pdarray, int64) – Bitwise XOR of values in segments corresponding to keys

Raises:
  • TypeError – Raised if the values array is not a pdarray or if the pdarray dtype is not int64

  • ValueError – Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array

  • RuntimeError – Raised if all is not supported for the values dtype

aggregate(values: groupable, operator: str, skipna: bool = True, ddof: int_scalars = 1) Tuple[groupable, groupable][source]

Using the permutation stored in the GroupBy instance, group another array of values and apply a reduction to each group’s values.

Parameters:
  • values (pdarray) – The values to group and reduce

  • operator (str) – The name of the reduction operator to use

  • skipna (bool) – boolean which determines if NANs should be skipped

  • ddof (int_scalars) – “Delta Degrees of Freedom” used in calculating std

Returns:

  • unique_keys (groupable) – The unique keys, in grouped order

  • aggregates (groupable) – One aggregate value per unique key in the GroupBy instance

Raises:
  • TypeError – Raised if the values array is not a pdarray

  • ValueError – Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array

  • RuntimeError – Raised if the requested operator is not supported for the values dtype

Examples

>>> keys = ak.arange(0, 10)
>>> vals = ak.linspace(-1, 1, 10)
>>> g = ak.GroupBy(keys)
>>> g.aggregate(vals, 'sum')
(array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]), array([-1, -0.77777777777777768,
-0.55555555555555536, -0.33333333333333348, -0.11111111111111116,
0.11111111111111116, 0.33333333333333348, 0.55555555555555536, 0.77777777777777768,
1]))
>>> g.aggregate(vals, 'min')
(array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]), array([-1, -0.77777777777777779,
-0.55555555555555558, -0.33333333333333337, -0.11111111111111116, 0.11111111111111116,
0.33333333333333326, 0.55555555555555536, 0.77777777777777768, 1]))
all(values: pdarray) Tuple[pdarray | List[pdarray | Strings], pdarray][source]

Using the permutation stored in the GroupBy instance, group another array of values and perform an “and” reduction on each group.

Parameters:

values (pdarray, bool) – The values to group and reduce with “and”

Returns:

  • unique_keys ((list of) pdarray or Strings) – The unique keys, in grouped order

  • group_any (pdarray, bool) – One bool per unique key in the GroupBy instance

Raises:
  • TypeError – Raised if the values array is not a pdarray or if the pdarray dtype is not bool

  • ValueError – Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array

  • RuntimeError – Raised if all is not supported for the values dtype

any(values: pdarray) Tuple[pdarray | List[pdarray | Strings], pdarray][source]

Using the permutation stored in the GroupBy instance, group another array of values and perform an “or” reduction on each group.

Parameters:

values (pdarray, bool) – The values to group and reduce with “or”

Returns:

  • unique_keys ((list of) pdarray or Strings) – The unique keys, in grouped order

  • group_any (pdarray, bool) – One bool per unique key in the GroupBy instance

Raises:
  • TypeError – Raised if the values array is not a pdarray or if the pdarray dtype is not bool

  • ValueError – Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array

argmax(values: pdarray) Tuple[groupable, pdarray][source]

Using the permutation stored in the GroupBy instance, group another array of values and return the location of the first maximum of each group’s values.

Parameters:

values (pdarray) – The values to group and find argmax

Returns:

  • unique_keys ((list of) pdarray or Strings) – The unique keys, in grouped order

  • group_argmaxima (pdarray, int64) – One index per unique key in the GroupBy instance

Raises:
  • TypeError – Raised if the values array is not a pdarray object or if argmax is not supported for the values dtype

  • ValueError – Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array

Notes

The returned indices refer to the original values array as passed in, not the permutation applied by the GroupBy instance.

Examples

>>> a = ak.randint(1,5,10)
>>> a
array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2])
>>> g = ak.GroupBy(a)
>>> g.keys
array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2])
>>> b = ak.randint(1,5,10)
>>> b
array([3, 3, 3, 4, 1, 1, 3, 3, 3, 4])
>>> g.argmax(b)
(array([2, 3, 4]), array([9, 3, 2]))
argmin(values: pdarray) Tuple[groupable, pdarray][source]

Using the permutation stored in the GroupBy instance, group another array of values and return the location of the first minimum of each group’s values.

Parameters:

values (pdarray) – The values to group and find argmin

Returns:

  • unique_keys ((list of) pdarray or Strings) – The unique keys, in grouped order

  • group_argminima (pdarray, int64) – One index per unique key in the GroupBy instance

Raises:
  • TypeError – Raised if the values array is not a pdarray object or if argmax is not supported for the values dtype

  • ValueError – Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array

  • RuntimeError – Raised if argmin is not supported for the values dtype

Notes

The returned indices refer to the original values array as passed in, not the permutation applied by the GroupBy instance.

Examples

>>> a = ak.randint(1,5,10)
>>> a
array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2])
>>> g = ak.GroupBy(a)
>>> g.keys
array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2])
>>> b = ak.randint(1,5,10)
>>> b
array([3, 3, 3, 4, 1, 1, 3, 3, 3, 4])
>>> g.argmin(b)
(array([2, 3, 4]), array([5, 4, 2]))
attach(user_defined_name: str) GroupBy[source]

Function to return a GroupBy object attached to the registered name in the arkouda server which was registered using register()

Parameters:

user_defined_name (str) – user defined name which GroupBy object was registered under

Returns:

The GroupBy object created by re-attaching to the corresponding server components

Return type:

GroupBy

Raises:

RegistrationError – if user_defined_name is not registered

broadcast(values: pdarray | Strings, permute: bool = True) pdarray | Strings[source]

Fill each group’s segment with a constant value.

Parameters:
  • values (pdarray, Strings) – The values to put in each group’s segment

  • permute (bool) – If True (default), permute broadcast values back to the ordering of the original array on which GroupBy was called. If False, the broadcast values are grouped by value.

Returns:

The broadcasted values

Return type:

pdarray, Strings

Raises:
  • TypeError – Raised if value is not a pdarray object

  • ValueError – Raised if the values array does not have one value per segment

Notes

This function is a sparse analog of np.broadcast. If a GroupBy object represents a sparse matrix (tensor), then this function takes a (dense) column vector and replicates each value to the non-zero elements in the corresponding row.

Examples

>>> a = ak.array([0, 1, 0, 1, 0])
>>> values = ak.array([3, 5])
>>> g = ak.GroupBy(a)
# By default, result is in original order
>>> g.broadcast(values)
array([3, 5, 3, 5, 3])
# With permute=False, result is in grouped order
>>> g.broadcast(values, permute=False)
array([3, 3, 3, 5, 5]
>>> a = ak.randint(1,5,10)
>>> a
array([3, 1, 4, 4, 4, 1, 3, 3, 2, 2])
>>> g = ak.GroupBy(a)
>>> keys,counts = g.size()
>>> g.broadcast(counts > 2)
array([True False True True True False True True False False])
>>> g.broadcast(counts == 3)
array([True False True True True False True True False False])
>>> g.broadcast(counts < 4)
array([True True True True True True True True True True])
build_from_components(user_defined_name: str | None = None, **kwargs) GroupBy[source]

function to build a new GroupBy object from component keys and permutation.

Parameters:
  • user_defined_name (str (Optional) Passing a name will init the new GroupBy) – and assign it the given name

  • kwargs (dict Dictionary of components required for rebuilding the GroupBy.) – Expected keys are “orig_keys”, “permutation”, “unique_keys”, and “segments”

Returns:

The GroupBy object created by using the given components

Return type:

GroupBy

count(values: pdarray) Tuple[groupable, pdarray][source]

Count the number of elements in each group. NaN values will be excluded from the total.

Parameters:

values (pdarray) – The values to be count by group (excluding NaN values).

Returns:

  • unique_keys ((list of) pdarray or Strings) – The unique keys, in grouped order

  • counts (pdarray, int64) – The number of times each unique key appears (excluding NaN values).

Examples

>>> a = ak.array([1, 0, -1, 1, 0, -1])
>>> a
array([1 0 -1 1 0 -1])
>>> b = ak.array([1, np.nan, -1, np.nan, np.nan, -1], dtype = "float64")
>>> b
array([1.00000000000000000 nan -1.00000000000000000 nan nan -1.00000000000000000])
>>> g = ak.GroupBy(a)
>>> keys,counts = g.count(b)
>>> keys
array([-1 0 1])
>>> counts
array([2 0 1])
first(values: groupable_element_type) Tuple[groupable, groupable_element_type][source]

First value in each group.

Parameters:

values (pdarray-like) – The values from which to take the first of each group

Returns:

  • unique_keys ((list of) pdarray-like) – The unique keys, in grouped order

  • result (pdarray-like) – The first value of each group

from_return_msg(rep_msg)[source]
head(values: groupable_element_type, n: int = 5, return_indices: bool = True) Tuple[groupable, groupable_element_type][source]

Return the first n values from each group.

Parameters:
  • values ((list of) pdarray-like) – The values from which to select, according to their group membership.

  • n (int, optional, default = 5) – Maximum number of items to return for each group. If the number of values in a group is less than n, all the values from that group will be returned.

  • return_indices (bool, default False) – If True, return the indices of the sampled values. Otherwise, return the selected values.

Returns:

  • unique_keys ((list of) pdarray-like) – The unique keys, in grouped order

  • result (pdarray-like) – The first n items of each group. If return_indices is True, the result are indices. O.W. the result are values.

Examples

>>> a = ak.arange(10) %3
>>> a
array([0 1 2 0 1 2 0 1 2 0])
>>> v = ak.arange(10)
>>> v
array([0 1 2 3 4 5 6 7 8 9])
>>> g = GroupBy(a)
>>> unique_keys, idx = g.head(v, 2, return_indices=True)
>>> _, values = g.head(v, 2, return_indices=False)
>>> unique_keys
array([0 1 2])
>>> idx
array([0 3 1 4 2 5])
>>> values
array([0 3 1 4 2 5])
>>> v2 =  -2 * ak.arange(10)
>>> v2
array([0 -2 -4 -6 -8 -10 -12 -14 -16 -18])
>>> _, idx2 = g.head(v2, 2, return_indices=True)
>>> _, values2 = g.head(v2, 2, return_indices=False)
>>> idx2
array([0 3 1 4 2 5])
>>> values2
array([0 -6 -2 -8 -4 -10])
is_registered() bool[source]

Return True if the object is contained in the registry

Returns:

Indicates if the object is contained in the registry

Return type:

bool

Raises:

RegistrationError – Raised if there’s a server-side error or a mismatch of registered components

Notes

Objects registered with the server are immune to deletion until they are unregistered.

max(values: pdarray, skipna: bool = True) Tuple[groupable, pdarray][source]

Using the permutation stored in the GroupBy instance, group another array of values and return the maximum of each group’s values.

Parameters:
  • values (pdarray) – The values to group and find maxima

  • skipna (bool) – boolean which determines if NANs should be skipped

Returns:

  • unique_keys ((list of) pdarray or Strings) – The unique keys, in grouped order

  • group_maxima (pdarray) – One maximum per unique key in the GroupBy instance

Raises:
  • TypeError – Raised if the values array is not a pdarray object or if max is not supported for the values dtype

  • ValueError – Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array

  • RuntimeError – Raised if max is not supported for the values dtype

Examples

>>> a = ak.randint(1,5,10)
>>> a
array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2])
>>> g = ak.GroupBy(a)
>>> g.keys
array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2])
>>> b = ak.randint(1,5,10)
>>> b
array([3, 3, 3, 4, 1, 1, 3, 3, 3, 4])
>>> g.max(b)
(array([2, 3, 4]), array([4, 4, 3]))
mean(values: pdarray, skipna: bool = True) Tuple[groupable, pdarray][source]

Using the permutation stored in the GroupBy instance, group another array of values and compute the mean of each group’s values.

Parameters:
  • values (pdarray) – The values to group and average

  • skipna (bool) – boolean which determines if NANs should be skipped

Returns:

  • unique_keys ((list of) pdarray or Strings) – The unique keys, in grouped order

  • group_means (pdarray, float64) – One mean value per unique key in the GroupBy instance

Raises:
  • TypeError – Raised if the values array is not a pdarray object

  • ValueError – Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array

Notes

The return dtype is always float64.

Examples

>>> a = ak.randint(1,5,10)
>>> a
array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2])
>>> g = ak.GroupBy(a)
>>> g.keys
array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2])
>>> b = ak.randint(1,5,10)
>>> b
array([3, 3, 3, 4, 1, 1, 3, 3, 3, 4])
>>> g.mean(b)
(array([2, 3, 4]), array([2.6666666666666665, 2.7999999999999998, 3]))
median(values: pdarray, skipna: bool = True) Tuple[groupable, pdarray][source]

Using the permutation stored in the GroupBy instance, group another array of values and compute the median of each group’s values.

Parameters:
  • values (pdarray) – The values to group and find median

  • skipna (bool) – boolean which determines if NANs should be skipped

Returns:

  • unique_keys ((list of) pdarray or Strings) – The unique keys, in grouped order

  • group_medians (pdarray, float64) – One median value per unique key in the GroupBy instance

Raises:
  • TypeError – Raised if the values array is not a pdarray object

  • ValueError – Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array

Notes

The return dtype is always float64.

Examples

>>> a = ak.randint(1,5,9)
>>> a
array([4 1 4 3 2 2 2 3 3])
>>> g = ak.GroupBy(a)
>>> g.keys
array([4 1 4 3 2 2 2 3 3])
>>> b = ak.linspace(-5,5,9)
>>> b
array([-5 -3.75 -2.5 -1.25 0 1.25 2.5 3.75 5])
>>> g.median(b)
(array([1 2 3 4]), array([-3.75 1.25 3.75 -3.75]))
min(values: pdarray, skipna: bool = True) Tuple[groupable, pdarray][source]

Using the permutation stored in the GroupBy instance, group another array of values and return the minimum of each group’s values.

Parameters:
  • values (pdarray) – The values to group and find minima

  • skipna (bool) – boolean which determines if NANs should be skipped

Returns:

  • unique_keys ((list of) pdarray or Strings) – The unique keys, in grouped order

  • group_minima (pdarray) – One minimum per unique key in the GroupBy instance

Raises:
  • TypeError – Raised if the values array is not a pdarray object or if min is not supported for the values dtype

  • ValueError – Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array

  • RuntimeError – Raised if min is not supported for the values dtype

Examples

>>> a = ak.randint(1,5,10)
>>> a
array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2])
>>> g = ak.GroupBy(a)
>>> g.keys
array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2])
>>> b = ak.randint(1,5,10)
>>> b
array([3, 3, 3, 4, 1, 1, 3, 3, 3, 4])
>>> g.min(b)
(array([2, 3, 4]), array([1, 1, 3]))
mode(values: groupable) Tuple[groupable, groupable][source]

Most common value in each group. If a group is multi-modal, return the modal value that occurs first.

Parameters:

values ((list of) pdarray-like) – The values from which to take the mode of each group

Returns:

  • unique_keys ((list of) pdarray-like) – The unique keys, in grouped order

  • result ((list of) pdarray-like) – The most common value of each group

most_common(values)[source]

(Deprecated) See GroupBy.mode().

nunique(values: groupable) Tuple[groupable, pdarray][source]

Using the permutation stored in the GroupBy instance, group another array of values and return the number of unique values in each group.

Parameters:

values (pdarray, int64) – The values to group and find unique values

Returns:

  • unique_keys (groupable) – The unique keys, in grouped order

  • group_nunique (groupable) – Number of unique values per unique key in the GroupBy instance

Raises:
  • TypeError – Raised if the dtype(s) of values array(s) does/do not support the nunique method

  • ValueError – Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array

  • RuntimeError – Raised if nunique is not supported for the values dtype

Examples

>>> data = ak.array([3, 4, 3, 1, 1, 4, 3, 4, 1, 4])
>>> data
array([3, 4, 3, 1, 1, 4, 3, 4, 1, 4])
>>> labels = ak.array([1, 1, 1, 2, 2, 2, 3, 3, 3, 4])
>>> labels
ak.array([1, 1, 1, 2, 2, 2, 3, 3, 3, 4])
>>> g = ak.GroupBy(labels)
>>> g.keys
ak.array([1, 1, 1, 2, 2, 2, 3, 3, 3, 4])
>>> g.nunique(data)
array([1,2,3,4]), array([2, 2, 3, 1])
#    Group (1,1,1) has values [3,4,3] -> there are 2 unique values 3&4
#    Group (2,2,2) has values [1,1,4] -> 2 unique values 1&4
#    Group (3,3,3) has values [3,4,1] -> 3 unique values
#    Group (4) has values [4] -> 1 unique value
objType(*args, **kwargs)

str(object=’’) -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.__str__() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to ‘strict’.

prod(values: pdarray, skipna: bool = True) Tuple[groupable, pdarray][source]

Using the permutation stored in the GroupBy instance, group another array of values and compute the product of each group’s values.

Parameters:
  • values (pdarray) – The values to group and multiply

  • skipna (bool) – boolean which determines if NANs should be skipped

Returns:

  • unique_keys ((list of) pdarray or Strings) – The unique keys, in grouped order

  • group_products (pdarray, float64) – One product per unique key in the GroupBy instance

Raises:
  • TypeError – Raised if the values array is not a pdarray object

  • ValueError – Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array

  • RuntimeError – Raised if prod is not supported for the values dtype

Notes

The return dtype is always float64.

Examples

>>> a = ak.randint(1,5,10)
>>> a
array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2])
>>> g = ak.GroupBy(a)
>>> g.keys
array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2])
>>> b = ak.randint(1,5,10)
>>> b
array([3, 3, 3, 4, 1, 1, 3, 3, 3, 4])
>>> g.prod(b)
(array([2, 3, 4]), array([12, 108.00000000000003, 8.9999999999999982]))
register(user_defined_name: str) GroupBy[source]

Register this GroupBy object and underlying components with the Arkouda server

Parameters:

user_defined_name (str) – user defined name the GroupBy is to be registered under, this will be the root name for underlying components

Returns:

The same GroupBy which is now registered with the arkouda server and has an updated name. This is an in-place modification, the original is returned to support a fluid programming style. Please note you cannot register two different GroupBys with the same name.

Return type:

GroupBy

Raises:
  • TypeError – Raised if user_defined_name is not a str

  • RegistrationError – If the server was unable to register the GroupBy with the user_defined_name

Notes

Objects registered with the server are immune to deletion until they are unregistered.

sample(values: groupable, n=None, frac=None, replace=False, weights=None, random_state=None, return_indices=False, permute_samples=False)[source]

Return a random sample from each group. You can either specify the number of elements or the fraction of elements to be sampled. random_state can be used for reproducibility

Parameters:
  • values ((list of) pdarray-like) – The values from which to sample, according to their group membership.

  • n (int, optional) – Number of items to return for each group. Cannot be used with frac and must be no larger than the smallest group unless replace is True. Default is one if frac is None.

  • frac (float, optional) – Fraction of items to return. Cannot be used with n.

  • replace (bool, default False) – Allow or disallow sampling of the value more than once.

  • weights (pdarray, optional) – Default None results in equal probability weighting. If passed a pdarray, then values must have the same length as the groupby keys and will be used as sampling probabilities after normalization within each group. Weights must be non-negative with at least one positive element within each group.

  • random_state (int or ak.random.Generator, optional) – If int, seed for random number generator. If ak.random.Generator, use as given.

  • return_indices (bool, default False) – if True, return the indices of the sampled values. Otherwise, return the sample values.

  • permute_samples (bool, default False) – if True, return permute the samples according to group Otherwise, keep samples in original order.

Returns:

if return_indices is True, return the indices of the sampled values. Otherwise, return the sample values.

Return type:

pdarray

size() Tuple[groupable, pdarray][source]

Count the number of elements in each group, i.e. the number of times each key appears. This counts the total number of rows (including NaN values).

Parameters:

none

Returns:

  • unique_keys ((list of) pdarray or Strings) – The unique keys, in grouped order

  • counts (pdarray, int64) – The number of times each unique key appears

See also

count

Examples

>>> a = ak.randint(1,5,10)
>>> a
array([3, 2, 3, 1, 2, 4, 3, 4, 3, 4])
>>> g = ak.GroupBy(a)
>>> keys,counts = g.size()
>>> keys
array([1, 2, 3, 4])
>>> counts
array([1, 2, 4, 3])
std(values: pdarray, skipna: bool = True, ddof: int_scalars = 1) Tuple[groupable, pdarray][source]

Using the permutation stored in the GroupBy instance, group another array of values and compute the standard deviation of each group’s values.

Parameters:
  • values (pdarray) – The values to group and find standard deviation

  • skipna (bool) – boolean which determines if NANs should be skipped

  • ddof (int_scalars) – “Delta Degrees of Freedom” used in calculating std

Returns:

  • unique_keys ((list of) pdarray or Strings) – The unique keys, in grouped order

  • group_stds (pdarray, float64) – One std value per unique key in the GroupBy instance

Raises:
  • TypeError – Raised if the values array is not a pdarray object

  • ValueError – Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array

Notes

The return dtype is always float64.

The standard deviation is the square root of the average of the squared deviations from the mean, i.e., std = sqrt(mean((x - x.mean())**2)).

The average squared deviation is normally calculated as x.sum() / N, where N = len(x). If, however, ddof is specified, the divisor N - ddof is used instead. In standard statistical practice, ddof=1 provides an unbiased estimator of the variance of the infinite population. ddof=0 provides a maximum likelihood estimate of the variance for normally distributed variables. The standard deviation computed in this function is the square root of the estimated variance, so even with ddof=1, it will not be an unbiased estimate of the standard deviation per se.

Examples

>>> a = ak.randint(1,5,10)
>>> a
array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2])
>>> g = ak.GroupBy(a)
>>> g.keys
array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2])
>>> b = ak.randint(1,5,10)
>>> b
array([3, 3, 3, 4, 1, 1, 3, 3, 3, 4])
>>> g.std(b)
(array([2 3 4]), array([1.5275252316519465 1.0954451150103321 0]))
sum(values: pdarray, skipna: bool = True) Tuple[groupable, pdarray][source]

Using the permutation stored in the GroupBy instance, group another array of values and sum each group’s values.

Parameters:
  • values (pdarray) – The values to group and sum

  • skipna (bool) – boolean which determines if NANs should be skipped

Returns:

  • unique_keys ((list of) pdarray or Strings) – The unique keys, in grouped order

  • group_sums (pdarray) – One sum per unique key in the GroupBy instance

Raises:
  • TypeError – Raised if the values array is not a pdarray object

  • ValueError – Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array

Notes

The grouped sum of a boolean pdarray returns integers.

Examples

>>> a = ak.randint(1,5,10)
>>> a
array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2])
>>> g = ak.GroupBy(a)
>>> g.keys
array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2])
>>> b = ak.randint(1,5,10)
>>> b
array([3, 3, 3, 4, 1, 1, 3, 3, 3, 4])
>>> g.sum(b)
(array([2, 3, 4]), array([8, 14, 6]))
tail(values: groupable_element_type, n: int = 5, return_indices: bool = True) Tuple[groupable, groupable_element_type][source]

Return the last n values from each group.

Parameters:
  • values ((list of) pdarray-like) – The values from which to select, according to their group membership.

  • n (int, optional, default = 5) – Maximum number of items to return for each group. If the number of values in a group is less than n, all the values from that group will be returned.

  • return_indices (bool, default False) – If True, return the indices of the sampled values. Otherwise, return the selected values.

Returns:

  • unique_keys ((list of) pdarray-like) – The unique keys, in grouped order

  • result (pdarray-like) – The last n items of each group. If return_indices is True, the result are indices. O.W. the result are values.

Examples

>>> a = ak.arange(10) %3
>>> a
array([0 1 2 0 1 2 0 1 2 0])
>>> v = ak.arange(10)
>>> v
array([0 1 2 3 4 5 6 7 8 9])
>>> g = GroupBy(a)
>>> unique_keys, idx = g.tail(v, 2, return_indices=True)
>>> _, values = g.tail(v, 2, return_indices=False)
>>> unique_keys
array([0 1 2])
>>> idx
array([6 9 4 7 5 8])
>>> values
array([6 9 4 7 5 8])
>>> v2 =  -2 * ak.arange(10)
>>> v2
array([0 -2 -4 -6 -8 -10 -12 -14 -16 -18])
>>> _, idx2 = g.tail(v2, 2, return_indices=True)
>>> _, values2 = g.tail(v2, 2, return_indices=False)
>>> idx2
array([6 9 4 7 5 8])
>>> values2
array([-12 -18 -8 -14 -10 -16])
to_hdf(prefix_path, dataset='groupby', mode='truncate', file_type='distribute')[source]

Save the GroupBy to HDF5. The result is a collection of HDF5 files, one file per locale of the arkouda server, where each filename starts with prefix_path.

Parameters:
  • prefix_path (str) – Directory and filename prefix that all output files will share

  • dataset (str) – Name prefix for saved data within the HDF5 file

  • mode (str {'truncate' | 'append'}) – By default, truncate (overwrite) output files, if they exist. If ‘append’, add data as a new column to existing files.

  • file_type (str ("single" | "distribute")) – Default: “distribute” When set to single, dataset is written to a single file. When distribute, dataset is written on a file per locale. This is only supported by HDF5 files and will have no impact of Parquet Files.

Returns:

  • None

  • GroupBy is not currently supported by Parquet

unique(values: groupable)[source]

Return the set of unique values in each group, as a SegArray.

Parameters:

values ((list of) pdarray-like) – The values to unique

Returns:

  • unique_keys ((list of) pdarray-like) – The unique keys, in grouped order

  • result ((list of) SegArray) – The unique values of each group

Raises:

TypeError – Raised if values is or contains Strings or Categorical

unregister()[source]

Unregister this GroupBy object in the arkouda server which was previously registered using register() and/or attached to using attach()

Raises:

RegistrationError – If the object is already unregistered or if there is a server error when attempting to unregister

Notes

Objects registered with the server are immune to deletion until they are unregistered.

unregister_groupby_by_name(user_defined_name: str) None[source]

Function to unregister GroupBy object by name which was registered with the arkouda server via register()

Parameters:

user_defined_name (str) – Name under which the GroupBy object was registered

Raises:
  • TypeError – if user_defined_name is not a string

  • RegistrationError – if there is an issue attempting to unregister any underlying components

update_hdf(prefix_path: str, dataset: str = 'groupby', repack: bool = True)[source]
var(values: pdarray, skipna: bool = True, ddof: int_scalars = 1) Tuple[groupable, pdarray][source]

Using the permutation stored in the GroupBy instance, group another array of values and compute the variance of each group’s values.

Parameters:
  • values (pdarray) – The values to group and find variance

  • skipna (bool) – boolean which determines if NANs should be skipped

  • ddof (int_scalars) – “Delta Degrees of Freedom” used in calculating var

Returns:

  • unique_keys ((list of) pdarray or Strings) – The unique keys, in grouped order

  • group_vars (pdarray, float64) – One var value per unique key in the GroupBy instance

Raises:
  • TypeError – Raised if the values array is not a pdarray object

  • ValueError – Raised if the key array size does not match the values size or if the operator is not in the GroupBy.Reductions array

Notes

The return dtype is always float64.

The variance is the average of the squared deviations from the mean, i.e., var = mean((x - x.mean())**2).

The mean is normally calculated as x.sum() / N, where N = len(x). If, however, ddof is specified, the divisor N - ddof is used instead. In standard statistical practice, ddof=1 provides an unbiased estimator of the variance of a hypothetical infinite population. ddof=0 provides a maximum likelihood estimate of the variance for normally distributed variables.

Examples

>>> a = ak.randint(1,5,10)
>>> a
array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2])
>>> g = ak.GroupBy(a)
>>> g.keys
array([3, 3, 4, 3, 3, 2, 3, 2, 4, 2])
>>> b = ak.randint(1,5,10)
>>> b
array([3, 3, 3, 4, 1, 1, 3, 3, 3, 4])
>>> g.var(b)
(array([2 3 4]), array([2.333333333333333 1.2 0]))
class arkouda.numpy.Int16DType(obj, align=False, copy=False)

Bases: numpy.dtype

DType class corresponding to the scalar type and dtype of the same name.

Please see numpy.dtype for the typical way to create dtype instances and arrays.dtypes for additional information.

class arkouda.numpy.Int32DType(obj, align=False, copy=False)

Bases: numpy.dtype

DType class corresponding to the scalar type and dtype of the same name.

Please see numpy.dtype for the typical way to create dtype instances and arrays.dtypes for additional information.

class arkouda.numpy.Int64DType(obj, align=False, copy=False)

Bases: numpy.dtype

DType class corresponding to the scalar type and dtype of the same name.

Please see numpy.dtype for the typical way to create dtype instances and arrays.dtypes for additional information.

class arkouda.numpy.Int8DType(obj, align=False, copy=False)

Bases: numpy.dtype

DType class corresponding to the scalar type and dtype of the same name.

Please see numpy.dtype for the typical way to create dtype instances and arrays.dtypes for additional information.

class arkouda.numpy.IntDType(obj, align=False, copy=False)

Bases: numpy.dtype

DType class corresponding to the scalar type and dtype of the same name.

Please see numpy.dtype for the typical way to create dtype instances and arrays.dtypes for additional information.

arkouda.numpy.LEN_SUFFIX = '_lengths'
class arkouda.numpy.LongDType(obj, align=False, copy=False)

Bases: numpy.dtype

DType class corresponding to the scalar type and dtype of the same name.

Please see numpy.dtype for the typical way to create dtype instances and arrays.dtypes for additional information.

class arkouda.numpy.LongDoubleDType(obj, align=False, copy=False)

Bases: numpy.dtype

DType class corresponding to the scalar type and dtype of the same name.

Please see numpy.dtype for the typical way to create dtype instances and arrays.dtypes for additional information.

class arkouda.numpy.LongLongDType(obj, align=False, copy=False)

Bases: numpy.dtype

DType class corresponding to the scalar type and dtype of the same name.

Please see numpy.dtype for the typical way to create dtype instances and arrays.dtypes for additional information.

class arkouda.numpy.NUMBER_FORMAT_STRINGS

dict() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object’s

(key, value) pairs

dict(iterable) -> new dictionary initialized as if via:

d = {} for k, v in iterable:

d[k] = v

dict(**kwargs) -> new dictionary initialized with the name=value pairs

in the keyword argument list. For example: dict(one=1, two=2)

clear(*args, **kwargs)

D.clear() -> None. Remove all items from D.

copy(*args, **kwargs)

D.copy() -> a shallow copy of D

fromkeys(iterable, value=None, /)

Create a new dictionary with keys from iterable and values set to value.

get(key, default=None, /)

Return the value for key if key is in the dictionary, else default.

items(*args, **kwargs)

D.items() -> a set-like object providing a view on D’s items

keys(*args, **kwargs)

D.keys() -> a set-like object providing a view on D’s keys

pop(*args, **kwargs)

D.pop(k[,d]) -> v, remove specified key and return the corresponding value.

If key is not found, default is returned if given, otherwise KeyError is raised

popitem()

Remove and return a (key, value) pair as a 2-tuple.

Pairs are returned in LIFO (last-in, first-out) order. Raises KeyError if the dict is empty.

setdefault(key, default=None, /)

Insert key with a value of default if key is not in the dictionary.

Return the value for key if key is in the dictionary, else default.

update(*args, **kwargs)

D.update([E, ]**F) -> None. Update D from dict/iterable E and F. If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]

values(*args, **kwargs)

D.values() -> an object providing a view on D’s values

class arkouda.numpy.NumericDTypes

frozenset() -> empty frozenset object frozenset(iterable) -> frozenset object

Build an immutable unordered collection of unique elements.

copy(*args, **kwargs)

Return a shallow copy of a set.

difference(*args, **kwargs)

Return the difference of two or more sets as a new set.

(i.e. all elements that are in this set but not the others.)

intersection(*args, **kwargs)

Return the intersection of two sets as a new set.

(i.e. all elements that are in both sets.)

isdisjoint(*args, **kwargs)

Return True if two sets have a null intersection.

issubset(*args, **kwargs)

Report whether another set contains this set.

issuperset(*args, **kwargs)

Report whether this set contains another set.

symmetric_difference(*args, **kwargs)

Return the symmetric difference of two sets as a new set.

(i.e. all elements that are in exactly one of the sets.)

union(*args, **kwargs)

Return the union of sets as a new set.

(i.e. all elements that are in either set.)

class arkouda.numpy.ObjectDType(obj, align=False, copy=False)

Bases: numpy.dtype

DType class corresponding to the scalar type and dtype of the same name.

Please see numpy.dtype for the typical way to create dtype instances and arrays.dtypes for additional information.

exception arkouda.numpy.RegistrationError[source]

Bases: Exception

Error/Exception used when the Arkouda Server cannot register an object

exception arkouda.numpy.RegistrationError[source]

Bases: Exception

Error/Exception used when the Arkouda Server cannot register an object

exception arkouda.numpy.RegistrationError[source]

Bases: Exception

Error/Exception used when the Arkouda Server cannot register an object

arkouda.numpy.SEG_SUFFIX = '_segments'
class arkouda.numpy.ScalarDTypes

frozenset() -> empty frozenset object frozenset(iterable) -> frozenset object

Build an immutable unordered collection of unique elements.

copy(*args, **kwargs)

Return a shallow copy of a set.

difference(*args, **kwargs)

Return the difference of two or more sets as a new set.

(i.e. all elements that are in this set but not the others.)

intersection(*args, **kwargs)

Return the intersection of two sets as a new set.

(i.e. all elements that are in both sets.)

isdisjoint(*args, **kwargs)

Return True if two sets have a null intersection.

issubset(*args, **kwargs)

Report whether another set contains this set.

issuperset(*args, **kwargs)

Report whether this set contains another set.

symmetric_difference(*args, **kwargs)

Return the symmetric difference of two sets as a new set.

(i.e. all elements that are in exactly one of the sets.)

union(*args, **kwargs)

Return the union of sets as a new set.

(i.e. all elements that are in either set.)

class arkouda.numpy.SegArray(segments, values, lengths=None, grouping=None)[source]
AND(x=None)[source]
OR(x=None)[source]
XOR(x=None)[source]
aggregate(op, x=None)[source]
all(x=None)[source]
any(x=None)[source]
append(other, axis=0)[source]

Append other to self, either vertically (axis=0, length of resulting SegArray increases), or horizontally (axis=1, each sub-array of other appends to the corresponding sub-array of self).

Parameters:
  • other (SegArray) – Array of sub-arrays to append

  • axis (0 or 1) – Whether to append vertically (0) or horizontally (1). If axis=1, other must be same size as self.

Returns:

axis=0: New SegArray containing all sub-arrays axis=1: New SegArray of same length, with pairs of sub-arrays concatenated

Return type:

SegArray

append_single(x, prepend=False)[source]

Append a single value to each sub-array.

Parameters:

x (pdarray or scalar) – Single value to append to each sub-array

Returns:

Copy of original SegArray with values from x appended to each sub-array

Return type:

SegArray

argmax(x=None)[source]
argmin(x=None)[source]
classmethod attach(user_defined_name)[source]

Using the defined name, attach to a SegArray that has been registered to the Symbol Table

Parameters:

user_defined_name (str) – user defined name which the SegArray object was registered under

Returns:

The resulting SegArray

Return type:

SegArray

Raises:

RuntimeError – Raised if the server could not attach to the SegArray object

classmethod concat(x, axis=0, ordered=True)[source]

Concatenate a sequence of SegArrays

Parameters:
  • x (sequence of SegArray) – The SegArrays to concatenate

  • axis (0 or 1) – Select vertical (0) or horizontal (1) concatenation. If axis=1, all SegArrays must have same size.

  • ordered (bool) – Must be True. This option is present for compatibility only, because unordered concatenation is not yet supported.

Returns:

The input arrays joined into one SegArray

Return type:

SegArray

copy()[source]

Return a deep copy.

dtype
filter(filter, discard_empty: bool = False)[source]

Filter values out of the SegArray object

Parameters:
  • filter (pdarray, list, or value) – The value/s to be filtered out of the SegArray

  • discard_empty (bool) – Defaults to False. When True, empty segments are removed from the return SegArray

Return type:

SegArray

classmethod from_multi_array(m)[source]

Construct a SegArray from a list of columns. This essentially transposes the input, resulting in an array of rows.

Parameters:

m (list of pdarray or Strings) – List of columns, the rows of which will form the sub-arrays of the output

Returns:

Array of rows of input

Return type:

SegArray

classmethod from_parts(segments, values, lengths=None, grouping=None) SegArray[source]

DEPRECATED Construct a SegArray object from its parts

Parameters:
  • segments (pdarray, int64) – Start index of each sub-array in the flattened values array

  • values (pdarray) – The flattened values of all sub-arrays

  • lengths (pdarray) – The length of each segment

  • grouping (GroupBy) – grouping of segments

Returns:

Data structure representing an array whose elements are variable-length arrays.

Return type:

SegArray

Notes

Keyword args ‘lengths’ and ‘grouping’ are not user-facing. They are used by the attach method.

classmethod from_return_msg(rep_msg) SegArray[source]
get_jth(j, return_origins=True, compressed=False, default=0)[source]

Select the j-th element of each sub-array, where possible.

Parameters:
  • j (int) – The index of the value to get from each sub-array. If j is negative, it counts backwards from the end of each sub-array.

  • return_origins (bool) – If True, return a logical index indicating where j is in bounds

  • compressed (bool) – If False, return array is same size as self, with default value where j is out of bounds. If True, the return array only contains values where j is in bounds.

  • default (scalar) – When compressed=False, the value to return when j is out of bounds for the sub-array

Returns:

  • val (pdarray) – compressed=False: The j-th value of each sub-array where j is in bounds and the default value where j is out of bounds. compressed=True: The j-th values of only the sub-arrays where j is in bounds

  • origin_indices (pdarray, bool) – A Boolean array that is True where j is in bounds for the sub-array.

Notes

If values are Strings, only the compressed format is supported.

get_length_n(n, return_origins=True)[source]

Return all sub-arrays of length n, as a list of columns.

Parameters:
  • n (int) – Length of sub-arrays to select

  • return_origins (bool) – Return a logical index indicating which sub-arrays are length n

Returns:

  • columns (list of pdarray) – An n-long list of pdarray, where each row is one of the n-long sub-arrays from the SegArray. The number of rows is the number of True values in the returned mask.

  • origin_indices (pdarray, bool) – Array of bool for each element of the SegArray, True where sub-array has length n.

get_ngrams(n, return_origins=True)[source]

Return all n-grams from all sub-arrays.

Parameters:
  • n (int) – Length of n-gram

  • return_origins (bool) – If True, return an int64 array indicating which sub-array each returned n-gram came from.

Returns:

  • ngrams (list of pdarray) – An n-long list of pdarrays, essentially a table where each row is an n-gram.

  • origin_indices (pdarray, int) – The index of the sub-array from which the corresponding n-gram originated

get_prefixes(n, return_origins=True, proper=True)[source]

Return all sub-array prefixes of length n (for sub-arrays that are at least n+1 long)

Parameters:
  • n (int) – Length of suffix

  • return_origins (bool) – If True, return a logical index indicating which sub-arrays were long enough to return an n-prefix

  • proper (bool) – If True, only return proper prefixes, i.e. from sub-arrays that are at least n+1 long. If False, allow the entire sub-array to be returned as a prefix.

Returns:

  • prefixes (list of pdarray) – An n-long list of pdarrays, essentially a table where each row is an n-prefix. The number of rows is the number of True values in the returned mask.

  • origin_indices (pdarray, bool) – Boolean array that is True where the sub-array was long enough to return an n-suffix, False otherwise.

get_suffixes(n, return_origins=True, proper=True)[source]

Return the n-long suffix of each sub-array, where possible

Parameters:
  • n (int) – Length of suffix

  • return_origins (bool) – If True, return a logical index indicating which sub-arrays were long enough to return an n-suffix

  • proper (bool) – If True, only return proper suffixes, i.e. from sub-arrays that are at least n+1 long. If False, allow the entire sub-array to be returned as a suffix.

Returns:

  • suffixes (list of pdarray) – An n-long list of pdarrays, essentially a table where each row is an n-suffix. The number of rows is the number of True values in the returned mask.

  • origin_indices (pdarray, bool) – Boolean array that is True where the sub-array was long enough to return an n-suffix, False otherwise.

property grouping
hash() Tuple[arkouda.numpy.pdarrayclass.pdarray, arkouda.numpy.pdarrayclass.pdarray][source]

Compute a 128-bit hash of each segment.

Returns:

A tuple of two int64 pdarrays. The ith hash value is the concatenation of the ith values from each array.

Return type:

Tuple[pdarray,pdarray]

intersect(other)[source]

Computes the intersection of 2 SegArrays.

Parameters:

other (SegArray) – SegArray to compute against

Returns:

Segments are the 1d intersections of the segments of self and other

Return type:

SegArray

Examples

>>> a = [1, 2, 3, 1, 4]
>>> b = [3, 1, 4, 5]
>>> c = [1, 3, 3, 5]
>>> d = [2, 2, 4]
>>> seg_a = ak.segarray(ak.array([0, len(a)]), ak.array(a+b))
>>> seg_b = ak.segarray(ak.array([0, len(c)]), ak.array(c+d))
>>> seg_a.intersect(seg_b)
SegArray([
[1, 3],
[4]
])
is_registered() bool[source]

Checks if the name of the SegArray object is registered in the Symbol Table

Returns:

True if SegArray is registered, false if not

Return type:

bool

classmethod load(prefix_path, dataset='segarray', segment_name='segments', value_name='values')[source]
logger
max(x=None)[source]
mean(x=None)[source]
min(x=None)[source]
property nbytes

The size of the segarray in bytes.

Returns:

The size of the segarray in bytes.

Return type:

int

property non_empty
nunique(x=None)[source]
objType = 'SegArray'
prepend_single(x)[source]
prod(x=None)[source]
classmethod read_hdf(prefix_path, dataset='segarray')[source]

Load a saved SegArray from HDF5. All arguments must match what was supplied to SegArray.save()

Parameters:
  • prefix_path (str) – Directory and filename prefix

  • dataset (str) – Name prefix for saved data within the HDF5 files

Return type:

SegArray

register(user_defined_name)[source]

Register this SegArray object and underlying components with the Arkouda server

Parameters:

user_defined_name (str) – user defined name which this SegArray object will be registered under

Returns:

The same SegArray which is now registered with the arkouda server and has an updated name. This is an in-place modification, the original is returned to support a fluid programming style. Please note you cannot register two different SegArrays with the same name.

Return type:

SegArray

Raises:

RegistrationError – Raised if the server could not register the SegArray object

Notes

Objects registered with the server are immune to deletion until they are unregistered.

registered_name: str | None = None
remove_repeats(return_multiplicity=False)[source]

Condense sequences of repeated values within a sub-array to a single value.

Parameters:

return_multiplicity (bool) – If True, also return the number of times each value was repeated.

Returns:

  • norepeats (SegArray) – Sub-arrays with runs of repeated values replaced with single value

  • multiplicity (SegArray) – If return_multiplicity=True, this array contains the number of times each value in the returned SegArray was repeated in the original SegArray.

save(prefix_path, dataset='segarray', mode='truncate', file_type='distribute')[source]

DEPRECATED Save the SegArray to HDF5. The object can be saved to a collection of files or single file. :param prefix_path: Directory and filename prefix that all output files share :type prefix_path: str :param dataset: Name of the dataset to create in files (must not already exist) :type dataset: str :param mode: By default, truncate (overwrite) output files, if they exist.

If ‘append’, attempt to create new dataset in existing files.

Parameters:

file_type (str ("single" | "distribute")) – Default: “distribute” When set to single, dataset is written to a single file. When distribute, dataset is written on a file per locale. This is only supported by HDF5 files and will have no impact of Parquet Files.

Return type:

string message indicating result of save operation

Raises:

RuntimeError – Raised if a server-side error is thrown saving the pdarray

Notes

  • The prefix_path must be visible to the arkouda server and the user must

have write permission. - Output files have names of the form <prefix_path>_LOCALE<i>, where <i> ranges from 0 to numLocales for file_type=’distribute’. Otherwise, the file name will be prefix_path. - If any of the output files already exist and the mode is ‘truncate’, they will be overwritten. If the mode is ‘append’ and the number of output files is less than the number of locales or a dataset with the same name already exists, a RuntimeError will result. - Any file extension can be used.The file I/O does not rely on the extension to determine the file format.

See also

to_hdf, load

segments
set_jth(i, j, v)[source]

Set the j-th element of each sub-array in a subset.

Parameters:
  • i (pdarray, int) – Indices of sub-arrays to set j-th element

  • j (int) – Index of value to set in each sub-array. If j is negative, it counts backwards from the end of the sub-array.

  • v (pdarray or scalar) – The value(s) to set. If v is a pdarray, it must have same length as i.

Raises:

ValueError – If j is out of bounds in any of the sub-arrays specified by i.

setdiff(other)[source]

Computes the set difference of 2 SegArrays.

Parameters:

other (SegArray) – SegArray to compute against

Returns:

Segments are the 1d set difference of the segments of self and other

Return type:

SegArray

Examples

>>> a = [1, 2, 3, 1, 4]
>>> b = [3, 1, 4, 5]
>>> c = [1, 3, 3, 5]
>>> d = [2, 2, 4]
>>> seg_a = ak.segarray(ak.array([0, len(a)]), ak.array(a+b))
>>> seg_b = ak.segarray(ak.array([0, len(c)]), ak.array(c+d))
>>> seg_a.setdiff(seg_b)
SegArray([
[2, 4],
[1, 3, 5]
])
setxor(other)[source]

Computes the symmetric difference of 2 SegArrays.

Parameters:

other (SegArray) – SegArray to compute against

Returns:

Segments are the 1d symmetric difference of the segments of self and other

Return type:

SegArray

Examples

>>> a = [1, 2, 3, 1, 4]
>>> b = [3, 1, 4, 5]
>>> c = [1, 3, 3, 5]
>>> d = [2, 2, 4]
>>> seg_a = ak.segarray(ak.array([0, len(a)]), ak.array(a+b))
>>> seg_b = ak.segarray(ak.array([0, len(c)]), ak.array(c+d))
>>> seg_a.setxor(seg_b)
SegArray([
[2, 4, 5],
[1, 3, 5, 2]
])
size
sum(x=None)[source]
to_hdf(prefix_path, dataset='segarray', mode='truncate', file_type='distribute')[source]

Save the SegArray to HDF5. The result is a collection of HDF5 files, one file per locale of the arkouda server, where each filename starts with prefix_path.

Parameters:
  • prefix_path (str) – Directory and filename prefix that all output files will share

  • dataset (str) – Name prefix for saved data within the HDF5 file

  • mode (str {'truncate' | 'append'}) – By default, truncate (overwrite) output files, if they exist. If ‘append’, add data as a new column to existing files.

  • file_type (str ("single" | "distribute")) – Default: “distribute” When set to single, dataset is written to a single file. When distribute, dataset is written on a file per locale. This is only supported by HDF5 files and will have no impact of Parquet Files.

Return type:

None

See also

load

to_list()[source]

Convert the segarray into a list containing sub-arrays

Returns:

A list with the same sub-arrays (also list) as this segarray

Return type:

list

See also

to_ndarray

Examples

>>> segarr = ak.SegArray(ak.array([0, 4, 7]), ak.arange(12))
>>> segarr.to_list()
[[0, 1, 2, 3], [4, 5, 6], [7, 8, 9, 10, 11]]
>>> type(segarr.to_list())
list
to_ndarray()[source]

Convert the array into a numpy.ndarray containing sub-arrays

Returns:

A numpy ndarray with the same sub-arrays (also numpy.ndarray) as this array

Return type:

np.ndarray

See also

array, to_list

Examples

>>> segarr = ak.SegArray(ak.array([0, 4, 7]), ak.arange(12))
>>> segarr.to_ndarray()
array([array([1, 2, 3, 4]), array([5, 6, 7]), array([8, 9, 10, 11, 12])])
>>> type(segarr.to_ndarray())
numpy.ndarray
to_parquet(prefix_path, dataset='segarray', mode: str = 'truncate', compression: str | None = None)[source]

Save the SegArray object to Parquet. The result is a collection of files, one file per locale of the arkouda server, where each filename starts with prefix_path. Each locale saves its chunk of the object to its corresponding file. :param prefix_path: Directory and filename prefix that all output files share :type prefix_path: str :param dataset: Name of the dataset to create in files (must not already exist) :type dataset: str :param mode: Deprecated.

Parameter kept to maintain functionality of other calls. Only Truncate supported. By default, truncate (overwrite) output files, if they exist. If ‘append’, attempt to create new dataset in existing files.

Parameters:

compression (str (Optional)) – (None | “snappy” | “gzip” | “brotli” | “zstd” | “lz4”) Sets the compression type used with Parquet files

Return type:

string message indicating result of save operation

Raises:
  • RuntimeError – Raised if a server-side error is thrown saving the pdarray

  • ValueError – If write mode is not Truncate.

Notes

  • Append mode for Parquet has been deprecated. It was not implemented for SegArray.

  • The prefix_path must be visible to the arkouda server and the user must

have write permission. - Output files have names of the form <prefix_path>_LOCALE<i>, where <i> ranges from 0 to numLocales for file_type=’distribute’. - If any of the output files already exist and the mode is ‘truncate’, they will be overwritten. If the mode is ‘append’ and the number of output files is less than the number of locales or a dataset with the same name already exists, a RuntimeError will result. - Any file extension can be used.The file I/O does not rely on the extension to determine the file format.

transfer(hostname: str, port: arkouda.numpy.dtypes.int_scalars)[source]

Sends a Segmented Array to a different Arkouda server

Parameters:
  • hostname (str) – The hostname where the Arkouda server intended to receive the Segmented Array is running.

  • port (int_scalars) – The port to send the array over. This needs to be an open port (i.e., not one that the Arkouda server is running on). This will open up numLocales ports, each of which in succession, so will use ports of the range {port..(port+numLocales)} (e.g., running an Arkouda server of 4 nodes, port 1234 is passed as port, Arkouda will use ports 1234, 1235, 1236, and 1237 to send the array data). This port much match the port passed to the call to ak.receive_array().

Return type:

A message indicating a complete transfer

Raises:
  • ValueError – Raised if the op is not within the pdarray.BinOps set

  • TypeError – Raised if other is not a pdarray or the pdarray.dtype is not a supported dtype

union(other)[source]

Computes the union of 2 SegArrays.

Parameters:

other (SegArray) – SegArray to compute against

Returns:

Segments are the 1d union of the segments of self and other

Return type:

SegArray

Examples

>>> a = [1, 2, 3, 1, 4]
>>> b = [3, 1, 4, 5]
>>> c = [1, 3, 3, 5]
>>> d = [2, 2, 4]
>>> seg_a = ak.segarray(ak.array([0, len(a)]), ak.array(a+b))
>>> seg_b = ak.segarray(ak.array([0, len(c)]), ak.array(c+d))
>>> seg_a.union(seg_b)
SegArray([
[1, 2, 3, 4, 5],
[1, 2, 3, 4, 5]
])
unique(x=None)[source]

Return sub-arrays of unique values.

Parameters:

x (pdarray) – The values to unique, per group. By default, the values of this SegArray’s sub-arrays.

Returns:

Same number of sub-arrays as original SegArray, but elements in sub-array are unique and in sorted order.

Return type:

SegArray

unregister()[source]

Unregister this SegArray object in the arkouda server which was previously registered using register() and/or attached to using attach()

Return type:

None

Raises:

RuntimeError – Raised if the server could not unregister the SegArray object from the Symbol Table

Notes

Objects registered with the server are immune to deletion until they are unregistered.

static unregister_segarray_by_name(user_defined_name)[source]

Using the defined name, remove the registered SegArray object from the Symbol Table

Parameters:

user_defined_name (str) – user defined name which the SegArray object was registered under

Return type:

None

Raises:

RuntimeError – Raised if the server could not unregister the SegArray object from the Symbol Table

update_hdf(prefix_path: str, dataset: str = 'segarray', repack: bool = True)[source]

Overwrite the dataset with the name provided with this SegArray object. If the dataset does not exist it is added.

Parameters:
  • prefix_path (str) – Directory and filename prefix that all output files share

  • dataset (str) – Name of the dataset to create in files

  • repack (bool) – Default: True HDF5 does not release memory on delete. When True, the inaccessible data (that was overwritten) is removed. When False, the data remains, but is inaccessible. Setting to false will yield better performance, but will cause file sizes to expand.

Return type:

None

Raises:

RuntimeError – Raised if a server-side error is thrown saving the SegArray

Notes

  • If file does not contain File_Format attribute to indicate how it was saved, the file name is checked for _LOCALE#### to determine if it is distributed.

  • If the dataset provided does not exist, it will be added

  • Because HDF5 deletes do not release memory, this will create a copy of the file with the new data

valsize
values
class arkouda.numpy.SeriesDTypes

dict() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object’s

(key, value) pairs

dict(iterable) -> new dictionary initialized as if via:

d = {} for k, v in iterable:

d[k] = v

dict(**kwargs) -> new dictionary initialized with the name=value pairs

in the keyword argument list. For example: dict(one=1, two=2)

clear(*args, **kwargs)

D.clear() -> None. Remove all items from D.

copy(*args, **kwargs)

D.copy() -> a shallow copy of D

fromkeys(iterable, value=None, /)

Create a new dictionary with keys from iterable and values set to value.

get(key, default=None, /)

Return the value for key if key is in the dictionary, else default.

items(*args, **kwargs)

D.items() -> a set-like object providing a view on D’s items

keys(*args, **kwargs)

D.keys() -> a set-like object providing a view on D’s keys

pop(*args, **kwargs)

D.pop(k[,d]) -> v, remove specified key and return the corresponding value.

If key is not found, default is returned if given, otherwise KeyError is raised

popitem()

Remove and return a (key, value) pair as a 2-tuple.

Pairs are returned in LIFO (last-in, first-out) order. Raises KeyError if the dict is empty.

setdefault(key, default=None, /)

Insert key with a value of default if key is not in the dictionary.

Return the value for key if key is in the dictionary, else default.

update(*args, **kwargs)

D.update([E, ]**F) -> None. Update D from dict/iterable E and F. If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]

values(*args, **kwargs)

D.values() -> an object providing a view on D’s values

class arkouda.numpy.ShortDType(obj, align=False, copy=False)

Bases: numpy.dtype

DType class corresponding to the scalar type and dtype of the same name.

Please see numpy.dtype for the typical way to create dtype instances and arrays.dtypes for additional information.

arkouda.numpy.SortingAlgorithm
class arkouda.numpy.StrDType(obj, align=False, copy=False)

Bases: numpy.dtype

DType class corresponding to the scalar type and dtype of the same name.

Please see numpy.dtype for the typical way to create dtype instances and arrays.dtypes for additional information.

class arkouda.numpy.Strings(strings_pdarray: arkouda.numpy.pdarrayclass.pdarray, bytes_size: arkouda.numpy.dtypes.int_scalars)[source]

Represents an array of strings whose data resides on the arkouda server. The user should not call this class directly; rather its instances are created by other arkouda functions.

entry

Encapsulation of a Segmented Strings array contained on the arkouda server. This is a composite of

  • offsets array: starting indices for each string

  • bytes array: raw bytes of all strings joined by nulls

Type:

pdarray

size

The number of strings in the array

Type:

int_scalars

nbytes

The total number of bytes in all strings

Type:

int_scalars

ndim

The rank of the array (currently only rank 1 arrays supported)

Type:

int_scalars

shape

The sizes of each dimension of the array

Type:

tuple

dtype

The dtype is ak.str

Type:

dtype

logger

Used for all logging operations

Type:

ArkoudaLogger

Notes

Strings is composed of two pdarrays: (1) offsets, which contains the starting indices for each string and (2) bytes, which contains the raw bytes of all strings, delimited by nulls.

BinOps
astype(dtype: numpy.dtype | str) arkouda.numpy.pdarrayclass.pdarray[source]

Cast values of Strings object to provided dtype

Parameters:

dtype (np.dtype or str) – Dtype to cast to

Returns:

An arkouda pdarray with values converted to the specified data type

Return type:

ak.pdarray

Notes

This is essentially shorthand for ak.cast(x, ‘<dtype>’) where x is a pdarray.

static attach(user_defined_name: str) Strings[source]

class method to return a Strings object attached to the registered name in the arkouda server which was registered using register()

Parameters:

user_defined_name (str) – user defined name which the Strings object was registered under

Returns:

the Strings object registered with user_defined_name in the arkouda server

Return type:

Strings object

Raises:

TypeError – Raised if user_defined_name is not a str

See also

register, unregister

Notes

Registered names/Strings objects in the server are immune to deletion until they are unregistered.

cached_regex_patterns() List[source]

Returns the regex patterns for which Match objects have been cached

capitalize() Strings[source]

Returns a new Strings from the original replaced with the first letter capitilzed and the remaining letters lowercase.

Returns:

Strings from the original replaced with the capitalized equivalent.

Return type:

Strings

Raises:

RuntimeError – Raised if there is a server-side error thrown.

See also

Strings.lower, String.upper, String.title

Examples

>>> strings = ak.array([f'StrINgS aRe Here {i}' for i in range(5)])
>>> strings
array(['StrINgS aRe Here 0', 'StrINgS aRe Here 1', 'StrINgS aRe Here 2', 'StrINgS aRe Here 3', 'StrINgS aRe Here 4'])
>>> strings.title()
array(['Strings Are Here 0', 'Strings Are Here 1', 'Strings Are Here 2', 'Strings Are Here 3', 'Strings Are Here 4'])
contains(substr: bytes | arkouda.numpy.dtypes.str_scalars, regex: bool = False) arkouda.numpy.pdarrayclass.pdarray[source]

Check whether each element contains the given substring.

Parameters:
  • substr (bytes or str_scalars) – The substring in the form of string or byte array to search for

  • regex (bool, default=False) – Indicates whether substr is a regular expression Note: only handles regular expressions supported by re2 (does not support lookaheads/lookbehinds)

Returns:

True for elements that contain substr, False otherwise

Return type:

pdarray, bool

Raises:
  • TypeError – Raised if the substr parameter is not bytes or str_scalars

  • ValueError – Rasied if substr is not a valid regex

  • RuntimeError – Raised if there is a server-side error thrown

Examples

>>> strings = ak.array([f'{i} string {i}' for i in range(1, 6)])
>>> strings
array(['1 string 1', '2 string 2', '3 string 3', '4 string 4', '5 string 5'])
>>> strings.contains('string')
array([True True True True True])
>>> strings.contains('string \d', regex=True)
array([True True True True True])
decode(fromEncoding: str, toEncoding: str = 'UTF-8') Strings[source]

Return a new strings object in fromEncoding, expecting that the current Strings is encoded in toEncoding

Parameters:
  • fromEncoding (str) – The current encoding of the strings object

  • toEncoding (str, default="UTF-8") – The encoding that the strings will be converted to, default to UTF-8

Returns:

A new Strings object in toEncoding

Return type:

Strings

Raises:

RuntimeError – Raised if there is a server-side error thrown

dtype
encode(toEncoding: str, fromEncoding: str = 'UTF-8') Strings[source]

Return a new strings object in toEncoding, expecting that the current Strings is encoded in fromEncoding

Parameters:
  • toEncoding (str) – The encoding that the strings will be converted to

  • fromEncoding (str, default="UTF-8") – The current encoding of the strings object, default to UTF-8

Returns:

A new Strings object in toEncoding

Return type:

Strings

Raises:

RuntimeError – Raised if there is a server-side error thrown

endswith(substr: bytes | arkouda.numpy.dtypes.str_scalars, regex: bool = False) arkouda.numpy.pdarrayclass.pdarray[source]

Check whether each element ends with the given substring.

Parameters:
  • substr (bytes or str_scalars) – The suffix to search for

  • regex (bool, default=False) – Indicates whether substr is a regular expression Note: only handles regular expressions supported by re2 (does not support lookaheads/lookbehinds)

Returns:

True for elements that end with substr, False otherwise

Return type:

pdarray, bool

Raises:
  • TypeError – Raised if the substr parameter is not bytes or str_scalars

  • ValueError – Rasied if substr is not a valid regex

  • RuntimeError – Raised if there is a server-side error thrown

Examples

>>> strings_start = ak.array([f'{i} string' for i in range(1,6)])
>>> strings_start
array(['1 string', '2 string', '3 string', '4 string', '5 string'])
>>> strings_start.endswith('ing')
array([True True True True True])
>>> strings_end = ak.array([f'string {i}' for i in range(1, 6)])
>>> strings_end
array(['string 1', 'string 2', 'string 3', 'string 4', 'string 5'])
>>> strings_end.endswith('ing \d', regex = True)
array([True True True True True])
entry: arkouda.numpy.pdarrayclass.pdarray
equals(other: Any) arkouda.numpy.dtypes.bool_scalars[source]

Whether Strings are the same size and all entries are equal.

Parameters:

other (Any) – object to compare.

Returns:

True if the Strings are the same, o.w. False.

Return type:

bool

Examples

>>> import arkouda as ak
>>> ak.connect()
>>> s = ak.array(["a", "b", "c"])
>>> s_cpy = ak.array(["a", "b", "c"])
>>> s.equals(s_cpy)
True
>>> s2 = ak.array(["a", "x", "c"])
>>> s.equals(s2)
False
find_locations(pattern: bytes | arkouda.numpy.dtypes.str_scalars) Tuple[arkouda.numpy.pdarrayclass.pdarray, arkouda.numpy.pdarrayclass.pdarray, arkouda.numpy.pdarrayclass.pdarray][source]

Finds pattern matches and returns pdarrays containing the number, start postitions, and lengths of matches

Parameters:

pattern (bytes or str_scalars) – The regex pattern used to find matches

Returns:

  • pdarray, int64 – For each original string, the number of pattern matches

  • pdarray, int64 – The start positons of pattern matches

  • pdarray, int64 – The lengths of pattern matches

Raises:
  • TypeError – Raised if the pattern parameter is not bytes or str_scalars

  • ValueError – Raised if pattern is not a valid regex

  • RuntimeError – Raised if there is a server-side error thrown

Examples

>>> strings = ak.array([f'{i} string {i}' for i in range(1, 6)])
>>> num_matches, starts, lens = strings.find_locations('\d')
>>> num_matches
array([2 2 2 2 2])
>>> starts
array([0 9 0 9 0 9 0 9 0 9])
>>> lens
array([1 1 1 1 1 1 1 1 1 1])
findall(pattern: bytes | arkouda.numpy.dtypes.str_scalars, return_match_origins: bool = False) Strings | Tuple[source]

Return a new Strings containg all non-overlapping matches of pattern

Parameters:
  • pattern (bytes or str_scalars) – Regex used to find matches

  • return_match_origins (bool, default=False) – If True, return a pdarray containing the index of the original string each pattern match is from

Returns:

  • Strings – Strings object containing only pattern matches

  • pdarray, int64 (optional) – The index of the original string each pattern match is from

Raises:
  • TypeError – Raised if the pattern parameter is not bytes or str_scalars

  • ValueError – Raised if pattern is not a valid regex

  • RuntimeError – Raised if there is a server-side error thrown

Examples

>>> strings = ak.array(['1_2___', '____', '3', '__4___5____6___7', ''])
>>> strings.findall('_+', return_match_origins=True)
(array(['_', '___', '____', '__', '___', '____', '___']), array([0 0 1 3 3 3 3]))
flatten() Strings[source]

Return a copy of the array collapsed into one dimension.

Return type:

A copy of the input array, flattened to one dimension.

Note

As multidimensional Strings are currently supported, flatten on a Strings object will always return itself.

static from_parts(offset_attrib: arkouda.numpy.pdarrayclass.pdarray | str, bytes_attrib: arkouda.numpy.pdarrayclass.pdarray | str) Strings[source]

Factory method for creating a Strings object from an Arkouda server response where the arrays are separate components.

Parameters:
  • offset_attrib (pdarray or str) – the array containing the offsets

  • bytes_attrib (pdarray or str) – the array containing the string values

Returns:

object representing a segmented strings array on the server

Return type:

Strings

Raises:

RuntimeError – Raised if there’s an error converting a server-returned str-descriptor

Notes

This factory method is used when we construct the parts of a Strings object on the client side and transfer the offsets & bytes separately to the server. This results in two entries in the symbol table and we need to instruct the server to assemble the into a composite entity.

static from_return_msg(rep_msg: str) Strings[source]

Factory method for creating a Strings object from an Arkouda server response message

Parameters:

rep_msg (str) – Server response message currently of form created name type size ndim shape itemsize+created bytes.size 1234

Returns:

object representing a segmented strings array on the server

Return type:

Strings

Raises:

RuntimeError – Raised if there’s an error converting a server-returned str-descriptor

Notes

We really don’t have an itemsize because these are variable length strings. In the future we could probably use this position to store the total bytes.

fullmatch(pattern: bytes | arkouda.numpy.dtypes.str_scalars) arkouda.match.Match[source]

Returns a match object where elements match only if the whole string matches the regular expression pattern

Parameters:

pattern (bytes or str_scalars) – Regex used to find matches

Returns:

Match object where elements match only if the whole string matches the regular expression pattern

Return type:

Match

Examples

>>> strings = ak.array(['1_2___', '____', '3', '__4___5____6___7', ''])
>>> strings.fullmatch('_+')
<ak.Match object: matched=False; matched=True, span=(0, 4); matched=False;
matched=False; matched=False>
get_bytes() arkouda.numpy.pdarrayclass.pdarray[source]

Getter for the bytes component (uint8 pdarray) of this Strings.

Returns:

Pdarray of bytes of the string accessed

Return type:

pdarray, uint8

Example

>>> x = ak.array(['one', 'two', 'three'])
>>> x.get_bytes()
[111 110 101 0 116 119 111 0 116 104 114 101 101 0]
get_lengths() arkouda.numpy.pdarrayclass.pdarray[source]

Return the length of each string in the array.

Returns:

The length of each string

Return type:

pdarray, int

Raises:

RuntimeError – Raised if there is a server-side error thrown

get_offsets() arkouda.numpy.pdarrayclass.pdarray[source]

Getter for the offsets component (int64 pdarray) of this Strings.

Returns:

Pdarray of offsets of the string accessed

Return type:

pdarray, int64

Example

>>> x = ak.array(['one', 'two', 'three'])
>>> x.get_offsets()
[0 4 8]
get_prefixes(n: arkouda.numpy.dtypes.int_scalars, return_origins: bool = True, proper: bool = True) Strings | Tuple[Strings, arkouda.numpy.pdarrayclass.pdarray][source]

Return the n-long prefix of each string, where possible

Parameters:
  • n (int_scalars) – Length of prefix

  • return_origins (bool, default=True) – If True, return a logical index indicating which strings were long enough to return an n-prefix

  • proper (bool, default=True) – If True, only return proper prefixes, i.e. from strings that are at least n+1 long. If False, allow the entire string to be returned as a prefix.

Returns:

  • prefixes (Strings) – The array of n-character prefixes; the number of elements is the number of True values in the returned mask.

  • origin_indices (pdarray, bool) – Boolean array that is True where the string was long enough to return an n-character prefix, False otherwise.

get_suffixes(n: arkouda.numpy.dtypes.int_scalars, return_origins: bool = True, proper: bool = True) Strings | Tuple[Strings, arkouda.numpy.pdarrayclass.pdarray][source]

Return the n-long suffix of each string, where possible

Parameters:
  • n (int_scalars) – Length of suffix

  • return_origins (bool, default=True) – If True, return a logical index indicating which strings were long enough to return an n-suffix

  • proper (bool, default=True) – If True, only return proper suffixes, i.e. from strings that are at least n+1 long. If False, allow the entire string to be returned as a suffix.

Returns:

  • suffixes (Strings) – The array of n-character suffixes; the number of elements is the number of True values in the returned mask.

  • origin_indices (pdarray, bool) – Boolean array that is True where the string was long enough to return an n-character suffix, False otherwise.

group() arkouda.numpy.pdarrayclass.pdarray[source]

Return the permutation that groups the array, placing equivalent strings together. All instances of the same string are guaranteed to lie in one contiguous block of the permuted array, but the blocks are not necessarily ordered.

Returns:

The permutation that groups the array by value

Return type:

pdarray

See also

GroupBy, unique

Notes

If the arkouda server is compiled with “-sSegmentedString.useHash=true”, then arkouda uses 128-bit hash values to group strings, rather than sorting the strings directly. This method is fast, but the resulting permutation merely groups equivalent strings and does not sort them. If the “useHash” parameter is false, then a full sort is performed.

Raises:

RuntimeError – Raised if there is a server-side error in executing group request or creating the pdarray encapsulating the return message

hash() Tuple[arkouda.numpy.pdarrayclass.pdarray, arkouda.numpy.pdarrayclass.pdarray][source]

Compute a 128-bit hash of each string.

Returns:

A tuple of two int64 pdarrays. The ith hash value is the concatenation of the ith values from each array.

Return type:

Tuple[pdarray,pdarray]

Notes

The implementation uses SipHash128, a fast and balanced hash function (used by Python for dictionaries and sets). For realistic numbers of strings (up to about 10**15), the probability of a collision between two 128-bit hash values is negligible.

property inferred_type: str

Return a string of the type inferred from the values.

info() str[source]

Returns a JSON formatted string containing information about all components of self

Parameters:

None

Returns:

JSON string containing information about all components of self

Return type:

str

is_registered() numpy.bool_[source]

Return True iff the object is contained in the registry

Parameters:

None

Returns:

Indicates if the object is contained in the registry

Return type:

bool

Raises:

RuntimeError – Raised if there’s a server-side error thrown

isalnum() arkouda.numpy.pdarrayclass.pdarray[source]

Returns a boolean pdarray where index i indicates whether string i of the Strings is alphanumeric.

Returns:

True for elements that are alphanumeric, False otherwise

Return type:

pdarray, bool

Raises:

RuntimeError – Raised if there is a server-side error thrown

Examples

>>> not_alnum = ak.array([f'%Strings {i}' for i in range(3)])
>>> alnum = ak.array([f'Strings{i}' for i in range(3)])
>>> strings = ak.concatenate([not_alnum, alnum])
>>> strings
array(['%Strings 0', '%Strings 1', '%Strings 2', 'Strings0', 'Strings1', 'Strings2'])
>>> strings.isalnum()
array([False False False True True True])
isalpha() arkouda.numpy.pdarrayclass.pdarray[source]

Returns a boolean pdarray where index i indicates whether string i of the Strings is alphabetic. This means there is at least one character, and all the characters are alphabetic.

Returns:

True for elements that are alphabetic, False otherwise

Return type:

pdarray, bool

Raises:

RuntimeError – Raised if there is a server-side error thrown

Examples

>>> not_alpha = ak.array([f'%Strings {i}' for i in range(3)])
>>> alpha = ak.array(['StringA','StringB','StringC'])
>>> strings = ak.concatenate([not_alpha, alpha])
>>> strings
array(['%Strings 0', '%Strings 1', '%Strings 2', 'StringA', 'StringB', 'StringC'])
>>> strings.isalpha()
array([False False False True True True])
isdecimal() arkouda.numpy.pdarrayclass.pdarray[source]

Returns a boolean pdarray where index i indicates whether string i of the Strings has all decimal characters.

Returns:

True for elements that are decimals, False otherwise

Return type:

pdarray, bool

Raises:

RuntimeError – Raised if there is a server-side error thrown

See also

Strings.isdigit

Examples

>>> not_decimal = ak.array([f'Strings {i}' for i in range(3)])
>>> decimal = ak.array([f'12{i}' for i in range(3)])
>>> strings = ak.concatenate([not_decimal, decimal])
>>> strings
array(['Strings 0', 'Strings 1', 'Strings 2', '120', '121', '122'])
>>> strings.isdecimal()
array([False False False True True True])

Special Character Examples

>>> special_strings = ak.array(["3.14", "0", "²", "2³₇", "2³x₇"])
>>> special_strings
array(['3.14', '0', '²', '2³₇', '2³x₇'])
>>> special_strings.isdecimal()
array([False True False False False])
isdigit() arkouda.numpy.pdarrayclass.pdarray[source]

Returns a boolean pdarray where index i indicates whether string i of the Strings has all digit characters.

Returns:

True for elements that are digits, False otherwise

Return type:

pdarray, bool

Raises:

RuntimeError – Raised if there is a server-side error thrown

Examples

>>> not_digit = ak.array([f'Strings {i}' for i in range(3)])
>>> digit = ak.array([f'12{i}' for i in range(3)])
>>> strings = ak.concatenate([not_digit, digit])
>>> strings
array(['Strings 0', 'Strings 1', 'Strings 2', '120', '121', '122'])
>>> strings.isdigit()
array([False False False True True True])

Special Character Examples

>>> special_strings = ak.array(["3.14", "0", "²", "2³₇", "2³x₇"])
>>> special_strings
array(['3.14', '0', '²', '2³₇', '2³x₇'])
>>> special_strings.isdigit()
array([False True True True False])
isempty() arkouda.numpy.pdarrayclass.pdarray[source]

Returns a boolean pdarray where index i indicates whether string i of the Strings is empty.

True for elements that are the empty string, False otherwise

Returns:

True for elements that are digits, False otherwise

Return type:

pdarray, bool

Raises:

RuntimeError – Raised if there is a server-side error thrown

Examples

>>> not_empty = ak.array([f'Strings {i}' for i in range(3)])
>>> empty = ak.array(['' for i in range(3)])
>>> strings = ak.concatenate([not_empty, empty])
>>> strings
array(['Strings 0', 'Strings 1', 'Strings 2', '', '', ''])
>>> strings.isempty()
array([False False False True True True])
islower() arkouda.numpy.pdarrayclass.pdarray[source]

Returns a boolean pdarray where index i indicates whether string i of the Strings is entirely lowercase

Returns:

True for elements that are entirely lowercase, False otherwise

Return type:

pdarray, bool

Raises:

RuntimeError – Raised if there is a server-side error thrown

See also

Strings.isupper

Examples

>>> lower = ak.array([f'strings {i}' for i in range(3)])
>>> upper = ak.array([f'STRINGS {i}' for i in range(3)])
>>> strings = ak.concatenate([lower, upper])
>>> strings
array(['strings 0', 'strings 1', 'strings 2', 'STRINGS 0', 'STRINGS 1', 'STRINGS 2'])
>>> strings.islower()
array([True True True False False False])
isspace() arkouda.numpy.pdarrayclass.pdarray[source]

Returns a boolean pdarray where index i indicates whether string i has all whitespace characters (‘ ’, ‘\t’, ‘\n’, ‘\v’, ‘\f’, ‘\r’).

Returns:

True for elements that are whitespace, False otherwise

Return type:

pdarray, bool

Raises:

RuntimeError – Raised if there is a server-side error thrown

Examples

>>> not_space = ak.array([f'Strings {i}' for i in range(3)])
>>> space = ak.array([' ', '\t', '\n', '\v', '\f', '\r', ' \t\n\v\f\r'])
>>> strings = ak.concatenate([not_space, space])
>>> strings
array(['Strings 0', 'Strings 1', 'Strings 2', ' ', 'u0009', 'n', 'u000B', 'u000C', 'u000D', ' u0009nu000Bu000Cu000D'])
>>> strings.isspace()
array([False False False True True True True True True True])
istitle() arkouda.numpy.pdarrayclass.pdarray[source]

Returns a boolean pdarray where index i indicates whether string i of the Strings is titlecase

Returns:

True for elements that are titlecase, False otherwise

Return type:

pdarray, bool

Raises:

RuntimeError – Raised if there is a server-side error thrown

Examples

>>> mixed = ak.array([f'sTrINgs {i}' for i in range(3)])
>>> title = ak.array([f'Strings {i}' for i in range(3)])
>>> strings = ak.concatenate([mixed, title])
>>> strings
array(['sTrINgs 0', 'sTrINgs 1', 'sTrINgs 2', 'Strings 0', 'Strings 1', 'Strings 2'])
>>> strings.istitle()
array([False False False True True True])
isupper() arkouda.numpy.pdarrayclass.pdarray[source]

Returns a boolean pdarray where index i indicates whether string i of the Strings is entirely uppercase

Returns:

True for elements that are entirely uppercase, False otherwise

Return type:

pdarray, bool

Raises:

RuntimeError – Raised if there is a server-side error thrown

See also

Strings.islower

Examples

>>> lower = ak.array([f'strings {i}' for i in range(3)])
>>> upper = ak.array([f'STRINGS {i}' for i in range(3)])
>>> strings = ak.concatenate([lower, upper])
>>> strings
array(['strings 0', 'strings 1', 'strings 2', 'STRINGS 0', 'STRINGS 1', 'STRINGS 2'])
>>> strings.isupper()
array([False False False True True True])
logger
lower() Strings[source]

Returns a new Strings with all uppercase characters from the original replaced with their lowercase equivalent

Returns:

Strings with all uppercase characters from the original replaced with their lowercase equivalent

Return type:

Strings

Raises:

RuntimeError – Raised if there is a server-side error thrown

See also

Strings.upper

Examples

>>> strings = ak.array([f'StrINgS {i}' for i in range(5)])
>>> strings
array(['StrINgS 0', 'StrINgS 1', 'StrINgS 2', 'StrINgS 3', 'StrINgS 4'])
>>> strings.lower()
array(['strings 0', 'strings 1', 'strings 2', 'strings 3', 'strings 4'])
lstick(other: Strings, delimiter: bytes | arkouda.numpy.dtypes.str_scalars = '') Strings[source]

Join the strings from another array onto the left of the strings of this array, optionally inserting a delimiter. Warning: This function is experimental and not guaranteed to work.

Parameters:
  • other (Strings) – The strings to join onto self’s strings

  • delimiter (bytes or str_scalars, default="") – String inserted between self and other

Returns:

The array of joined strings, as other + self

Return type:

Strings

Raises:
  • TypeError – Raised if the delimiter parameter is neither bytes nor a str or if the other parameter is not a Strings instance

  • RuntimeError – Raised if there is a server-side error thrown

See also

stick, peel, rpeel

Examples

>>> s = ak.array(['a', 'c', 'e'])
>>> t = ak.array(['b', 'd', 'f'])
>>> s.lstick(t, delimiter='.')
array(['b.a', 'd.c', 'f.e'])
match(pattern: bytes | arkouda.numpy.dtypes.str_scalars) arkouda.match.Match[source]

Returns a match object where elements match only if the beginning of the string matches the regular expression pattern

Parameters:

pattern (bytes or str_scalars) – Regex used to find matches

Returns:

Match object where elements match only if the beginning of the string matches the regular expression pattern

Return type:

Match

Examples

>>> strings = ak.array(['1_2___', '____', '3', '__4___5____6___7', ''])
>>> strings.match('_+')
<ak.Match object: matched=False; matched=True, span=(0, 4); matched=False;
matched=True, span=(0, 2); matched=False>
objType = 'Strings'
peel(delimiter: bytes | arkouda.numpy.dtypes.str_scalars, times: arkouda.numpy.dtypes.int_scalars = 1, includeDelimiter: bool = False, keepPartial: bool = False, fromRight: bool = False, regex: bool = False) Tuple[Strings, Strings][source]

Peel off one or more delimited fields from each string (similar to string.partition), returning two new arrays of strings. Warning: This function is experimental and not guaranteed to work.

Parameters:
  • delimiter (bytes or str_scalars) – The separator where the split will occur

  • times (int_scalars, default=1) – The number of times the delimiter is sought, i.e. skip over the first (times-1) delimiters

  • includeDelimiter (bool, default=False) – If true, append the delimiter to the end of the first return array. By default, it is prepended to the beginning of the second return array.

  • keepPartial (bool, default=False) – If true, a string that does not contain <times> instances of the delimiter will be returned in the first array. By default, such strings are returned in the second array.

  • fromRight (bool, default=False) – If true, peel from the right instead of the left (see also rpeel)

  • regex (bool, default=False) – Indicates whether delimiter is a regular expression Note: only handles regular expressions supported by re2 (does not support lookaheads/lookbehinds)

Returns:

left: Strings

The field(s) peeled from the end of each string (unless fromRight is true)

right: Strings

The remainder of each string after peeling (unless fromRight is true)

Return type:

Tuple[Strings, Strings]

Raises:
  • TypeError – Raised if the delimiter parameter is not byte or str_scalars, if times is not int64, or if includeDelimiter, keepPartial, or fromRight is not bool

  • ValueError – Raised if times is < 1 or if delimiter is not a valid regex

  • RuntimeError – Raised if there is a server-side error thrown

See also

rpeel, stick, lstick

Examples

>>> s = ak.array(['a.b', 'c.d', 'e.f.g'])
>>> s.peel('.')
(array(['a', 'c', 'e']), array(['b', 'd', 'f.g']))
>>> s.peel('.', includeDelimiter=True)
(array(['a.', 'c.', 'e.']), array(['b', 'd', 'f.g']))
>>> s.peel('.', times=2)
(array(['', '', 'e.f']), array(['a.b', 'c.d', 'g']))
>>> s.peel('.', times=2, keepPartial=True)
(array(['a.b', 'c.d', 'e.f']), array(['', '', 'g']))
pretty_print_info() None[source]

Prints information about all components of self in a human readable format

Parameters:

None

Return type:

None

purge_cached_regex_patterns() None[source]

purges cached regex patterns

regex_split(pattern: bytes | arkouda.numpy.dtypes.str_scalars, maxsplit: int = 0, return_segments: bool = False) Strings | Tuple[source]

Returns a new Strings split by the occurrences of pattern. If maxsplit is nonzero, at most maxsplit splits occur

Parameters:
  • pattern (bytes or str_scalars) – Regex used to split strings into substrings

  • maxsplit (int, default=0) – The max number of pattern match occurences in each element to split. The default maxsplit=0 splits on all occurences

  • return_segments (bool, default=False) – If True, return mapping of original strings to first substring in return array.

Returns:

  • Strings – Substrings with pattern matches removed

  • pdarray, int64 (optional) – For each original string, the index of first corresponding substring in the return array

Examples

>>> strings = ak.array(['1_2___', '____', '3', '__4___5____6___7', ''])
>>> strings.regex_split('_+', maxsplit=2, return_segments=True)
(array(['1', '2', '', '', '', '3', '', '4', '5____6___7', '']), array([0 3 5 6 9]))
register(user_defined_name: str) Strings[source]

Register this Strings object with a user defined name in the arkouda server so it can be attached to later using Strings.attach() This is an in-place operation, registering a Strings object more than once will update the name in the registry and remove the previously registered name. A name can only be registered to one object at a time.

Parameters:

user_defined_name (str) – user defined name which the Strings object is to be registered under

Returns:

The same Strings object which is now registered with the arkouda server and has an updated name. This is an in-place modification, the original is returned to support a fluid programming style. Please note you cannot register two different objects with the same name.

Return type:

Strings

Raises:
  • TypeError – Raised if user_defined_name is not a str

  • RegistrationError – If the server was unable to register the Strings object with the user_defined_name If the user is attempting to register more than one object with the same name, the former should be unregistered first to free up the registration name.

See also

attach, unregister

Notes

Registered names/Strings objects in the server are immune to deletion until they are unregistered.

registered_name: str | None = None
rpeel(delimiter: bytes | arkouda.numpy.dtypes.str_scalars, times: arkouda.numpy.dtypes.int_scalars = 1, includeDelimiter: bool = False, keepPartial: bool = False, regex: bool = False) Tuple[Strings, Strings][source]

Peel off one or more delimited fields from the end of each string (similar to string.rpartition), returning two new arrays of strings. Warning: This function is experimental and not guaranteed to work.

Parameters:
  • delimiter (bytes or str_scalars) – The separator where the split will occur

  • times (int_scalars, default=1) – The number of times the delimiter is sought, i.e. skip over the last (times-1) delimiters

  • includeDelimiter (bool, default=False) – If true, prepend the delimiter to the start of the first return array. By default, it is appended to the end of the second return array.

  • keepPartial (bool, default=False) – If true, a string that does not contain <times> instances of the delimiter will be returned in the second array. By default, such strings are returned in the first array.

  • regex (bool, default=False) – Indicates whether delimiter is a regular expression Note: only handles regular expressions supported by re2 (does not support lookaheads/lookbehinds)

Returns:

left: Strings

The remainder of the string after peeling

right: Strings

The field(s) that were peeled from the right of each string

Return type:

Tuple[Strings, Strings]

Raises:
  • TypeError – Raised if the delimiter parameter is not bytes or str_scalars or if times is not int64

  • ValueError – Raised if times is < 1 or if delimiter is not a valid regex

  • RuntimeError – Raised if there is a server-side error thrown

See also

peel, stick, lstick

Examples

>>> s = ak.array(['a.b', 'c.d', 'e.f.g'])
>>> s.rpeel('.')
(array(['a', 'c', 'e.f']), array(['b', 'd', 'g']))

Compared against peel

>>> s.peel('.')
(array(['a', 'c', 'e']), array(['b', 'd', 'f.g']))
save(prefix_path: str, dataset: str = 'strings_array', mode: Literal['truncate', 'append'] = 'truncate', save_offsets: bool = True, compression: Literal['snappy', 'gzip', 'brotli', 'zstd', 'lz4'] | None = None, file_format: Literal['HDF5', 'Parquet'] = 'HDF5', file_type: Literal['single', 'distribute'] = 'distribute') str[source]

DEPRECATED Save the Strings object to HDF5 or Parquet. The result is a collection of files, one file per locale of the arkouda server, where each filename starts with prefix_path. HDF5 support single files, in which case the file name will only be that provided. Each locale saves its chunk of the array to its corresponding file.

Parameters:
  • prefix_path (str) – Directory and filename prefix that all output files share

  • dataset (str, default="strings_array") – The name of the Strings dataset to be written, defaults to strings_array

  • mode ({"truncate", "append"}, default = "truncate") – By default, truncate (overwrite) output files, if they exist. If ‘append’, create a new Strings dataset within existing files.

  • save_offsets (bool, default=True) – Defaults to True which will instruct the server to save the offsets array to HDF5 If False the offsets array will not be save and will be derived from the string values upon load/read. This is not supported for Parquet files.

  • compression ({"snappy", "gzip", "brotli", "zstd", "lz4"}, optional) – Sets the compression type used with Parquet files

  • file_format ({"HDF5", "Parquet"}, default = "HDF5") – By default, saved files will be written to the HDF5 file format. If ‘Parquet’, the files will be written to the Parquet file format. This is case insensitive.

  • file_type ({"single", "distribute"}, default = "distribute") – Default: Distribute Distribute the dataset over a file per locale. Single file will save the dataset to one file

Return type:

String message indicating result of save operation

Notes

Important implementation notes: (1) Strings state is saved as two datasets within an hdf5 group: one for the string characters and one for the segments corresponding to the start of each string, (2) the hdf5 group is named via the dataset parameter. (3) Parquet files do not store the segments, only the values.

search(pattern: bytes | arkouda.numpy.dtypes.str_scalars) arkouda.match.Match[source]

Returns a match object with the first location in each element where pattern produces a match. Elements match if any part of the string matches the regular expression pattern

Parameters:

pattern (bytes or str_scalars) – Regex used to find matches

Returns:

Match object where elements match if any part of the string matches the regular expression pattern

Return type:

Match

Examples

>>> strings = ak.array(['1_2___', '____', '3', '__4___5____6___7', ''])
>>> strings.search('_+')
<ak.Match object: matched=True, span=(1, 2); matched=True, span=(0, 4);
matched=False; matched=True, span=(0, 2); matched=False>
shape: Tuple[int]
size: arkouda.numpy.dtypes.int_scalars
split(delimiter: str, return_segments: bool = False, regex: bool = False) Strings | Tuple[source]

Unpack delimiter-joined substrings into a flat array.

Parameters:
  • delimiter (str) – Characters used to split strings into substrings

  • return_segments (bool, default=False) – If True, also return mapping of original strings to first substring in return array.

  • regex (bool, default=False) – Indicates whether delimiter is a regular expression Note: only handles regular expressions supported by re2 (does not support lookaheads/lookbehinds)

Returns:

  • Strings – Flattened substrings with delimiters removed

  • pdarray, int64 (optional) – For each original string, the index of first corresponding substring in the return array

See also

peel, rpeel

Examples

>>> orig = ak.array(['one|two', 'three|four|five', 'six'])
>>> orig.split('|')
array(['one', 'two', 'three', 'four', 'five', 'six'])
>>> flat, mapping = orig.split('|', return_segments=True)
>>> mapping
array([0 2 5])
>>> under = ak.array(['one_two', 'three_____four____five', 'six'])
>>> under_split, under_map = under.split('_+', return_segments=True, regex=True)
>>> under_split
array(['one', 'two', 'three', 'four', 'five', 'six'])
>>> under_map
array([0 2 5])
startswith(substr: bytes | arkouda.numpy.dtypes.str_scalars, regex: bool = False) arkouda.numpy.pdarrayclass.pdarray[source]

Check whether each element starts with the given substring.

Parameters:
  • substr (bytes or str_scalars) – The prefix to search for

  • regex (bool, default=False) – Indicates whether substr is a regular expression Note: only handles regular expressions supported by re2 (does not support lookaheads/lookbehinds)

Returns:

True for elements that start with substr, False otherwise

Return type:

pdarray, bool

Raises:
  • TypeError – Raised if the substr parameter is not a bytes ior str_scalars

  • ValueError – Rasied if substr is not a valid regex

  • RuntimeError – Raised if there is a server-side error thrown

Examples

>>> strings_end = ak.array([f'string {i}' for i in range(1, 6)])
>>> strings_end
array(['string 1', 'string 2', 'string 3', 'string 4', 'string 5'])
>>> strings_end.startswith('string')
array([True True True True True])
>>> strings_start = ak.array([f'{i} string' for i in range(1,6)])
>>> strings_start
array(['1 string', '2 string', '3 string', '4 string', '5 string'])
>>> strings_start.startswith('\d str', regex = True)
array([True True True True True])
stick(other: Strings, delimiter: bytes | arkouda.numpy.dtypes.str_scalars = '', toLeft: bool = False) Strings[source]

Join the strings from another array onto one end of the strings of this array, optionally inserting a delimiter. Warning: This function is experimental and not guaranteed to work.

Parameters:
  • other (Strings) – The strings to join onto self’s strings

  • delimiter (bytes or str_scalars, default="") – String inserted between self and other

  • toLeft (bool, default=False) – If true, join other strings to the left of self. By default, other is joined to the right of self.

Returns:

The array of joined strings

Return type:

Strings

Raises:
  • TypeError – Raised if the delimiter parameter is not bytes or str_scalars or if the other parameter is not a Strings instance

  • ValueError – Raised if times is < 1

  • RuntimeError – Raised if there is a server-side error thrown

See also

lstick, peel, rpeel

Examples

>>> s = ak.array(['a', 'c', 'e'])
>>> t = ak.array(['b', 'd', 'f'])
>>> s.stick(t, delimiter='.')
array(['a.b', 'c.d', 'e.f'])
strip(chars: bytes | arkouda.numpy.dtypes.str_scalars | None = '') Strings[source]

Returns a new Strings object with all leading and trailing occurrences of characters contained in chars removed. The chars argument is a string specifying the set of characters to be removed. If omitted, the chars argument defaults to removing whitespace. The chars argument is not a prefix or suffix; rather, all combinations of its values are stripped.

Parameters:

chars (bytes or str_scalars, optional) – the set of characters to be removed

Returns:

Strings object with the leading and trailing characters matching the set of characters in the chars argument removed

Return type:

Strings

Raises:

RuntimeError – Raised if there is a server-side error thrown

Examples

>>> strings = ak.array(['Strings ', '  StringS  ', 'StringS   '])
>>> s = strings.strip()
>>> s
array(['Strings', 'StringS', 'StringS'])
>>> strings = ak.array(['Strings 1', '1 StringS  ', '  1StringS  12 '])
>>> s = strings.strip(' 12')
>>> s
array(['Strings', 'StringS', 'StringS'])
sub(pattern: bytes | arkouda.numpy.dtypes.str_scalars, repl: bytes | arkouda.numpy.dtypes.str_scalars, count: int = 0) Strings[source]

Return new Strings obtained by replacing non-overlapping occurrences of pattern with the replacement repl. If count is nonzero, at most count substitutions occur

Parameters:
  • pattern (bytes or str_scalars) – The regex to substitue

  • repl (bytes or str_scalars) – The substring to replace pattern matches with

  • count (int, default=0) – The max number of pattern match occurences in each element to replace. The default count=0 replaces all occurences of pattern with repl

Returns:

Strings with pattern matches replaced

Return type:

Strings

Raises:
  • TypeError – Raised if pattern or repl are not bytes or str_scalars

  • ValueError – Raised if pattern is not a valid regex

  • RuntimeError – Raised if there is a server-side error thrown

See also

Strings.subn

Examples

>>> strings = ak.array(['1_2___', '____', '3', '__4___5____6___7', ''])
>>> strings.sub(pattern='_+', repl='-', count=2)
array(['1-2-', '-', '3', '-4-5____6___7', ''])
subn(pattern: bytes | arkouda.numpy.dtypes.str_scalars, repl: bytes | arkouda.numpy.dtypes.str_scalars, count: int = 0) Tuple[source]

Perform the same operation as sub(), but return a tuple (new_Strings, number_of_substitions)

Parameters:
  • pattern (bytes or str_scalars) – The regex to substitue

  • repl (bytes or str_scalars) – The substring to replace pattern matches with

  • count (int, default=0) – The max number of pattern match occurences in each element to replace. The default count=0 replaces all occurences of pattern with repl

Returns:

  • Strings – Strings with pattern matches replaced

  • pdarray, int64 – The number of substitutions made for each element of Strings

Raises:
  • TypeError – Raised if pattern or repl are not bytes or str_scalars

  • ValueError – Raised if pattern is not a valid regex

  • RuntimeError – Raised if there is a server-side error thrown

See also

Strings.sub

Examples

>>> strings = ak.array(['1_2___', '____', '3', '__4___5____6___7', ''])
>>> strings.subn(pattern='_+', repl='-', count=2)
(array(['1-2-', '-', '3', '-4-5____6___7', '']), array([2 1 0 2 0]))
title() Strings[source]

Returns a new Strings from the original replaced with their titlecase equivalent.

Returns:

Strings from the original replaced with their titlecase equivalent.

Return type:

Strings

Raises:

RuntimeError – Raised if there is a server-side error thrown.

See also

Strings.lower, String.upper

Examples

>>> strings = ak.array([f'StrINgS {i}' for i in range(5)])
>>> strings
array(['StrINgS 0', 'StrINgS 1', 'StrINgS 2', 'StrINgS 3', 'StrINgS 4'])
>>> strings.title()
array(['Strings 0', 'Strings 1', 'Strings 2', 'Strings 3', 'Strings 4'])
to_csv(prefix_path: str, dataset: str = 'strings_array', col_delim: str = ',', overwrite: bool = False) str[source]

Write Strings to CSV file(s). File will contain a single column with the Strings data. All CSV Files written by Arkouda include a header denoting data types of the columns. Unlike other file formats, CSV files store Strings as their UTF-8 format instead of storing bytes as uint(8).

Parameters:
  • prefix_path (str) – The filename prefix to be used for saving files. Files will have _LOCALE#### appended when they are written to disk.

  • dataset (str, default="strings_array") – Column name to save the Strings under. Defaults to “strings_array”.

  • col_delim (str, default=",") – Defaults to “,”. Value to be used to separate columns within the file. Please be sure that the value used DOES NOT appear in your dataset.

  • overwrite (bool, default=False) – Defaults to False. If True, any existing files matching your provided prefix_path will be overwritten. If False, an error will be returned if existing files are found.

Returns:

response message

Return type:

str

Raises:
  • ValueError – Raised if all datasets are not present in all parquet files or if one or more of the specified files do not exist

  • RuntimeError – Raised if one or more of the specified files cannot be opened. If allow_errors is true this may be raised if no values are returned from the server.

  • TypeError – Raised if we receive an unknown arkouda_type returned from the server

Notes

  • CSV format is not currently supported by load/load_all operations

  • The column delimiter is expected to be the same for column names and data

  • Be sure that column delimiters are not found within your data.

  • All CSV files must delimit rows using newline (\n) at this time.

to_hdf(prefix_path: str, dataset: str = 'strings_array', mode: Literal['truncate', 'append'] = 'truncate', save_offsets: bool = True, file_type: Literal['single', 'distribute'] = 'distribute') str[source]

Save the Strings object to HDF5. The object can be saved to a collection of files or single file.

Parameters:
  • prefix_path (str) – Directory and filename prefix that all output files share

  • dataset (str, default="strings_array") – The name of the Strings dataset to be written, defaults to strings_array

  • mode ({"truncate", "append"}, default = "truncate") – By default, truncate (overwrite) output files, if they exist. If ‘append’, create a new Strings dataset within existing files.

  • save_offsets (bool, default=True) – Defaults to True which will instruct the server to save the offsets array to HDF5 If False the offsets array will not be save and will be derived from the string values upon load/read.

  • file_type ({"single", "distribute"}, default = "distribute") – Default: Distribute Distribute the dataset over a file per locale. Single file will save the dataset to one file

Return type:

String message indicating result of save operation

Raises:

RuntimeError – Raised if a server-side error is thrown saving the pdarray

Notes

  • Parquet files do not store the segments, only the values.

  • Strings state is saved as two datasets within an hdf5 group: one for the string characters and one for the segments corresponding to the start of each string

  • the hdf5 group is named via the dataset parameter.

  • The prefix_path must be visible to the arkouda server and the user must have write permission.

  • Output files have names of the form <prefix_path>_LOCALE<i>, where <i> ranges from 0 to numLocales for file_type=’distribute’. Otherwise, the file name will be prefix_path.

  • If any of the output files already exist and the mode is ‘truncate’, they will be overwritten. If the mode is ‘append’ and the number of output files is less than the number of locales or a dataset with the same name already exists, a RuntimeError will result.

  • Any file extension can be used.The file I/O does not rely on the extension to determine the file format.

See also

to_hdf

to_list() list[source]

Convert the SegString to a list, transferring data from the arkouda server to Python. If the SegString exceeds a built-in size limit, a RuntimeError is raised.

Returns:

A list with the same strings as this SegString

Return type:

list

Notes

The number of bytes in the array cannot exceed ak.client.maxTransferBytes, otherwise a RuntimeError will be raised. This is to protect the user from overflowing the memory of the system on which the Python client is running, under the assumption that the server is running on a distributed system with much more memory than the client. The user may override this limit by setting ak.client.maxTransferBytes to a larger value, but proceed with caution.

See also

to_ndarray

Examples

>>> a = ak.array(["hello", "my", "world"])
>>> a.to_list()
['hello', 'my', 'world']
>>> type(a.to_list())
<class 'list'>
to_ndarray() numpy.ndarray[source]

Convert the array to a np.ndarray, transferring array data from the arkouda server to Python. If the array exceeds a built-in size limit, a RuntimeError is raised.

Returns:

A numpy ndarray with the same strings as this array

Return type:

np.ndarray

Notes

The number of bytes in the array cannot exceed ak.client.maxTransferBytes, otherwise a RuntimeError will be raised. This is to protect the user from overflowing the memory of the system on which the Python client is running, under the assumption that the server is running on a distributed system with much more memory than the client. The user may override this limit by setting ak.client.maxTransferBytes to a larger value, but proceed with caution.

See also

array, to_list

Examples

>>> a = ak.array(["hello", "my", "world"])
>>> a.to_ndarray()
array(['hello', 'my', 'world'], dtype='<U5')
>>> type(a.to_ndarray())
<class 'numpy.ndarray'>
to_parquet(prefix_path: str, dataset: str = 'strings_array', mode: Literal['truncate', 'append'] = 'truncate', compression: Literal['snappy', 'gzip', 'brotli', 'zstd', 'lz4'] | None = None) str[source]

Save the Strings object to Parquet. The result is a collection of files, one file per locale of the arkouda server, where each filename starts with prefix_path. Each locale saves its chunk of the array to its corresponding file. :param prefix_path: Directory and filename prefix that all output files share :type prefix_path: str :param dataset: Name of the dataset to create in files (must not already exist) :type dataset: str, default=”strings_array” :param mode: By default, truncate (overwrite) output files, if they exist.

If ‘append’, attempt to create new dataset in existing files.

Parameters:

compression ({"snappy", "gzip", "brotli", "zstd", "lz4"}, optional) – Sets the compression type used with Parquet files

Return type:

string message indicating result of save operation

Raises:

RuntimeError – Raised if a server-side error is thrown saving the pdarray

Notes

  • The prefix_path must be visible to the arkouda server and the user must

have write permission. - Output files have names of the form <prefix_path>_LOCALE<i>, where <i> ranges from 0 to numLocales for file_type=’distribute’. - ‘append’ write mode is supported, but is not efficient. - If any of the output files already exist and the mode is ‘truncate’, they will be overwritten. If the mode is ‘append’ and the number of output files is less than the number of locales or a dataset with the same name already exists, a RuntimeError will result. - Any file extension can be used.The file I/O does not rely on the extension to determine the file format.

transfer(hostname: str, port: arkouda.numpy.dtypes.int_scalars) str | memoryview[source]

Sends a Strings object to a different Arkouda server

Parameters:
  • hostname (str) – The hostname where the Arkouda server intended to receive the Strings object is running.

  • port (int_scalars) – The port to send the array over. This needs to be an open port (i.e., not one that the Arkouda server is running on). This will open up numLocales ports, each of which in succession, so will use ports of the range {port..(port+numLocales)} (e.g., running an Arkouda server of 4 nodes, port 1234 is passed as port, Arkouda will use ports 1234, 1235, 1236, and 1237 to send the array data). This port much match the port passed to the call to ak.receive_array().

Return type:

A message indicating a complete transfer

Raises:
  • ValueError – Raised if the op is not within the pdarray.BinOps set

  • TypeError – Raised if other is not a pdarray or the pdarray.dtype is not a supported dtype

unregister() None[source]

Unregister a Strings object in the arkouda server which was previously registered using register() and/or attached to using attach()

Return type:

None

Raises:

RuntimeError – Raised if the server could not find the internal name/symbol to remove

See also

register, attach

Notes

Registered names/Strings objects in the server are immune to deletion until they are unregistered.

static unregister_strings_by_name(user_defined_name: str) None[source]

Unregister a Strings object in the arkouda server previously registered via register()

Parameters:

user_defined_name (str) – The registered name of the Strings object

update_hdf(prefix_path: str, dataset: str = 'strings_array', save_offsets: bool = True, repack: bool = True) str[source]

Overwrite the dataset with the name provided with this Strings object. If the dataset does not exist it is added

Parameters:
  • prefix_path (str) – Directory and filename prefix that all output files share

  • dataset (str, default="strings_array") – Name of the dataset to create in files

  • save_offsets (bool, default=True) – Defaults to True which will instruct the server to save the offsets array to HDF5 If False the offsets array will not be save and will be derived from the string values upon load/read.

  • repack (bool, default=True) – Default: True HDF5 does not release memory on delete. When True, the inaccessible data (that was overwritten) is removed. When False, the data remains, but is inaccessible. Setting to false will yield better performance, but will cause file sizes to expand.

Return type:

str - success message if successful

Raises:

RuntimeError – Raised if a server-side error is thrown saving the Strings object

Notes

  • If file does not contain File_Format attribute to indicate how it was saved, the file name is checked for _LOCALE#### to determine if it is distributed.

  • If the dataset provided does not exist, it will be added

upper() Strings[source]

Returns a new Strings with all lowercase characters from the original replaced with their uppercase equivalent

Returns:

Strings with all lowercase characters from the original replaced with their uppercase equivalent

Return type:

Strings

Raises:

RuntimeError – Raised if there is a server-side error thrown

See also

Strings.lower

Examples

>>> strings = ak.array([f'StrINgS {i}' for i in range(5)])
>>> strings
array(['StrINgS 0', 'StrINgS 1', 'StrINgS 2', 'StrINgS 3', 'StrINgS 4'])
>>> strings.upper()
array(['STRINGS 0', 'STRINGS 1', 'STRINGS 2', 'STRINGS 3', 'STRINGS 4'])
class arkouda.numpy.Strings(strings_pdarray: arkouda.numpy.pdarrayclass.pdarray, bytes_size: arkouda.numpy.dtypes.int_scalars)[source]

Represents an array of strings whose data resides on the arkouda server. The user should not call this class directly; rather its instances are created by other arkouda functions.

entry

Encapsulation of a Segmented Strings array contained on the arkouda server. This is a composite of

  • offsets array: starting indices for each string

  • bytes array: raw bytes of all strings joined by nulls

Type:

pdarray

size

The number of strings in the array

Type:

int_scalars

nbytes

The total number of bytes in all strings

Type:

int_scalars

ndim

The rank of the array (currently only rank 1 arrays supported)

Type:

int_scalars

shape

The sizes of each dimension of the array

Type:

tuple

dtype

The dtype is ak.str

Type:

dtype

logger

Used for all logging operations

Type:

ArkoudaLogger

Notes

Strings is composed of two pdarrays: (1) offsets, which contains the starting indices for each string and (2) bytes, which contains the raw bytes of all strings, delimited by nulls.

BinOps
astype(dtype: numpy.dtype | str) arkouda.numpy.pdarrayclass.pdarray[source]

Cast values of Strings object to provided dtype

Parameters:

dtype (np.dtype or str) – Dtype to cast to

Returns:

An arkouda pdarray with values converted to the specified data type

Return type:

ak.pdarray

Notes

This is essentially shorthand for ak.cast(x, ‘<dtype>’) where x is a pdarray.

static attach(user_defined_name: str) Strings[source]

class method to return a Strings object attached to the registered name in the arkouda server which was registered using register()

Parameters:

user_defined_name (str) – user defined name which the Strings object was registered under

Returns:

the Strings object registered with user_defined_name in the arkouda server

Return type:

Strings object

Raises:

TypeError – Raised if user_defined_name is not a str

See also

register, unregister

Notes

Registered names/Strings objects in the server are immune to deletion until they are unregistered.

cached_regex_patterns() List[source]

Returns the regex patterns for which Match objects have been cached

capitalize() Strings[source]

Returns a new Strings from the original replaced with the first letter capitilzed and the remaining letters lowercase.

Returns:

Strings from the original replaced with the capitalized equivalent.

Return type:

Strings

Raises:

RuntimeError – Raised if there is a server-side error thrown.

See also

Strings.lower, String.upper, String.title

Examples

>>> strings = ak.array([f'StrINgS aRe Here {i}' for i in range(5)])
>>> strings
array(['StrINgS aRe Here 0', 'StrINgS aRe Here 1', 'StrINgS aRe Here 2', 'StrINgS aRe Here 3', 'StrINgS aRe Here 4'])
>>> strings.title()
array(['Strings Are Here 0', 'Strings Are Here 1', 'Strings Are Here 2', 'Strings Are Here 3', 'Strings Are Here 4'])
contains(substr: bytes | arkouda.numpy.dtypes.str_scalars, regex: bool = False) arkouda.numpy.pdarrayclass.pdarray[source]

Check whether each element contains the given substring.

Parameters:
  • substr (bytes or str_scalars) – The substring in the form of string or byte array to search for

  • regex (bool, default=False) – Indicates whether substr is a regular expression Note: only handles regular expressions supported by re2 (does not support lookaheads/lookbehinds)

Returns:

True for elements that contain substr, False otherwise

Return type:

pdarray, bool

Raises:
  • TypeError – Raised if the substr parameter is not bytes or str_scalars

  • ValueError – Rasied if substr is not a valid regex

  • RuntimeError – Raised if there is a server-side error thrown

Examples

>>> strings = ak.array([f'{i} string {i}' for i in range(1, 6)])
>>> strings
array(['1 string 1', '2 string 2', '3 string 3', '4 string 4', '5 string 5'])
>>> strings.contains('string')
array([True True True True True])
>>> strings.contains('string \d', regex=True)
array([True True True True True])
decode(fromEncoding: str, toEncoding: str = 'UTF-8') Strings[source]

Return a new strings object in fromEncoding, expecting that the current Strings is encoded in toEncoding

Parameters:
  • fromEncoding (str) – The current encoding of the strings object

  • toEncoding (str, default="UTF-8") – The encoding that the strings will be converted to, default to UTF-8

Returns:

A new Strings object in toEncoding

Return type:

Strings

Raises:

RuntimeError – Raised if there is a server-side error thrown

dtype
encode(toEncoding: str, fromEncoding: str = 'UTF-8') Strings[source]

Return a new strings object in toEncoding, expecting that the current Strings is encoded in fromEncoding

Parameters:
  • toEncoding (str) – The encoding that the strings will be converted to

  • fromEncoding (str, default="UTF-8") – The current encoding of the strings object, default to UTF-8

Returns:

A new Strings object in toEncoding

Return type:

Strings

Raises:

RuntimeError – Raised if there is a server-side error thrown

endswith(substr: bytes | arkouda.numpy.dtypes.str_scalars, regex: bool = False) arkouda.numpy.pdarrayclass.pdarray[source]

Check whether each element ends with the given substring.

Parameters:
  • substr (bytes or str_scalars) – The suffix to search for

  • regex (bool, default=False) – Indicates whether substr is a regular expression Note: only handles regular expressions supported by re2 (does not support lookaheads/lookbehinds)

Returns:

True for elements that end with substr, False otherwise

Return type:

pdarray, bool

Raises:
  • TypeError – Raised if the substr parameter is not bytes or str_scalars

  • ValueError – Rasied if substr is not a valid regex

  • RuntimeError – Raised if there is a server-side error thrown

Examples

>>> strings_start = ak.array([f'{i} string' for i in range(1,6)])
>>> strings_start
array(['1 string', '2 string', '3 string', '4 string', '5 string'])
>>> strings_start.endswith('ing')
array([True True True True True])
>>> strings_end = ak.array([f'string {i}' for i in range(1, 6)])
>>> strings_end
array(['string 1', 'string 2', 'string 3', 'string 4', 'string 5'])
>>> strings_end.endswith('ing \d', regex = True)
array([True True True True True])
entry: arkouda.numpy.pdarrayclass.pdarray
equals(other: Any) arkouda.numpy.dtypes.bool_scalars[source]

Whether Strings are the same size and all entries are equal.

Parameters:

other (Any) – object to compare.

Returns:

True if the Strings are the same, o.w. False.

Return type:

bool

Examples

>>> import arkouda as ak
>>> ak.connect()
>>> s = ak.array(["a", "b", "c"])
>>> s_cpy = ak.array(["a", "b", "c"])
>>> s.equals(s_cpy)
True
>>> s2 = ak.array(["a", "x", "c"])
>>> s.equals(s2)
False
find_locations(pattern: bytes | arkouda.numpy.dtypes.str_scalars) Tuple[arkouda.numpy.pdarrayclass.pdarray, arkouda.numpy.pdarrayclass.pdarray, arkouda.numpy.pdarrayclass.pdarray][source]

Finds pattern matches and returns pdarrays containing the number, start postitions, and lengths of matches

Parameters:

pattern (bytes or str_scalars) – The regex pattern used to find matches

Returns:

  • pdarray, int64 – For each original string, the number of pattern matches

  • pdarray, int64 – The start positons of pattern matches

  • pdarray, int64 – The lengths of pattern matches

Raises:
  • TypeError – Raised if the pattern parameter is not bytes or str_scalars

  • ValueError – Raised if pattern is not a valid regex

  • RuntimeError – Raised if there is a server-side error thrown

Examples

>>> strings = ak.array([f'{i} string {i}' for i in range(1, 6)])
>>> num_matches, starts, lens = strings.find_locations('\d')
>>> num_matches
array([2 2 2 2 2])
>>> starts
array([0 9 0 9 0 9 0 9 0 9])
>>> lens
array([1 1 1 1 1 1 1 1 1 1])
findall(pattern: bytes | arkouda.numpy.dtypes.str_scalars, return_match_origins: bool = False) Strings | Tuple[source]

Return a new Strings containg all non-overlapping matches of pattern

Parameters:
  • pattern (bytes or str_scalars) – Regex used to find matches

  • return_match_origins (bool, default=False) – If True, return a pdarray containing the index of the original string each pattern match is from

Returns:

  • Strings – Strings object containing only pattern matches

  • pdarray, int64 (optional) – The index of the original string each pattern match is from

Raises:
  • TypeError – Raised if the pattern parameter is not bytes or str_scalars

  • ValueError – Raised if pattern is not a valid regex

  • RuntimeError – Raised if there is a server-side error thrown

Examples

>>> strings = ak.array(['1_2___', '____', '3', '__4___5____6___7', ''])
>>> strings.findall('_+', return_match_origins=True)
(array(['_', '___', '____', '__', '___', '____', '___']), array([0 0 1 3 3 3 3]))
flatten() Strings[source]

Return a copy of the array collapsed into one dimension.

Return type:

A copy of the input array, flattened to one dimension.

Note

As multidimensional Strings are currently supported, flatten on a Strings object will always return itself.

static from_parts(offset_attrib: arkouda.numpy.pdarrayclass.pdarray | str, bytes_attrib: arkouda.numpy.pdarrayclass.pdarray | str) Strings[source]

Factory method for creating a Strings object from an Arkouda server response where the arrays are separate components.

Parameters:
  • offset_attrib (pdarray or str) – the array containing the offsets

  • bytes_attrib (pdarray or str) – the array containing the string values

Returns:

object representing a segmented strings array on the server

Return type:

Strings

Raises:

RuntimeError – Raised if there’s an error converting a server-returned str-descriptor

Notes

This factory method is used when we construct the parts of a Strings object on the client side and transfer the offsets & bytes separately to the server. This results in two entries in the symbol table and we need to instruct the server to assemble the into a composite entity.

static from_return_msg(rep_msg: str) Strings[source]

Factory method for creating a Strings object from an Arkouda server response message

Parameters:

rep_msg (str) – Server response message currently of form created name type size ndim shape itemsize+created bytes.size 1234

Returns:

object representing a segmented strings array on the server

Return type:

Strings

Raises:

RuntimeError – Raised if there’s an error converting a server-returned str-descriptor

Notes

We really don’t have an itemsize because these are variable length strings. In the future we could probably use this position to store the total bytes.

fullmatch(pattern: bytes | arkouda.numpy.dtypes.str_scalars) arkouda.match.Match[source]

Returns a match object where elements match only if the whole string matches the regular expression pattern

Parameters:

pattern (bytes or str_scalars) – Regex used to find matches

Returns:

Match object where elements match only if the whole string matches the regular expression pattern

Return type:

Match

Examples

>>> strings = ak.array(['1_2___', '____', '3', '__4___5____6___7', ''])
>>> strings.fullmatch('_+')
<ak.Match object: matched=False; matched=True, span=(0, 4); matched=False;
matched=False; matched=False>
get_bytes() arkouda.numpy.pdarrayclass.pdarray[source]

Getter for the bytes component (uint8 pdarray) of this Strings.

Returns:

Pdarray of bytes of the string accessed

Return type:

pdarray, uint8

Example

>>> x = ak.array(['one', 'two', 'three'])
>>> x.get_bytes()
[111 110 101 0 116 119 111 0 116 104 114 101 101 0]
get_lengths() arkouda.numpy.pdarrayclass.pdarray[source]

Return the length of each string in the array.

Returns:

The length of each string

Return type:

pdarray, int

Raises:

RuntimeError – Raised if there is a server-side error thrown

get_offsets() arkouda.numpy.pdarrayclass.pdarray[source]

Getter for the offsets component (int64 pdarray) of this Strings.

Returns:

Pdarray of offsets of the string accessed

Return type:

pdarray, int64

Example

>>> x = ak.array(['one', 'two', 'three'])
>>> x.get_offsets()
[0 4 8]
get_prefixes(n: arkouda.numpy.dtypes.int_scalars, return_origins: bool = True, proper: bool = True) Strings | Tuple[Strings, arkouda.numpy.pdarrayclass.pdarray][source]

Return the n-long prefix of each string, where possible

Parameters:
  • n (int_scalars) – Length of prefix

  • return_origins (bool, default=True) – If True, return a logical index indicating which strings were long enough to return an n-prefix

  • proper (bool, default=True) – If True, only return proper prefixes, i.e. from strings that are at least n+1 long. If False, allow the entire string to be returned as a prefix.

Returns:

  • prefixes (Strings) – The array of n-character prefixes; the number of elements is the number of True values in the returned mask.

  • origin_indices (pdarray, bool) – Boolean array that is True where the string was long enough to return an n-character prefix, False otherwise.

get_suffixes(n: arkouda.numpy.dtypes.int_scalars, return_origins: bool = True, proper: bool = True) Strings | Tuple[Strings, arkouda.numpy.pdarrayclass.pdarray][source]

Return the n-long suffix of each string, where possible

Parameters:
  • n (int_scalars) – Length of suffix

  • return_origins (bool, default=True) – If True, return a logical index indicating which strings were long enough to return an n-suffix

  • proper (bool, default=True) – If True, only return proper suffixes, i.e. from strings that are at least n+1 long. If False, allow the entire string to be returned as a suffix.

Returns:

  • suffixes (Strings) – The array of n-character suffixes; the number of elements is the number of True values in the returned mask.

  • origin_indices (pdarray, bool) – Boolean array that is True where the string was long enough to return an n-character suffix, False otherwise.

group() arkouda.numpy.pdarrayclass.pdarray[source]

Return the permutation that groups the array, placing equivalent strings together. All instances of the same string are guaranteed to lie in one contiguous block of the permuted array, but the blocks are not necessarily ordered.

Returns:

The permutation that groups the array by value

Return type:

pdarray

See also

GroupBy, unique

Notes

If the arkouda server is compiled with “-sSegmentedString.useHash=true”, then arkouda uses 128-bit hash values to group strings, rather than sorting the strings directly. This method is fast, but the resulting permutation merely groups equivalent strings and does not sort them. If the “useHash” parameter is false, then a full sort is performed.

Raises:

RuntimeError – Raised if there is a server-side error in executing group request or creating the pdarray encapsulating the return message

hash() Tuple[arkouda.numpy.pdarrayclass.pdarray, arkouda.numpy.pdarrayclass.pdarray][source]

Compute a 128-bit hash of each string.

Returns:

A tuple of two int64 pdarrays. The ith hash value is the concatenation of the ith values from each array.

Return type:

Tuple[pdarray,pdarray]

Notes

The implementation uses SipHash128, a fast and balanced hash function (used by Python for dictionaries and sets). For realistic numbers of strings (up to about 10**15), the probability of a collision between two 128-bit hash values is negligible.

property inferred_type: str

Return a string of the type inferred from the values.

info() str[source]

Returns a JSON formatted string containing information about all components of self

Parameters:

None

Returns:

JSON string containing information about all components of self

Return type:

str

is_registered() numpy.bool_[source]

Return True iff the object is contained in the registry

Parameters:

None

Returns:

Indicates if the object is contained in the registry

Return type:

bool

Raises:

RuntimeError – Raised if there’s a server-side error thrown

isalnum() arkouda.numpy.pdarrayclass.pdarray[source]

Returns a boolean pdarray where index i indicates whether string i of the Strings is alphanumeric.

Returns:

True for elements that are alphanumeric, False otherwise

Return type:

pdarray, bool

Raises:

RuntimeError – Raised if there is a server-side error thrown

Examples

>>> not_alnum = ak.array([f'%Strings {i}' for i in range(3)])
>>> alnum = ak.array([f'Strings{i}' for i in range(3)])
>>> strings = ak.concatenate([not_alnum, alnum])
>>> strings
array(['%Strings 0', '%Strings 1', '%Strings 2', 'Strings0', 'Strings1', 'Strings2'])
>>> strings.isalnum()
array([False False False True True True])
isalpha() arkouda.numpy.pdarrayclass.pdarray[source]

Returns a boolean pdarray where index i indicates whether string i of the Strings is alphabetic. This means there is at least one character, and all the characters are alphabetic.

Returns:

True for elements that are alphabetic, False otherwise

Return type:

pdarray, bool

Raises:

RuntimeError – Raised if there is a server-side error thrown

Examples

>>> not_alpha = ak.array([f'%Strings {i}' for i in range(3)])
>>> alpha = ak.array(['StringA','StringB','StringC'])
>>> strings = ak.concatenate([not_alpha, alpha])
>>> strings
array(['%Strings 0', '%Strings 1', '%Strings 2', 'StringA', 'StringB', 'StringC'])
>>> strings.isalpha()
array([False False False True True True])
isdecimal() arkouda.numpy.pdarrayclass.pdarray[source]

Returns a boolean pdarray where index i indicates whether string i of the Strings has all decimal characters.

Returns:

True for elements that are decimals, False otherwise

Return type:

pdarray, bool

Raises:

RuntimeError – Raised if there is a server-side error thrown

See also

Strings.isdigit

Examples

>>> not_decimal = ak.array([f'Strings {i}' for i in range(3)])
>>> decimal = ak.array([f'12{i}' for i in range(3)])
>>> strings = ak.concatenate([not_decimal, decimal])
>>> strings
array(['Strings 0', 'Strings 1', 'Strings 2', '120', '121', '122'])
>>> strings.isdecimal()
array([False False False True True True])

Special Character Examples

>>> special_strings = ak.array(["3.14", "0", "²", "2³₇", "2³x₇"])
>>> special_strings
array(['3.14', '0', '²', '2³₇', '2³x₇'])
>>> special_strings.isdecimal()
array([False True False False False])
isdigit() arkouda.numpy.pdarrayclass.pdarray[source]

Returns a boolean pdarray where index i indicates whether string i of the Strings has all digit characters.

Returns:

True for elements that are digits, False otherwise

Return type:

pdarray, bool

Raises:

RuntimeError – Raised if there is a server-side error thrown

Examples

>>> not_digit = ak.array([f'Strings {i}' for i in range(3)])
>>> digit = ak.array([f'12{i}' for i in range(3)])
>>> strings = ak.concatenate([not_digit, digit])
>>> strings
array(['Strings 0', 'Strings 1', 'Strings 2', '120', '121', '122'])
>>> strings.isdigit()
array([False False False True True True])

Special Character Examples

>>> special_strings = ak.array(["3.14", "0", "²", "2³₇", "2³x₇"])
>>> special_strings
array(['3.14', '0', '²', '2³₇', '2³x₇'])
>>> special_strings.isdigit()
array([False True True True False])
isempty() arkouda.numpy.pdarrayclass.pdarray[source]

Returns a boolean pdarray where index i indicates whether string i of the Strings is empty.

True for elements that are the empty string, False otherwise

Returns:

True for elements that are digits, False otherwise

Return type:

pdarray, bool

Raises:

RuntimeError – Raised if there is a server-side error thrown

Examples

>>> not_empty = ak.array([f'Strings {i}' for i in range(3)])
>>> empty = ak.array(['' for i in range(3)])
>>> strings = ak.concatenate([not_empty, empty])
>>> strings
array(['Strings 0', 'Strings 1', 'Strings 2', '', '', ''])
>>> strings.isempty()
array([False False False True True True])
islower() arkouda.numpy.pdarrayclass.pdarray[source]

Returns a boolean pdarray where index i indicates whether string i of the Strings is entirely lowercase

Returns:

True for elements that are entirely lowercase, False otherwise

Return type:

pdarray, bool

Raises:

RuntimeError – Raised if there is a server-side error thrown

See also

Strings.isupper

Examples

>>> lower = ak.array([f'strings {i}' for i in range(3)])
>>> upper = ak.array([f'STRINGS {i}' for i in range(3)])
>>> strings = ak.concatenate([lower, upper])
>>> strings
array(['strings 0', 'strings 1', 'strings 2', 'STRINGS 0', 'STRINGS 1', 'STRINGS 2'])
>>> strings.islower()
array([True True True False False False])
isspace() arkouda.numpy.pdarrayclass.pdarray[source]

Returns a boolean pdarray where index i indicates whether string i has all whitespace characters (‘ ’, ‘\t’, ‘\n’, ‘\v’, ‘\f’, ‘\r’).

Returns:

True for elements that are whitespace, False otherwise

Return type:

pdarray, bool

Raises:

RuntimeError – Raised if there is a server-side error thrown

Examples

>>> not_space = ak.array([f'Strings {i}' for i in range(3)])
>>> space = ak.array([' ', '\t', '\n', '\v', '\f', '\r', ' \t\n\v\f\r'])
>>> strings = ak.concatenate([not_space, space])
>>> strings
array(['Strings 0', 'Strings 1', 'Strings 2', ' ', 'u0009', 'n', 'u000B', 'u000C', 'u000D', ' u0009nu000Bu000Cu000D'])
>>> strings.isspace()
array([False False False True True True True True True True])
istitle() arkouda.numpy.pdarrayclass.pdarray[source]

Returns a boolean pdarray where index i indicates whether string i of the Strings is titlecase

Returns:

True for elements that are titlecase, False otherwise

Return type:

pdarray, bool

Raises:

RuntimeError – Raised if there is a server-side error thrown

Examples

>>> mixed = ak.array([f'sTrINgs {i}' for i in range(3)])
>>> title = ak.array([f'Strings {i}' for i in range(3)])
>>> strings = ak.concatenate([mixed, title])
>>> strings
array(['sTrINgs 0', 'sTrINgs 1', 'sTrINgs 2', 'Strings 0', 'Strings 1', 'Strings 2'])
>>> strings.istitle()
array([False False False True True True])
isupper() arkouda.numpy.pdarrayclass.pdarray[source]

Returns a boolean pdarray where index i indicates whether string i of the Strings is entirely uppercase

Returns:

True for elements that are entirely uppercase, False otherwise

Return type:

pdarray, bool

Raises:

RuntimeError – Raised if there is a server-side error thrown

See also

Strings.islower

Examples

>>> lower = ak.array([f'strings {i}' for i in range(3)])
>>> upper = ak.array([f'STRINGS {i}' for i in range(3)])
>>> strings = ak.concatenate([lower, upper])
>>> strings
array(['strings 0', 'strings 1', 'strings 2', 'STRINGS 0', 'STRINGS 1', 'STRINGS 2'])
>>> strings.isupper()
array([False False False True True True])
logger
lower() Strings[source]

Returns a new Strings with all uppercase characters from the original replaced with their lowercase equivalent

Returns:

Strings with all uppercase characters from the original replaced with their lowercase equivalent

Return type:

Strings

Raises:

RuntimeError – Raised if there is a server-side error thrown

See also

Strings.upper

Examples

>>> strings = ak.array([f'StrINgS {i}' for i in range(5)])
>>> strings
array(['StrINgS 0', 'StrINgS 1', 'StrINgS 2', 'StrINgS 3', 'StrINgS 4'])
>>> strings.lower()
array(['strings 0', 'strings 1', 'strings 2', 'strings 3', 'strings 4'])
lstick(other: Strings, delimiter: bytes | arkouda.numpy.dtypes.str_scalars = '') Strings[source]

Join the strings from another array onto the left of the strings of this array, optionally inserting a delimiter. Warning: This function is experimental and not guaranteed to work.

Parameters:
  • other (Strings) – The strings to join onto self’s strings

  • delimiter (bytes or str_scalars, default="") – String inserted between self and other

Returns:

The array of joined strings, as other + self

Return type:

Strings

Raises:
  • TypeError – Raised if the delimiter parameter is neither bytes nor a str or if the other parameter is not a Strings instance

  • RuntimeError – Raised if there is a server-side error thrown

See also

stick, peel, rpeel

Examples

>>> s = ak.array(['a', 'c', 'e'])
>>> t = ak.array(['b', 'd', 'f'])
>>> s.lstick(t, delimiter='.')
array(['b.a', 'd.c', 'f.e'])
match(pattern: bytes | arkouda.numpy.dtypes.str_scalars) arkouda.match.Match[source]

Returns a match object where elements match only if the beginning of the string matches the regular expression pattern

Parameters:

pattern (bytes or str_scalars) – Regex used to find matches

Returns:

Match object where elements match only if the beginning of the string matches the regular expression pattern

Return type:

Match

Examples

>>> strings = ak.array(['1_2___', '____', '3', '__4___5____6___7', ''])
>>> strings.match('_+')
<ak.Match object: matched=False; matched=True, span=(0, 4); matched=False;
matched=True, span=(0, 2); matched=False>
objType = 'Strings'
peel(delimiter: bytes | arkouda.numpy.dtypes.str_scalars, times: arkouda.numpy.dtypes.int_scalars = 1, includeDelimiter: bool = False, keepPartial: bool = False, fromRight: bool = False, regex: bool = False) Tuple[Strings, Strings][source]

Peel off one or more delimited fields from each string (similar to string.partition), returning two new arrays of strings. Warning: This function is experimental and not guaranteed to work.

Parameters:
  • delimiter (bytes or str_scalars) – The separator where the split will occur

  • times (int_scalars, default=1) – The number of times the delimiter is sought, i.e. skip over the first (times-1) delimiters

  • includeDelimiter (bool, default=False) – If true, append the delimiter to the end of the first return array. By default, it is prepended to the beginning of the second return array.

  • keepPartial (bool, default=False) – If true, a string that does not contain <times> instances of the delimiter will be returned in the first array. By default, such strings are returned in the second array.

  • fromRight (bool, default=False) – If true, peel from the right instead of the left (see also rpeel)

  • regex (bool, default=False) – Indicates whether delimiter is a regular expression Note: only handles regular expressions supported by re2 (does not support lookaheads/lookbehinds)

Returns:

left: Strings

The field(s) peeled from the end of each string (unless fromRight is true)

right: Strings

The remainder of each string after peeling (unless fromRight is true)

Return type:

Tuple[Strings, Strings]

Raises:
  • TypeError – Raised if the delimiter parameter is not byte or str_scalars, if times is not int64, or if includeDelimiter, keepPartial, or fromRight is not bool

  • ValueError – Raised if times is < 1 or if delimiter is not a valid regex

  • RuntimeError – Raised if there is a server-side error thrown

See also

rpeel, stick, lstick

Examples

>>> s = ak.array(['a.b', 'c.d', 'e.f.g'])
>>> s.peel('.')
(array(['a', 'c', 'e']), array(['b', 'd', 'f.g']))
>>> s.peel('.', includeDelimiter=True)
(array(['a.', 'c.', 'e.']), array(['b', 'd', 'f.g']))
>>> s.peel('.', times=2)
(array(['', '', 'e.f']), array(['a.b', 'c.d', 'g']))
>>> s.peel('.', times=2, keepPartial=True)
(array(['a.b', 'c.d', 'e.f']), array(['', '', 'g']))
pretty_print_info() None[source]

Prints information about all components of self in a human readable format

Parameters:

None

Return type:

None

purge_cached_regex_patterns() None[source]

purges cached regex patterns

regex_split(pattern: bytes | arkouda.numpy.dtypes.str_scalars, maxsplit: int = 0, return_segments: bool = False) Strings | Tuple[source]

Returns a new Strings split by the occurrences of pattern. If maxsplit is nonzero, at most maxsplit splits occur

Parameters:
  • pattern (bytes or str_scalars) – Regex used to split strings into substrings

  • maxsplit (int, default=0) – The max number of pattern match occurences in each element to split. The default maxsplit=0 splits on all occurences

  • return_segments (bool, default=False) – If True, return mapping of original strings to first substring in return array.

Returns:

  • Strings – Substrings with pattern matches removed

  • pdarray, int64 (optional) – For each original string, the index of first corresponding substring in the return array

Examples

>>> strings = ak.array(['1_2___', '____', '3', '__4___5____6___7', ''])
>>> strings.regex_split('_+', maxsplit=2, return_segments=True)
(array(['1', '2', '', '', '', '3', '', '4', '5____6___7', '']), array([0 3 5 6 9]))
register(user_defined_name: str) Strings[source]

Register this Strings object with a user defined name in the arkouda server so it can be attached to later using Strings.attach() This is an in-place operation, registering a Strings object more than once will update the name in the registry and remove the previously registered name. A name can only be registered to one object at a time.

Parameters:

user_defined_name (str) – user defined name which the Strings object is to be registered under

Returns:

The same Strings object which is now registered with the arkouda server and has an updated name. This is an in-place modification, the original is returned to support a fluid programming style. Please note you cannot register two different objects with the same name.

Return type:

Strings

Raises:
  • TypeError – Raised if user_defined_name is not a str

  • RegistrationError – If the server was unable to register the Strings object with the user_defined_name If the user is attempting to register more than one object with the same name, the former should be unregistered first to free up the registration name.

See also

attach, unregister

Notes

Registered names/Strings objects in the server are immune to deletion until they are unregistered.

registered_name: str | None = None
rpeel(delimiter: bytes | arkouda.numpy.dtypes.str_scalars, times: arkouda.numpy.dtypes.int_scalars = 1, includeDelimiter: bool = False, keepPartial: bool = False, regex: bool = False) Tuple[Strings, Strings][source]

Peel off one or more delimited fields from the end of each string (similar to string.rpartition), returning two new arrays of strings. Warning: This function is experimental and not guaranteed to work.

Parameters:
  • delimiter (bytes or str_scalars) – The separator where the split will occur

  • times (int_scalars, default=1) – The number of times the delimiter is sought, i.e. skip over the last (times-1) delimiters

  • includeDelimiter (bool, default=False) – If true, prepend the delimiter to the start of the first return array. By default, it is appended to the end of the second return array.

  • keepPartial (bool, default=False) – If true, a string that does not contain <times> instances of the delimiter will be returned in the second array. By default, such strings are returned in the first array.

  • regex (bool, default=False) – Indicates whether delimiter is a regular expression Note: only handles regular expressions supported by re2 (does not support lookaheads/lookbehinds)

Returns:

left: Strings

The remainder of the string after peeling

right: Strings

The field(s) that were peeled from the right of each string

Return type:

Tuple[Strings, Strings]

Raises:
  • TypeError – Raised if the delimiter parameter is not bytes or str_scalars or if times is not int64

  • ValueError – Raised if times is < 1 or if delimiter is not a valid regex

  • RuntimeError – Raised if there is a server-side error thrown

See also

peel, stick, lstick

Examples

>>> s = ak.array(['a.b', 'c.d', 'e.f.g'])
>>> s.rpeel('.')
(array(['a', 'c', 'e.f']), array(['b', 'd', 'g']))

Compared against peel

>>> s.peel('.')
(array(['a', 'c', 'e']), array(['b', 'd', 'f.g']))
save(prefix_path: str, dataset: str = 'strings_array', mode: Literal['truncate', 'append'] = 'truncate', save_offsets: bool = True, compression: Literal['snappy', 'gzip', 'brotli', 'zstd', 'lz4'] | None = None, file_format: Literal['HDF5', 'Parquet'] = 'HDF5', file_type: Literal['single', 'distribute'] = 'distribute') str[source]

DEPRECATED Save the Strings object to HDF5 or Parquet. The result is a collection of files, one file per locale of the arkouda server, where each filename starts with prefix_path. HDF5 support single files, in which case the file name will only be that provided. Each locale saves its chunk of the array to its corresponding file.

Parameters:
  • prefix_path (str) – Directory and filename prefix that all output files share

  • dataset (str, default="strings_array") – The name of the Strings dataset to be written, defaults to strings_array

  • mode ({"truncate", "append"}, default = "truncate") – By default, truncate (overwrite) output files, if they exist. If ‘append’, create a new Strings dataset within existing files.

  • save_offsets (bool, default=True) – Defaults to True which will instruct the server to save the offsets array to HDF5 If False the offsets array will not be save and will be derived from the string values upon load/read. This is not supported for Parquet files.

  • compression ({"snappy", "gzip", "brotli", "zstd", "lz4"}, optional) – Sets the compression type used with Parquet files

  • file_format ({"HDF5", "Parquet"}, default = "HDF5") – By default, saved files will be written to the HDF5 file format. If ‘Parquet’, the files will be written to the Parquet file format. This is case insensitive.

  • file_type ({"single", "distribute"}, default = "distribute") – Default: Distribute Distribute the dataset over a file per locale. Single file will save the dataset to one file

Return type:

String message indicating result of save operation

Notes

Important implementation notes: (1) Strings state is saved as two datasets within an hdf5 group: one for the string characters and one for the segments corresponding to the start of each string, (2) the hdf5 group is named via the dataset parameter. (3) Parquet files do not store the segments, only the values.

search(pattern: bytes | arkouda.numpy.dtypes.str_scalars) arkouda.match.Match[source]

Returns a match object with the first location in each element where pattern produces a match. Elements match if any part of the string matches the regular expression pattern

Parameters:

pattern (bytes or str_scalars) – Regex used to find matches

Returns:

Match object where elements match if any part of the string matches the regular expression pattern

Return type:

Match

Examples

>>> strings = ak.array(['1_2___', '____', '3', '__4___5____6___7', ''])
>>> strings.search('_+')
<ak.Match object: matched=True, span=(1, 2); matched=True, span=(0, 4);
matched=False; matched=True, span=(0, 2); matched=False>
shape: Tuple[int]
size: arkouda.numpy.dtypes.int_scalars
split(delimiter: str, return_segments: bool = False, regex: bool = False) Strings | Tuple[source]

Unpack delimiter-joined substrings into a flat array.

Parameters:
  • delimiter (str) – Characters used to split strings into substrings

  • return_segments (bool, default=False) – If True, also return mapping of original strings to first substring in return array.

  • regex (bool, default=False) – Indicates whether delimiter is a regular expression Note: only handles regular expressions supported by re2 (does not support lookaheads/lookbehinds)

Returns:

  • Strings – Flattened substrings with delimiters removed

  • pdarray, int64 (optional) – For each original string, the index of first corresponding substring in the return array

See also

peel, rpeel

Examples

>>> orig = ak.array(['one|two', 'three|four|five', 'six'])
>>> orig.split('|')
array(['one', 'two', 'three', 'four', 'five', 'six'])
>>> flat, mapping = orig.split('|', return_segments=True)
>>> mapping
array([0 2 5])
>>> under = ak.array(['one_two', 'three_____four____five', 'six'])
>>> under_split, under_map = under.split('_+', return_segments=True, regex=True)
>>> under_split
array(['one', 'two', 'three', 'four', 'five', 'six'])
>>> under_map
array([0 2 5])
startswith(substr: bytes | arkouda.numpy.dtypes.str_scalars, regex: bool = False) arkouda.numpy.pdarrayclass.pdarray[source]

Check whether each element starts with the given substring.

Parameters:
  • substr (bytes or str_scalars) – The prefix to search for

  • regex (bool, default=False) – Indicates whether substr is a regular expression Note: only handles regular expressions supported by re2 (does not support lookaheads/lookbehinds)

Returns:

True for elements that start with substr, False otherwise

Return type:

pdarray, bool

Raises:
  • TypeError – Raised if the substr parameter is not a bytes ior str_scalars

  • ValueError – Rasied if substr is not a valid regex

  • RuntimeError – Raised if there is a server-side error thrown

Examples

>>> strings_end = ak.array([f'string {i}' for i in range(1, 6)])
>>> strings_end
array(['string 1', 'string 2', 'string 3', 'string 4', 'string 5'])
>>> strings_end.startswith('string')
array([True True True True True])
>>> strings_start = ak.array([f'{i} string' for i in range(1,6)])
>>> strings_start
array(['1 string', '2 string', '3 string', '4 string', '5 string'])
>>> strings_start.startswith('\d str', regex = True)
array([True True True True True])
stick(other: Strings, delimiter: bytes | arkouda.numpy.dtypes.str_scalars = '', toLeft: bool = False) Strings[source]

Join the strings from another array onto one end of the strings of this array, optionally inserting a delimiter. Warning: This function is experimental and not guaranteed to work.

Parameters:
  • other (Strings) – The strings to join onto self’s strings

  • delimiter (bytes or str_scalars, default="") – String inserted between self and other

  • toLeft (bool, default=False) – If true, join other strings to the left of self. By default, other is joined to the right of self.

Returns:

The array of joined strings

Return type:

Strings

Raises:
  • TypeError – Raised if the delimiter parameter is not bytes or str_scalars or if the other parameter is not a Strings instance

  • ValueError – Raised if times is < 1

  • RuntimeError – Raised if there is a server-side error thrown

See also

lstick, peel, rpeel

Examples

>>> s = ak.array(['a', 'c', 'e'])
>>> t = ak.array(['b', 'd', 'f'])
>>> s.stick(t, delimiter='.')
array(['a.b', 'c.d', 'e.f'])
strip(chars: bytes | arkouda.numpy.dtypes.str_scalars | None = '') Strings[source]

Returns a new Strings object with all leading and trailing occurrences of characters contained in chars removed. The chars argument is a string specifying the set of characters to be removed. If omitted, the chars argument defaults to removing whitespace. The chars argument is not a prefix or suffix; rather, all combinations of its values are stripped.

Parameters:

chars (bytes or str_scalars, optional) – the set of characters to be removed

Returns:

Strings object with the leading and trailing characters matching the set of characters in the chars argument removed

Return type:

Strings

Raises:

RuntimeError – Raised if there is a server-side error thrown

Examples

>>> strings = ak.array(['Strings ', '  StringS  ', 'StringS   '])
>>> s = strings.strip()
>>> s
array(['Strings', 'StringS', 'StringS'])
>>> strings = ak.array(['Strings 1', '1 StringS  ', '  1StringS  12 '])
>>> s = strings.strip(' 12')
>>> s
array(['Strings', 'StringS', 'StringS'])
sub(pattern: bytes | arkouda.numpy.dtypes.str_scalars, repl: bytes | arkouda.numpy.dtypes.str_scalars, count: int = 0) Strings[source]

Return new Strings obtained by replacing non-overlapping occurrences of pattern with the replacement repl. If count is nonzero, at most count substitutions occur

Parameters:
  • pattern (bytes or str_scalars) – The regex to substitue

  • repl (bytes or str_scalars) – The substring to replace pattern matches with

  • count (int, default=0) – The max number of pattern match occurences in each element to replace. The default count=0 replaces all occurences of pattern with repl

Returns:

Strings with pattern matches replaced

Return type:

Strings

Raises:
  • TypeError – Raised if pattern or repl are not bytes or str_scalars

  • ValueError – Raised if pattern is not a valid regex

  • RuntimeError – Raised if there is a server-side error thrown

See also

Strings.subn

Examples

>>> strings = ak.array(['1_2___', '____', '3', '__4___5____6___7', ''])
>>> strings.sub(pattern='_+', repl='-', count=2)
array(['1-2-', '-', '3', '-4-5____6___7', ''])
subn(pattern: bytes | arkouda.numpy.dtypes.str_scalars, repl: bytes | arkouda.numpy.dtypes.str_scalars, count: int = 0) Tuple[source]

Perform the same operation as sub(), but return a tuple (new_Strings, number_of_substitions)

Parameters:
  • pattern (bytes or str_scalars) – The regex to substitue

  • repl (bytes or str_scalars) – The substring to replace pattern matches with

  • count (int, default=0) – The max number of pattern match occurences in each element to replace. The default count=0 replaces all occurences of pattern with repl

Returns:

  • Strings – Strings with pattern matches replaced

  • pdarray, int64 – The number of substitutions made for each element of Strings

Raises:
  • TypeError – Raised if pattern or repl are not bytes or str_scalars

  • ValueError – Raised if pattern is not a valid regex

  • RuntimeError – Raised if there is a server-side error thrown

See also

Strings.sub

Examples

>>> strings = ak.array(['1_2___', '____', '3', '__4___5____6___7', ''])
>>> strings.subn(pattern='_+', repl='-', count=2)
(array(['1-2-', '-', '3', '-4-5____6___7', '']), array([2 1 0 2 0]))
title() Strings[source]

Returns a new Strings from the original replaced with their titlecase equivalent.

Returns:

Strings from the original replaced with their titlecase equivalent.

Return type:

Strings

Raises:

RuntimeError – Raised if there is a server-side error thrown.

See also

Strings.lower, String.upper

Examples

>>> strings = ak.array([f'StrINgS {i}' for i in range(5)])
>>> strings
array(['StrINgS 0', 'StrINgS 1', 'StrINgS 2', 'StrINgS 3', 'StrINgS 4'])
>>> strings.title()
array(['Strings 0', 'Strings 1', 'Strings 2', 'Strings 3', 'Strings 4'])
to_csv(prefix_path: str, dataset: str = 'strings_array', col_delim: str = ',', overwrite: bool = False) str[source]

Write Strings to CSV file(s). File will contain a single column with the Strings data. All CSV Files written by Arkouda include a header denoting data types of the columns. Unlike other file formats, CSV files store Strings as their UTF-8 format instead of storing bytes as uint(8).

Parameters:
  • prefix_path (str) – The filename prefix to be used for saving files. Files will have _LOCALE#### appended when they are written to disk.

  • dataset (str, default="strings_array") – Column name to save the Strings under. Defaults to “strings_array”.

  • col_delim (str, default=",") – Defaults to “,”. Value to be used to separate columns within the file. Please be sure that the value used DOES NOT appear in your dataset.

  • overwrite (bool, default=False) – Defaults to False. If True, any existing files matching your provided prefix_path will be overwritten. If False, an error will be returned if existing files are found.

Returns:

response message

Return type:

str

Raises:
  • ValueError – Raised if all datasets are not present in all parquet files or if one or more of the specified files do not exist

  • RuntimeError – Raised if one or more of the specified files cannot be opened. If allow_errors is true this may be raised if no values are returned from the server.

  • TypeError – Raised if we receive an unknown arkouda_type returned from the server

Notes

  • CSV format is not currently supported by load/load_all operations

  • The column delimiter is expected to be the same for column names and data

  • Be sure that column delimiters are not found within your data.

  • All CSV files must delimit rows using newline (\n) at this time.

to_hdf(prefix_path: str, dataset: str = 'strings_array', mode: Literal['truncate', 'append'] = 'truncate', save_offsets: bool = True, file_type: Literal['single', 'distribute'] = 'distribute') str[source]

Save the Strings object to HDF5. The object can be saved to a collection of files or single file.

Parameters:
  • prefix_path (str) – Directory and filename prefix that all output files share

  • dataset (str, default="strings_array") – The name of the Strings dataset to be written, defaults to strings_array

  • mode ({"truncate", "append"}, default = "truncate") – By default, truncate (overwrite) output files, if they exist. If ‘append’, create a new Strings dataset within existing files.

  • save_offsets (bool, default=True) – Defaults to True which will instruct the server to save the offsets array to HDF5 If False the offsets array will not be save and will be derived from the string values upon load/read.

  • file_type ({"single", "distribute"}, default = "distribute") – Default: Distribute Distribute the dataset over a file per locale. Single file will save the dataset to one file

Return type:

String message indicating result of save operation

Raises:

RuntimeError – Raised if a server-side error is thrown saving the pdarray

Notes

  • Parquet files do not store the segments, only the values.

  • Strings state is saved as two datasets within an hdf5 group: one for the string characters and one for the segments corresponding to the start of each string

  • the hdf5 group is named via the dataset parameter.

  • The prefix_path must be visible to the arkouda server and the user must have write permission.

  • Output files have names of the form <prefix_path>_LOCALE<i>, where <i> ranges from 0 to numLocales for file_type=’distribute’. Otherwise, the file name will be prefix_path.

  • If any of the output files already exist and the mode is ‘truncate’, they will be overwritten. If the mode is ‘append’ and the number of output files is less than the number of locales or a dataset with the same name already exists, a RuntimeError will result.

  • Any file extension can be used.The file I/O does not rely on the extension to determine the file format.

See also

to_hdf

to_list() list[source]

Convert the SegString to a list, transferring data from the arkouda server to Python. If the SegString exceeds a built-in size limit, a RuntimeError is raised.

Returns:

A list with the same strings as this SegString

Return type:

list

Notes

The number of bytes in the array cannot exceed ak.client.maxTransferBytes, otherwise a RuntimeError will be raised. This is to protect the user from overflowing the memory of the system on which the Python client is running, under the assumption that the server is running on a distributed system with much more memory than the client. The user may override this limit by setting ak.client.maxTransferBytes to a larger value, but proceed with caution.

See also

to_ndarray

Examples

>>> a = ak.array(["hello", "my", "world"])
>>> a.to_list()
['hello', 'my', 'world']
>>> type(a.to_list())
<class 'list'>
to_ndarray() numpy.ndarray[source]

Convert the array to a np.ndarray, transferring array data from the arkouda server to Python. If the array exceeds a built-in size limit, a RuntimeError is raised.

Returns:

A numpy ndarray with the same strings as this array

Return type:

np.ndarray

Notes

The number of bytes in the array cannot exceed ak.client.maxTransferBytes, otherwise a RuntimeError will be raised. This is to protect the user from overflowing the memory of the system on which the Python client is running, under the assumption that the server is running on a distributed system with much more memory than the client. The user may override this limit by setting ak.client.maxTransferBytes to a larger value, but proceed with caution.

See also

array, to_list

Examples

>>> a = ak.array(["hello", "my", "world"])
>>> a.to_ndarray()
array(['hello', 'my', 'world'], dtype='<U5')
>>> type(a.to_ndarray())
<class 'numpy.ndarray'>
to_parquet(prefix_path: str, dataset: str = 'strings_array', mode: Literal['truncate', 'append'] = 'truncate', compression: Literal['snappy', 'gzip', 'brotli', 'zstd', 'lz4'] | None = None) str[source]

Save the Strings object to Parquet. The result is a collection of files, one file per locale of the arkouda server, where each filename starts with prefix_path. Each locale saves its chunk of the array to its corresponding file. :param prefix_path: Directory and filename prefix that all output files share :type prefix_path: str :param dataset: Name of the dataset to create in files (must not already exist) :type dataset: str, default=”strings_array” :param mode: By default, truncate (overwrite) output files, if they exist.

If ‘append’, attempt to create new dataset in existing files.

Parameters:

compression ({"snappy", "gzip", "brotli", "zstd", "lz4"}, optional) – Sets the compression type used with Parquet files

Return type:

string message indicating result of save operation

Raises:

RuntimeError – Raised if a server-side error is thrown saving the pdarray

Notes

  • The prefix_path must be visible to the arkouda server and the user must

have write permission. - Output files have names of the form <prefix_path>_LOCALE<i>, where <i> ranges from 0 to numLocales for file_type=’distribute’. - ‘append’ write mode is supported, but is not efficient. - If any of the output files already exist and the mode is ‘truncate’, they will be overwritten. If the mode is ‘append’ and the number of output files is less than the number of locales or a dataset with the same name already exists, a RuntimeError will result. - Any file extension can be used.The file I/O does not rely on the extension to determine the file format.

transfer(hostname: str, port: arkouda.numpy.dtypes.int_scalars) str | memoryview[source]

Sends a Strings object to a different Arkouda server

Parameters:
  • hostname (str) – The hostname where the Arkouda server intended to receive the Strings object is running.

  • port (int_scalars) – The port to send the array over. This needs to be an open port (i.e., not one that the Arkouda server is running on). This will open up numLocales ports, each of which in succession, so will use ports of the range {port..(port+numLocales)} (e.g., running an Arkouda server of 4 nodes, port 1234 is passed as port, Arkouda will use ports 1234, 1235, 1236, and 1237 to send the array data). This port much match the port passed to the call to ak.receive_array().

Return type:

A message indicating a complete transfer

Raises:
  • ValueError – Raised if the op is not within the pdarray.BinOps set

  • TypeError – Raised if other is not a pdarray or the pdarray.dtype is not a supported dtype

unregister() None[source]

Unregister a Strings object in the arkouda server which was previously registered using register() and/or attached to using attach()

Return type:

None

Raises:

RuntimeError – Raised if the server could not find the internal name/symbol to remove

See also

register, attach

Notes

Registered names/Strings objects in the server are immune to deletion until they are unregistered.

static unregister_strings_by_name(user_defined_name: str) None[source]

Unregister a Strings object in the arkouda server previously registered via register()

Parameters:

user_defined_name (str) – The registered name of the Strings object

update_hdf(prefix_path: str, dataset: str = 'strings_array', save_offsets: bool = True, repack: bool = True) str[source]

Overwrite the dataset with the name provided with this Strings object. If the dataset does not exist it is added

Parameters:
  • prefix_path (str) – Directory and filename prefix that all output files share

  • dataset (str, default="strings_array") – Name of the dataset to create in files

  • save_offsets (bool, default=True) – Defaults to True which will instruct the server to save the offsets array to HDF5 If False the offsets array will not be save and will be derived from the string values upon load/read.

  • repack (bool, default=True) – Default: True HDF5 does not release memory on delete. When True, the inaccessible data (that was overwritten) is removed. When False, the data remains, but is inaccessible. Setting to false will yield better performance, but will cause file sizes to expand.

Return type:

str - success message if successful

Raises:

RuntimeError – Raised if a server-side error is thrown saving the Strings object

Notes

  • If file does not contain File_Format attribute to indicate how it was saved, the file name is checked for _LOCALE#### to determine if it is distributed.

  • If the dataset provided does not exist, it will be added

upper() Strings[source]

Returns a new Strings with all lowercase characters from the original replaced with their uppercase equivalent

Returns:

Strings with all lowercase characters from the original replaced with their uppercase equivalent

Return type:

Strings

Raises:

RuntimeError – Raised if there is a server-side error thrown

See also

Strings.lower

Examples

>>> strings = ak.array([f'StrINgS {i}' for i in range(5)])
>>> strings
array(['StrINgS 0', 'StrINgS 1', 'StrINgS 2', 'StrINgS 3', 'StrINgS 4'])
>>> strings.upper()
array(['STRINGS 0', 'STRINGS 1', 'STRINGS 2', 'STRINGS 3', 'STRINGS 4'])
class arkouda.numpy.TimeDelta64DType(obj, align=False, copy=False)

Bases: numpy.dtype

DType class corresponding to the scalar type and dtype of the same name.

Please see numpy.dtype for the typical way to create dtype instances and arrays.dtypes for additional information.

class arkouda.numpy.Timedelta(pda, unit: str = _BASE_UNIT)[source]

Bases: _AbstractBaseTime

Represents a duration, the difference between two dates or times.

Timedelta is the Arkouda equivalent of pandas.TimedeltaIndex.

Parameters:
  • pda (int64 pdarray, pd.TimedeltaIndex, pd.Series, or np.timedelta64 array)

  • unit (str, default 'ns') –

    For int64 pdarray, denotes the unit of the input. Ignored for pandas and numpy arrays, which carry their own unit. Not case-sensitive; prefixes of full names (like ‘sec’) are accepted.

    Possible values:

    • ’weeks’ or ‘w’

    • ’days’ or ‘d’

    • ’hours’ or ‘h’

    • ’minutes’, ‘m’, or ‘t’

    • ’seconds’ or ‘s’

    • ’milliseconds’, ‘ms’, or ‘l’

    • ’microseconds’, ‘us’, or ‘u’

    • ’nanoseconds’, ‘ns’, or ‘n’

    Unlike in pandas, units cannot be combined or mixed with integers

Notes

The .values attribute is always in nanoseconds with int64 dtype.

abs()[source]

Absolute value of time interval.

property components
property days
is_registered() numpy.bool_[source]

Return True iff the object is contained in the registry or is a component of a registered object.

Returns:

Indicates if the object is contained in the registry

Return type:

numpy.bool

Raises:

RegistrationError – Raised if there’s a server-side error or a mis-match of registered components

Notes

Objects registered with the server are immune to deletion until they are unregistered.

property microseconds
property nanoseconds
register(user_defined_name)[source]

Register this Timedelta object and underlying components with the Arkouda server

Parameters:

user_defined_name (str) – user defined name the timedelta is to be registered under, this will be the root name for underlying components

Returns:

The same Timedelta which is now registered with the arkouda server and has an updated name. This is an in-place modification, the original is returned to support a fluid programming style. Please note you cannot register two different Timedeltas with the same name.

Return type:

Timedelta

Raises:
  • TypeError – Raised if user_defined_name is not a str

  • RegistrationError – If the server was unable to register the timedelta with the user_defined_name

Notes

Objects registered with the server are immune to deletion until they are unregistered.

property seconds
special_objType = 'Timedelta'
std(ddof: arkouda.numpy.dtypes.int_scalars = 0)[source]

Returns the standard deviation as a pd.Timedelta object

sum()[source]

Return sum of array elements along the given axis.

Parameters:
  • axis (int, Tuple[int, ...], optional, default = None) – The axis or axes along which to do the operation If None, the computation is done across the entire array.

  • keepdims (bool, optional, default = False) – Whether to keep the singleton dimension(s) along axis in the result.

Returns:

numpy_scalar if axis is omitted, in which case operation is done over entire array pdarray if axis is supplied, in which case the operation is done along that axis

Return type:

numpy_scalar or pdarray

Raises:
  • TypeError – Raised if pda is not a pdarray instance

  • RuntimeError – Raised if there’s a server-side error thrown

Examples

>>> ak.sum(ak.array([1,2,3,4,5]))
15
>>> ak.sum(ak.array([5.5,4.5,3.5,2.5,1.5]))
17.5
>>> ak.array([[1,2,3],[5,4,3]]).sum(axis=1)
array([6 12])

Notes

Works as a method of a pdarray (e.g. a.sum()) or a standalone function (e.g. ak.sum(a))

supported_opeq
supported_with_datetime
supported_with_pdarray
supported_with_r_datetime
supported_with_r_pdarray
supported_with_r_timedelta
supported_with_timedelta
to_pandas()[source]

Convert array to a pandas TimedeltaIndex. Note: if the array size exceeds client.maxTransferBytes, a RuntimeError is raised.

See also

to_ndarray

total_seconds()[source]
unregister()[source]

Unregister this timedelta object in the arkouda server which was previously registered using register() and/or attached to using attach()

Raises:

RegistrationError – If the object is already unregistered or if there is a server error when attempting to unregister

Notes

Objects registered with the server are immune to deletion until they are unregistered.

class arkouda.numpy.UByteDType(obj, align=False, copy=False)

Bases: numpy.dtype

DType class corresponding to the scalar type and dtype of the same name.

Please see numpy.dtype for the typical way to create dtype instances and arrays.dtypes for additional information.

class arkouda.numpy.UInt16DType(obj, align=False, copy=False)

Bases: numpy.dtype

DType class corresponding to the scalar type and dtype of the same name.

Please see numpy.dtype for the typical way to create dtype instances and arrays.dtypes for additional information.

class arkouda.numpy.UInt32DType(obj, align=False, copy=False)

Bases: numpy.dtype

DType class corresponding to the scalar type and dtype of the same name.

Please see numpy.dtype for the typical way to create dtype instances and arrays.dtypes for additional information.

class arkouda.numpy.UInt64DType(obj, align=False, copy=False)

Bases: numpy.dtype

DType class corresponding to the scalar type and dtype of the same name.

Please see numpy.dtype for the typical way to create dtype instances and arrays.dtypes for additional information.

class arkouda.numpy.UInt8DType(obj, align=False, copy=False)

Bases: numpy.dtype

DType class corresponding to the scalar type and dtype of the same name.

Please see numpy.dtype for the typical way to create dtype instances and arrays.dtypes for additional information.

class arkouda.numpy.UIntDType(obj, align=False, copy=False)

Bases: numpy.dtype

DType class corresponding to the scalar type and dtype of the same name.

Please see numpy.dtype for the typical way to create dtype instances and arrays.dtypes for additional information.

class arkouda.numpy.ULongDType(obj, align=False, copy=False)

Bases: numpy.dtype

DType class corresponding to the scalar type and dtype of the same name.

Please see numpy.dtype for the typical way to create dtype instances and arrays.dtypes for additional information.

class arkouda.numpy.ULongLongDType(obj, align=False, copy=False)

Bases: numpy.dtype

DType class corresponding to the scalar type and dtype of the same name.

Please see numpy.dtype for the typical way to create dtype instances and arrays.dtypes for additional information.

class arkouda.numpy.UShortDType(obj, align=False, copy=False)

Bases: numpy.dtype

DType class corresponding to the scalar type and dtype of the same name.

Please see numpy.dtype for the typical way to create dtype instances and arrays.dtypes for additional information.

class arkouda.numpy.Union

Bases: _Final

Union type; Union[X, Y] means either X or Y.

To define a union, use e.g. Union[int, str]. Details: - The arguments must be types and there must be at least one. - None as an argument is a special case and is replaced by

type(None).

  • Unions of unions are flattened, e.g.:

    Union[Union[int, str], float] == Union[int, str, float]
    
  • Unions of a single argument vanish, e.g.:

    Union[int] == int  # The constructor actually returns int
    
  • Redundant arguments are skipped, e.g.:

    Union[int, str, int] == Union[int, str]
    
  • When comparing unions, the argument order is ignored, e.g.:

    Union[int, str] == Union[str, int]
    
  • You cannot subclass or instantiate a union.

  • You can use Optional[X] as a shorthand for Union[X, None].

arkouda.numpy.VAL_SUFFIX = '_values'
class arkouda.numpy.VoidDType(obj, align=False, copy=False)

Bases: numpy.dtype

DType class corresponding to the scalar type and dtype of the same name.

Please see numpy.dtype for the typical way to create dtype instances and arrays.dtypes for additional information.

arkouda.numpy.abs(pda: arkouda.numpy.pdarrayclass.pdarray) arkouda.numpy.pdarrayclass.pdarray[source]

Return the element-wise absolute value of the array.

Parameters:

pda (pdarray)

Returns:

A pdarray containing absolute values of the input array elements

Return type:

pdarray

Raises:

TypeError – Raised if the parameter is not a pdarray

Examples

>>> ak.abs(ak.arange(-5,-1))
array([5 4 3 2])
>>> ak.abs(ak.linspace(-5,-1,5))
array([5.00000000000000000 4.00000000000000000 3.00000000000000000
2.00000000000000000 1.00000000000000000])
class arkouda.numpy.akbool(value)

Bases: numpy.generic

Boolean type (True or False), stored as a byte.

Warning

The bool_ type is not a subclass of the int_ type (the bool_ is not even a number type). This is different than Python’s default implementation of bool as a sub-class of int.

Character code:

'?'

class arkouda.numpy.akint64(value)

Bases: numpy.signedinteger

Signed integer type, compatible with Python int and C long.

Character code:

'l'

Canonical name:

numpy.int_

Alias on this platform (Linux x86_64):

numpy.int64: 64-bit signed integer (-9_223_372_036_854_775_808 to 9_223_372_036_854_775_807).

Alias on this platform (Linux x86_64):

numpy.intp: Signed integer large enough to fit pointer, compatible with C intptr_t.

bit_count(*args, **kwargs)

int64.bit_count() -> int

Computes the number of 1-bits in the absolute value of the input. Analogous to the builtin int.bit_count or popcount in C++.

>>> np.int64(127).bit_count()
7
>>> np.int64(-127).bit_count()
7
class arkouda.numpy.akuint64(value)

Bases: numpy.unsignedinteger

Unsigned integer type, compatible with C unsigned long.

Character code:

'L'

Canonical name:

numpy.uint

Alias on this platform (Linux x86_64):

numpy.uint64: 64-bit unsigned integer (0 to 18_446_744_073_709_551_615).

Alias on this platform (Linux x86_64):

numpy.uintp: Unsigned integer large enough to fit pointer, compatible with C uintptr_t.

bit_count(*args, **kwargs)

uint64.bit_count() -> int

Computes the number of 1-bits in the absolute value of the input. Analogous to the builtin int.bit_count or popcount in C++.

>>> np.uint64(127).bit_count()
7
class arkouda.numpy.all_scalars(origin, params, *, inst=True, name=None)

Bases: _GenericAlias

The central part of internal API.

This represents a generic version of type ‘origin’ with type arguments ‘params’. There are two kind of these aliases: user defined and special. The special ones are wrappers around builtin collections and ABCs in collections.abc. These must have ‘name’ always set. If ‘inst’ is False, then the alias can’t be instantiated, this is used by e.g. typing.List and typing.Dict.

class arkouda.numpy.annotations
compiler_flag(*args, **kwargs)

int([x]) -> integer int(x, base=10) -> integer

Convert a number or string to an integer, or return 0 if no arguments are given. If x is a number, return x.__int__(). For floating point numbers, this truncates towards zero.

If x is not a number or if base is given, then x must be a string, bytes, or bytearray instance representing an integer literal in the given base. The literal can be preceded by ‘+’ or ‘-’ and be surrounded by whitespace. The base defaults to 10. Valid bases are 0 and 2-36. Base 0 means to interpret the base from the string as an integer literal. >>> int(‘0b100’, base=0) 4

getMandatoryRelease()

Return release in which this feature will become mandatory.

This is a 5-tuple, of the same form as sys.version_info, or, if the feature was dropped, is None.

getOptionalRelease()

Return first release in which this feature was recognized.

This is a 5-tuple, of the same form as sys.version_info.

mandatory(*args, **kwargs)

Built-in immutable sequence.

If no argument is given, the constructor returns an empty tuple. If iterable is specified the tuple is initialized from iterable’s items.

If the argument is a tuple, the return value is the same object.

optional(*args, **kwargs)

Built-in immutable sequence.

If no argument is given, the constructor returns an empty tuple. If iterable is specified the tuple is initialized from iterable’s items.

If the argument is a tuple, the return value is the same object.

arkouda.numpy.arange(*args, **kwargs) arkouda.numpy.pdarrayclass.pdarray[source]

arange([start,] stop[, stride,] dtype=int64)

Create a pdarray of consecutive integers within the interval [start, stop). If only one arg is given then arg is the stop parameter. If two args are given, then the first arg is start and second is stop. If three args are given, then the first arg is start, second is stop, third is stride.

The return value is cast to type dtype

Parameters:
  • start (int_scalars, optional) – Starting value (inclusive)

  • stop (int_scalars) – Stopping value (exclusive)

  • stride (int_scalars, optional) – The difference between consecutive elements, the default stride is 1, if stride is specified then start must also be specified.

  • dtype (np.dtype, type, or str) – The target dtype to cast values to

  • max_bits (int) – Specifies the maximum number of bits; only used for bigint pdarrays

Returns:

Integers from start (inclusive) to stop (exclusive) by stride

Return type:

pdarray, dtype

Raises:
  • TypeError – Raised if start, stop, or stride is not an int object

  • ZeroDivisionError – Raised if stride == 0

See also

linspace, zeros, ones, randint

Notes

Negative strides result in decreasing values. Currently, only int64 pdarrays can be created with this method. For float64 arrays, use the linspace method.

Examples

>>> ak.arange(0, 5, 1)
array([0 1 2 3 4])
>>> ak.arange(5, 0, -1)
array([5 4 3 2 1])
>>> ak.arange(0, 10, 2)
array([0 2 4 6 8])
>>> ak.arange(-5, -10, -1)
array([-5 -6 -7 -8 -9])
arkouda.numpy.arange(*args, **kwargs) arkouda.numpy.pdarrayclass.pdarray[source]

arange([start,] stop[, stride,] dtype=int64)

Create a pdarray of consecutive integers within the interval [start, stop). If only one arg is given then arg is the stop parameter. If two args are given, then the first arg is start and second is stop. If three args are given, then the first arg is start, second is stop, third is stride.

The return value is cast to type dtype

Parameters:
  • start (int_scalars, optional) – Starting value (inclusive)

  • stop (int_scalars) – Stopping value (exclusive)

  • stride (int_scalars, optional) – The difference between consecutive elements, the default stride is 1, if stride is specified then start must also be specified.

  • dtype (np.dtype, type, or str) – The target dtype to cast values to

  • max_bits (int) – Specifies the maximum number of bits; only used for bigint pdarrays

Returns:

Integers from start (inclusive) to stop (exclusive) by stride

Return type:

pdarray, dtype

Raises:
  • TypeError – Raised if start, stop, or stride is not an int object

  • ZeroDivisionError – Raised if stride == 0

See also

linspace, zeros, ones, randint

Notes

Negative strides result in decreasing values. Currently, only int64 pdarrays can be created with this method. For float64 arrays, use the linspace method.

Examples

>>> ak.arange(0, 5, 1)
array([0 1 2 3 4])
>>> ak.arange(5, 0, -1)
array([5 4 3 2 1])
>>> ak.arange(0, 10, 2)
array([0 2 4 6 8])
>>> ak.arange(-5, -10, -1)
array([-5 -6 -7 -8 -9])
arkouda.numpy.arccos(pda: arkouda.numpy.pdarrayclass.pdarray, where: bool | arkouda.numpy.pdarrayclass.pdarray = True) arkouda.numpy.pdarrayclass.pdarray[source]

Return the element-wise inverse cosine of the array. The result is between 0 and pi.

Parameters:
  • pda (pdarray)

  • where (bool or pdarray, default=True) – This condition is broadcast over the input. At locations where the condition is True, the inverse cosine will be applied to the corresponding value. Elsewhere, it will retain its original value. Default set to True.

Returns:

A pdarray containing inverse cosine for each element of the original pdarray

Return type:

pdarray

Raises:

TypeError – Raised if the parameter is not a pdarray

arkouda.numpy.arccosh(pda: arkouda.numpy.pdarrayclass.pdarray, where: bool | arkouda.numpy.pdarrayclass.pdarray = True) arkouda.numpy.pdarrayclass.pdarray[source]

Return the element-wise inverse hyperbolic cosine of the array.

Parameters:
  • pda (pdarray)

  • where (bool or pdarray, default=True) – This condition is broadcast over the input. At locations where the condition is True, the inverse hyperbolic cosine will be applied to the corresponding value. Elsewhere, it will retain its original value. Default set to True.

Returns:

A pdarray containing inverse hyperbolic cosine for each element of the original pdarray

Return type:

pdarray

Raises:

TypeError – Raised if the parameter is not a pdarray

arkouda.numpy.arcsin(pda: arkouda.numpy.pdarrayclass.pdarray, where: bool | arkouda.numpy.pdarrayclass.pdarray = True) arkouda.numpy.pdarrayclass.pdarray[source]

Return the element-wise inverse sine of the array. The result is between -pi/2 and pi/2.

Parameters:
  • pda (pdarray)

  • where (bool or pdarray, default=True) – This condition is broadcast over the input. At locations where the condition is True, the inverse sine will be applied to the corresponding value. Elsewhere, it will retain its original value. Default set to True.

Returns:

A pdarray containing inverse sine for each element of the original pdarray

Return type:

pdarray

Raises:

TypeError – Raised if the parameter is not a pdarray

arkouda.numpy.arcsinh(pda: arkouda.numpy.pdarrayclass.pdarray, where: bool | arkouda.numpy.pdarrayclass.pdarray = True) arkouda.numpy.pdarrayclass.pdarray[source]

Return the element-wise inverse hyperbolic sine of the array.

Parameters:
  • pda (pdarray)

  • where (bool or pdarray, default=True) – This condition is broadcast over the input. At locations where the condition is True, the inverse hyperbolic sine will be applied to the corresponding value. Elsewhere, it will retain its original value. Default set to True.

Returns:

A pdarray containing inverse hyperbolic sine for each element of the original pdarray

Return type:

pdarray

Raises:

TypeError – Raised if the parameter is not a pdarray

arkouda.numpy.arctan(pda: arkouda.numpy.pdarrayclass.pdarray, where: bool | arkouda.numpy.pdarrayclass.pdarray = True) arkouda.numpy.pdarrayclass.pdarray[source]

Return the element-wise inverse tangent of the array. The result is between -pi/2 and pi/2.

Parameters:
  • pda (pdarray)

  • where (bool or pdarray, default=True) – This condition is broadcast over the input. At locations where the condition is True, the inverse tangent will be applied to the corresponding value. Elsewhere, it will retain its original value. Default set to True.

Returns:

A pdarray containing inverse tangent for each element of the original pdarray

Return type:

pdarray

Raises:

TypeError – Raised if the parameter is not a pdarray

arkouda.numpy.arctan2(num: arkouda.numpy.pdarrayclass.pdarray | arkouda.numpy.dtypes.numeric_scalars, denom: arkouda.numpy.pdarrayclass.pdarray | arkouda.numpy.dtypes.numeric_scalars, where: bool | arkouda.numpy.pdarrayclass.pdarray = True) arkouda.numpy.pdarrayclass.pdarray[source]

Return the element-wise inverse tangent of the array pair. The result chosen is the signed angle in radians between the ray ending at the origin and passing through the point (1,0), and the ray ending at the origin and passing through the point (denom, num). The result is between -pi and pi.

Parameters:
  • num (pdarray or numeric_scalars) – Numerator of the arctan2 argument.

  • denom (pdarray or numeric_scalars) – Denominator of the arctan2 argument.

  • where (bool or pdarray, default=True) – This condition is broadcast over the input. At locations where the condition is True, the inverse tangent will be applied to the corresponding values. Elsewhere, it will retain its original value. Default set to True.

Returns:

A pdarray containing inverse tangent for each corresponding element pair of the original pdarray, using the signed values or the numerator and denominator to get proper placement on unit circle.

Return type:

pdarray

Raises:

TypeError

Raised if any parameter fails the typechecking
Raised if any element of pdarrays num and denom is not a supported type
Raised if both num and denom are scalars
Raised if where is neither boolean nor a pdarray of boolean

arkouda.numpy.arctanh(pda: arkouda.numpy.pdarrayclass.pdarray, where: bool | arkouda.numpy.pdarrayclass.pdarray = True) arkouda.numpy.pdarrayclass.pdarray[source]

Return the element-wise inverse hyperbolic tangent of the array.

Parameters:
  • pda (pdarray)

  • where (bool or pdarray, default=True) – This condition is broadcast over the input. At locations where the condition is True, the inverse hyperbolic tangent will be applied to the corresponding value. Elsewhere, it will retain its original value. Default set to True.

Returns:

A pdarray containing inverse hyperbolic tangent for each element of the original pdarray

Return type:

pdarray

Raises:

TypeError – Raised if the parameters are not a pdarray or numeric scalar.

arkouda.numpy.argmaxk(pda: pdarray, k: arkouda.numpy.dtypes.int_scalars) pdarray[source]

Find the indices corresponding to the k maximum values of an array.

Returns the largest k values of an array, sorted

Parameters:
  • pda (pdarray) – Input array.

  • k (int_scalars) – The desired count of indices corresponding to maxmum array values

Returns:

The indices of the maximum k values from the pda, sorted

Return type:

pdarray, int

Raises:
  • TypeError – Raised if pda is not a pdarray or k is not an integer

  • ValueError – Raised if the pda is empty, or pda.ndim > 1, or k < 1

Notes

This call is equivalent in value to ak.argsort(a)[k:] and generally outperforms this operation.

This reduction will see a significant drop in performance as k grows beyond a certain value. This value is system dependent, but generally about a k of 5 million is where performance degradation has been observed.

Examples

>>> A = ak.array([10,5,1,3,7,2,9,0])
>>> ak.argmaxk(A, 3)
array([4, 6, 0])
>>> ak.argmaxk(A, 4)
array([1, 4, 6, 0])
arkouda.numpy.argmink(pda: pdarray, k: arkouda.numpy.dtypes.int_scalars) pdarray[source]

Finds the indices corresponding to the k minimum values of an array.

Parameters:
  • pda (pdarray) – Input array.

  • k (int_scalars) – The desired count of indices corresponding to minimum array values

Returns:

The indices of the minimum k values from the pda, sorted

Return type:

pdarray, int

Raises:
  • TypeError – Raised if pda is not a pdarray or k is not an integer

  • ValueError – Raised if the pda is empty, or pda.ndim > 1, or k < 1

Notes

This call is equivalent in value to ak.argsort(a)[:k] and generally outperforms this operation.

This reduction will see a significant drop in performance as k grows beyond a certain value. This value is system dependent, but generally about a k of 5 million is where performance degradation has been observed.

Examples

>>> A = ak.array([10,5,1,3,7,2,9,0])
>>> ak.argmink(A, 3)
array([7, 2, 5])
>>> ak.argmink(A, 4)
array([7, 2, 5, 3])
arkouda.numpy.argsort(pda: arkouda.numpy.pdarrayclass.pdarray | arkouda.numpy.strings.Strings | arkouda.categorical.Categorical, algorithm: SortingAlgorithm = SortingAlgorithm.RadixSortLSD, axis: arkouda.numpy.dtypes.int_scalars = 0) arkouda.numpy.pdarrayclass.pdarray[source]

Return the permutation that sorts the array.

Parameters:
  • pda (pdarray, Strings, or Categorical) – The array to sort (int64, uint64, or float64)

  • algorithm (SortingAlgorithm, default=SortingAlgorithm.RadixSortLSD) – The algorithm to be used for sorting the array.

  • axis (int_scalars, default=0) – The axis to sort over.

Returns:

The indices such that pda[indices] is sorted

Return type:

pdarray of int64

Raises:

TypeError – Raised if the parameter is other than a pdarray, Strings or Categorical

See also

coargsort

Notes

Uses a least-significant-digit radix sort, which is stable and resilient to non-uniformity in data but communication intensive.

Examples

>>> a = ak.randint(0, 10, 10)
>>> perm = ak.argsort(a)
>>> a[perm]
array([0 1 3 3 5 5 5 6 6 6])
>>> ak.argsort(a, ak.sorting.SortingAlgorithm["RadixSortLSD"])
array([0 2 9 6 8 1 3 5 7 4])
>>> ak.argsort(a, ak.sorting.SortingAlgorithm["TwoArrayRadixSort"])
array([0 2 9 6 8 1 3 5 7 4])
arkouda.numpy.array(a: arkouda.numpy.pdarrayclass.pdarray | numpy.ndarray | Iterable, dtype: numpy.dtype | type | str | None = None, max_bits: int = -1) arkouda.numpy.pdarrayclass.pdarray | arkouda.numpy.strings.Strings[source]

Convert a Python or Numpy Iterable to a pdarray or Strings object, sending the corresponding data to the arkouda server.

Parameters:
  • a (Union[pdarray, np.ndarray]) – array of a supported dtype

  • dtype (np.dtype, type, or str) – The target dtype to cast values to

  • max_bits (int) – Specifies the maximum number of bits; only used for bigint pdarrays

Returns:

A pdarray instance stored on arkouda server or Strings instance, which is composed of two pdarrays stored on arkouda server

Return type:

pdarray or Strings

Raises:
  • TypeError – Raised if a is not a pdarray, np.ndarray, or Python Iterable such as a list, array, tuple, or deque

  • RuntimeError – Raised if nbytes > maxTransferBytes, a.dtype is not supported (not in DTypes), or if the product of a size and a.itemsize > maxTransferBytes

  • ValueError – Raised if a has rank is not in get_array_ranks(), or if the returned message is malformed or does not contain the fields required to generate the array.

Notes

The number of bytes in the input array cannot exceed ak.client.maxTransferBytes, otherwise a RuntimeError will be raised. This is to protect the user from overwhelming the connection between the Python client and the arkouda server, under the assumption that it is a low-bandwidth connection. The user may override this limit by setting ak.client.maxTransferBytes to a larger value, but should proceed with caution.

If the pdrray or ndarray is of type U, this method is called twice recursively to create the Strings object and the two corresponding pdarrays for string bytes and offsets, respectively.

Examples

>>> ak.array(np.arange(1,10))
array([1 2 3 4 5 6 7 8 9])
>>> ak.array(range(1,10))
array([1 2 3 4 5 6 7 8 9])
>>> strings = ak.array([f'string {i}' for i in range(0,5)])
>>> type(strings)
<class 'arkouda.numpy.strings.Strings'>
arkouda.numpy.array(a: arkouda.numpy.pdarrayclass.pdarray | numpy.ndarray | Iterable, dtype: numpy.dtype | type | str | None = None, max_bits: int = -1) arkouda.numpy.pdarrayclass.pdarray | arkouda.numpy.strings.Strings[source]

Convert a Python or Numpy Iterable to a pdarray or Strings object, sending the corresponding data to the arkouda server.

Parameters:
  • a (Union[pdarray, np.ndarray]) – array of a supported dtype

  • dtype (np.dtype, type, or str) – The target dtype to cast values to

  • max_bits (int) – Specifies the maximum number of bits; only used for bigint pdarrays

Returns:

A pdarray instance stored on arkouda server or Strings instance, which is composed of two pdarrays stored on arkouda server

Return type:

pdarray or Strings

Raises:
  • TypeError – Raised if a is not a pdarray, np.ndarray, or Python Iterable such as a list, array, tuple, or deque

  • RuntimeError – Raised if nbytes > maxTransferBytes, a.dtype is not supported (not in DTypes), or if the product of a size and a.itemsize > maxTransferBytes

  • ValueError – Raised if a has rank is not in get_array_ranks(), or if the returned message is malformed or does not contain the fields required to generate the array.

Notes

The number of bytes in the input array cannot exceed ak.client.maxTransferBytes, otherwise a RuntimeError will be raised. This is to protect the user from overwhelming the connection between the Python client and the arkouda server, under the assumption that it is a low-bandwidth connection. The user may override this limit by setting ak.client.maxTransferBytes to a larger value, but should proceed with caution.

If the pdrray or ndarray is of type U, this method is called twice recursively to create the Strings object and the two corresponding pdarrays for string bytes and offsets, respectively.

Examples

>>> ak.array(np.arange(1,10))
array([1 2 3 4 5 6 7 8 9])
>>> ak.array(range(1,10))
array([1 2 3 4 5 6 7 8 9])
>>> strings = ak.array([f'string {i}' for i in range(0,5)])
>>> type(strings)
<class 'arkouda.numpy.strings.Strings'>
arkouda.numpy.array_equal(pda_a: arkouda.numpy.pdarrayclass.pdarray, pda_b: arkouda.numpy.pdarrayclass.pdarray, equal_nan: bool = False) bool[source]

Compares two pdarrays for equality. If neither array has any nan elements, then if all elements are pairwise equal, it returns True. If equal_Nan is False, then any nan element in either array gives a False return. If equal_Nan is True, then pairwise-corresponding nans are considered equal.

Parameters:
  • pda_a (pdarray)

  • pda_b (pdarray)

  • equal_nan (bool, default=False) – Determines how to handle nans

Returns:

With string data:

False if one array is type ak.str_ & the other isn’t, True if both are ak.str_ & they match.

With numeric data:

True if neither array has any nan elements, and all elements pairwise equal.

True if equal_Nan True, all non-nans pairwise equal & nans in pda_a correspond to nans in pda_b

False if equal_Nan False, & either array has any nan element.

Return type:

boolean

Examples

>>> a = ak.randint(0,10,10,dtype=ak.float64)
>>> b = a
>>> ak.array_equal(a,b)
True
>>> b[9] = np.nan
>>> ak.array_equal(a,b)
False
>>> a[9] = np.nan
>>> ak.array_equal(a,b)
False
>>> ak.array_equal(a,b,True)
True
arkouda.numpy.attach(name: str)[source]
arkouda.numpy.attach_all(names: list)[source]

Attach to all objects registered with the names provide

Parameters:

names (list) – List of names to attach to

Return type:

dict

arkouda.numpy.attach_pdarray(user_defined_name: str) pdarray[source]

class method to return a pdarray attached to the registered name in the arkouda server which was registered using register()

Parameters:

user_defined_name (str) – user defined name which array was registered under

Returns:

pdarray which is bound to the corresponding server side component which was registered with user_defined_name

Return type:

pdarray

Raises:

TypeError – Raised if user_defined_name is not a str

See also

attach, register, unregister, is_registered, unregister_pdarray_by_name, list_registry

Notes

Registered names/pdarrays in the server are immune to deletion until they are unregistered.

Examples

>>> a = zeros(100)
>>> a.register("my_zeros")
>>> # potentially disconnect from server and reconnect to server
>>> b = ak.attach_pdarray("my_zeros")
>>> # ...other work...
>>> b.unregister()
class arkouda.numpy.bigint[source]

Datatype for representing integers of variable size.

May be used for integers that exceed 64 bits.

itemsize(*args, **kwargs)

int([x]) -> integer int(x, base=10) -> integer

Convert a number or string to an integer, or return 0 if no arguments are given. If x is a number, return x.__int__(). For floating point numbers, this truncates towards zero.

If x is not a number or if base is given, then x must be a string, bytes, or bytearray instance representing an integer literal in the given base. The literal can be preceded by ‘+’ or ‘-’ and be surrounded by whitespace. The base defaults to 10. Valid bases are 0 and 2-36. Base 0 means to interpret the base from the string as an integer literal. >>> int(‘0b100’, base=0) 4

name(*args, **kwargs)

str(object=’’) -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.__str__() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to ‘strict’.

ndim(*args, **kwargs)

int([x]) -> integer int(x, base=10) -> integer

Convert a number or string to an integer, or return 0 if no arguments are given. If x is a number, return x.__int__(). For floating point numbers, this truncates towards zero.

If x is not a number or if base is given, then x must be a string, bytes, or bytearray instance representing an integer literal in the given base. The literal can be preceded by ‘+’ or ‘-’ and be surrounded by whitespace. The base defaults to 10. Valid bases are 0 and 2-36. Base 0 means to interpret the base from the string as an integer literal. >>> int(‘0b100’, base=0) 4

shape(*args, **kwargs)

Built-in immutable sequence.

If no argument is given, the constructor returns an empty tuple. If iterable is specified the tuple is initialized from iterable’s items.

If the argument is a tuple, the return value is the same object.

type(x)[source]
arkouda.numpy.bigint_from_uint_arrays(arrays, max_bits=-1)[source]

Create a bigint pdarray from an iterable of uint pdarrays. The first item in arrays will be the highest 64 bits and the last item will be the lowest 64 bits.

Parameters:
  • arrays (Sequence[pdarray]) – An iterable of uint pdarrays used to construct the bigint pdarray. The first item in arrays will be the highest 64 bits and the last item will be the lowest 64 bits.

  • max_bits (int) – Specifies the maximum number of bits; only used for bigint pdarrays

Returns:

bigint pdarray constructed from uint arrays

Return type:

pdarray

Raises:
  • TypeError – Raised if any pdarray in arrays has a dtype other than uint or if the pdarrays are not the same size.

  • RuntimeError – Raised if there is a server-side error thrown

Examples

>>> a = ak.bigint_from_uint_arrays([ak.ones(5, dtype=ak.uint64), ak.arange(5, dtype=ak.uint64)])
>>> a
array([18446744073709551616 18446744073709551617 18446744073709551618
18446744073709551619 18446744073709551620])
>>> a.dtype
dtype(bigint)
>>> all(a[i] == 2**64 + i for i in range(5))
True
class arkouda.numpy.bitType(value)

Bases: numpy.unsignedinteger

Unsigned integer type, compatible with C unsigned long.

Character code:

'L'

Canonical name:

numpy.uint

Alias on this platform (Linux x86_64):

numpy.uint64: 64-bit unsigned integer (0 to 18_446_744_073_709_551_615).

Alias on this platform (Linux x86_64):

numpy.uintp: Unsigned integer large enough to fit pointer, compatible with C uintptr_t.

bit_count(*args, **kwargs)

uint64.bit_count() -> int

Computes the number of 1-bits in the absolute value of the input. Analogous to the builtin int.bit_count or popcount in C++.

>>> np.uint64(127).bit_count()
7
class arkouda.numpy.bool_(value)

Bases: numpy.generic

Boolean type (True or False), stored as a byte.

Warning

The bool_ type is not a subclass of the int_ type (the bool_ is not even a number type). This is different than Python’s default implementation of bool as a sub-class of int.

Character code:

'?'

class arkouda.numpy.bool_scalars(origin, params, *, inst=True, name=None)

Bases: _GenericAlias

The central part of internal API.

This represents a generic version of type ‘origin’ with type arguments ‘params’. There are two kind of these aliases: user defined and special. The special ones are wrappers around builtin collections and ABCs in collections.abc. These must have ‘name’ always set. If ‘inst’ is False, then the alias can’t be instantiated, this is used by e.g. typing.List and typing.Dict.

arkouda.numpy.broadcast(segments: pdarray, values: pdarray | Strings, size: int | np.int64 | np.uint64 = -1, permutation: pdarray | None = None)[source]

Broadcast a dense column vector to the rows of a sparse matrix or grouped array.

Parameters:
  • segments (pdarray, int64) – Offsets of the start of each row in the sparse matrix or grouped array. Must be sorted in ascending order.

  • values (pdarray, Strings) – The values to broadcast, one per row (or group)

  • size (int) – The total number of nonzeros in the matrix. If permutation is given, this argument is ignored and the size is inferred from the permutation array.

  • permutation (pdarray, int64) – The permutation to go from the original ordering of nonzeros to the ordering grouped by row. To broadcast values back to the original ordering, this permutation will be inverted. If no permutation is supplied, it is assumed that the original nonzeros were already grouped by row. In this case, the size argument must be given.

Returns:

The broadcast values, one per nonzero

Return type:

pdarray, Strings

Raises:

ValueError

  • If segments and values are different sizes

  • If segments are empty

  • If number of nonzeros (either user-specified or inferred from permutation) is less than one

Examples

>>>
# Define a sparse matrix with 3 rows and 7 nonzeros
>>> row_starts = ak.array([0, 2, 5])
>>> nnz = 7
# Broadcast the row number to each nonzero element
>>> row_number = ak.arange(3)
>>> ak.broadcast(row_starts, row_number, nnz)
array([0 0 1 1 1 2 2])
# If the original nonzeros were in reverse order...
>>> permutation = ak.arange(6, -1, -1)
>>> ak.broadcast(row_starts, row_number, permutation=permutation)
array([2 2 1 1 1 0 0])
arkouda.numpy.broadcast_dims(sa: Sequence[int], sb: Sequence[int]) Tuple[int, Ellipsis][source]

Algorithm to determine shape of broadcasted PD array given two array shapes

see: https://data-apis.org/array-api/latest/API_specification/broadcasting.html#algorithm

arkouda.numpy.broadcast_to_shape(pda: pdarray, shape: Tuple[int, Ellipsis]) pdarray[source]

Create a “broadcasted” array (of rank ‘nd’) by copying an array into an array of the given shape.

E.g., given the following broadcast:

pda (3d array): 1 x 4 x 1

shape ( shape ): 7 x 4 x 2

Result (3d array): 7 x 4 x 2

When copying from a singleton dimension, the value is repeated along that dimension (e.g., pda’s 1st and 3rd above). For non singleton dimensions, the size of the two arrays must match, and the values are copied into the result array.

When prepending a new dimension to increase an array’s rank, the values from the other dimensions are repeated along the new dimension.

Parameters:
  • pda (pdarray) – the input to be broadcast

  • shape (tuple of int) – the shape to which pda is to be broadcast

Returns:

the result of the broadcast operation

Return type:

pdarray

Examples

>>> a = ak.arange(2).reshape(1,2,1)
>>> ak.broadcast_to_shape(a,(2,2,2))
array([array([array([0 0]) array([1 1])]) array([array([0 0]) array([1 1])])])
>>> a = ak.array([5,19]).reshape(1,2)
>>> ak.broadcast_to_shape(a,(2,2,2))
array([array([array([5 19]) array([5 19])]) array([array([5 19]) array([5 19])])])
Raises:

RuntimeError – raised if the pda can’t be broadcast to the given shape

arkouda.numpy.can_cast(from_, to) bool[source]

Returns True if cast between data types can occur according to the casting rule.

Parameters:
  • from (dtype, dtype specifier, NumPy scalar, or pdarray) – Data type, NumPy scalar, or array to cast from.

  • to (dtype or dtype specifier) – Data type to cast to.

Returns:

True if cast can occur according to the casting rule.

Return type:

bool

arkouda.numpy.cast(typ, val)[source]

Cast a value to a type.

This returns the value unchanged. To the type checker this signals that the return value has the designated type, but at runtime we intentionally don’t check anything (we want this to be as fast as possible).

arkouda.numpy.cast(pda: arkouda.numpy.pdarrayclass.pdarray | arkouda.numpy.strings.Strings | arkouda.categorical.Categorical, dt: numpy.dtype | type | str | arkouda.numpy.dtypes.bigint, errors: ErrorMode = ErrorMode.strict) arkouda.numpy.pdarrayclass.pdarray | arkouda.numpy.strings.Strings | arkouda.categorical.Categorical | Tuple[arkouda.numpy.pdarrayclass.pdarray, arkouda.numpy.pdarrayclass.pdarray][source]

Cast an array to another dtype.

Parameters:
  • pda (pdarray, Strings, or Categorical) – The array of values to cast

  • dt (np.dtype, type, str, or bigint) – The target dtype to cast values to

  • errors ({strict, ignore, return_validity}, default=ErrorMode.strict) –

    Controls how errors are handled when casting strings to a numeric type (ignored for casts from numeric types).

    • strict: raise RuntimeError if any string cannot be converted

    • ignore: never raise an error. Uninterpretable strings get

      converted to NaN (float64), -2**63 (int64), zero (uint64 and uint8), or False (bool)

    • return_validity: in addition to returning the same output as “ignore”, also return a bool array indicating where the cast was successful.

    Default set to strict.

Returns:

  • pdarray or Strings – Array of values cast to desired dtype

  • [validity (pdarray(bool)]) – If errors=”return_validity” and input is Strings, a second array is returned with True where the cast succeeded and False where it failed.

Notes

The cast is performed according to Chapel’s casting rules and is NOT safe from overflows or underflows. The user must ensure that the target dtype has the precision and capacity to hold the desired result.

Examples

>>> ak.cast(ak.linspace(1.0,5.0,5), dt=ak.int64)
array([1 2 3 4 5])
>>> ak.cast(ak.arange(0,5), dt=ak.float64).dtype
dtype('float64')
>>> ak.cast(ak.arange(0,5), dt=ak.bool_)
array([False True True True True])
>>> ak.cast(ak.linspace(0,4,5), dt=ak.bool_)
array([False True True True True])
arkouda.numpy.ceil(pda: arkouda.numpy.pdarrayclass.pdarray) arkouda.numpy.pdarrayclass.pdarray[source]

Return the element-wise ceiling of the array.

Parameters:

pda (pdarray)

Returns:

A pdarray containing ceiling values of the input array elements

Return type:

pdarray

Raises:

TypeError – Raised if the parameter is not a pdarray

Examples

>>> ak.ceil(ak.linspace(1.1,5.5,5))
array([2.00000000000000000 3.00000000000000000 4.00000000000000000
5.00000000000000000 6.00000000000000000])
arkouda.numpy.clear() None[source]

Send a clear message to clear all unregistered data from the server symbol table

Return type:

None

Raises:

RuntimeError – Raised if there is a server-side error in executing clear request

arkouda.numpy.clip(pda: arkouda.numpy.pdarrayclass.pdarray, lo: arkouda.numpy.dtypes.numeric_scalars | arkouda.numpy.pdarrayclass.pdarray, hi: arkouda.numpy.dtypes.numeric_scalars | arkouda.numpy.pdarrayclass.pdarray) arkouda.numpy.pdarrayclass.pdarray[source]

Clip (limit) the values in an array to a given range [lo,hi]

Given an array a, values outside the range are clipped to the range edges, such that all elements lie in the range.

There is no check to enforce that lo < hi. If lo > hi, the corresponding value of the array will be set to hi.

If lo or hi (or both) are pdarrays, the check is by pairwise elements. See examples.

Parameters:
  • pda (pdarray) – the array of values to clip

  • lo (numeric_scalars or pdarray) – the lower value of the clipping range

  • hi (numeric_scalars or pdarray) – the higher value of the clipping range If lo or hi (or both) are pdarrays, the check is by pairwise elements. See examples.

Returns:

A pdarray matching pda, except that element x remains x if lo <= x <= hi,

or becomes lo if x < lo, or becomes hi if x > hi.

Return type:

arkouda.numpy.pdarrayclass.pdarray

Examples

>>> a = ak.array([1,2,3,4,5,6,7,8,9,10])
>>> ak.clip(a,3,8)
array([3 3 3 4 5 6 7 8 8 8])
>>> ak.clip(a,3,8.0)
array([3.00000000000000000 3.00000000000000000 3.00000000000000000 4.00000000000000000
       5.00000000000000000 6.00000000000000000 7.00000000000000000 8.00000000000000000
       8.00000000000000000 8.00000000000000000])
>>> ak.clip(a,None,7)
array([1 2 3 4 5 6 7 7 7 7])
>>> ak.clip(a,5,None)
array([5 5 5 5 5 6 7 8 9 10])
>>> ak.clip(a,None,None)
ValueError: Either min or max must be supplied.
>>> ak.clip(a,ak.array([2,2,3,3,8,8,5,5,6,6]),8)
array([2 2 3 4 8 8 7 8 8 8])
>>> ak.clip(a,4,ak.array([10,9,8,7,6,5,5,5,5,5]))
array([4 4 4 4 5 5 5 5 5 5])

Notes

Either lo or hi may be None, but not both. If lo > hi, all x = hi. If all inputs are int64, output is int64, but if any input is float64, output is float64.

Raises:

ValueError – Raised if both lo and hi are None

arkouda.numpy.clz(pda: pdarray) pdarray[source]

Count leading zeros for each integer in an array.

Parameters:

pda (pdarray, int64, uint64, bigint) – Input array (must be integral).

Returns:

lz – The number of leading zeros of each element.

Return type:

pdarray

Raises:

TypeError – If input array is not int64, uint64, or bigint

Examples

>>> A = ak.arange(10)
>>> ak.clz(A)
array([64, 63, 62, 62, 61, 61, 61, 61, 60, 60])
arkouda.numpy.coargsort(arrays: Sequence[arkouda.numpy.strings.Strings | arkouda.numpy.pdarrayclass.pdarray | arkouda.categorical.Categorical], algorithm: SortingAlgorithm = SortingAlgorithm.RadixSortLSD) arkouda.numpy.pdarrayclass.pdarray[source]

Return the permutation that groups the rows (left-to-right), if the input arrays are treated as columns. The permutation sorts numeric columns, but not strings/Categoricals – strings/Categoricals are grouped, but not ordered.

Parameters:
  • arrays (Sequence of Strings, pdarray, or Categorical) – The columns (int64, uint64, float64, Strings, or Categorical) to sort by row

  • algorithm (SortingAlgorithm, default=SortingAlgorithm.RadixSortLSD) – The algorithm to be used for sorting the arrays.

Returns:

The indices that permute the rows to grouped order

Return type:

pdarray of int64

Raises:

ValueError – Raised if the pdarrays are not of the same size or if the parameter is not an Iterable containing pdarrays, Strings, or Categoricals

See also

argsort

Notes

Uses a least-significant-digit radix sort, which is stable and resilient to non-uniformity in data but communication intensive. Starts with the last array and moves forward. This sort operates directly on numeric types, but for Strings, it operates on a hash. Thus, while grouping of equivalent strings is guaranteed, lexicographic ordering of the groups is not. For Categoricals, coargsort sorts based on Categorical.codes which guarantees grouping of equivalent categories but not lexicographic ordering of those groups.

Examples

>>> a = ak.array([0, 1, 0, 1])
>>> b = ak.array([1, 1, 0, 0])
>>> perm = ak.coargsort([a, b])
>>> perm
array([2 0 3 1])
>>> a[perm]
array([0 0 1 1])
>>> b[perm]
array([0 1 0 1])
class arkouda.numpy.complex128(value)

Bases: numpy.complexfloating

Complex number type composed of two double-precision floating-point

numbers, compatible with Python complex.

Character code:

'D'

Canonical name:

numpy.cdouble

Alias:

numpy.cfloat

Alias:

numpy.complex_

Alias on this platform (Linux x86_64):

numpy.complex128: Complex number type composed of 2 64-bit-precision floating-point numbers.

class arkouda.numpy.complex64(value)

Bases: numpy.complexfloating

Complex number type composed of two single-precision floating-point

numbers.

Character code:

'F'

Canonical name:

numpy.csingle

Alias:

numpy.singlecomplex

Alias on this platform (Linux x86_64):

numpy.complex64: Complex number type composed of 2 32-bit-precision floating-point numbers.

arkouda.numpy.concatenate(arrays: Sequence[arkouda.numpy.pdarrayclass.pdarray | arkouda.numpy.strings.Strings | arkouda.categorical.Categorical], ordered: bool = True) arkouda.numpy.pdarrayclass.pdarray | arkouda.numpy.strings.Strings | arkouda.categorical.Categorical | Sequence[arkouda.categorical.Categorical][source]

Concatenate a list or tuple of pdarray or Strings objects into one pdarray or Strings object, respectively.

Parameters:
  • arrays (Sequence[Union[pdarray,Strings,Categorical]]) – The arrays to concatenate. Must all have same dtype.

  • ordered (bool) – If True (default), the arrays will be appended in the order given. If False, array data may be interleaved in blocks, which can greatly improve performance but results in non-deterministic ordering of elements.

Returns:

Single pdarray or Strings object containing all values, returned in the original order

Return type:

Union[pdarray,Strings,Categorical]

Raises:
  • ValueError – Raised if arrays is empty or if pdarrays have differing dtypes

  • TypeError – Raised if arrays is not a pdarrays or Strings python Sequence such as a list or tuple

  • RuntimeError – Raised if any array elements are dtypes for which concatenate has not been implemented.

Examples

>>> ak.concatenate([ak.array([1, 2, 3]), ak.array([4, 5, 6])])
array([1 2 3 4 5 6])
>>> ak.concatenate([ak.array([True,False,True]),ak.array([False,True,True])])
array([True False True False True True])
>>> ak.concatenate([ak.array(['one','two']),ak.array(['three','four','five'])])
array(['one', 'two', 'three', 'four', 'five'])
arkouda.numpy.concatenate(arrays: Sequence[arkouda.numpy.pdarrayclass.pdarray | arkouda.numpy.strings.Strings | arkouda.categorical.Categorical], ordered: bool = True) arkouda.numpy.pdarrayclass.pdarray | arkouda.numpy.strings.Strings | arkouda.categorical.Categorical | Sequence[arkouda.categorical.Categorical][source]

Concatenate a list or tuple of pdarray or Strings objects into one pdarray or Strings object, respectively.

Parameters:
  • arrays (Sequence[Union[pdarray,Strings,Categorical]]) – The arrays to concatenate. Must all have same dtype.

  • ordered (bool) – If True (default), the arrays will be appended in the order given. If False, array data may be interleaved in blocks, which can greatly improve performance but results in non-deterministic ordering of elements.

Returns:

Single pdarray or Strings object containing all values, returned in the original order

Return type:

Union[pdarray,Strings,Categorical]

Raises:
  • ValueError – Raised if arrays is empty or if pdarrays have differing dtypes

  • TypeError – Raised if arrays is not a pdarrays or Strings python Sequence such as a list or tuple

  • RuntimeError – Raised if any array elements are dtypes for which concatenate has not been implemented.

Examples

>>> ak.concatenate([ak.array([1, 2, 3]), ak.array([4, 5, 6])])
array([1 2 3 4 5 6])
>>> ak.concatenate([ak.array([True,False,True]),ak.array([False,True,True])])
array([True False True False True True])
>>> ak.concatenate([ak.array(['one','two']),ak.array(['three','four','five'])])
array(['one', 'two', 'three', 'four', 'five'])
arkouda.numpy.corr(x: pdarray, y: pdarray) numpy.float64[source]

Return the correlation between x and y

Parameters:
  • x (pdarray) – One of the pdarrays used to calculate correlation

  • y (pdarray) – One of the pdarrays used to calculate correlation

Returns:

The scalar correlation of the two pdarrays

Return type:

np.float64

Examples

>>> a = ak.arange(10)
>>> b = a + 1
>>> ak.corr(a,b)
0.9999999999999998
>>> a.corr(b)
0.9999999999999998
Raises:
  • TypeError – Raised if x or y is not a pdarray instance

  • RuntimeError – Raised if there’s a server-side error thrown

See also

std, cov

Notes

The correlation is calculated by cov(x, y) / (x.std(ddof=1) * y.std(ddof=1))

arkouda.numpy.cos(pda: arkouda.numpy.pdarrayclass.pdarray, where: bool | arkouda.numpy.pdarrayclass.pdarray = True) arkouda.numpy.pdarrayclass.pdarray[source]

Return the element-wise cosine of the array.

Parameters:
  • pda (pdarray)

  • where (bool or pdarray, default=True) – This condition is broadcast over the input. At locations where the condition is True, the cosine will be applied to the corresponding value. Elsewhere, it will retain its original value. Default set to True.

Returns:

A pdarray containing cosine for each element of the original pdarray

Return type:

pdarray

Raises:

TypeError – Raised if the parameter is not a pdarray

arkouda.numpy.cosh(pda: arkouda.numpy.pdarrayclass.pdarray, where: bool | arkouda.numpy.pdarrayclass.pdarray = True) arkouda.numpy.pdarrayclass.pdarray[source]

Return the element-wise hyperbolic cosine of the array.

Parameters:
  • pda (pdarray)

  • where (bool or pdarray, default=True) – This condition is broadcast over the input. At locations where the condition is True, the hyperbolic cosine will be applied to the corresponding value. Elsewhere, it will retain its original value. Default set to True.

Returns:

A pdarray containing hyperbolic cosine for each element of the original pdarray

Return type:

pdarray

Raises:

TypeError – Raised if the parameter is not a pdarray

arkouda.numpy.count_nonzero(pda: arkouda.numpy.pdarrayclass.pdarray) numpy.int64[source]

Compute the nonzero count of a given array. 1D case only, for now.

Parameters:

pda (pdarray) – The input data, in pdarray form, numeric, bool, or str

Returns:

The nonzero count of the entire pdarray

Return type:

np.int64

Raises:
  • TypeError – Raised if the parameter is not a pdarray with numeric, bool, or str datatype

  • ValueError – Raised if sum applied to the pdarray doesn’t come back with a scalar

Examples

>>> pda = ak.array([0,4,7,8,1,3,5,2,-1])
>>> ak.count_nonzero(pda)
8
>>> pda = ak.array([False,True,False,True,False])
>>> ak.count_nonzero(pda)
2
>>> pda = ak.array(["hello","","there"])
>>> ak.count_nonzero(pda)
2
arkouda.numpy.cov(x: pdarray, y: pdarray) numpy.float64[source]

Return the covariance of x and y

Parameters:
  • x (pdarray) – One of the pdarrays used to calculate covariance

  • y (pdarray) – One of the pdarrays used to calculate covariance

Returns:

The scalar covariance of the two pdarrays

Return type:

np.float64

Examples

>>> a = ak.arange(10)
>>> b = a + 1
>>> ak.cov(a,b)
9.166666666666666
>>> a.cov(b)
9.166666666666666
Raises:
  • TypeError – Raised if x or y is not a pdarray instance

  • RuntimeError – Raised if there’s a server-side error thrown

See also

mean, var

Notes

The covariance is calculated by cov = ((x - x.mean()) * (y - y.mean())).sum() / (x.size - 1).

arkouda.numpy.create_pdarray(repMsg: str, max_bits=None) pdarray[source]

Return a pdarray instance pointing to an array created by the arkouda server. The user should not call this function directly.

Parameters:

repMsg (str) – space-delimited string containing the pdarray name, datatype, size dimension, shape,and itemsize

Returns:

A pdarray with the same attributes and data as the pdarray; on GPU

Return type:

pdarray

Raises:
  • ValueError – If there’s an error in parsing the repMsg parameter into the six values needed to create the pdarray instance

  • RuntimeError – Raised if a server-side error is thrown in the process of creating the pdarray instance

arkouda.numpy.create_pdarray(repMsg: str, max_bits=None) pdarray[source]

Return a pdarray instance pointing to an array created by the arkouda server. The user should not call this function directly.

Parameters:

repMsg (str) – space-delimited string containing the pdarray name, datatype, size dimension, shape,and itemsize

Returns:

A pdarray with the same attributes and data as the pdarray; on GPU

Return type:

pdarray

Raises:
  • ValueError – If there’s an error in parsing the repMsg parameter into the six values needed to create the pdarray instance

  • RuntimeError – Raised if a server-side error is thrown in the process of creating the pdarray instance

arkouda.numpy.ctz(pda: pdarray) pdarray[source]

Count trailing zeros for each integer in an array.

Parameters:

pda (pdarray, int64, uint64, bigint) – Input array (must be integral).

Returns:

lz – The number of trailing zeros of each element.

Return type:

pdarray

Notes

ctz(0) is defined to be zero.

Raises:

TypeError – If input array is not int64, uint64, or bigint

Examples

>>> A = ak.arange(10)
>>> ak.ctz(A)
array([0, 0, 1, 0, 2, 0, 1, 0, 3, 0])
arkouda.numpy.cumprod(pda: arkouda.numpy.pdarrayclass.pdarray) arkouda.numpy.pdarrayclass.pdarray[source]

Return the cumulative product over the array.

The product is inclusive, such that the i th element of the result is the product of elements up to and including i.

Parameters:

pda (pdarray)

Returns:

A pdarray containing cumulative products for each element of the original pdarray

Return type:

pdarray

Raises:

TypeError – Raised if the parameter is not a pdarray

Examples

>>> ak.cumprod(ak.arange(1,5))
array([1 2 6 24])
>>> ak.cumprod(ak.uniform(5,1.0,5.0))
array([1.5728783400481925 7.0472855509390593 33.78523998586553
       134.05309592737584 450.21589865655358])
arkouda.numpy.cumsum(pda: arkouda.numpy.pdarrayclass.pdarray) arkouda.numpy.pdarrayclass.pdarray[source]

Return the cumulative sum over the array.

The sum is inclusive, such that the i th element of the result is the sum of elements up to and including i.

Parameters:

pda (pdarray)

Returns:

A pdarray containing cumulative sums for each element of the original pdarray

Return type:

pdarray

Raises:

TypeError – Raised if the parameter is not a pdarray

Examples

>>> ak.cumsum(ak.arange(1,5))
array([1 3 6 10])
>>> ak.cumsum(ak.uniform(5,1.0,5.0))
array([3.1598310770203937 5.4110385860243131 9.1622479306453748
       12.710615785506533 13.945880905466208])
>>> ak.cumsum(ak.randint(0, 1, 5, dtype=ak.bool_))
array([0 1 1 2 3])
arkouda.numpy.date_range(start=None, end=None, periods=None, freq=None, tz=None, normalize=False, name=None, closed=None, inclusive='both', **kwargs)[source]

Creates a fixed frequency Datetime range. Alias for ak.Datetime(pd.date_range(args)). Subject to size limit imposed by client.maxTransferBytes.

Parameters:
  • start (str or datetime-like, optional) – Left bound for generating dates.

  • end (str or datetime-like, optional) – Right bound for generating dates.

  • periods (int, optional) – Number of periods to generate.

  • freq (str or DateOffset, default 'D') – Frequency strings can have multiples, e.g. ‘5H’. See timeseries.offset_aliases for a list of frequency aliases.

  • tz (str or tzinfo, optional) – Time zone name for returning localized DatetimeIndex, for example ‘Asia/Hong_Kong’. By default, the resulting DatetimeIndex is timezone-naive.

  • normalize (bool, default False) – Normalize start/end dates to midnight before generating date range.

  • name (str, default None) – Name of the resulting DatetimeIndex.

  • closed ({None, 'left', 'right'}, optional) – Make the interval closed with respect to the given frequency to the ‘left’, ‘right’, or both sides (None, the default). Deprecated

  • inclusive ({"both", "neither", "left", "right"}, default "both") – Include boundaries. Whether to set each bound as closed or open.

  • **kwargs – For compatibility. Has no effect on the result.

Returns:

rng

Return type:

DatetimeIndex

Notes

Of the four parameters start, end, periods, and freq, exactly three must be specified. If freq is omitted, the resulting DatetimeIndex will have periods linearly spaced elements between start and end (closed on both sides).

To learn more about the frequency strings, please see this link.

arkouda.numpy.deg2rad(pda: arkouda.numpy.pdarrayclass.pdarray, where: bool | arkouda.numpy.pdarrayclass.pdarray = True) arkouda.numpy.pdarrayclass.pdarray[source]

Converts angles element-wise from degrees to radians.

Parameters:
  • pda (pdarray)

  • where (bool or pdarray, default=True) – This condition is broadcast over the input. At locations where the condition is True, the corresponding value will be converted from degrees to radians. Elsewhere, it will retain its original value. Default set to True.

Returns:

A pdarray containing an angle converted to radians, from degrees, for each element of the original pdarray

Return type:

pdarray

Raises:

TypeError – Raised if the parameter is not a pdarray

arkouda.numpy.delete(arr: arkouda.numpy.pdarrayclass.pdarray, obj: arkouda.numpy.pdarrayclass.pdarray | slice | int, axis: int | None = None) arkouda.numpy.pdarrayclass.pdarray[source]

Return a copy of ‘arr’ with elements along the specified axis removed.

Parameters:
  • arr (pdarray) – The array to remove elements from

  • obj (Union[pdarray, slice, int]) – The indices to remove from ‘arr’. If obj is a pdarray, it must have an integer dtype.

  • axis (Optional[int], optional) – The axis along which to remove elements. If None, the array will be flattened before removing elements. Defaults to None.

Returns:

A copy of ‘arr’ with elements removed

Return type:

pdarray

arkouda.numpy.divmod(x: arkouda.numpy.dtypes.numeric_scalars | pdarray, y: arkouda.numpy.dtypes.numeric_scalars | pdarray, where: arkouda.numpy.dtypes.bool_scalars | pdarray = True) Tuple[pdarray, pdarray][source]
Parameters:
  • x (numeric_scalars(float_scalars, int_scalars) or pdarray) – The dividend array, the values that will be the numerator of the floordivision and will be acted on by the bases for modular division.

  • y (numeric_scalars(float_scalars, int_scalars) or pdarray) – The divisor array, the values that will be the denominator of the division and will be the bases for the modular division.

  • where (Boolean or pdarray) – This condition is broadcast over the input. At locations where the condition is True, the corresponding value will be divided using floor and modular division. Elsewhere, it will retain its original value. Default set to True.

Returns:

Returns a tuple that contains quotient and remainder of the division

Return type:

(pdarray, pdarray)

Raises:
  • TypeError – At least one entry must be a pdarray

  • ValueError – If both inputs are both pdarrays, their size must match

  • ZeroDivisionError – No entry in y is allowed to be 0, to prevent division by zero

Notes

The div is calculated by x // y The mod is calculated by x % y

Examples

>>> x = ak.arange(5, 10)
>>> y = ak.array([2, 1, 4, 5, 8])
>>> ak.divmod(x,y)
(array([2 6 1 1 1]), array([1 0 3 3 1]))
>>> ak.divmod(x,y, x % 2 == 0)
(array([5 6 7 1 9]), array([5 0 7 3 9]))
arkouda.numpy.dot(pda1: numpy.int64 | numpy.float64 | numpy.uint64 | pdarray, pda2: numpy.int64 | numpy.float64 | numpy.uint64 | pdarray) arkouda.numpy.dtypes.numpy_scalars | pdarray[source]

Returns the sum of the elementwise product of two arrays of the same size (the dot product) or the product of a singleton element and an array.

Parameters:
Returns:

The sum of the elementwise product pda1 and pda2 or the product of a singleton element and an array.

Return type:

Union[numeric_scalars, pdarray]

Raises:

ValueError – Raised if the size of pda1 is not the same as pda2

Examples

>>> x = ak.array([2, 3])
>>> y = ak.array([4, 5])
>>> ak.dot(x,y)
23
>>> ak.dot(x,2)
array([4 6])
arkouda.numpy.dtype(dtype)[source]

Create a data type object.

Parameters:

dtype (object) – Object to be converted to a data type object.

Return type:

type

arkouda.numpy.exp(pda: arkouda.numpy.pdarrayclass.pdarray) arkouda.numpy.pdarrayclass.pdarray[source]

Return the element-wise exponential of the array.

Parameters:

pda (pdarray)

Returns:

A pdarray containing exponential values of the input array elements

Return type:

pdarray

Raises:

TypeError – Raised if the parameter is not a pdarray

Examples

>>> ak.exp(ak.arange(1,5))
array([2.7182818284590451 7.3890560989306504 20.085536923187668 54.598150033144236])
>>> ak.exp(ak.uniform(5,1.0,5.0))
array([11.84010843172504 46.454368507659211 5.5571769623557188
       33.494295836924771 13.478894913238722])
arkouda.numpy.expm1(pda: arkouda.numpy.pdarrayclass.pdarray) arkouda.numpy.pdarrayclass.pdarray[source]

Return the element-wise exponential of the array minus one.

Parameters:

pda (pdarray)

Returns:

A pdarray containing e raised to each of the inputs, then subtracting one.

Return type:

pdarray

Raises:

TypeError – Raised if the parameter is not a pdarray

Examples

>>> ak.expm1(ak.arange(1,5))
array([1.7182818284590451 6.3890560989306504 19.085536923187668 53.598150033144236])
>>> ak.expm1(ak.uniform(5,1.0,5.0))
array([10.84010843172504 45.454368507659211 4.5571769623557188
       32.494295836924771 12.478894913238722])
arkouda.numpy.eye(rows: arkouda.numpy.dtypes.int_scalars, cols: arkouda.numpy.dtypes.int_scalars, diag: arkouda.numpy.dtypes.int_scalars = 0, dt: type = ak_int64) arkouda.numpy.pdarrayclass.pdarray[source]

Return a pdarray with zeros everywhere except along a diagonal, which is all ones. The matrix need not be square.

Parameters:
  • rows (int_scalars)

  • cols (int_scalars)

  • diag (int_scalars, default=0) –

    if diag = 0, zeros start at element [0,0] and proceed along diagonal
    if diag > 0, zeros start at element [0,diag] and proceed along diagonal
    if diag < 0, zeros start at element [diag,0] and proceed along diagonal
    etc. Default set to 0.

  • dt (type, default=ak_int64) – The data type of the elements in the matrix being returned. Default set to ak_int64

Returns:

an array of zeros with ones along the specified diagonal

Return type:

pdarray

Examples

>>> ak.eye(rows=4,cols=4,diag=0,dt=ak.int64)
array([array([1 0 0 0]) array([0 1 0 0]) array([0 0 1 0]) array([0 0 0 1])])
>>> ak.eye(rows=3,cols=3,diag=1,dt=ak.float64)
array([array([0.00000000000000000 1.00000000000000000 0.00000000000000000])
array([0.00000000000000000 0.00000000000000000 1.00000000000000000])
array([0.00000000000000000 0.00000000000000000 0.00000000000000000])])
>>> ak.eye(rows=4,cols=4,diag=-1,dt=ak.bool_)
array([array([False False False False]) array([True False False False])
array([False True False False]) array([False False True False])])

Notes

if rows = cols and diag = 0, the result is an identity matrix Server returns an error if rank of pda < 2

arkouda.numpy.flip(x: arkouda.numpy.pdarrayclass.pdarray | arkouda.numpy.strings.Strings | arkouda.categorical.Categorical, /, *, axis: int | Tuple[int, Ellipsis] | None = None) arkouda.numpy.pdarrayclass.pdarray | arkouda.numpy.strings.Strings | arkouda.categorical.Categorical[source]

Reverse an array’s values along a particular axis or axes.

Parameters:
  • x (pdarray, Strings, or Categorical) –

    Reverse the order of elements in an array along the given axis.

    The shape of the array is preserved, but the elements are reordered.

  • axis (int or Tuple[int, ...], optional) – The axis or axes along which to flip the array. If None, flip the array along all axes.

Returns:

An array with the entries of axis reversed.

Return type:

pdarray, Strings, or Categorical

Note

This differs from numpy as it actually reverses the data, rather than presenting a view.

class arkouda.numpy.float16(value)

Bases: numpy.floating

Half-precision floating-point number type.

Character code:

'e'

Canonical name:

numpy.half

Alias on this platform (Linux x86_64):

numpy.float16: 16-bit-precision floating-point number type: sign bit, 5 bits exponent, 10 bits mantissa.

as_integer_ratio(*args, **kwargs)

half.as_integer_ratio() -> (int, int)

Return a pair of integers, whose ratio is exactly equal to the original floating point number, and with a positive denominator. Raise OverflowError on infinities and a ValueError on NaNs.

>>> np.half(10.0).as_integer_ratio()
(10, 1)
>>> np.half(0.0).as_integer_ratio()
(0, 1)
>>> np.half(-.25).as_integer_ratio()
(-1, 4)
is_integer(*args, **kwargs)

half.is_integer() -> bool

Return True if the floating point number is finite with integral value, and False otherwise.

Added in version 1.22.

>>> np.half(-2.0).is_integer()
True
>>> np.half(3.2).is_integer()
False
class arkouda.numpy.float32(value)

Bases: numpy.floating

Single-precision floating-point number type, compatible with C float.

Character code:

'f'

Canonical name:

numpy.single

Alias on this platform (Linux x86_64):

numpy.float32: 32-bit-precision floating-point number type: sign bit, 8 bits exponent, 23 bits mantissa.

as_integer_ratio(*args, **kwargs)

single.as_integer_ratio() -> (int, int)

Return a pair of integers, whose ratio is exactly equal to the original floating point number, and with a positive denominator. Raise OverflowError on infinities and a ValueError on NaNs.

>>> np.single(10.0).as_integer_ratio()
(10, 1)
>>> np.single(0.0).as_integer_ratio()
(0, 1)
>>> np.single(-.25).as_integer_ratio()
(-1, 4)
is_integer(*args, **kwargs)

single.is_integer() -> bool

Return True if the floating point number is finite with integral value, and False otherwise.

Added in version 1.22.

>>> np.single(-2.0).is_integer()
True
>>> np.single(3.2).is_integer()
False
class arkouda.numpy.float64(value)

Bases: numpy.floating

Double-precision floating-point number type, compatible with Python float

and C double.

Character code:

'd'

Canonical name:

numpy.double

Alias:

numpy.float_

Alias on this platform (Linux x86_64):

numpy.float64: 64-bit precision floating-point number type: sign bit, 11 bits exponent, 52 bits mantissa.

as_integer_ratio(*args, **kwargs)

double.as_integer_ratio() -> (int, int)

Return a pair of integers, whose ratio is exactly equal to the original floating point number, and with a positive denominator. Raise OverflowError on infinities and a ValueError on NaNs.

>>> np.double(10.0).as_integer_ratio()
(10, 1)
>>> np.double(0.0).as_integer_ratio()
(0, 1)
>>> np.double(-.25).as_integer_ratio()
(-1, 4)
fromhex(string, /)

Create a floating-point number from a hexadecimal string.

>>> float.fromhex('0x1.ffffp10')
2047.984375
>>> float.fromhex('-0x1p-1074')
-5e-324
hex(/)

Return a hexadecimal representation of a floating-point number.

>>> (-0.1).hex()
'-0x1.999999999999ap-4'
>>> 3.14159.hex()
'0x1.921f9f01b866ep+1'
is_integer(*args, **kwargs)

double.is_integer() -> bool

Return True if the floating point number is finite with integral value, and False otherwise.

Added in version 1.22.

>>> np.double(-2.0).is_integer()
True
>>> np.double(3.2).is_integer()
False
class arkouda.numpy.float_scalars(origin, params, *, inst=True, name=None)

Bases: _GenericAlias

The central part of internal API.

This represents a generic version of type ‘origin’ with type arguments ‘params’. There are two kind of these aliases: user defined and special. The special ones are wrappers around builtin collections and ABCs in collections.abc. These must have ‘name’ always set. If ‘inst’ is False, then the alias can’t be instantiated, this is used by e.g. typing.List and typing.Dict.

arkouda.numpy.floor(pda: arkouda.numpy.pdarrayclass.pdarray) arkouda.numpy.pdarrayclass.pdarray[source]

Return the element-wise floor of the array.

Parameters:

pda (pdarray)

Returns:

A pdarray containing floor values of the input array elements

Return type:

pdarray

Raises:

TypeError – Raised if the parameter is not a pdarray

Examples

>>> ak.floor(ak.linspace(1.1,5.5,5))
array([1.00000000000000000 2.00000000000000000 3.00000000000000000
4.00000000000000000 5.00000000000000000])
arkouda.numpy.fmod(dividend: pdarray | arkouda.numpy.dtypes.numeric_scalars, divisor: pdarray | arkouda.numpy.dtypes.numeric_scalars) pdarray[source]

Returns the element-wise remainder of division.

It is equivalent to np.fmod, the remainder has the same sign as the dividend.

Parameters:
  • dividend (numeric scalars or pdarray) – The array being acted on by the bases for the modular division.

  • divisor (numeric scalars or pdarray) – The array that will be the bases for the modular division.

Returns:

an array that contains the element-wise remainder of division.

Return type:

pdarray

Raises:

TypeError – Raised if neither dividend nor divisor is a pdarray (at least one must be) or if any scalar or pdarray element is not one of int, uint, float, bigint

arkouda.numpy.from_series(series: pandas.Series, dtype: type | str | None = None) arkouda.numpy.pdarrayclass.pdarray | arkouda.numpy.strings.Strings[source]

Converts a Pandas Series to an Arkouda pdarray or Strings object. If dtype is None, the dtype is inferred from the Pandas Series. Otherwise, the dtype parameter is set if the dtype of the Pandas Series is to be overridden or is unknown (for example, in situations where the Series dtype is object).

Parameters:
  • series (Pandas Series) – The Pandas Series with a dtype of bool, float64, int64, or string

  • dtype (Optional[type]) – The valid dtype types are np.bool, np.float64, np.int64, and np.str

Return type:

Union[pdarray,Strings]

Raises:
  • TypeError – Raised if series is not a Pandas Series object

  • ValueError – Raised if the Series dtype is not bool, float64, int64, string, datetime, or timedelta

Examples

>>> np.random.seed(1701)
>>> ak.from_series(pd.Series(np.random.randint(0,10,5)))
array([4 3 3 5 0])
>>> ak.from_series(pd.Series(['1', '2', '3', '4', '5']),dtype=np.int64)
array([1 2 3 4 5])
>>> np.random.seed(1701)
>>> ak.from_series(pd.Series(np.random.uniform(low=0.0,high=1.0,size=3)))
array([0.089433234324597599 0.1153776854774361 0.51874393620990389])
>>> ak.from_series(pd.Series(['0.57600036956445599', '0.41619265571741659',
                   '0.6615356693784662']), dtype=np.float64)
array([0.57600036956445599 0.41619265571741659 0.6615356693784662])
>>> np.random.seed(1864)
>>> ak.from_series(pd.Series(np.random.choice([True, False],size=5)))
array([True True True False False])
>>> ak.from_series(pd.Series(['True', 'False', 'False', 'True', 'True']), dtype=bool)
array([True True True True True])
>>> ak.from_series(pd.Series(['a', 'b', 'c', 'd', 'e'], dtype="string"))
array(['a', 'b', 'c', 'd', 'e'])
>>> ak.from_series(pd.Series(pd.to_datetime(['1/1/2018', np.datetime64('2018-01-01')])))
array([1514764800000000000 1514764800000000000])

Notes

The supported datatypes are bool, float64, int64, string, and datetime64[ns]. The data type is either inferred from the the Series or is set via the dtype parameter.

Series of datetime or timedelta are converted to Arkouda arrays of dtype int64 (nanoseconds)

A Pandas Series containing strings has a dtype of object. Arkouda assumes the Series contains strings and sets the dtype to str

arkouda.numpy.from_series(series: pandas.Series, dtype: type | str | None = None) arkouda.numpy.pdarrayclass.pdarray | arkouda.numpy.strings.Strings[source]

Converts a Pandas Series to an Arkouda pdarray or Strings object. If dtype is None, the dtype is inferred from the Pandas Series. Otherwise, the dtype parameter is set if the dtype of the Pandas Series is to be overridden or is unknown (for example, in situations where the Series dtype is object).

Parameters:
  • series (Pandas Series) – The Pandas Series with a dtype of bool, float64, int64, or string

  • dtype (Optional[type]) – The valid dtype types are np.bool, np.float64, np.int64, and np.str

Return type:

Union[pdarray,Strings]

Raises:
  • TypeError – Raised if series is not a Pandas Series object

  • ValueError – Raised if the Series dtype is not bool, float64, int64, string, datetime, or timedelta

Examples

>>> np.random.seed(1701)
>>> ak.from_series(pd.Series(np.random.randint(0,10,5)))
array([4 3 3 5 0])
>>> ak.from_series(pd.Series(['1', '2', '3', '4', '5']),dtype=np.int64)
array([1 2 3 4 5])
>>> np.random.seed(1701)
>>> ak.from_series(pd.Series(np.random.uniform(low=0.0,high=1.0,size=3)))
array([0.089433234324597599 0.1153776854774361 0.51874393620990389])
>>> ak.from_series(pd.Series(['0.57600036956445599', '0.41619265571741659',
                   '0.6615356693784662']), dtype=np.float64)
array([0.57600036956445599 0.41619265571741659 0.6615356693784662])
>>> np.random.seed(1864)
>>> ak.from_series(pd.Series(np.random.choice([True, False],size=5)))
array([True True True False False])
>>> ak.from_series(pd.Series(['True', 'False', 'False', 'True', 'True']), dtype=bool)
array([True True True True True])
>>> ak.from_series(pd.Series(['a', 'b', 'c', 'd', 'e'], dtype="string"))
array(['a', 'b', 'c', 'd', 'e'])
>>> ak.from_series(pd.Series(pd.to_datetime(['1/1/2018', np.datetime64('2018-01-01')])))
array([1514764800000000000 1514764800000000000])

Notes

The supported datatypes are bool, float64, int64, string, and datetime64[ns]. The data type is either inferred from the the Series or is set via the dtype parameter.

Series of datetime or timedelta are converted to Arkouda arrays of dtype int64 (nanoseconds)

A Pandas Series containing strings has a dtype of object. Arkouda assumes the Series contains strings and sets the dtype to str

arkouda.numpy.full(size: arkouda.numpy.dtypes.int_scalars | Tuple[arkouda.numpy.dtypes.int_scalars, Ellipsis] | str, fill_value: arkouda.numpy.dtypes.numeric_scalars | str, dtype: numpy.dtype | type | str | arkouda.numpy.dtypes.bigint = float64, max_bits: int | None = None) arkouda.numpy.pdarrayclass.pdarray | arkouda.numpy.strings.Strings[source]

Create a pdarray filled with fill_value.

Parameters:
  • size (int_scalars or tuple of int_scalars) – Size or shape of the array

  • fill_value (int_scalars or str) – Value with which the array will be filled

  • dtype (all_scalars) – Resulting array type, default float64

  • max_bits (int) – Specifies the maximum number of bits; only used for bigint pdarrays

Returns:

array of the requested size and dtype filled with fill_value

Return type:

pdarray or Strings

Raises:
  • TypeError – Raised if the supplied dtype is not supported

  • RuntimeError – Raised if the size parameter is neither an int nor a str that is parseable to an int.

  • ValueError – Raised if the rank of the given shape is not in get_array_ranks() or is empty Raised if max_bits is not NONE and ndim does not equal 1

See also

zeros, ones

Examples

>>> ak.full(5, 7, dtype=ak.int64)
array([7 7 7 7 7])
>>> ak.full(5, 9, dtype=ak.float64)
array([9.00000000000000000 9.00000000000000000 9.00000000000000000
       9.00000000000000000 9.00000000000000000])
>>> ak.full(5, 5, dtype=ak.bool_)
array([True True True True True])
arkouda.numpy.full_like(pda: arkouda.numpy.pdarrayclass.pdarray, fill_value: arkouda.numpy.dtypes.numeric_scalars) arkouda.numpy.pdarrayclass.pdarray | arkouda.numpy.strings.Strings[source]

Create a pdarray filled with fill_value of the same size and dtype as an existing pdarray.

Parameters:
  • pda (pdarray) – Array to use for size and dtype

  • fill_value (int_scalars) – Value with which the array will be filled

Returns:

Equivalent to ak.full(pda.size, fill_value, pda.dtype)

Return type:

pdarray

Raises:

TypeError – Raised if the pda parameter is not a pdarray.

See also

ones_like, zeros_like

Notes

Logic for generating the pdarray is delegated to the ak.full method. Accordingly, the supported dtypes match are defined by the ak.full method.

Examples

>>> ak.full_like(ak.full(5,7,dtype=ak.int64),6)
array([6 6 6 6 6])
>>> ak.full_like(ak.full(7,9,dtype=ak.float64),10)
array([10.00000000000000000 10.00000000000000000 10.00000000000000000
       10.00000000000000000 10.00000000000000000 10.00000000000000000 10.00000000000000000])
>>> ak.full_like(ak.full(5,True,dtype=ak.bool_),False)
array([False False False False False])
arkouda.numpy.getArkoudaLogger(name: str, handlers: List[logging.Handler] | None = None, logFormat: str | None = ArkoudaLogger.DEFAULT_LOG_FORMAT, logLevel: LogLevel | None = None) ArkoudaLogger[source]

A convenience method for instantiating an ArkoudaLogger that retrieves the logging level from the ARKOUDA_LOG_LEVEL env variable

Parameters:
  • name (str) – The name of the ArkoudaLogger

  • handlers (List[Handler]) – A list of logging.Handler objects, if None, a list consisting of one StreamHandler named ‘console-handler’ is generated and configured

  • logFormat (str) – The format for log messages, defaults to the following format: ‘[%(name)s] Line %(lineno)d %(levelname)s: %(message)s’

Return type:

ArkoudaLogger

Raises:

TypeError – Raised if either name or logFormat is not a str object or if handlers is not a list of str objects

Notes

Important note: if a list of 1..n logging.Handler objects is passed in, and dynamic changes to 1..n handlers is desired, set a name for each Handler object as follows: handler.name = <desired name>, which will enable retrieval and updates for the specified handler.

arkouda.numpy.get_byteorder(dt: np.dtype) str[source]

Get a concrete byteorder (turns ‘=’ into ‘<’ or ‘>’) on the client.

Parameters:

dt (np.dtype) – The numpy dtype to determine the byteorder of.

Returns:

Returns “<” for little endian and “>” for big endian.

Return type:

str

Raises:

ValueError – Returned if sys.byteorder is not “little” or “big”

Examples

>>> ak.get_byteorder(ak.dtype(ak.int64))
'<'
arkouda.numpy.get_server_byteorder() str[source]

Get the server’s byteorder

Returns:

Returns “little” for little endian and “big” for big endian.

Return type:

str

Raises:

ValueError – Raised if Server byteorder is not ‘little’ or ‘big’

Examples

>>> ak.get_server_byteorder()
'little'
arkouda.numpy.hash(pda: arkouda.numpy.pdarrayclass.pdarray | arkouda.numpy.strings.Strings | arkouda.numpy.segarray.SegArray | arkouda.categorical.Categorical | List[arkouda.numpy.pdarrayclass.pdarray | arkouda.numpy.strings.Strings | arkouda.numpy.segarray.SegArray | arkouda.categorical.Categorical], full: bool = True) Tuple[arkouda.numpy.pdarrayclass.pdarray, arkouda.numpy.pdarrayclass.pdarray] | arkouda.numpy.pdarrayclass.pdarray[source]

Return an element-wise hash of the array or list of arrays.

Parameters:
  • pda (pdarray, Strings, SegArray, or Categorical or List of pdarray, Strings, SegArray, or Categorical)

  • full (bool, default=True) – This is only used when a single pdarray is passed into hash By default, a 128-bit hash is computed and returned as two int64 arrays. If full=False, then a 64-bit hash is computed and returned as a single int64 array.

Returns:

If full=True or a list of pdarrays is passed, a 2-tuple of pdarrays containing the high and low 64 bits of each hash, respectively. If full=False and a single pdarray is passed, a single pdarray containing a 64-bit hash

Return type:

hashes

Raises:

TypeError – Raised if the parameter is not a pdarray

Notes

In the case of a single pdarray being passed, this function uses the SIPhash algorithm, which can output either a 64-bit or 128-bit hash. However, the 64-bit hash runs a significant risk of collisions when applied to more than a few million unique values. Unless the number of unique values is known to be small, the 128-bit hash is strongly recommended.

Note that this hash should not be used for security, or for any cryptographic application. Not only is SIPhash not intended for such uses, but this implementation employs a fixed key for the hash, which makes it possible for an adversary with control over input to engineer collisions.

In the case of a list of pdrrays, Strings, Categoricals, or Segarrays being passed, a non-linear function must be applied to each array since hashes of subsequent arrays cannot be simply XORed because equivalent values will cancel each other out, hence we do a rotation by the ordinal of the array.

arkouda.numpy.histogram(pda: arkouda.numpy.pdarrayclass.pdarray, bins: arkouda.numpy.dtypes.int_scalars = 10) Tuple[arkouda.numpy.pdarrayclass.pdarray, arkouda.numpy.pdarrayclass.pdarray][source]

Compute a histogram of evenly spaced bins over the range of an array.

Parameters:
  • pda (pdarray) – The values to histogram

  • bins (int_scalars, default=10) – The number of equal-size bins to use (default: 10)

Returns:

The number of values present in each bin and the bin edges

Return type:

(pdarray, Union[pdarray, int64 or float64])

Raises:
  • TypeError – Raised if the parameter is not a pdarray or if bins is not an int.

  • ValueError – Raised if bins < 1

  • NotImplementedError – Raised if pdarray dtype is bool or uint8

Notes

The bins are evenly spaced in the interval [pda.min(), pda.max()].

Examples

>>> import matplotlib.pyplot as plt
>>> A = ak.arange(0, 10, 1)
>>> nbins = 3
>>> h, b = ak.histogram(A, bins=nbins)
>>> h
array([3 3 4])
>>> b
array([0.00000000000000000 3.00000000000000000 6.00000000000000000 9.00000000000000000])
# To plot, export the left edges and the histogram to NumPy
>>> b_np = b.to_ndarray()
>>> import numpy as np
>>> b_widths = np.diff(b_np)
>>> plt.bar(b_np[:-1], h.to_ndarray(), width=b_widths, align='edge', edgecolor='black')
<BarContainer object of 3 artists>
>>> plt.show()
arkouda.numpy.histogram2d(x: arkouda.numpy.pdarrayclass.pdarray, y: arkouda.numpy.pdarrayclass.pdarray, bins: arkouda.numpy.dtypes.int_scalars | Sequence[arkouda.numpy.dtypes.int_scalars] = 10) Tuple[arkouda.numpy.pdarrayclass.pdarray, arkouda.numpy.pdarrayclass.pdarray, arkouda.numpy.pdarrayclass.pdarray][source]

Compute the bi-dimensional histogram of two data samples with evenly spaced bins

Parameters:
  • x (pdarray) – A pdarray containing the x coordinates of the points to be histogrammed.

  • y (pdarray) – A pdarray containing the y coordinates of the points to be histogrammed.

  • bins (int_scalars or [int, int], default=10) – The number of equal-size bins to use. If int, the number of bins for the two dimensions (nx=ny=bins). If [int, int], the number of bins in each dimension (nx, ny = bins). Defaults to 10

Returns:

  • hist (pdarray) – shape(nx, ny) The bi-dimensional histogram of samples x and y. Values in x are histogrammed along the first dimension and values in y are histogrammed along the second dimension.

  • x_edges (pdarray) – The bin edges along the first dimension.

  • y_edges (pdarray) – The bin edges along the second dimension.

Raises:
  • TypeError – Raised if x or y parameters are not pdarrays or if bins is not an int or (int, int).

  • ValueError – Raised if bins < 1

  • NotImplementedError – Raised if pdarray dtype is bool or uint8

See also

histogram

Notes

The x bins are evenly spaced in the interval [x.min(), x.max()] and y bins are evenly spaced in the interval [y.min(), y.max()].

Examples

>>> x = ak.arange(0, 10, 1)
>>> y = ak.arange(9, -1, -1)
>>> nbins = 3
>>> h, x_edges, y_edges = ak.histogram2d(x, y, bins=nbins)
>>> h
array([array([0.00000000000000000 0.00000000000000000 3.00000000000000000])
       array([0.00000000000000000 2.00000000000000000 1.00000000000000000])
       array([3.00000000000000000 1.00000000000000000 0.00000000000000000])])
>>> x_edges
array([0.00000000000000000 3.00000000000000000 6.00000000000000000 9.00000000000000000])
>>> y_edges
array([0.00000000000000000 3.00000000000000000 6.00000000000000000 9.00000000000000000])
arkouda.numpy.histogramdd(sample: Sequence[arkouda.numpy.pdarrayclass.pdarray], bins: arkouda.numpy.dtypes.int_scalars | Sequence[arkouda.numpy.dtypes.int_scalars] = 10) Tuple[arkouda.numpy.pdarrayclass.pdarray, Sequence[arkouda.numpy.pdarrayclass.pdarray]][source]

Compute the multidimensional histogram of data in sample with evenly spaced bins.

Parameters:
  • sample (Sequence of pdarray) – A sequence of pdarrays containing the coordinates of the points to be histogrammed.

  • bins (int_scalars or Sequence of int_scalars, default=10) – The number of equal-size bins to use. If int, the number of bins for all dimensions (nx=ny=…=bins). If [int, int, …], the number of bins in each dimension (nx, ny, … = bins). Defaults to 10

Returns:

  • hist (pdarray) – shape(nx, ny, …, nd) The multidimensional histogram of pdarrays in sample. Values in first pdarray are histogrammed along the first dimension. Values in second pdarray are histogrammed along the second dimension and so on.

  • edges (List[pdarray]) – A list of pdarrays containing the bin edges for each dimension.

Raises:
  • ValueError – Raised if bins < 1

  • NotImplementedError – Raised if pdarray dtype is bool or uint8

See also

histogram

Notes

The bins for each dimension, m, are evenly spaced in the interval [m.min(), m.max()]

Examples

>>> x = ak.arange(0, 10, 1)
>>> y = ak.arange(9, -1, -1)
>>> z = ak.where(x % 2 == 0, x, y)
>>> h, edges = ak.histogramdd((x, y,z), bins=(2,2,5))
>>> h
array([array([array([0 0 0 0 0])
       array([1 1 1 1 1])])
       array([array([1 1 1 1 1])
       array([0 0 0 0 0])])])
>>> edges
[array([0.00000000000000000 4.5 9.00000000000000000]),
array([0.00000000000000000 4.5 9.00000000000000000]),
array([0.00000000000000000 1.6000000000000001 3.2000000000000002
4.8000000000000007 6.4000000000000004 8.00000000000000000])]
arkouda.numpy.in1d(A: arkouda.groupbyclass.groupable, B: arkouda.groupbyclass.groupable, assume_unique: bool = False, symmetric: bool = False, invert: bool = False) arkouda.groupbyclass.groupable[source]

Test whether each element of a 1-D array is also present in a second array.

Returns a boolean array the same length as A that is True where an element of A is in B and False otherwise.

Supports multi-level, i.e. test if rows of a are in the set of rows of b. But note that multi-dimensional pdarrays are not supported.

Parameters:
  • A (list of pdarrays, pdarray, Strings, or Categorical) – Entries will be tested for membership in B

  • B (list of pdarrays, pdarray, Strings, or Categorical) – The set of elements in which to test membership

  • assume_unique (bool, optional, defaults to False) – If true, assume rows of a and b are each unique and sorted. By default, sort and unique them explicitly.

  • symmetric (bool, optional, defaults to False) – Return in1d(A, B), in1d(B, A) when A and B are single items.

  • invert (bool, optional, defaults to False) – If True, the values in the returned array are inverted (that is, False where an element of A is in B and True otherwise). Default is False. ak.in1d(a, b, invert=True) is equivalent to (but is faster than) ~ak.in1d(a, b).

Returns:

True for each row in a that is contained in b

Return type:

pdarray, bool

Raises:
  • TypeError – Raised if either A or B is not a pdarray, Strings, or Categorical object, or if both are pdarrays and either has rank > 1, or if invert is not a bool

  • RuntimeError – Raised if the dtype of either array is not supported

Examples

>>> ak.in1d(ak.array([-1, 0, 1]), ak.array([-2, 0, 2]))
array([False True False])
>>> ak.in1d(ak.array(['one','two']),ak.array(['two', 'three','four','five']))
array([False True])

Notes

in1d can be considered as an element-wise function version of the python keyword in, for 1-D sequences. in1d(a, b) is logically equivalent to ak.array([item in b for item in a]), but is much faster and scales to arbitrarily large a.

ak.in1d is not supported for bool or float64 pdarrays

arkouda.numpy.indexof1d(query: arkouda.groupbyclass.groupable, space: arkouda.groupbyclass.groupable) arkouda.numpy.pdarrayclass.pdarray[source]

Return indices of query items in a search list of items. Items not found will be excluded. When duplicate terms are present in search space return indices of all occurrences.

Parameters:
  • query ((sequence of) pdarray or Strings or Categorical) – The items to search for. If multiple arrays, each “row” is an item.

  • space ((sequence of) pdarray or Strings or Categorical) – The set of items in which to search. Must have same shape/dtype as query.

Returns:

indices – For each item in query that is found in space, its index in space.

Return type:

pdarray, int64

Notes

This is an alias of ak.find(query, space, all_occurrences=True, remove_missing=True).values

Examples

>>> select_from = ak.arange(10)
>>> query = select_from[ak.randint(0, select_from.size, 20, seed=10)]
>>> space = select_from[ak.randint(0, select_from.size, 20, seed=11)]

remove some values to ensure that query has entries which don’t appear in space

>>> space = space[arr2 != 9]
>>> space = space[arr2 != 3]
>>> ak.indexof1d(query, space)
array([0 4 1 3 10 2 6 12 13 5 7 8 9 14 5 7 11 15 5 7 0 4])
Raises:
  • TypeError – Raised if either query or space is not a pdarray, Strings, or Categorical object

  • RuntimeError – Raised if the dtype of either array is not supported

class arkouda.numpy.int16(value)

Bases: numpy.signedinteger

Signed integer type, compatible with C short.

Character code:

'h'

Canonical name:

numpy.short

Alias on this platform (Linux x86_64):

numpy.int16: 16-bit signed integer (-32_768 to 32_767).

bit_count(*args, **kwargs)

int16.bit_count() -> int

Computes the number of 1-bits in the absolute value of the input. Analogous to the builtin int.bit_count or popcount in C++.

>>> np.int16(127).bit_count()
7
>>> np.int16(-127).bit_count()
7
class arkouda.numpy.int32(value)

Bases: numpy.signedinteger

Signed integer type, compatible with C int.

Character code:

'i'

Canonical name:

numpy.intc

Alias on this platform (Linux x86_64):

numpy.int32: 32-bit signed integer (-2_147_483_648 to 2_147_483_647).

bit_count(*args, **kwargs)

int32.bit_count() -> int

Computes the number of 1-bits in the absolute value of the input. Analogous to the builtin int.bit_count or popcount in C++.

>>> np.int32(127).bit_count()
7
>>> np.int32(-127).bit_count()
7
class arkouda.numpy.int64(value)

Bases: numpy.signedinteger

Signed integer type, compatible with Python int and C long.

Character code:

'l'

Canonical name:

numpy.int_

Alias on this platform (Linux x86_64):

numpy.int64: 64-bit signed integer (-9_223_372_036_854_775_808 to 9_223_372_036_854_775_807).

Alias on this platform (Linux x86_64):

numpy.intp: Signed integer large enough to fit pointer, compatible with C intptr_t.

bit_count(*args, **kwargs)

int64.bit_count() -> int

Computes the number of 1-bits in the absolute value of the input. Analogous to the builtin int.bit_count or popcount in C++.

>>> np.int64(127).bit_count()
7
>>> np.int64(-127).bit_count()
7
class arkouda.numpy.int64(value)

Bases: numpy.signedinteger

Signed integer type, compatible with Python int and C long.

Character code:

'l'

Canonical name:

numpy.int_

Alias on this platform (Linux x86_64):

numpy.int64: 64-bit signed integer (-9_223_372_036_854_775_808 to 9_223_372_036_854_775_807).

Alias on this platform (Linux x86_64):

numpy.intp: Signed integer large enough to fit pointer, compatible with C intptr_t.

bit_count(*args, **kwargs)

int64.bit_count() -> int

Computes the number of 1-bits in the absolute value of the input. Analogous to the builtin int.bit_count or popcount in C++.

>>> np.int64(127).bit_count()
7
>>> np.int64(-127).bit_count()
7
class arkouda.numpy.int8(value)

Bases: numpy.signedinteger

Signed integer type, compatible with C char.

Character code:

'b'

Canonical name:

numpy.byte

Alias on this platform (Linux x86_64):

numpy.int8: 8-bit signed integer (-128 to 127).

bit_count(*args, **kwargs)

int8.bit_count() -> int

Computes the number of 1-bits in the absolute value of the input. Analogous to the builtin int.bit_count or popcount in C++.

>>> np.int8(127).bit_count()
7
>>> np.int8(-127).bit_count()
7
class arkouda.numpy.intTypes

frozenset() -> empty frozenset object frozenset(iterable) -> frozenset object

Build an immutable unordered collection of unique elements.

copy(*args, **kwargs)

Return a shallow copy of a set.

difference(*args, **kwargs)

Return the difference of two or more sets as a new set.

(i.e. all elements that are in this set but not the others.)

intersection(*args, **kwargs)

Return the intersection of two sets as a new set.

(i.e. all elements that are in both sets.)

isdisjoint(*args, **kwargs)

Return True if two sets have a null intersection.

issubset(*args, **kwargs)

Report whether another set contains this set.

issuperset(*args, **kwargs)

Report whether this set contains another set.

symmetric_difference(*args, **kwargs)

Return the symmetric difference of two sets as a new set.

(i.e. all elements that are in exactly one of the sets.)

union(*args, **kwargs)

Return the union of sets as a new set.

(i.e. all elements that are in either set.)

class arkouda.numpy.intTypes

frozenset() -> empty frozenset object frozenset(iterable) -> frozenset object

Build an immutable unordered collection of unique elements.

copy(*args, **kwargs)

Return a shallow copy of a set.

difference(*args, **kwargs)

Return the difference of two or more sets as a new set.

(i.e. all elements that are in this set but not the others.)

intersection(*args, **kwargs)

Return the intersection of two sets as a new set.

(i.e. all elements that are in both sets.)

isdisjoint(*args, **kwargs)

Return True if two sets have a null intersection.

issubset(*args, **kwargs)

Report whether another set contains this set.

issuperset(*args, **kwargs)

Report whether this set contains another set.

symmetric_difference(*args, **kwargs)

Return the symmetric difference of two sets as a new set.

(i.e. all elements that are in exactly one of the sets.)

union(*args, **kwargs)

Return the union of sets as a new set.

(i.e. all elements that are in either set.)

class arkouda.numpy.int_scalars(origin, params, *, inst=True, name=None)

Bases: _GenericAlias

The central part of internal API.

This represents a generic version of type ‘origin’ with type arguments ‘params’. There are two kind of these aliases: user defined and special. The special ones are wrappers around builtin collections and ABCs in collections.abc. These must have ‘name’ always set. If ‘inst’ is False, then the alias can’t be instantiated, this is used by e.g. typing.List and typing.Dict.

class arkouda.numpy.int_scalars(origin, params, *, inst=True, name=None)

Bases: _GenericAlias

The central part of internal API.

This represents a generic version of type ‘origin’ with type arguments ‘params’. There are two kind of these aliases: user defined and special. The special ones are wrappers around builtin collections and ABCs in collections.abc. These must have ‘name’ always set. If ‘inst’ is False, then the alias can’t be instantiated, this is used by e.g. typing.List and typing.Dict.

class arkouda.numpy.int_scalars(origin, params, *, inst=True, name=None)

Bases: _GenericAlias

The central part of internal API.

This represents a generic version of type ‘origin’ with type arguments ‘params’. There are two kind of these aliases: user defined and special. The special ones are wrappers around builtin collections and ABCs in collections.abc. These must have ‘name’ always set. If ‘inst’ is False, then the alias can’t be instantiated, this is used by e.g. typing.List and typing.Dict.

arkouda.numpy.intersect1d(A: arkouda.groupbyclass.groupable, B: arkouda.groupbyclass.groupable, assume_unique: bool = False) arkouda.numpy.pdarrayclass.pdarray | arkouda.groupbyclass.groupable[source]

Find the intersection of two arrays.

Return the sorted, unique values that are in both of the input arrays.

Parameters:
Returns:

Sorted 1D array/List of sorted pdarrays of common and unique elements.

Return type:

pdarray/groupable

Raises:
  • TypeError – Raised if either A or B is not a groupable

  • RuntimeError – Raised if the dtype of either pdarray is not supported

Examples

1D Example

>>> ak.intersect1d(ak.array([1, 3, 4, 3]), ak.array([3, 1, 2, 1]))
array([1 3])

Multi-Array Example

>>> a = ak.arange(5)
>>> b = ak.array([1, 5, 3, 4, 2])
>>> c = ak.array([1, 4, 3, 2, 5])
>>> d = ak.array([1, 2, 3, 5, 4])
>>> multia = [a, a, a]
>>> multib = [b, c, d]
>>> ak.intersect1d(multia, multib)
[array([1 3]), array([1 3]), array([1 3])]
arkouda.numpy.isSupportedBool(num)[source]

Whether a scalar is an arkouda supported boolean dtype.

Parameters:

scalar (object)

Returns:

True if scalar is an instance of an arkouda supported boolean dtype, else False.

Return type:

bool

Examples

>>> ak.isSupportedBool(ak.int64)
False
>>> ak.isSupportedBool(bool)
True
arkouda.numpy.isSupportedDType(scalar: object) bool[source]

Whether a scalar is an arkouda supported dtype.

Parameters:

scalar (object)

Returns:

True if scalar is an instance of an arkouda supported dtype, else False.

Return type:

bool

Examples

>>> ak.isSupportedDType(ak.int64)
True
>>> ak.isSupportedDType(np.complex128(1+2j))
False
arkouda.numpy.isSupportedFloat(num)[source]

Whether a scalar is an arkouda supported float dtype.

Parameters:

scalar (object)

Returns:

True if scalar is an instance of an arkouda supported float dtype, else False.

Return type:

bool

Examples

>>> ak.isSupportedFloat(ak.int64)
False
>>> ak.isSupportedFloat(ak.float64)
True
arkouda.numpy.isSupportedInt(num)[source]

Whether a scalar is an arkouda supported integer dtype.

Parameters:

scalar (object)

Returns:

True if scalar is an instance of an arkouda supported integer dtype, else False.

Return type:

bool

Examples

>>> ak.isSupportedInt(ak.int64)
True
>>> ak.isSupportedInt(ak.float64)
False
arkouda.numpy.isSupportedInt(num)[source]

Whether a scalar is an arkouda supported integer dtype.

Parameters:

scalar (object)

Returns:

True if scalar is an instance of an arkouda supported integer dtype, else False.

Return type:

bool

Examples

>>> ak.isSupportedInt(ak.int64)
True
>>> ak.isSupportedInt(ak.float64)
False
arkouda.numpy.isSupportedInt(num)[source]

Whether a scalar is an arkouda supported integer dtype.

Parameters:

scalar (object)

Returns:

True if scalar is an instance of an arkouda supported integer dtype, else False.

Return type:

bool

Examples

>>> ak.isSupportedInt(ak.int64)
True
>>> ak.isSupportedInt(ak.float64)
False
arkouda.numpy.isSupportedNumber(num)[source]

Whether a scalar is an arkouda supported numeric dtype.

Parameters:

scalar (object)

Returns:

True if scalar is an instance of an arkouda supported numeric dtype, else False.

Return type:

bool

Examples

>>> ak.isSupportedNumber(ak.int64)
True
>>> ak.isSupportedNumber(ak.str_)
False
arkouda.numpy.is_registered(name: str, as_component: bool = False) bool[source]

Determine if the name provided is associated with a registered Object

Parameters:
  • name (str) – The name to check for in the registry

  • as_component (bool, default=False) – When True, the name will be checked to determine if it is registered as a component of a registered object

Return type:

bool

arkouda.numpy.isfinite(pda: arkouda.numpy.pdarrayclass.pdarray) arkouda.numpy.pdarrayclass.pdarray[source]

Return the element-wise isfinite check applied to the array.

Parameters:

pda (pdarray)

Returns:

A pdarray containing boolean values indicating whether the input array elements are finite

Return type:

pdarray

Raises:
  • TypeError – Raised if the parameter is not a pdarray

  • RuntimeError – if the underlying pdarray is not float-based

Examples

>>> ak.isfinite(ak.array([1.0, 2.0, ak.inf]))
array([True True False])
arkouda.numpy.isinf(pda: arkouda.numpy.pdarrayclass.pdarray) arkouda.numpy.pdarrayclass.pdarray[source]

Return the element-wise isinf check applied to the array.

Parameters:

pda (pdarray)

Returns:

A pdarray containing boolean values indicating whether the input array elements are infinite (positive or negative)

Return type:

pdarray

Raises:
  • TypeError – Raised if the parameter is not a pdarray

  • RuntimeError – if the underlying pdarray is not float-based

Examples

>>> ak.isinf(ak.array([1.0, 2.0, ak.inf]))
array([False False True])
arkouda.numpy.isnan(pda: arkouda.numpy.pdarrayclass.pdarray) arkouda.numpy.pdarrayclass.pdarray[source]

Return the element-wise isnan check applied to the array.

Parameters:

pda (pdarray)

Returns:

A pdarray containing boolean values indicating whether the input array elements are NaN

Return type:

pdarray

Raises:
  • TypeError – Raised if the parameter is not a pdarray

  • RuntimeError – if the underlying pdarray is not float-based

Examples

>>> ak.isnan(ak.array([1.0, 2.0, np.log(-1)]))
array([False False True])
arkouda.numpy.linspace(start: arkouda.numpy.dtypes.numeric_scalars, stop: arkouda.numpy.dtypes.numeric_scalars, length: arkouda.numpy.dtypes.int_scalars) arkouda.numpy.pdarrayclass.pdarray[source]

Create a pdarray of linearly-spaced floats in a closed interval.

Parameters:
Returns:

Array of evenly spaced float values along the interval

Return type:

pdarray, float64

Raises:

TypeError – Raised if start or stop is not a float or int or if length is not an int

See also

arange

Notes

If that start is greater than stop, the pdarray values are generated in descending order.

Examples

>>> ak.linspace(0, 1, 5)
array([0.00000000000000000 0.25 0.5 0.75 1.00000000000000000])
>>> ak.linspace(start=1, stop=0, length=5)
array([1.00000000000000000 0.75 0.5 0.25 0.00000000000000000])
>>> ak.linspace(start=-5, stop=0, length=5)
array([-5.00000000000000000 -3.75 -2.5 -1.25 0.00000000000000000])
arkouda.numpy.log(pda: arkouda.numpy.pdarrayclass.pdarray) arkouda.numpy.pdarrayclass.pdarray[source]

Return the element-wise natural log of the array.

Parameters:

pda (pdarray)

Returns:

A pdarray containing natural log values of the input array elements

Return type:

pdarray

Raises:

TypeError – Raised if the parameter is not a pdarray

Notes

Logarithms with other bases can be computed as follows:

Examples

>>> A = ak.array([1, 10, 100])
# Natural log
>>> ak.log(A)
array([0.00000000000000000 2.3025850929940459 4.6051701859880918])
# Log base 10
>>> ak.log(A) / np.log(10)
array([0.00000000000000000 1.00000000000000000 2.00000000000000000])
# Log base 2
>>> ak.log(A) / np.log(2)
array([0.00000000000000000 3.3219280948873626 6.6438561897747253])
arkouda.numpy.log10(pda: arkouda.numpy.pdarrayclass.pdarray) arkouda.numpy.pdarrayclass.pdarray[source]

Return the element-wise base 10 log of the array.

Parameters:

pda (pdarray) – array to compute on

Returns:

pdarray containing base 10 log values of the input array elements

Return type:

pdarray

arkouda.numpy.log1p(pda: arkouda.numpy.pdarrayclass.pdarray) arkouda.numpy.pdarrayclass.pdarray[source]

Return the element-wise natural log of one plus the array.

Parameters:

pda (pdarray) – array to compute on

Returns:

pdarray containing natural log values of the input array elements, adding one before taking the log

Return type:

pdarray

Examples

>>> ak.log1p(ak.arange(1,5))
array([0.69314718055994529 1.0986122886681098 1.3862943611198906 1.6094379124341003])
arkouda.numpy.log2(pda: arkouda.numpy.pdarrayclass.pdarray) arkouda.numpy.pdarrayclass.pdarray[source]

Return the element-wise base 2 log of the array.

Parameters:

pda (pdarray) – array to compute on

Returns:

pdarray containing base 2 log values of the input array elements

Return type:

pdarray

arkouda.numpy.matmul(pdaLeft: arkouda.numpy.pdarrayclass.pdarray, pdaRight: arkouda.numpy.pdarrayclass.pdarray) arkouda.numpy.pdarrayclass.pdarray[source]

Compute the product of two matrices.

Parameters:
Returns:

the matrix product pdaLeft x pdaRight

Return type:

pdarray

Examples

>>> a = ak.array([[1,2,3,4,5],[1,2,3,4,5]])
>>> b = ak.array([[1,1],[2,2],[3,3],[4,4],[5,5]])
>>> ak.matmul(a,b)
array([array([55 55]) array([55 55])])
>>> x = ak.array([[1,2,3],[1.1,2.1,3.1]])
>>> y = ak.array([[1,1,1],[0,2,2],[0,0,3]])
>>> ak.matmul(x,y)
array([array([1.00000000000000000 5.00000000000000000 14.00000000000000000])
array([1.1000000000000001 5.3000000000000007 14.600000000000001])])

Notes

Server returns an error if shapes of pdaLeft and pdaRight are incompatible with matrix multiplication.

arkouda.numpy.maxk(pda: pdarray, k: arkouda.numpy.dtypes.int_scalars) pdarray[source]

Find the k maximum values of an array.

Returns the largest k values of an array, sorted

Parameters:
  • pda (pdarray) – Input array.

  • k (int_scalars) – The desired count of maximum values to be returned by the output.

Returns:

The maximum k values from pda, sorted

Return type:

pdarray, int

Raises:
  • TypeError – Raised if pda is not a pdarray or k is not an integer

  • ValueError – Raised if the pda is empty, or pda.ndim > 1, or k < 1

Notes

This call is equivalent in value to a[ak.argsort(a)[k:]]

and generally outperforms this operation.

This reduction will see a significant drop in performance as k grows beyond a certain value. This value is system dependent, but generally about a k of 5 million is where performance degredation has been observed.

Examples

>>> A = ak.array([10,5,1,3,7,2,9,0])
>>> ak.maxk(A, 3)
array([7, 9, 10])
>>> ak.maxk(A, 4)
array([5, 7, 9, 10])
arkouda.numpy.mean(pda: pdarray) numpy.float64[source]

Return the mean of the array.

Parameters:

pda (pdarray) – Values for which to calculate the mean

Returns:

The mean calculated from the pda sum and size

Return type:

np.float64

Examples

>>> a = ak.arange(10)
>>> ak.mean(a)
4.5
>>> a.mean()
4.5
Raises:
  • TypeError – Raised if pda is not a pdarray instance

  • RuntimeError – Raised if there’s a server-side error thrown

arkouda.numpy.median(pda: arkouda.numpy.pdarrayclass.pdarray) numpy.float64[source]

Compute the median of a given array. 1d case only, for now.

Parameters:

pda (pdarray) – The input data, in pdarray form, numeric type or boolean

Returns:

The median of the entire pdarray
The array is sorted, and then if the number of elements is odd, the return value is the middle element. If even, then the mean of the two middle elements.

Return type:

np.float64

Examples

>>> pda = ak.array([0,4,7,8,1,3,5,2,-1])
>>> ak.median(pda)
3.0
>>> pda = ak.array([0,1,3,3,1,2,3,4,2,3])
>>> ak.median(pda)
2.5
arkouda.numpy.mink(pda: pdarray, k: arkouda.numpy.dtypes.int_scalars) pdarray[source]

Find the k minimum values of an array.

Returns the smallest k values of an array, sorted

Parameters:
  • pda (pdarray) – Input array.

  • k (int_scalars) – The desired count of minimum values to be returned by the output.

Returns:

The minimum k values from pda, sorted

Return type:

pdarray

Raises:
  • TypeError – Raised if pda is not a pdarray

  • ValueError – Raised if the pda is empty, or pda.ndim > 1, or k < 1

Notes

This call is equivalent in value to a[ak.argsort(a)[:k]] and generally outperforms this operation.

This reduction will see a significant drop in performance as k grows beyond a certain value. This value is system dependent, but generally about a k of 5 million is where performance degredation has been observed.

Examples

>>> A = ak.array([10,5,1,3,7,2,9,0])
>>> ak.mink(A, 3)
array([0, 1, 2])
>>> ak.mink(A, 4)
array([0, 1, 2, 3])
arkouda.numpy.mod(dividend, divisor) pdarray[source]

Returns the element-wise remainder of division.

Computes the remainder complementary to the floor_divide function. It is equivalent to np.mod, the remainder has the same sign as the divisor.

Parameters:
  • dividend – pdarray : The numeric scalar or pdarray being acted on by the bases for the modular division.

  • divisor – pdarray : The numeric scalar or pdarray that will be the bases for the modular division.

Returns:

an array that contains the element-wise remainder of division.

Return type:

pdarray

Examples

>>> a = ak.array([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20])
>>> b = ak.array([2,2,2,3,3,3,4,4,4,5,5,5,6,6,6,7,7,7,8,8])
>>> ak.mod(a,b)
array([1 0 1 1 2 0 3 0 1 0 1 2 1 2 3 2 3 4 3 4])
Raises:

ValueError – raised if shapes of dividend and divisor are incompatible

class arkouda.numpy.numeric_and_bool_scalars(origin, params, *, inst=True, name=None)

Bases: _GenericAlias

The central part of internal API.

This represents a generic version of type ‘origin’ with type arguments ‘params’. There are two kind of these aliases: user defined and special. The special ones are wrappers around builtin collections and ABCs in collections.abc. These must have ‘name’ always set. If ‘inst’ is False, then the alias can’t be instantiated, this is used by e.g. typing.List and typing.Dict.

class arkouda.numpy.numeric_scalars(origin, params, *, inst=True, name=None)

Bases: _GenericAlias

The central part of internal API.

This represents a generic version of type ‘origin’ with type arguments ‘params’. There are two kind of these aliases: user defined and special. The special ones are wrappers around builtin collections and ABCs in collections.abc. These must have ‘name’ always set. If ‘inst’ is False, then the alias can’t be instantiated, this is used by e.g. typing.List and typing.Dict.

class arkouda.numpy.numpy_scalars(origin, params, *, inst=True, name=None)

Bases: _GenericAlias

The central part of internal API.

This represents a generic version of type ‘origin’ with type arguments ‘params’. There are two kind of these aliases: user defined and special. The special ones are wrappers around builtin collections and ABCs in collections.abc. These must have ‘name’ always set. If ‘inst’ is False, then the alias can’t be instantiated, this is used by e.g. typing.List and typing.Dict.

arkouda.numpy.ones(size: arkouda.numpy.dtypes.int_scalars | Tuple[arkouda.numpy.dtypes.int_scalars, Ellipsis] | str, dtype: numpy.dtype | type | str | arkouda.numpy.dtypes.bigint = float64, max_bits: int | None = None) arkouda.numpy.pdarrayclass.pdarray[source]

Create a pdarray filled with ones.

Parameters:
  • size (int_scalars or tuple of int_scalars) – Size or shape of the array

  • dtype (Union[float64, int64, bool]) – Resulting array type, default ak.float64

  • max_bits (int) – Specifies the maximum number of bits; only used for bigint pdarrays Included for consistency, as ones are all zeros ending on a one, regardless of max_bits

Returns:

Ones of the requested size or shape and dtype

Return type:

pdarray

Raises:
  • TypeError – Raised if the supplied dtype is not supported

  • RuntimeError – Raised if the size parameter is neither an int nor a str that is parseable to an int.

  • ValueError – Raised if the rank of the given shape is not in get_array_ranks() or is empty

See also

zeros, ones_like

Examples

>>> ak.ones(5, dtype=ak.int64)
array([1 1 1 1 1])
>>> ak.ones(5, dtype=ak.float64)
array([1.00000000000000000 1.00000000000000000 1.00000000000000000
       1.00000000000000000 1.00000000000000000])
>>> ak.ones(5, dtype=ak.bool_)
array([True True True True True])

Notes

Logic for generating the pdarray is delegated to the ak.full method.

arkouda.numpy.ones(size: arkouda.numpy.dtypes.int_scalars | Tuple[arkouda.numpy.dtypes.int_scalars, Ellipsis] | str, dtype: numpy.dtype | type | str | arkouda.numpy.dtypes.bigint = float64, max_bits: int | None = None) arkouda.numpy.pdarrayclass.pdarray[source]

Create a pdarray filled with ones.

Parameters:
  • size (int_scalars or tuple of int_scalars) – Size or shape of the array

  • dtype (Union[float64, int64, bool]) – Resulting array type, default ak.float64

  • max_bits (int) – Specifies the maximum number of bits; only used for bigint pdarrays Included for consistency, as ones are all zeros ending on a one, regardless of max_bits

Returns:

Ones of the requested size or shape and dtype

Return type:

pdarray

Raises:
  • TypeError – Raised if the supplied dtype is not supported

  • RuntimeError – Raised if the size parameter is neither an int nor a str that is parseable to an int.

  • ValueError – Raised if the rank of the given shape is not in get_array_ranks() or is empty

See also

zeros, ones_like

Examples

>>> ak.ones(5, dtype=ak.int64)
array([1 1 1 1 1])
>>> ak.ones(5, dtype=ak.float64)
array([1.00000000000000000 1.00000000000000000 1.00000000000000000
       1.00000000000000000 1.00000000000000000])
>>> ak.ones(5, dtype=ak.bool_)
array([True True True True True])

Notes

Logic for generating the pdarray is delegated to the ak.full method.

arkouda.numpy.ones_like(pda: arkouda.numpy.pdarrayclass.pdarray) arkouda.numpy.pdarrayclass.pdarray[source]

Create a one-filled pdarray of the same size and dtype as an existing pdarray.

Parameters:

pda (pdarray) – Array to use for size and dtype

Returns:

Equivalent to ak.ones(pda.size, pda.dtype)

Return type:

pdarray

Raises:

TypeError – Raised if the pda parameter is not a pdarray.

See also

ones, zeros_like

Notes

Logic for generating the pdarray is delegated to the ak.ones method. Accordingly, the supported dtypes match are defined by the ak.ones method.

Examples

>>> ak.ones_like(ak.zeros(5,dtype=ak.int64))
array([1 1 1 1 1])
>>> ak.ones_like(ak.zeros(5,dtype=ak.float64))
array([1.00000000000000000 1.00000000000000000 1.00000000000000000
       1.00000000000000000 1.00000000000000000])
>>> ak.ones_like(ak.zeros(5,dtype=ak.bool_))
array([True True True True True])
arkouda.numpy.parity(pda: pdarray) pdarray[source]

Find the bit parity (XOR of all bits) for each integer in an array.

Parameters:

pda (pdarray, int64, uint64, bigint) – Input array (must be integral).

Returns:

parity – The parity of each element: 0 if even number of bits set, 1 if odd.

Return type:

pdarray

Raises:

TypeError – If input array is not int64, uint64, or bigint

Examples

>>> A = ak.arange(10)
>>> ak.parity(A)
array([0, 1, 1, 0, 1, 0, 0, 1, 1, 0])
class arkouda.numpy.pdarray(name: str, mydtype: numpy.dtype | str, size: arkouda.numpy.dtypes.int_scalars, ndim: arkouda.numpy.dtypes.int_scalars, shape: Sequence[int], itemsize: arkouda.numpy.dtypes.int_scalars, max_bits: int | None = None)[source]

The basic arkouda array class. This class contains only the attributes of the array; the data resides on the arkouda server. When a server operation results in a new array, arkouda will create a pdarray instance that points to the array data on the server. As such, the user should not initialize pdarray instances directly.

name

The server-side identifier for the array

Type:

str

dtype

The element type of the array

Type:

dtype

size

The number of elements in the array

Type:

int_scalars

ndim

The rank of the array

Type:

int_scalars

shape

A list or tuple containing the sizes of each dimension of the array

Type:

Sequence[int]

itemsize

The size in bytes of each element

Type:

int_scalars

BinOps
OpEqOps
all(axis: int | Tuple[int, Ellipsis] | None = None, keepdims: bool = False) arkouda.numpy.dtypes.bool_scalars | pdarray[source]

Return True iff all elements of the array along the given axis evaluate to True.

Parameters:
  • axis (int, Tuple[int, ...], optional, default = None) – The axis or axes along which to do the operation If None, the computation is done across the entire array.

  • keepdims (bool, optional, default = False) – Whether to keep the singleton dimension(s) along axis in the result.

Returns:

boolean if axis is omitted, pdarray if axis is supplied

Return type:

boolean or pdarray

Examples

>>> ak.all(ak.array([True,False,False]))
False
>>> ak.all(ak.array([[True,True,False],[False,True,True]]),axis=0)
array([False True False])
>>> ak.all(ak.array([[True,True,True],[False,False,False]]),axis=0,keepdims=True)
array([array([False False False])])
>>> ak.all(ak.array([[True,True,True],[False,False,False]]),axis=1,keepdims=True)
array([array([True]) array([False])])
>>> ak.array([True,False,False]).all()
False
Raises:
  • TypeError – Raised if pda is not a pdarray instance

  • RuntimeError – Raised if there’s a server-side error thrown

Notes

Works as a method of a pdarray (e.g. a.any()) or a standalone function (e.g. ak.all(a))

any(axis: int | Tuple[int, Ellipsis] | None = None, keepdims: bool = False) arkouda.numpy.dtypes.bool_scalars | pdarray[source]

Return True iff any element of the array along the given axis evaluates to True.

Parameters:
  • axis (int, Tuple[int, ...], optional, default = None) – The axis or axes along which to do the operation If None, the computation is done across the entire array.

  • keepdims (bool, optional, default = False) – Whether to keep the singleton dimension(s) along axis in the result.

Returns:

boolean if axis is omitted, else pdarray if axis is supplied

Return type:

boolean or pdarray

Examples

>>> ak.any(ak.array([True,False,False]))
True
>>> ak.any(ak.array([[True,True,False],[False,True,True]]),axis=0)
array([True True True])
>>> ak.any(ak.array([[True,True,True],[False,False,False]]),axis=0,keepdims=True)
array([array([True True True])])
>>> ak.any(ak.array([[True,True,True],[False,False,False]]),axis=1,keepdims=True)
array([array([True]) array([False])])
>>> ak.array([True,False,False]).any()
True
Raises:
  • TypeError – Raised if pda is not a pdarray instance

  • RuntimeError – Raised if there’s a server-side error thrown

Notes

Works as a method of a pdarray (e.g. a.any()) or a standalone function (e.g. ak.any(a))

argmax(axis: int | None | None = None, keepdims: bool = False) numpy.int64 | numpy.uint64 | pdarray[source]

Return index of the first occurrence of the maximum along the given axis.

Parameters:
  • axis (int, Tuple[int, ...], optional, default = None) – The axis or axes along which to do the operation If None, the computation is done across the entire array.

  • keepdims (bool, optional, default = False) – Whether to keep the singleton dimension(s) along axis in the result.

Returns:

int64 or uint64 if axis is omitted, in which case operation is done over entire array pdarray if axis is supplied, in which case the operation is done along that axis

Return type:

int64, uint64 or pdarray

Raises:
  • TypeError – Raised if pda is not a pdarray instance

  • RuntimeError – Raised if there’s a server-side error thrown

Examples

>>> ak.argmax(ak.array([1,2,3,4,5]))
4
>>> ak.argmax(ak.array([5.5,4.5,3.5,2.5,1.5]))
0
>>> ak.array([[1,2,3],[5,4,3]]).argmax(axis=1)
array([2 0])

Notes

Works as a method of a pdarray (e.g. a.argmax()) or a standalone function (e.g. ak.argmax(a))

argmaxk(k: arkouda.numpy.dtypes.int_scalars) pdarray[source]

Finds the indices corresponding to the k maximum values of an array. See arkouda.argmaxk for details.

argmin(axis: int | None | None = None, keepdims: bool = False) numpy.int64 | numpy.uint64 | pdarray[source]

Return index of the first occurrence of the minimum along the given axis.

Parameters:
  • axis (int, Tuple[int, ...], optional, default = None) – The axis or axes along which to do the operation If None, the computation is done across the entire array.

  • keepdims (bool, optional, default = False) – Whether to keep the singleton dimension(s) along axis in the result.

Returns:

int64 or uint64 if axis is omitted, in which case operation is done over entire array pdarray if axis is supplied, in which case the operation is done along that axis

Return type:

int64, uint64 or pdarray

Raises:
  • TypeError – Raised if pda is not a pdarray instance

  • RuntimeError – Raised if there’s a server-side error thrown

Examples

>>> ak.argmin(ak.array([1,2,3,4,5]))
0
>>> ak.argmin(ak.array([5.5,4.5,3.5,2.5,1.5]))
4
>>> ak.array([[1,2,3],[5,4,3]]).argmin(axis=1)
array([0 2])

Notes

Works as a method of a pdarray (e.g. a.argmin()) or a standalone function (e.g. ak.argmin(a))

argmink(k: arkouda.numpy.dtypes.int_scalars) pdarray[source]

Finds the indices corresponding to the k minimum values of an array. See arkouda.argmink for details.

astype(dtype) pdarray[source]

Cast values of pdarray to provided dtype

Parameters:

dtype (np.dtype or str) – Dtype to cast to

Examples

>>> ak.array([1,2,3]).astype(ak.float64)
array([1.00000000000000000 2.00000000000000000 3.00000000000000000])
>>> ak.array([1.5,2.5]).astype(ak.int64)
array([1 2])
>>> ak.array([True,False]).astype(ak.int64)
array([1 0])
Returns:

An arkouda pdarray with values converted to the specified data type

Return type:

ak.pdarray

Notes

This is essentially shorthand for ak.cast(x, ‘<dtype>’) where x is a pdarray.

static attach(user_defined_name: str) pdarray[source]

class method to return a pdarray attached to the registered name in the arkouda server which was registered using register()

Parameters:

user_defined_name (str) – user defined name which array was registered under

Returns:

pdarray which is bound to the corresponding server side component which was registered with user_defined_name

Return type:

pdarray

Raises:

TypeError – Raised if user_defined_name is not a str

Notes

Registered names/pdarrays in the server are immune to deletion until they are unregistered.

Examples

>>> a = zeros(100)
>>> a.register("my_zeros")
>>> # potentially disconnect from server and reconnect to server
>>> b = ak.pdarray.attach("my_zeros")
>>> # ...other work...
>>> b.unregister()
bigint_to_uint_arrays() List[pdarray][source]

Creates a list of uint pdarrays from a bigint pdarray. The first item in return will be the highest 64 bits of the bigint pdarray and the last item will be the lowest 64 bits.

Returns:

A list of uint pdarrays where: The first item in return will be the highest 64 bits of the bigint pdarray and the last item will be the lowest 64 bits.

Return type:

List[pdarrays]

Raises:

RuntimeError – Raised if there is a server-side error thrown

Examples

>>> a = ak.arange(2**64, 2**64 + 5)
>>> a
array([18446744073709551616 18446744073709551617 18446744073709551618
18446744073709551619 18446744073709551620])
>>> a.bigint_to_uint_arrays()
[array([1 1 1 1 1]), array([0 1 2 3 4])]
clz() pdarray[source]

Count the number of leading zeros in each element. See ak.clz.

corr(y: pdarray) numpy.float64[source]

Compute the correlation between self and y using pearson correlation coefficient. See arkouda.corr for details.

cov(y: pdarray) numpy.float64[source]

Compute the covariance between self and y.

ctz() pdarray[source]

Count the number of trailing zeros in each element. See ak.ctz.

dtype
equals(other) arkouda.numpy.dtypes.bool_scalars[source]

Whether pdarrays are the same size and all entries are equal.

Parameters:

other (object) – object to compare.

Returns:

True if the pdarrays are the same, o.w. False.

Return type:

bool

Examples

>>> a = ak.array([1, 2, 3])
>>> a_cpy = ak.array([1, 2, 3])
>>> a.equals(a_cpy)
True
>>> a2 = ak.array([1, 2, 5)
>>> a.equals(a2)
False
fill(value: arkouda.numpy.dtypes.numeric_scalars) None[source]

Fill the array (in place) with a constant value.

Parameters:

value (numeric_scalars)

Raises:

TypeError – Raised if value is not an int, int64, float, or float64

flatten()[source]

Return a copy of the array collapsed into one dimension.

Return type:

A copy of the input array, flattened to one dimension.

Examples

>>> a = ak.array([[3,2,1],[2,3,1]])
>>> a.flatten()
array([3 2 1 2 3 1])
format_other(other) str[source]

Attempt to cast scalar other to the element dtype of this pdarray, and print the resulting value to a string (e.g. for sending to a server command). The user should not call this function directly.

Parameters:

other (object) – The scalar to be cast to the pdarray.dtype

Return type:

string representation of np.dtype corresponding to the other parameter

Raises:

TypeError – Raised if the other parameter cannot be converted to Numpy dtype

property inferred_type: str | None

Return a string of the type inferred from the values.

info() str[source]

Returns a JSON formatted string containing information about all components of self

Parameters:

None

Returns:

JSON string containing information about all components of self

Return type:

str

is_registered() numpy.bool_[source]

Return True iff the object is contained in the registry

Parameters:

None

Returns:

Indicates if the object is contained in the registry

Return type:

bool

Raises:

RuntimeError – Raised if there’s a server-side error thrown

Note

This will return True if the object is registered itself or as a component of another object

is_sorted(axis: int | Tuple[int, Ellipsis] | None = None, keepdims: bool = False) arkouda.numpy.dtypes.bool_scalars | pdarray[source]

Return True iff the array (or given axis of the array) is monotonically non-decreasing.

Parameters:
  • axis (int, Tuple[int, ...], optional, default = None) – The axis or axes along which to do the operation If None, the computation is done across the entire array.

  • keepdims (bool, optional, default = False) – Whether to keep the singleton dimension(s) along axis in the result.

Returns:

boolean if axis is omitted, else pdarray if axis is supplied

Return type:

boolean or pdarray

Raises:
  • TypeError – Raised if pda is not a pdarray instance

  • RuntimeError – Raised if there’s a server-side error thrown

Examples

>>> ak.is_sorted(ak.array([1,2,3,4,5]))
True
>>> ak.is_sorted(ak.array([5,4,3,2,1]))
False
>>> ak.array([[1,2,3],[5,4,3]]).is_sorted(axis=1)
array([True False])

Notes

Works as a method of a pdarray (e.g. a.is_sorted()) or a standalone function (e.g. ak.is_sorted(a))

itemsize
max(axis: int | Tuple[int, Ellipsis] | None = None, keepdims: bool = False) arkouda.numpy.dtypes.numpy_scalars | pdarray[source]

Return max of array elements along the given axis.

Parameters:
  • axis (int, Tuple[int, ...], optional, default = None) – The axis or axes along which to do the operation If None, the computation is done across the entire array.

  • keepdims (bool, optional, default = False) – Whether to keep the singleton dimension(s) along axis in the result.

Returns:

numpy_scalar if axis is omitted, in which case operation is done over entire array pdarray if axis is supplied, in which case the operation is done along that axis

Return type:

numpy_scalar or pdarray

Raises:
  • TypeError – Raised if pda is not a pdarray instance

  • RuntimeError – Raised if there’s a server-side error thrown

Examples

>>> ak.max(ak.array([1,2,3,4,5]))
5
>>> ak.max(ak.array([5.5,4.5,3.5,2.5,1.5]))
5.5
>>> ak.array([[1,2,3],[5,4,3]]).max(axis=1)
array([3 5])

Notes

Works as a method of a pdarray (e.g. a.max()) or a standalone function (e.g. ak.max(a))

property max_bits
maxk(k: arkouda.numpy.dtypes.int_scalars) pdarray[source]

Compute the maximum “k” values. See arkouda.maxk for details.

mean() numpy.float64[source]

Compute the mean. See arkouda.mean for details.

min(axis: int | Tuple[int, Ellipsis] | None = None, keepdims: bool = False) arkouda.numpy.dtypes.numpy_scalars | pdarray[source]

Return min of array elements along the given axis.

Parameters:
  • axis (int, Tuple[int, ...], optional, default = None) – The axis or axes along which to do the operation If None, the computation is done across the entire array.

  • keepdims (bool, optional, default = False) – Whether to keep the singleton dimension(s) along axis in the result.

Returns:

numpy_scalar if axis is omitted, in which case operation is done over entire array pdarray if axis is supplied, in which case the operation is done along that axis

Return type:

numpy_scalar or pdarray

Raises:
  • TypeError – Raised if pda is not a pdarray instance

  • RuntimeError – Raised if there’s a server-side error thrown

Examples

>>> ak.min(ak.array([1,2,3,4,5]))
1
>>> ak.min(ak.array([5.5,4.5,3.5,2.5,1.5]))
1.5
>>> ak.array([[1,2,3],[5,4,3]]).min(axis=1)
array([1 3])

Notes

Works as a method of a pdarray (e.g. a.min()) or a standalone function (e.g. ak.min(a))

mink(k: arkouda.numpy.dtypes.int_scalars) pdarray[source]

Compute the minimum “k” values. See arkouda.mink for details.

name
property nbytes

The size of the pdarray in bytes.

Returns:

The size of the pdarray in bytes.

Return type:

int

ndim
objType = 'pdarray'
opeq(other, op)[source]
parity() pdarray[source]

Find the parity (XOR of all bits) in each element. See ak.parity.

popcount() pdarray[source]

Find the population (number of bits set) in each element. See ak.popcount.

pretty_print_info() None[source]

Prints information about all components of self in a human readable format

Parameters:

None

Return type:

None

prod(axis: int | Tuple[int, Ellipsis] | None = None, keepdims: bool = False) arkouda.numpy.dtypes.numpy_scalars | pdarray[source]

Return prod of array elements along the given axis.

Parameters:
  • axis (int, Tuple[int, ...], optional, defalt = None) – The axis or axes along which to do the operation If None, the computation is done across the entire array.

  • keepdims (bool, optional, default = False) – Whether to keep the singleton dimension(s) along axis in the result.

Returns:

numpy_scalar if axis is omitted, in which case operation is done over entire array pdarray if axis is supplied, in which case the operation is done along that axis

Return type:

numpy_scalar or pdarray

Raises:
  • TypeError – Raised if pda is not a pdarray instance

  • RuntimeError – Raised if there’s a server-side error thrown

Examples

>>> ak.prod(ak.array([1,2,3,4,5]))
120
>>> ak.prod(ak.array([5.5,4.5,3.5,2.5,1.5]))
324.84375
>>> ak.array([[1,2,3],[5,4,3]]).prod(axis=1)
array([6 60])

Notes

Works as a method of a pdarray (e.g. a.prod()) or a standalone function (e.g. ak.prod(a))

register(user_defined_name: str) pdarray[source]

Register this pdarray with a user defined name in the arkouda server so it can be attached to later using pdarray.attach() This is an in-place operation, registering a pdarray more than once will update the name in the registry and remove the previously registered name. A name can only be registered to one pdarray at a time.

Parameters:

user_defined_name (str) – user defined name array is to be registered under

Returns:

The same pdarray which is now registered with the arkouda server and has an updated name. This is an in-place modification, the original is returned to support a fluid programming style. Please note you cannot register two different pdarrays with the same name.

Return type:

pdarray

Raises:
  • TypeError – Raised if user_defined_name is not a str

  • RegistrationError – If the server was unable to register the pdarray with the user_defined_name If the user is attempting to register more than one pdarray with the same name, the former should be unregistered first to free up the registration name.

Notes

Registered names/pdarrays in the server are immune to deletion until they are unregistered.

Examples

>>> a = zeros(100)
>>> a.register("my_zeros")
>>> # potentially disconnect from server and reconnect to server
>>> b = ak.pdarray.attach("my_zeros")
>>> # ...other work...
>>> b.unregister()
registered_name: str | None = None
reshape(*shape)[source]

Gives a new shape to an array without changing its data.

Parameters:

shape (int, tuple of ints, or pdarray) – The new shape should be compatible with the original shape.

Returns:

a pdarray with the same data, reshaped to the new shape

Return type:

pdarray

Examples

>>> a = ak.array([[3,2,1],[2,3,1]])
>>> a.reshape((3,2))
array([array([3 2]) array([1 2]) array([3 1])])
>>> a.reshape(3,2)
array([array([3 2]) array([1 2]) array([3 1])])
>>> a.reshape((6,1))
array([array([3]) array([2]) array([1]) array([2]) array([3]) array([1])])

Notes

only available as a method, not as a standalone function, i.e., a.reshape(compatibleShape) is valid, but ak.reshape(a,compatibleShape) is not.

rotl(other) pdarray[source]

Rotate bits left by <other>.

rotr(other) pdarray[source]

Rotate bits right by <other>.

save(prefix_path: str, dataset: str = 'array', mode: str = 'truncate', compression: str | None = None, file_format: str = 'HDF5', file_type: str = 'distribute') str[source]

DEPRECATED Save the pdarray to HDF5 or Parquet. The result is a collection of files, one file per locale of the arkouda server, where each filename starts with prefix_path. HDF5 support single files, in which case the file name will only be that provided. Each locale saves its chunk of the array to its corresponding file.

Parameters:
  • prefix_path (str) – Directory and filename prefix that all output files share

  • dataset (str) – Name of the dataset to create in files (must not already exist)

  • mode (str {'truncate' | 'append'}) – By default, truncate (overwrite) output files, if they exist. If ‘append’, attempt to create new dataset in existing files.

  • compression (str (Optional)) – (None | “snappy” | “gzip” | “brotli” | “zstd” | “lz4”) Sets the compression type used with Parquet files

  • file_format (str {'HDF5', 'Parquet'}) – By default, saved files will be written to the HDF5 file format. If ‘Parquet’, the files will be written to the Parquet file format. This is case insensitive.

  • file_type (str ("single" | "distribute")) – Default: “distribute” When set to single, dataset is written to a single file. When distribute, dataset is written on a file per locale. This is only supported by HDF5 files and will have no impact of Parquet Files.

Return type:

string message indicating result of save operation

Raises:
  • RuntimeError – Raised if a server-side error is thrown saving the pdarray

  • ValueError – Raised if there is an error in parsing the prefix path pointing to file write location or if the mode parameter is neither truncate nor append

  • TypeError – Raised if any one of the prefix_path, dataset, or mode parameters is not a string

See also

save_all, load, read, to_parquet, to_hdf

Notes

The prefix_path must be visible to the arkouda server and the user must have write permission. Output files have names of the form <prefix_path>_LOCALE<i>, where <i> ranges from 0 to numLocales. If any of the output files already exist and the mode is ‘truncate’, they will be overwritten. If the mode is ‘append’ and the number of output files is less than the number of locales or a dataset with the same name already exists, a RuntimeError will result. Previously all files saved in Parquet format were saved with a .parquet file extension. This will require you to use load as if you saved the file with the extension. Try this if an older file is not being found. Any file extension can be used.The file I/O does not rely on the extension to determine the file format.

Examples

>>> a = ak.arange(25)
>>> # Saving without an extension
>>> a.save('path/prefix', dataset='array')
Saves the array to numLocales HDF5 files with the name ``cwd/path/name_prefix_LOCALE####``
>>> # Saving with an extension (HDF5)
>>> a.save('path/prefix.h5', dataset='array')
Saves the array to numLocales HDF5 files with the name
``cwd/path/name_prefix_LOCALE####.h5`` where #### is replaced by each locale number
>>> # Saving with an extension (Parquet)
>>> a.save('path/prefix.parquet', dataset='array', file_format='Parquet')
Saves the array in numLocales Parquet files with the name
``cwd/path/name_prefix_LOCALE####.parquet`` where #### is replaced by each locale number
property shape

Return the shape of an array.

Returns:

The elements of the shape tuple give the lengths of the corresponding array dimensions.

Return type:

tuple of int

size
slice_bits(low, high) pdarray[source]

Returns a pdarray containing only bits from low to high of self.

This is zero indexed and inclusive on both ends, so slicing the bottom 64 bits is pda.slice_bits(0, 63)

Parameters:
  • low (int) – The lowest bit included in the slice (inclusive) zero indexed, so the first bit is 0

  • high (int) – The highest bit included in the slice (inclusive)

Returns:

A new pdarray containing the bits of self from low to high

Return type:

pdarray

Raises:

RuntimeError – Raised if there is a server-side error thrown

Examples

>>> p = ak.array([2**65 + (2**64 - 1)])
>>> bin(p[0])
'0b101111111111111111111111111111111111111111111111111111111111111111'
>>> bin(p.slice_bits(64, 65)[0])
'0b10'
>>> a = ak.array([143,15])
>>> a.slice_bits(1,3)
array([7 7])
>>> a.slice_bits(4,9)
array([8 0])
>>> a.slice_bits(1,9)
array([71 7])
std(ddof: arkouda.numpy.dtypes.int_scalars = 0) numpy.float64[source]

Compute the standard deviation. See arkouda.std for details.

sum(axis: int | Tuple[int, Ellipsis] | None = None, keepdims: bool = False) arkouda.numpy.dtypes.numpy_scalars | pdarray[source]

Return sum of array elements along the given axis.

Parameters:
  • axis (int, Tuple[int, ...], optional, default = None) – The axis or axes along which to do the operation If None, the computation is done across the entire array.

  • keepdims (bool, optional, default = False) – Whether to keep the singleton dimension(s) along axis in the result.

Returns:

numpy_scalar if axis is omitted, in which case operation is done over entire array pdarray if axis is supplied, in which case the operation is done along that axis

Return type:

numpy_scalar or pdarray

Raises:
  • TypeError – Raised if pda is not a pdarray instance

  • RuntimeError – Raised if there’s a server-side error thrown

Examples

>>> ak.sum(ak.array([1,2,3,4,5]))
15
>>> ak.sum(ak.array([5.5,4.5,3.5,2.5,1.5]))
17.5
>>> ak.array([[1,2,3],[5,4,3]]).sum(axis=1)
array([6 12])

Notes

Works as a method of a pdarray (e.g. a.sum()) or a standalone function (e.g. ak.sum(a))

to_csv(prefix_path: str, dataset: str = 'array', col_delim: str = ',', overwrite: bool = False)[source]

Write pdarry to CSV file(s). File will contain a single column with the pdarray data. All CSV files written by Arkouda include a header denoting data types of the columns.

Parameters:
  • prefix_path (str) – filename prefix to be used for saving files. Files will have _LOCALE#### appended when they are written to disk.

  • dataset (str, defaults to "array") – column name to save the pdarray under.

  • col_delim (str, defaults to ",") – value to be used to separate columns within the file. Please be sure that the value used DOES NOT appear in your dataset.

  • overwrite (bool, defaults to False) – If True, existing files matching the provided path will be overwritten. if False and existing files are found, an error will be returned.

Returns:

response message

Return type:

str

Raises:
  • ValueError – Raised if all datasets are not present in all parquet files or if one or more of the specified files do not exist

  • RuntimeError – Raised if one or more of the specified files cannot be opened. if ‘allow_errors’ is true, this may be raised if no values are returned from the server.

  • TypeError – Raise if the server returns an unknown arkouda_type

Notes

  • CSV format is not currently supported by load/load_all operations

  • The column delimiter is expected to be the same for all column names and data

  • Be sure that column delimiters are not found within your data.

  • All CSV files must delimit rows using newline (”n”) at this time.

to_cuda()[source]

Convert the array to a Numba DeviceND array, transferring array data from the arkouda server to Python via ndarray. If the array exceeds a builtin size limit, a RuntimeError is raised.

Returns:

A Numba ndarray with the same attributes and data as the pdarray; on GPU

Return type:

numba.DeviceNDArray

Raises:
  • ImportError – Raised if CUDA is not available

  • ModuleNotFoundError – Raised if Numba is either not installed or not enabled

  • RuntimeError – Raised if there is a server-side error thrown in the course of retrieving the pdarray.

Notes

The number of bytes in the array cannot exceed client.maxTransferBytes, otherwise a RuntimeError will be raised. This is to protect the user from overflowing the memory of the system on which the Python client is running, under the assumption that the server is running on a distributed system with much more memory than the client. The user may override this limit by setting client.maxTransferBytes to a larger value, but proceed with caution.

See also

array

Examples

>>> a = ak.arange(0, 5, 1)
>>> a.to_cuda()
array([0, 1, 2, 3, 4])
>>> type(a.to_cuda())
numpy.devicendarray
to_hdf(prefix_path: str, dataset: str = 'array', mode: str = 'truncate', file_type: str = 'distribute') str[source]

Save the pdarray to HDF5. The object can be saved to a collection of files or single file.

Parameters:
  • prefix_path (str) – Directory and filename prefix that all output files share

  • dataset (str) – Name of the dataset to create in files (must not already exist)

  • mode (str {'truncate' | 'append'}) – By default, truncate (overwrite) output files, if they exist. If ‘append’, attempt to create new dataset in existing files.

  • file_type (str ("single" | "distribute")) – Default: “distribute” When set to single, dataset is written to a single file. When distribute, dataset is written on a file per locale. This is only supported by HDF5 files and will have no impact of Parquet Files.

Return type:

string message indicating result of save operation

Raises:

RuntimeError – Raised if a server-side error is thrown saving the pdarray

Notes

  • The prefix_path must be visible to the arkouda server and the user must

have write permission. - Output files have names of the form <prefix_path>_LOCALE<i>, where <i> ranges from 0 to numLocales for file_type=’distribute’. Otherwise, the file name will be prefix_path. - If any of the output files already exist and the mode is ‘truncate’, they will be overwritten. If the mode is ‘append’ and the number of output files is less than the number of locales or a dataset with the same name already exists, a RuntimeError will result. - Any file extension can be used.The file I/O does not rely on the extension to determine the file format.

Examples

>>> a = ak.arange(25)
>>> # Saving without an extension
>>> a.to_hdf('path/prefix', dataset='array')
Saves the array to numLocales HDF5 files with the name ``cwd/path/name_prefix_LOCALE####``
>>> # Saving with an extension (HDF5)
>>> a.to_hdf('path/prefix.h5', dataset='array')
Saves the array to numLocales HDF5 files with the name
``cwd/path/name_prefix_LOCALE####.h5`` where #### is replaced by each locale number
>>> # Saving to a single file
>>> a.to_hdf('path/prefix.hdf5', dataset='array', file_type='single')
Saves the array in to single hdf5 file on the root node.
``cwd/path/name_prefix.hdf5``
to_list() List[source]

Convert the array to a list, transferring array data from the Arkouda server to client-side Python. Note: if the pdarray size exceeds client.maxTransferBytes, a RuntimeError is raised.

Returns:

A list with the same data as the pdarray

Return type:

list

Raises:

RuntimeError – Raised if there is a server-side error thrown, if the pdarray size exceeds the built-in client.maxTransferBytes size limit, or if the bytes received does not match expected number of bytes

Notes

The number of bytes in the array cannot exceed client.maxTransferBytes, otherwise a RuntimeError will be raised. This is to protect the user from overflowing the memory of the system on which the Python client is running, under the assumption that the server is running on a distributed system with much more memory than the client. The user may override this limit by setting client.maxTransferBytes to a larger value, but proceed with caution.

See also

to_ndarray

Examples

>>> a = ak.arange(0, 5, 1)
>>> a.to_list()
[0, 1, 2, 3, 4]
>>> type(a.to_list())
<class 'list'>
to_ndarray() numpy.ndarray[source]

Convert the array to a np.ndarray, transferring array data from the Arkouda server to client-side Python. Note: if the pdarray size exceeds client.maxTransferBytes, a RuntimeError is raised.

Returns:

A numpy ndarray with the same attributes and data as the pdarray

Return type:

np.ndarray

Raises:

RuntimeError – Raised if there is a server-side error thrown, if the pdarray size exceeds the built-in client.maxTransferBytes size limit, or if the bytes received does not match expected number of bytes

Notes

The number of bytes in the array cannot exceed client.maxTransferBytes, otherwise a RuntimeError will be raised. This is to protect the user from overflowing the memory of the system on which the Python client is running, under the assumption that the server is running on a distributed system with much more memory than the client. The user may override this limit by setting client.maxTransferBytes to a larger value, but proceed with caution.

See also

array, to_list

Examples

>>> a = ak.arange(0, 5, 1)
>>> a.to_ndarray()
array([0, 1, 2, 3, 4])
>>> type(a.to_ndarray())
<class 'numpy.ndarray'>
to_parquet(prefix_path: str, dataset: str = 'array', mode: str = 'truncate', compression: str | None = None) str[source]

Save the pdarray to Parquet. The result is a collection of files, one file per locale of the arkouda server, where each filename starts with prefix_path. Each locale saves its chunk of the array to its corresponding file.

Parameters:
  • prefix_path (str) – Directory and filename prefix that all output files share

  • dataset (str) – Name of the dataset to create in files (must not already exist)

  • mode (str {'truncate' | 'append'}) – By default, truncate (overwrite) output files, if they exist. If ‘append’, attempt to create new dataset in existing files.

  • compression (str (Optional)) – (None | “snappy” | “gzip” | “brotli” | “zstd” | “lz4”) Sets the compression type used with Parquet files

Return type:

string message indicating result of save operation

Raises:

RuntimeError – Raised if a server-side error is thrown saving the pdarray

Notes

  • The prefix_path must be visible to the arkouda server and the user must

have write permission. - Output files have names of the form <prefix_path>_LOCALE<i>, where <i> ranges from 0 to numLocales for file_type=’distribute’. - ‘append’ write mode is supported, but is not efficient. - If any of the output files already exist and the mode is ‘truncate’, they will be overwritten. If the mode is ‘append’ and the number of output files is less than the number of locales or a dataset with the same name already exists, a RuntimeError will result. - Any file extension can be used.The file I/O does not rely on the extension to determine the file format.

Examples

>>> a = ak.arange(25)
>>> # Saving without an extension
>>> a.to_parquet('path/prefix', dataset='array')
Saves the array to numLocales HDF5 files with the name ``cwd/path/name_prefix_LOCALE####``
>>> # Saving with an extension (HDF5)
>>> a.to_parqet('path/prefix.parquet', dataset='array')
Saves the array to numLocales HDF5 files with the name
``cwd/path/name_prefix_LOCALE####.parquet`` where #### is replaced by each locale number
transfer(hostname: str, port: arkouda.numpy.dtypes.int_scalars)[source]

Sends a pdarray to a different Arkouda server

Parameters:
  • hostname (str) – The hostname where the Arkouda server intended to receive the pdarray is running.

  • port (int_scalars) – The port to send the array over. This needs to be an open port (i.e., not one that the Arkouda server is running on). This will open up numLocales ports, each of which in succession, so will use ports of the range {port..(port+numLocales)} (e.g., running an Arkouda server of 4 nodes, port 1234 is passed as port, Arkouda will use ports 1234, 1235, 1236, and 1237 to send the array data). This port much match the port passed to the call to ak.receive_array().

Return type:

A message indicating a complete transfer

Raises:
  • ValueError – Raised if the op is not within the pdarray.BinOps set

  • TypeError – Raised if other is not a pdarray or the pdarray.dtype is not a supported dtype

unregister() None[source]

Unregister a pdarray in the arkouda server which was previously registered using register() and/or attahced to using attach()

Return type:

None

Raises:

RuntimeError – Raised if the server could not find the internal name/symbol to remove

Notes

Registered names/pdarrays in the server are immune to deletion until they are unregistered.

Examples

>>> a = zeros(100)
>>> a.register("my_zeros")
>>> # potentially disconnect from server and reconnect to server
>>> b = ak.pdarray.attach("my_zeros")
>>> # ...other work...
>>> b.unregister()
update_hdf(prefix_path: str, dataset: str = 'array', repack: bool = True)[source]

Overwrite the dataset with the name provided with this pdarray. If the dataset does not exist it is added

Parameters:
  • prefix_path (str) – Directory and filename prefix that all output files share

  • dataset (str) – Name of the dataset to create in files

  • repack (bool) – Default: True HDF5 does not release memory on delete. When True, the inaccessible data (that was overwritten) is removed. When False, the data remains, but is inaccessible. Setting to false will yield better performance, but will cause file sizes to expand.

Return type:

str - success message if successful

Raises:

RuntimeError – Raised if a server-side error is thrown saving the pdarray

Notes

  • If file does not contain File_Format attribute to indicate how it was saved, the file name is checked for _LOCALE#### to determine if it is distributed.

  • If the dataset provided does not exist, it will be added

value_counts()[source]

Count the occurrences of the unique values of self.

Returns:

  • unique_values (pdarray) – The unique values, sorted in ascending order

  • counts (pdarray, int64) – The number of times the corresponding unique value occurs

Examples

>>> ak.array([2, 0, 2, 4, 0, 0]).value_counts()
(array([0, 2, 4]), array([3, 2, 1]))
var(ddof: arkouda.numpy.dtypes.int_scalars = 0) numpy.float64[source]

Compute the variance. See arkouda.var for details.

class arkouda.numpy.pdarray(name: str, mydtype: numpy.dtype | str, size: arkouda.numpy.dtypes.int_scalars, ndim: arkouda.numpy.dtypes.int_scalars, shape: Sequence[int], itemsize: arkouda.numpy.dtypes.int_scalars, max_bits: int | None = None)[source]

The basic arkouda array class. This class contains only the attributes of the array; the data resides on the arkouda server. When a server operation results in a new array, arkouda will create a pdarray instance that points to the array data on the server. As such, the user should not initialize pdarray instances directly.

name

The server-side identifier for the array

Type:

str

dtype

The element type of the array

Type:

dtype

size

The number of elements in the array

Type:

int_scalars

ndim

The rank of the array

Type:

int_scalars

shape

A list or tuple containing the sizes of each dimension of the array

Type:

Sequence[int]

itemsize

The size in bytes of each element

Type:

int_scalars

BinOps
OpEqOps
all(axis: int | Tuple[int, Ellipsis] | None = None, keepdims: bool = False) arkouda.numpy.dtypes.bool_scalars | pdarray[source]

Return True iff all elements of the array along the given axis evaluate to True.

Parameters:
  • axis (int, Tuple[int, ...], optional, default = None) – The axis or axes along which to do the operation If None, the computation is done across the entire array.

  • keepdims (bool, optional, default = False) – Whether to keep the singleton dimension(s) along axis in the result.

Returns:

boolean if axis is omitted, pdarray if axis is supplied

Return type:

boolean or pdarray

Examples

>>> ak.all(ak.array([True,False,False]))
False
>>> ak.all(ak.array([[True,True,False],[False,True,True]]),axis=0)
array([False True False])
>>> ak.all(ak.array([[True,True,True],[False,False,False]]),axis=0,keepdims=True)
array([array([False False False])])
>>> ak.all(ak.array([[True,True,True],[False,False,False]]),axis=1,keepdims=True)
array([array([True]) array([False])])
>>> ak.array([True,False,False]).all()
False
Raises:
  • TypeError – Raised if pda is not a pdarray instance

  • RuntimeError – Raised if there’s a server-side error thrown

Notes

Works as a method of a pdarray (e.g. a.any()) or a standalone function (e.g. ak.all(a))

any(axis: int | Tuple[int, Ellipsis] | None = None, keepdims: bool = False) arkouda.numpy.dtypes.bool_scalars | pdarray[source]

Return True iff any element of the array along the given axis evaluates to True.

Parameters:
  • axis (int, Tuple[int, ...], optional, default = None) – The axis or axes along which to do the operation If None, the computation is done across the entire array.

  • keepdims (bool, optional, default = False) – Whether to keep the singleton dimension(s) along axis in the result.

Returns:

boolean if axis is omitted, else pdarray if axis is supplied

Return type:

boolean or pdarray

Examples

>>> ak.any(ak.array([True,False,False]))
True
>>> ak.any(ak.array([[True,True,False],[False,True,True]]),axis=0)
array([True True True])
>>> ak.any(ak.array([[True,True,True],[False,False,False]]),axis=0,keepdims=True)
array([array([True True True])])
>>> ak.any(ak.array([[True,True,True],[False,False,False]]),axis=1,keepdims=True)
array([array([True]) array([False])])
>>> ak.array([True,False,False]).any()
True
Raises:
  • TypeError – Raised if pda is not a pdarray instance

  • RuntimeError – Raised if there’s a server-side error thrown

Notes

Works as a method of a pdarray (e.g. a.any()) or a standalone function (e.g. ak.any(a))

argmax(axis: int | None | None = None, keepdims: bool = False) numpy.int64 | numpy.uint64 | pdarray[source]

Return index of the first occurrence of the maximum along the given axis.

Parameters:
  • axis (int, Tuple[int, ...], optional, default = None) – The axis or axes along which to do the operation If None, the computation is done across the entire array.

  • keepdims (bool, optional, default = False) – Whether to keep the singleton dimension(s) along axis in the result.

Returns:

int64 or uint64 if axis is omitted, in which case operation is done over entire array pdarray if axis is supplied, in which case the operation is done along that axis

Return type:

int64, uint64 or pdarray

Raises:
  • TypeError – Raised if pda is not a pdarray instance

  • RuntimeError – Raised if there’s a server-side error thrown

Examples

>>> ak.argmax(ak.array([1,2,3,4,5]))
4
>>> ak.argmax(ak.array([5.5,4.5,3.5,2.5,1.5]))
0
>>> ak.array([[1,2,3],[5,4,3]]).argmax(axis=1)
array([2 0])

Notes

Works as a method of a pdarray (e.g. a.argmax()) or a standalone function (e.g. ak.argmax(a))

argmaxk(k: arkouda.numpy.dtypes.int_scalars) pdarray[source]

Finds the indices corresponding to the k maximum values of an array. See arkouda.argmaxk for details.

argmin(axis: int | None | None = None, keepdims: bool = False) numpy.int64 | numpy.uint64 | pdarray[source]

Return index of the first occurrence of the minimum along the given axis.

Parameters:
  • axis (int, Tuple[int, ...], optional, default = None) – The axis or axes along which to do the operation If None, the computation is done across the entire array.

  • keepdims (bool, optional, default = False) – Whether to keep the singleton dimension(s) along axis in the result.

Returns:

int64 or uint64 if axis is omitted, in which case operation is done over entire array pdarray if axis is supplied, in which case the operation is done along that axis

Return type:

int64, uint64 or pdarray

Raises:
  • TypeError – Raised if pda is not a pdarray instance

  • RuntimeError – Raised if there’s a server-side error thrown

Examples

>>> ak.argmin(ak.array([1,2,3,4,5]))
0
>>> ak.argmin(ak.array([5.5,4.5,3.5,2.5,1.5]))
4
>>> ak.array([[1,2,3],[5,4,3]]).argmin(axis=1)
array([0 2])

Notes

Works as a method of a pdarray (e.g. a.argmin()) or a standalone function (e.g. ak.argmin(a))

argmink(k: arkouda.numpy.dtypes.int_scalars) pdarray[source]

Finds the indices corresponding to the k minimum values of an array. See arkouda.argmink for details.

astype(dtype) pdarray[source]

Cast values of pdarray to provided dtype

Parameters:

dtype (np.dtype or str) – Dtype to cast to

Examples

>>> ak.array([1,2,3]).astype(ak.float64)
array([1.00000000000000000 2.00000000000000000 3.00000000000000000])
>>> ak.array([1.5,2.5]).astype(ak.int64)
array([1 2])
>>> ak.array([True,False]).astype(ak.int64)
array([1 0])
Returns:

An arkouda pdarray with values converted to the specified data type

Return type:

ak.pdarray

Notes

This is essentially shorthand for ak.cast(x, ‘<dtype>’) where x is a pdarray.

static attach(user_defined_name: str) pdarray[source]

class method to return a pdarray attached to the registered name in the arkouda server which was registered using register()

Parameters:

user_defined_name (str) – user defined name which array was registered under

Returns:

pdarray which is bound to the corresponding server side component which was registered with user_defined_name

Return type:

pdarray

Raises:

TypeError – Raised if user_defined_name is not a str

Notes

Registered names/pdarrays in the server are immune to deletion until they are unregistered.

Examples

>>> a = zeros(100)
>>> a.register("my_zeros")
>>> # potentially disconnect from server and reconnect to server
>>> b = ak.pdarray.attach("my_zeros")
>>> # ...other work...
>>> b.unregister()
bigint_to_uint_arrays() List[pdarray][source]

Creates a list of uint pdarrays from a bigint pdarray. The first item in return will be the highest 64 bits of the bigint pdarray and the last item will be the lowest 64 bits.

Returns:

A list of uint pdarrays where: The first item in return will be the highest 64 bits of the bigint pdarray and the last item will be the lowest 64 bits.

Return type:

List[pdarrays]

Raises:

RuntimeError – Raised if there is a server-side error thrown

Examples

>>> a = ak.arange(2**64, 2**64 + 5)
>>> a
array([18446744073709551616 18446744073709551617 18446744073709551618
18446744073709551619 18446744073709551620])
>>> a.bigint_to_uint_arrays()
[array([1 1 1 1 1]), array([0 1 2 3 4])]
clz() pdarray[source]

Count the number of leading zeros in each element. See ak.clz.

corr(y: pdarray) numpy.float64[source]

Compute the correlation between self and y using pearson correlation coefficient. See arkouda.corr for details.

cov(y: pdarray) numpy.float64[source]

Compute the covariance between self and y.

ctz() pdarray[source]

Count the number of trailing zeros in each element. See ak.ctz.

dtype
equals(other) arkouda.numpy.dtypes.bool_scalars[source]

Whether pdarrays are the same size and all entries are equal.

Parameters:

other (object) – object to compare.

Returns:

True if the pdarrays are the same, o.w. False.

Return type:

bool

Examples

>>> a = ak.array([1, 2, 3])
>>> a_cpy = ak.array([1, 2, 3])
>>> a.equals(a_cpy)
True
>>> a2 = ak.array([1, 2, 5)
>>> a.equals(a2)
False
fill(value: arkouda.numpy.dtypes.numeric_scalars) None[source]

Fill the array (in place) with a constant value.

Parameters:

value (numeric_scalars)

Raises:

TypeError – Raised if value is not an int, int64, float, or float64

flatten()[source]

Return a copy of the array collapsed into one dimension.

Return type:

A copy of the input array, flattened to one dimension.

Examples

>>> a = ak.array([[3,2,1],[2,3,1]])
>>> a.flatten()
array([3 2 1 2 3 1])
format_other(other) str[source]

Attempt to cast scalar other to the element dtype of this pdarray, and print the resulting value to a string (e.g. for sending to a server command). The user should not call this function directly.

Parameters:

other (object) – The scalar to be cast to the pdarray.dtype

Return type:

string representation of np.dtype corresponding to the other parameter

Raises:

TypeError – Raised if the other parameter cannot be converted to Numpy dtype

property inferred_type: str | None

Return a string of the type inferred from the values.

info() str[source]

Returns a JSON formatted string containing information about all components of self

Parameters:

None

Returns:

JSON string containing information about all components of self

Return type:

str

is_registered() numpy.bool_[source]

Return True iff the object is contained in the registry

Parameters:

None

Returns:

Indicates if the object is contained in the registry

Return type:

bool

Raises:

RuntimeError – Raised if there’s a server-side error thrown

Note

This will return True if the object is registered itself or as a component of another object

is_sorted(axis: int | Tuple[int, Ellipsis] | None = None, keepdims: bool = False) arkouda.numpy.dtypes.bool_scalars | pdarray[source]

Return True iff the array (or given axis of the array) is monotonically non-decreasing.

Parameters:
  • axis (int, Tuple[int, ...], optional, default = None) – The axis or axes along which to do the operation If None, the computation is done across the entire array.

  • keepdims (bool, optional, default = False) – Whether to keep the singleton dimension(s) along axis in the result.

Returns:

boolean if axis is omitted, else pdarray if axis is supplied

Return type:

boolean or pdarray

Raises:
  • TypeError – Raised if pda is not a pdarray instance

  • RuntimeError – Raised if there’s a server-side error thrown

Examples

>>> ak.is_sorted(ak.array([1,2,3,4,5]))
True
>>> ak.is_sorted(ak.array([5,4,3,2,1]))
False
>>> ak.array([[1,2,3],[5,4,3]]).is_sorted(axis=1)
array([True False])

Notes

Works as a method of a pdarray (e.g. a.is_sorted()) or a standalone function (e.g. ak.is_sorted(a))

itemsize
max(axis: int | Tuple[int, Ellipsis] | None = None, keepdims: bool = False) arkouda.numpy.dtypes.numpy_scalars | pdarray[source]

Return max of array elements along the given axis.

Parameters:
  • axis (int, Tuple[int, ...], optional, default = None) – The axis or axes along which to do the operation If None, the computation is done across the entire array.

  • keepdims (bool, optional, default = False) – Whether to keep the singleton dimension(s) along axis in the result.

Returns:

numpy_scalar if axis is omitted, in which case operation is done over entire array pdarray if axis is supplied, in which case the operation is done along that axis

Return type:

numpy_scalar or pdarray

Raises:
  • TypeError – Raised if pda is not a pdarray instance

  • RuntimeError – Raised if there’s a server-side error thrown

Examples

>>> ak.max(ak.array([1,2,3,4,5]))
5
>>> ak.max(ak.array([5.5,4.5,3.5,2.5,1.5]))
5.5
>>> ak.array([[1,2,3],[5,4,3]]).max(axis=1)
array([3 5])

Notes

Works as a method of a pdarray (e.g. a.max()) or a standalone function (e.g. ak.max(a))

property max_bits
maxk(k: arkouda.numpy.dtypes.int_scalars) pdarray[source]

Compute the maximum “k” values. See arkouda.maxk for details.

mean() numpy.float64[source]

Compute the mean. See arkouda.mean for details.

min(axis: int | Tuple[int, Ellipsis] | None = None, keepdims: bool = False) arkouda.numpy.dtypes.numpy_scalars | pdarray[source]

Return min of array elements along the given axis.

Parameters:
  • axis (int, Tuple[int, ...], optional, default = None) – The axis or axes along which to do the operation If None, the computation is done across the entire array.

  • keepdims (bool, optional, default = False) – Whether to keep the singleton dimension(s) along axis in the result.

Returns:

numpy_scalar if axis is omitted, in which case operation is done over entire array pdarray if axis is supplied, in which case the operation is done along that axis

Return type:

numpy_scalar or pdarray

Raises:
  • TypeError – Raised if pda is not a pdarray instance

  • RuntimeError – Raised if there’s a server-side error thrown

Examples

>>> ak.min(ak.array([1,2,3,4,5]))
1
>>> ak.min(ak.array([5.5,4.5,3.5,2.5,1.5]))
1.5
>>> ak.array([[1,2,3],[5,4,3]]).min(axis=1)
array([1 3])

Notes

Works as a method of a pdarray (e.g. a.min()) or a standalone function (e.g. ak.min(a))

mink(k: arkouda.numpy.dtypes.int_scalars) pdarray[source]

Compute the minimum “k” values. See arkouda.mink for details.

name
property nbytes

The size of the pdarray in bytes.

Returns:

The size of the pdarray in bytes.

Return type:

int

ndim
objType = 'pdarray'
opeq(other, op)[source]
parity() pdarray[source]

Find the parity (XOR of all bits) in each element. See ak.parity.

popcount() pdarray[source]

Find the population (number of bits set) in each element. See ak.popcount.

pretty_print_info() None[source]

Prints information about all components of self in a human readable format

Parameters:

None

Return type:

None

prod(axis: int | Tuple[int, Ellipsis] | None = None, keepdims: bool = False) arkouda.numpy.dtypes.numpy_scalars | pdarray[source]

Return prod of array elements along the given axis.

Parameters:
  • axis (int, Tuple[int, ...], optional, defalt = None) – The axis or axes along which to do the operation If None, the computation is done across the entire array.

  • keepdims (bool, optional, default = False) – Whether to keep the singleton dimension(s) along axis in the result.

Returns:

numpy_scalar if axis is omitted, in which case operation is done over entire array pdarray if axis is supplied, in which case the operation is done along that axis

Return type:

numpy_scalar or pdarray

Raises:
  • TypeError – Raised if pda is not a pdarray instance

  • RuntimeError – Raised if there’s a server-side error thrown

Examples

>>> ak.prod(ak.array([1,2,3,4,5]))
120
>>> ak.prod(ak.array([5.5,4.5,3.5,2.5,1.5]))
324.84375
>>> ak.array([[1,2,3],[5,4,3]]).prod(axis=1)
array([6 60])

Notes

Works as a method of a pdarray (e.g. a.prod()) or a standalone function (e.g. ak.prod(a))

register(user_defined_name: str) pdarray[source]

Register this pdarray with a user defined name in the arkouda server so it can be attached to later using pdarray.attach() This is an in-place operation, registering a pdarray more than once will update the name in the registry and remove the previously registered name. A name can only be registered to one pdarray at a time.

Parameters:

user_defined_name (str) – user defined name array is to be registered under

Returns:

The same pdarray which is now registered with the arkouda server and has an updated name. This is an in-place modification, the original is returned to support a fluid programming style. Please note you cannot register two different pdarrays with the same name.

Return type:

pdarray

Raises:
  • TypeError – Raised if user_defined_name is not a str

  • RegistrationError – If the server was unable to register the pdarray with the user_defined_name If the user is attempting to register more than one pdarray with the same name, the former should be unregistered first to free up the registration name.

Notes

Registered names/pdarrays in the server are immune to deletion until they are unregistered.

Examples

>>> a = zeros(100)
>>> a.register("my_zeros")
>>> # potentially disconnect from server and reconnect to server
>>> b = ak.pdarray.attach("my_zeros")
>>> # ...other work...
>>> b.unregister()
registered_name: str | None = None
reshape(*shape)[source]

Gives a new shape to an array without changing its data.

Parameters:

shape (int, tuple of ints, or pdarray) – The new shape should be compatible with the original shape.

Returns:

a pdarray with the same data, reshaped to the new shape

Return type:

pdarray

Examples

>>> a = ak.array([[3,2,1],[2,3,1]])
>>> a.reshape((3,2))
array([array([3 2]) array([1 2]) array([3 1])])
>>> a.reshape(3,2)
array([array([3 2]) array([1 2]) array([3 1])])
>>> a.reshape((6,1))
array([array([3]) array([2]) array([1]) array([2]) array([3]) array([1])])

Notes

only available as a method, not as a standalone function, i.e., a.reshape(compatibleShape) is valid, but ak.reshape(a,compatibleShape) is not.

rotl(other) pdarray[source]

Rotate bits left by <other>.

rotr(other) pdarray[source]

Rotate bits right by <other>.

save(prefix_path: str, dataset: str = 'array', mode: str = 'truncate', compression: str | None = None, file_format: str = 'HDF5', file_type: str = 'distribute') str[source]

DEPRECATED Save the pdarray to HDF5 or Parquet. The result is a collection of files, one file per locale of the arkouda server, where each filename starts with prefix_path. HDF5 support single files, in which case the file name will only be that provided. Each locale saves its chunk of the array to its corresponding file.

Parameters:
  • prefix_path (str) – Directory and filename prefix that all output files share

  • dataset (str) – Name of the dataset to create in files (must not already exist)

  • mode (str {'truncate' | 'append'}) – By default, truncate (overwrite) output files, if they exist. If ‘append’, attempt to create new dataset in existing files.

  • compression (str (Optional)) – (None | “snappy” | “gzip” | “brotli” | “zstd” | “lz4”) Sets the compression type used with Parquet files

  • file_format (str {'HDF5', 'Parquet'}) – By default, saved files will be written to the HDF5 file format. If ‘Parquet’, the files will be written to the Parquet file format. This is case insensitive.

  • file_type (str ("single" | "distribute")) – Default: “distribute” When set to single, dataset is written to a single file. When distribute, dataset is written on a file per locale. This is only supported by HDF5 files and will have no impact of Parquet Files.

Return type:

string message indicating result of save operation

Raises:
  • RuntimeError – Raised if a server-side error is thrown saving the pdarray

  • ValueError – Raised if there is an error in parsing the prefix path pointing to file write location or if the mode parameter is neither truncate nor append

  • TypeError – Raised if any one of the prefix_path, dataset, or mode parameters is not a string

See also

save_all, load, read, to_parquet, to_hdf

Notes

The prefix_path must be visible to the arkouda server and the user must have write permission. Output files have names of the form <prefix_path>_LOCALE<i>, where <i> ranges from 0 to numLocales. If any of the output files already exist and the mode is ‘truncate’, they will be overwritten. If the mode is ‘append’ and the number of output files is less than the number of locales or a dataset with the same name already exists, a RuntimeError will result. Previously all files saved in Parquet format were saved with a .parquet file extension. This will require you to use load as if you saved the file with the extension. Try this if an older file is not being found. Any file extension can be used.The file I/O does not rely on the extension to determine the file format.

Examples

>>> a = ak.arange(25)
>>> # Saving without an extension
>>> a.save('path/prefix', dataset='array')
Saves the array to numLocales HDF5 files with the name ``cwd/path/name_prefix_LOCALE####``
>>> # Saving with an extension (HDF5)
>>> a.save('path/prefix.h5', dataset='array')
Saves the array to numLocales HDF5 files with the name
``cwd/path/name_prefix_LOCALE####.h5`` where #### is replaced by each locale number
>>> # Saving with an extension (Parquet)
>>> a.save('path/prefix.parquet', dataset='array', file_format='Parquet')
Saves the array in numLocales Parquet files with the name
``cwd/path/name_prefix_LOCALE####.parquet`` where #### is replaced by each locale number
property shape

Return the shape of an array.

Returns:

The elements of the shape tuple give the lengths of the corresponding array dimensions.

Return type:

tuple of int

size
slice_bits(low, high) pdarray[source]

Returns a pdarray containing only bits from low to high of self.

This is zero indexed and inclusive on both ends, so slicing the bottom 64 bits is pda.slice_bits(0, 63)

Parameters:
  • low (int) – The lowest bit included in the slice (inclusive) zero indexed, so the first bit is 0

  • high (int) – The highest bit included in the slice (inclusive)

Returns:

A new pdarray containing the bits of self from low to high

Return type:

pdarray

Raises:

RuntimeError – Raised if there is a server-side error thrown

Examples

>>> p = ak.array([2**65 + (2**64 - 1)])
>>> bin(p[0])
'0b101111111111111111111111111111111111111111111111111111111111111111'
>>> bin(p.slice_bits(64, 65)[0])
'0b10'
>>> a = ak.array([143,15])
>>> a.slice_bits(1,3)
array([7 7])
>>> a.slice_bits(4,9)
array([8 0])
>>> a.slice_bits(1,9)
array([71 7])
std(ddof: arkouda.numpy.dtypes.int_scalars = 0) numpy.float64[source]

Compute the standard deviation. See arkouda.std for details.

sum(axis: int | Tuple[int, Ellipsis] | None = None, keepdims: bool = False) arkouda.numpy.dtypes.numpy_scalars | pdarray[source]

Return sum of array elements along the given axis.

Parameters:
  • axis (int, Tuple[int, ...], optional, default = None) – The axis or axes along which to do the operation If None, the computation is done across the entire array.

  • keepdims (bool, optional, default = False) – Whether to keep the singleton dimension(s) along axis in the result.

Returns:

numpy_scalar if axis is omitted, in which case operation is done over entire array pdarray if axis is supplied, in which case the operation is done along that axis

Return type:

numpy_scalar or pdarray

Raises:
  • TypeError – Raised if pda is not a pdarray instance

  • RuntimeError – Raised if there’s a server-side error thrown

Examples

>>> ak.sum(ak.array([1,2,3,4,5]))
15
>>> ak.sum(ak.array([5.5,4.5,3.5,2.5,1.5]))
17.5
>>> ak.array([[1,2,3],[5,4,3]]).sum(axis=1)
array([6 12])

Notes

Works as a method of a pdarray (e.g. a.sum()) or a standalone function (e.g. ak.sum(a))

to_csv(prefix_path: str, dataset: str = 'array', col_delim: str = ',', overwrite: bool = False)[source]

Write pdarry to CSV file(s). File will contain a single column with the pdarray data. All CSV files written by Arkouda include a header denoting data types of the columns.

Parameters:
  • prefix_path (str) – filename prefix to be used for saving files. Files will have _LOCALE#### appended when they are written to disk.

  • dataset (str, defaults to "array") – column name to save the pdarray under.

  • col_delim (str, defaults to ",") – value to be used to separate columns within the file. Please be sure that the value used DOES NOT appear in your dataset.

  • overwrite (bool, defaults to False) – If True, existing files matching the provided path will be overwritten. if False and existing files are found, an error will be returned.

Returns:

response message

Return type:

str

Raises:
  • ValueError – Raised if all datasets are not present in all parquet files or if one or more of the specified files do not exist

  • RuntimeError – Raised if one or more of the specified files cannot be opened. if ‘allow_errors’ is true, this may be raised if no values are returned from the server.

  • TypeError – Raise if the server returns an unknown arkouda_type

Notes

  • CSV format is not currently supported by load/load_all operations

  • The column delimiter is expected to be the same for all column names and data

  • Be sure that column delimiters are not found within your data.

  • All CSV files must delimit rows using newline (”n”) at this time.

to_cuda()[source]

Convert the array to a Numba DeviceND array, transferring array data from the arkouda server to Python via ndarray. If the array exceeds a builtin size limit, a RuntimeError is raised.

Returns:

A Numba ndarray with the same attributes and data as the pdarray; on GPU

Return type:

numba.DeviceNDArray

Raises:
  • ImportError – Raised if CUDA is not available

  • ModuleNotFoundError – Raised if Numba is either not installed or not enabled

  • RuntimeError – Raised if there is a server-side error thrown in the course of retrieving the pdarray.

Notes

The number of bytes in the array cannot exceed client.maxTransferBytes, otherwise a RuntimeError will be raised. This is to protect the user from overflowing the memory of the system on which the Python client is running, under the assumption that the server is running on a distributed system with much more memory than the client. The user may override this limit by setting client.maxTransferBytes to a larger value, but proceed with caution.

See also

array

Examples

>>> a = ak.arange(0, 5, 1)
>>> a.to_cuda()
array([0, 1, 2, 3, 4])
>>> type(a.to_cuda())
numpy.devicendarray
to_hdf(prefix_path: str, dataset: str = 'array', mode: str = 'truncate', file_type: str = 'distribute') str[source]

Save the pdarray to HDF5. The object can be saved to a collection of files or single file.

Parameters:
  • prefix_path (str) – Directory and filename prefix that all output files share

  • dataset (str) – Name of the dataset to create in files (must not already exist)

  • mode (str {'truncate' | 'append'}) – By default, truncate (overwrite) output files, if they exist. If ‘append’, attempt to create new dataset in existing files.

  • file_type (str ("single" | "distribute")) – Default: “distribute” When set to single, dataset is written to a single file. When distribute, dataset is written on a file per locale. This is only supported by HDF5 files and will have no impact of Parquet Files.

Return type:

string message indicating result of save operation

Raises:

RuntimeError – Raised if a server-side error is thrown saving the pdarray

Notes

  • The prefix_path must be visible to the arkouda server and the user must

have write permission. - Output files have names of the form <prefix_path>_LOCALE<i>, where <i> ranges from 0 to numLocales for file_type=’distribute’. Otherwise, the file name will be prefix_path. - If any of the output files already exist and the mode is ‘truncate’, they will be overwritten. If the mode is ‘append’ and the number of output files is less than the number of locales or a dataset with the same name already exists, a RuntimeError will result. - Any file extension can be used.The file I/O does not rely on the extension to determine the file format.

Examples

>>> a = ak.arange(25)
>>> # Saving without an extension
>>> a.to_hdf('path/prefix', dataset='array')
Saves the array to numLocales HDF5 files with the name ``cwd/path/name_prefix_LOCALE####``
>>> # Saving with an extension (HDF5)
>>> a.to_hdf('path/prefix.h5', dataset='array')
Saves the array to numLocales HDF5 files with the name
``cwd/path/name_prefix_LOCALE####.h5`` where #### is replaced by each locale number
>>> # Saving to a single file
>>> a.to_hdf('path/prefix.hdf5', dataset='array', file_type='single')
Saves the array in to single hdf5 file on the root node.
``cwd/path/name_prefix.hdf5``
to_list() List[source]

Convert the array to a list, transferring array data from the Arkouda server to client-side Python. Note: if the pdarray size exceeds client.maxTransferBytes, a RuntimeError is raised.

Returns:

A list with the same data as the pdarray

Return type:

list

Raises:

RuntimeError – Raised if there is a server-side error thrown, if the pdarray size exceeds the built-in client.maxTransferBytes size limit, or if the bytes received does not match expected number of bytes

Notes

The number of bytes in the array cannot exceed client.maxTransferBytes, otherwise a RuntimeError will be raised. This is to protect the user from overflowing the memory of the system on which the Python client is running, under the assumption that the server is running on a distributed system with much more memory than the client. The user may override this limit by setting client.maxTransferBytes to a larger value, but proceed with caution.

See also

to_ndarray

Examples

>>> a = ak.arange(0, 5, 1)
>>> a.to_list()
[0, 1, 2, 3, 4]
>>> type(a.to_list())
<class 'list'>
to_ndarray() numpy.ndarray[source]

Convert the array to a np.ndarray, transferring array data from the Arkouda server to client-side Python. Note: if the pdarray size exceeds client.maxTransferBytes, a RuntimeError is raised.

Returns:

A numpy ndarray with the same attributes and data as the pdarray

Return type:

np.ndarray

Raises:

RuntimeError – Raised if there is a server-side error thrown, if the pdarray size exceeds the built-in client.maxTransferBytes size limit, or if the bytes received does not match expected number of bytes

Notes

The number of bytes in the array cannot exceed client.maxTransferBytes, otherwise a RuntimeError will be raised. This is to protect the user from overflowing the memory of the system on which the Python client is running, under the assumption that the server is running on a distributed system with much more memory than the client. The user may override this limit by setting client.maxTransferBytes to a larger value, but proceed with caution.

See also

array, to_list

Examples

>>> a = ak.arange(0, 5, 1)
>>> a.to_ndarray()
array([0, 1, 2, 3, 4])
>>> type(a.to_ndarray())
<class 'numpy.ndarray'>
to_parquet(prefix_path: str, dataset: str = 'array', mode: str = 'truncate', compression: str | None = None) str[source]

Save the pdarray to Parquet. The result is a collection of files, one file per locale of the arkouda server, where each filename starts with prefix_path. Each locale saves its chunk of the array to its corresponding file.

Parameters:
  • prefix_path (str) – Directory and filename prefix that all output files share

  • dataset (str) – Name of the dataset to create in files (must not already exist)

  • mode (str {'truncate' | 'append'}) – By default, truncate (overwrite) output files, if they exist. If ‘append’, attempt to create new dataset in existing files.

  • compression (str (Optional)) – (None | “snappy” | “gzip” | “brotli” | “zstd” | “lz4”) Sets the compression type used with Parquet files

Return type:

string message indicating result of save operation

Raises:

RuntimeError – Raised if a server-side error is thrown saving the pdarray

Notes

  • The prefix_path must be visible to the arkouda server and the user must

have write permission. - Output files have names of the form <prefix_path>_LOCALE<i>, where <i> ranges from 0 to numLocales for file_type=’distribute’. - ‘append’ write mode is supported, but is not efficient. - If any of the output files already exist and the mode is ‘truncate’, they will be overwritten. If the mode is ‘append’ and the number of output files is less than the number of locales or a dataset with the same name already exists, a RuntimeError will result. - Any file extension can be used.The file I/O does not rely on the extension to determine the file format.

Examples

>>> a = ak.arange(25)
>>> # Saving without an extension
>>> a.to_parquet('path/prefix', dataset='array')
Saves the array to numLocales HDF5 files with the name ``cwd/path/name_prefix_LOCALE####``
>>> # Saving with an extension (HDF5)
>>> a.to_parqet('path/prefix.parquet', dataset='array')
Saves the array to numLocales HDF5 files with the name
``cwd/path/name_prefix_LOCALE####.parquet`` where #### is replaced by each locale number
transfer(hostname: str, port: arkouda.numpy.dtypes.int_scalars)[source]

Sends a pdarray to a different Arkouda server

Parameters:
  • hostname (str) – The hostname where the Arkouda server intended to receive the pdarray is running.

  • port (int_scalars) – The port to send the array over. This needs to be an open port (i.e., not one that the Arkouda server is running on). This will open up numLocales ports, each of which in succession, so will use ports of the range {port..(port+numLocales)} (e.g., running an Arkouda server of 4 nodes, port 1234 is passed as port, Arkouda will use ports 1234, 1235, 1236, and 1237 to send the array data). This port much match the port passed to the call to ak.receive_array().

Return type:

A message indicating a complete transfer

Raises:
  • ValueError – Raised if the op is not within the pdarray.BinOps set

  • TypeError – Raised if other is not a pdarray or the pdarray.dtype is not a supported dtype

unregister() None[source]

Unregister a pdarray in the arkouda server which was previously registered using register() and/or attahced to using attach()

Return type:

None

Raises:

RuntimeError – Raised if the server could not find the internal name/symbol to remove

Notes

Registered names/pdarrays in the server are immune to deletion until they are unregistered.

Examples

>>> a = zeros(100)
>>> a.register("my_zeros")
>>> # potentially disconnect from server and reconnect to server
>>> b = ak.pdarray.attach("my_zeros")
>>> # ...other work...
>>> b.unregister()
update_hdf(prefix_path: str, dataset: str = 'array', repack: bool = True)[source]

Overwrite the dataset with the name provided with this pdarray. If the dataset does not exist it is added

Parameters:
  • prefix_path (str) – Directory and filename prefix that all output files share

  • dataset (str) – Name of the dataset to create in files

  • repack (bool) – Default: True HDF5 does not release memory on delete. When True, the inaccessible data (that was overwritten) is removed. When False, the data remains, but is inaccessible. Setting to false will yield better performance, but will cause file sizes to expand.

Return type:

str - success message if successful

Raises:

RuntimeError – Raised if a server-side error is thrown saving the pdarray

Notes

  • If file does not contain File_Format attribute to indicate how it was saved, the file name is checked for _LOCALE#### to determine if it is distributed.

  • If the dataset provided does not exist, it will be added

value_counts()[source]

Count the occurrences of the unique values of self.

Returns:

  • unique_values (pdarray) – The unique values, sorted in ascending order

  • counts (pdarray, int64) – The number of times the corresponding unique value occurs

Examples

>>> ak.array([2, 0, 2, 4, 0, 0]).value_counts()
(array([0, 2, 4]), array([3, 2, 1]))
var(ddof: arkouda.numpy.dtypes.int_scalars = 0) numpy.float64[source]

Compute the variance. See arkouda.var for details.

class arkouda.numpy.pdarray(name: str, mydtype: numpy.dtype | str, size: arkouda.numpy.dtypes.int_scalars, ndim: arkouda.numpy.dtypes.int_scalars, shape: Sequence[int], itemsize: arkouda.numpy.dtypes.int_scalars, max_bits: int | None = None)[source]

The basic arkouda array class. This class contains only the attributes of the array; the data resides on the arkouda server. When a server operation results in a new array, arkouda will create a pdarray instance that points to the array data on the server. As such, the user should not initialize pdarray instances directly.

name

The server-side identifier for the array

Type:

str

dtype

The element type of the array

Type:

dtype

size

The number of elements in the array

Type:

int_scalars

ndim

The rank of the array

Type:

int_scalars

shape

A list or tuple containing the sizes of each dimension of the array

Type:

Sequence[int]

itemsize

The size in bytes of each element

Type:

int_scalars

BinOps
OpEqOps
all(axis: int | Tuple[int, Ellipsis] | None = None, keepdims: bool = False) arkouda.numpy.dtypes.bool_scalars | pdarray[source]

Return True iff all elements of the array along the given axis evaluate to True.

Parameters:
  • axis (int, Tuple[int, ...], optional, default = None) – The axis or axes along which to do the operation If None, the computation is done across the entire array.

  • keepdims (bool, optional, default = False) – Whether to keep the singleton dimension(s) along axis in the result.

Returns:

boolean if axis is omitted, pdarray if axis is supplied

Return type:

boolean or pdarray

Examples

>>> ak.all(ak.array([True,False,False]))
False
>>> ak.all(ak.array([[True,True,False],[False,True,True]]),axis=0)
array([False True False])
>>> ak.all(ak.array([[True,True,True],[False,False,False]]),axis=0,keepdims=True)
array([array([False False False])])
>>> ak.all(ak.array([[True,True,True],[False,False,False]]),axis=1,keepdims=True)
array([array([True]) array([False])])
>>> ak.array([True,False,False]).all()
False
Raises:
  • TypeError – Raised if pda is not a pdarray instance

  • RuntimeError – Raised if there’s a server-side error thrown

Notes

Works as a method of a pdarray (e.g. a.any()) or a standalone function (e.g. ak.all(a))

any(axis: int | Tuple[int, Ellipsis] | None = None, keepdims: bool = False) arkouda.numpy.dtypes.bool_scalars | pdarray[source]

Return True iff any element of the array along the given axis evaluates to True.

Parameters:
  • axis (int, Tuple[int, ...], optional, default = None) – The axis or axes along which to do the operation If None, the computation is done across the entire array.

  • keepdims (bool, optional, default = False) – Whether to keep the singleton dimension(s) along axis in the result.

Returns:

boolean if axis is omitted, else pdarray if axis is supplied

Return type:

boolean or pdarray

Examples

>>> ak.any(ak.array([True,False,False]))
True
>>> ak.any(ak.array([[True,True,False],[False,True,True]]),axis=0)
array([True True True])
>>> ak.any(ak.array([[True,True,True],[False,False,False]]),axis=0,keepdims=True)
array([array([True True True])])
>>> ak.any(ak.array([[True,True,True],[False,False,False]]),axis=1,keepdims=True)
array([array([True]) array([False])])
>>> ak.array([True,False,False]).any()
True
Raises:
  • TypeError – Raised if pda is not a pdarray instance

  • RuntimeError – Raised if there’s a server-side error thrown

Notes

Works as a method of a pdarray (e.g. a.any()) or a standalone function (e.g. ak.any(a))

argmax(axis: int | None | None = None, keepdims: bool = False) numpy.int64 | numpy.uint64 | pdarray[source]

Return index of the first occurrence of the maximum along the given axis.

Parameters:
  • axis (int, Tuple[int, ...], optional, default = None) – The axis or axes along which to do the operation If None, the computation is done across the entire array.

  • keepdims (bool, optional, default = False) – Whether to keep the singleton dimension(s) along axis in the result.

Returns:

int64 or uint64 if axis is omitted, in which case operation is done over entire array pdarray if axis is supplied, in which case the operation is done along that axis

Return type:

int64, uint64 or pdarray

Raises:
  • TypeError – Raised if pda is not a pdarray instance

  • RuntimeError – Raised if there’s a server-side error thrown

Examples

>>> ak.argmax(ak.array([1,2,3,4,5]))
4
>>> ak.argmax(ak.array([5.5,4.5,3.5,2.5,1.5]))
0
>>> ak.array([[1,2,3],[5,4,3]]).argmax(axis=1)
array([2 0])

Notes

Works as a method of a pdarray (e.g. a.argmax()) or a standalone function (e.g. ak.argmax(a))

argmaxk(k: arkouda.numpy.dtypes.int_scalars) pdarray[source]

Finds the indices corresponding to the k maximum values of an array. See arkouda.argmaxk for details.

argmin(axis: int | None | None = None, keepdims: bool = False) numpy.int64 | numpy.uint64 | pdarray[source]

Return index of the first occurrence of the minimum along the given axis.

Parameters:
  • axis (int, Tuple[int, ...], optional, default = None) – The axis or axes along which to do the operation If None, the computation is done across the entire array.

  • keepdims (bool, optional, default = False) – Whether to keep the singleton dimension(s) along axis in the result.

Returns:

int64 or uint64 if axis is omitted, in which case operation is done over entire array pdarray if axis is supplied, in which case the operation is done along that axis

Return type:

int64, uint64 or pdarray

Raises:
  • TypeError – Raised if pda is not a pdarray instance

  • RuntimeError – Raised if there’s a server-side error thrown

Examples

>>> ak.argmin(ak.array([1,2,3,4,5]))
0
>>> ak.argmin(ak.array([5.5,4.5,3.5,2.5,1.5]))
4
>>> ak.array([[1,2,3],[5,4,3]]).argmin(axis=1)
array([0 2])

Notes

Works as a method of a pdarray (e.g. a.argmin()) or a standalone function (e.g. ak.argmin(a))

argmink(k: arkouda.numpy.dtypes.int_scalars) pdarray[source]

Finds the indices corresponding to the k minimum values of an array. See arkouda.argmink for details.

astype(dtype) pdarray[source]

Cast values of pdarray to provided dtype

Parameters:

dtype (np.dtype or str) – Dtype to cast to

Examples

>>> ak.array([1,2,3]).astype(ak.float64)
array([1.00000000000000000 2.00000000000000000 3.00000000000000000])
>>> ak.array([1.5,2.5]).astype(ak.int64)
array([1 2])
>>> ak.array([True,False]).astype(ak.int64)
array([1 0])
Returns:

An arkouda pdarray with values converted to the specified data type

Return type:

ak.pdarray

Notes

This is essentially shorthand for ak.cast(x, ‘<dtype>’) where x is a pdarray.

static attach(user_defined_name: str) pdarray[source]

class method to return a pdarray attached to the registered name in the arkouda server which was registered using register()

Parameters:

user_defined_name (str) – user defined name which array was registered under

Returns:

pdarray which is bound to the corresponding server side component which was registered with user_defined_name

Return type:

pdarray

Raises:

TypeError – Raised if user_defined_name is not a str

Notes

Registered names/pdarrays in the server are immune to deletion until they are unregistered.

Examples

>>> a = zeros(100)
>>> a.register("my_zeros")
>>> # potentially disconnect from server and reconnect to server
>>> b = ak.pdarray.attach("my_zeros")
>>> # ...other work...
>>> b.unregister()
bigint_to_uint_arrays() List[pdarray][source]

Creates a list of uint pdarrays from a bigint pdarray. The first item in return will be the highest 64 bits of the bigint pdarray and the last item will be the lowest 64 bits.

Returns:

A list of uint pdarrays where: The first item in return will be the highest 64 bits of the bigint pdarray and the last item will be the lowest 64 bits.

Return type:

List[pdarrays]

Raises:

RuntimeError – Raised if there is a server-side error thrown

Examples

>>> a = ak.arange(2**64, 2**64 + 5)
>>> a
array([18446744073709551616 18446744073709551617 18446744073709551618
18446744073709551619 18446744073709551620])
>>> a.bigint_to_uint_arrays()
[array([1 1 1 1 1]), array([0 1 2 3 4])]
clz() pdarray[source]

Count the number of leading zeros in each element. See ak.clz.

corr(y: pdarray) numpy.float64[source]

Compute the correlation between self and y using pearson correlation coefficient. See arkouda.corr for details.

cov(y: pdarray) numpy.float64[source]

Compute the covariance between self and y.

ctz() pdarray[source]

Count the number of trailing zeros in each element. See ak.ctz.

dtype
equals(other) arkouda.numpy.dtypes.bool_scalars[source]

Whether pdarrays are the same size and all entries are equal.

Parameters:

other (object) – object to compare.

Returns:

True if the pdarrays are the same, o.w. False.

Return type:

bool

Examples

>>> a = ak.array([1, 2, 3])
>>> a_cpy = ak.array([1, 2, 3])
>>> a.equals(a_cpy)
True
>>> a2 = ak.array([1, 2, 5)
>>> a.equals(a2)
False
fill(value: arkouda.numpy.dtypes.numeric_scalars) None[source]

Fill the array (in place) with a constant value.

Parameters:

value (numeric_scalars)

Raises:

TypeError – Raised if value is not an int, int64, float, or float64

flatten()[source]

Return a copy of the array collapsed into one dimension.

Return type:

A copy of the input array, flattened to one dimension.

Examples

>>> a = ak.array([[3,2,1],[2,3,1]])
>>> a.flatten()
array([3 2 1 2 3 1])
format_other(other) str[source]

Attempt to cast scalar other to the element dtype of this pdarray, and print the resulting value to a string (e.g. for sending to a server command). The user should not call this function directly.

Parameters:

other (object) – The scalar to be cast to the pdarray.dtype

Return type:

string representation of np.dtype corresponding to the other parameter

Raises:

TypeError – Raised if the other parameter cannot be converted to Numpy dtype

property inferred_type: str | None

Return a string of the type inferred from the values.

info() str[source]

Returns a JSON formatted string containing information about all components of self

Parameters:

None

Returns:

JSON string containing information about all components of self

Return type:

str

is_registered() numpy.bool_[source]

Return True iff the object is contained in the registry

Parameters:

None

Returns:

Indicates if the object is contained in the registry

Return type:

bool

Raises:

RuntimeError – Raised if there’s a server-side error thrown

Note

This will return True if the object is registered itself or as a component of another object

is_sorted(axis: int | Tuple[int, Ellipsis] | None = None, keepdims: bool = False) arkouda.numpy.dtypes.bool_scalars | pdarray[source]

Return True iff the array (or given axis of the array) is monotonically non-decreasing.

Parameters:
  • axis (int, Tuple[int, ...], optional, default = None) – The axis or axes along which to do the operation If None, the computation is done across the entire array.

  • keepdims (bool, optional, default = False) – Whether to keep the singleton dimension(s) along axis in the result.

Returns:

boolean if axis is omitted, else pdarray if axis is supplied

Return type:

boolean or pdarray

Raises:
  • TypeError – Raised if pda is not a pdarray instance

  • RuntimeError – Raised if there’s a server-side error thrown

Examples

>>> ak.is_sorted(ak.array([1,2,3,4,5]))
True
>>> ak.is_sorted(ak.array([5,4,3,2,1]))
False
>>> ak.array([[1,2,3],[5,4,3]]).is_sorted(axis=1)
array([True False])

Notes

Works as a method of a pdarray (e.g. a.is_sorted()) or a standalone function (e.g. ak.is_sorted(a))

itemsize
max(axis: int | Tuple[int, Ellipsis] | None = None, keepdims: bool = False) arkouda.numpy.dtypes.numpy_scalars | pdarray[source]

Return max of array elements along the given axis.

Parameters:
  • axis (int, Tuple[int, ...], optional, default = None) – The axis or axes along which to do the operation If None, the computation is done across the entire array.

  • keepdims (bool, optional, default = False) – Whether to keep the singleton dimension(s) along axis in the result.

Returns:

numpy_scalar if axis is omitted, in which case operation is done over entire array pdarray if axis is supplied, in which case the operation is done along that axis

Return type:

numpy_scalar or pdarray

Raises:
  • TypeError – Raised if pda is not a pdarray instance

  • RuntimeError – Raised if there’s a server-side error thrown

Examples

>>> ak.max(ak.array([1,2,3,4,5]))
5
>>> ak.max(ak.array([5.5,4.5,3.5,2.5,1.5]))
5.5
>>> ak.array([[1,2,3],[5,4,3]]).max(axis=1)
array([3 5])

Notes

Works as a method of a pdarray (e.g. a.max()) or a standalone function (e.g. ak.max(a))

property max_bits
maxk(k: arkouda.numpy.dtypes.int_scalars) pdarray[source]

Compute the maximum “k” values. See arkouda.maxk for details.

mean() numpy.float64[source]

Compute the mean. See arkouda.mean for details.

min(axis: int | Tuple[int, Ellipsis] | None = None, keepdims: bool = False) arkouda.numpy.dtypes.numpy_scalars | pdarray[source]

Return min of array elements along the given axis.

Parameters:
  • axis (int, Tuple[int, ...], optional, default = None) – The axis or axes along which to do the operation If None, the computation is done across the entire array.

  • keepdims (bool, optional, default = False) – Whether to keep the singleton dimension(s) along axis in the result.

Returns:

numpy_scalar if axis is omitted, in which case operation is done over entire array pdarray if axis is supplied, in which case the operation is done along that axis

Return type:

numpy_scalar or pdarray

Raises:
  • TypeError – Raised if pda is not a pdarray instance

  • RuntimeError – Raised if there’s a server-side error thrown

Examples

>>> ak.min(ak.array([1,2,3,4,5]))
1
>>> ak.min(ak.array([5.5,4.5,3.5,2.5,1.5]))
1.5
>>> ak.array([[1,2,3],[5,4,3]]).min(axis=1)
array([1 3])

Notes

Works as a method of a pdarray (e.g. a.min()) or a standalone function (e.g. ak.min(a))

mink(k: arkouda.numpy.dtypes.int_scalars) pdarray[source]

Compute the minimum “k” values. See arkouda.mink for details.

name
property nbytes

The size of the pdarray in bytes.

Returns:

The size of the pdarray in bytes.

Return type:

int

ndim
objType = 'pdarray'
opeq(other, op)[source]
parity() pdarray[source]

Find the parity (XOR of all bits) in each element. See ak.parity.

popcount() pdarray[source]

Find the population (number of bits set) in each element. See ak.popcount.

pretty_print_info() None[source]

Prints information about all components of self in a human readable format

Parameters:

None

Return type:

None

prod(axis: int | Tuple[int, Ellipsis] | None = None, keepdims: bool = False) arkouda.numpy.dtypes.numpy_scalars | pdarray[source]

Return prod of array elements along the given axis.

Parameters:
  • axis (int, Tuple[int, ...], optional, defalt = None) – The axis or axes along which to do the operation If None, the computation is done across the entire array.

  • keepdims (bool, optional, default = False) – Whether to keep the singleton dimension(s) along axis in the result.

Returns:

numpy_scalar if axis is omitted, in which case operation is done over entire array pdarray if axis is supplied, in which case the operation is done along that axis

Return type:

numpy_scalar or pdarray

Raises:
  • TypeError – Raised if pda is not a pdarray instance

  • RuntimeError – Raised if there’s a server-side error thrown

Examples

>>> ak.prod(ak.array([1,2,3,4,5]))
120
>>> ak.prod(ak.array([5.5,4.5,3.5,2.5,1.5]))
324.84375
>>> ak.array([[1,2,3],[5,4,3]]).prod(axis=1)
array([6 60])

Notes

Works as a method of a pdarray (e.g. a.prod()) or a standalone function (e.g. ak.prod(a))

register(user_defined_name: str) pdarray[source]

Register this pdarray with a user defined name in the arkouda server so it can be attached to later using pdarray.attach() This is an in-place operation, registering a pdarray more than once will update the name in the registry and remove the previously registered name. A name can only be registered to one pdarray at a time.

Parameters:

user_defined_name (str) – user defined name array is to be registered under

Returns:

The same pdarray which is now registered with the arkouda server and has an updated name. This is an in-place modification, the original is returned to support a fluid programming style. Please note you cannot register two different pdarrays with the same name.

Return type:

pdarray

Raises:
  • TypeError – Raised if user_defined_name is not a str

  • RegistrationError – If the server was unable to register the pdarray with the user_defined_name If the user is attempting to register more than one pdarray with the same name, the former should be unregistered first to free up the registration name.

Notes

Registered names/pdarrays in the server are immune to deletion until they are unregistered.

Examples

>>> a = zeros(100)
>>> a.register("my_zeros")
>>> # potentially disconnect from server and reconnect to server
>>> b = ak.pdarray.attach("my_zeros")
>>> # ...other work...
>>> b.unregister()
registered_name: str | None = None
reshape(*shape)[source]

Gives a new shape to an array without changing its data.

Parameters:

shape (int, tuple of ints, or pdarray) – The new shape should be compatible with the original shape.

Returns:

a pdarray with the same data, reshaped to the new shape

Return type:

pdarray

Examples

>>> a = ak.array([[3,2,1],[2,3,1]])
>>> a.reshape((3,2))
array([array([3 2]) array([1 2]) array([3 1])])
>>> a.reshape(3,2)
array([array([3 2]) array([1 2]) array([3 1])])
>>> a.reshape((6,1))
array([array([3]) array([2]) array([1]) array([2]) array([3]) array([1])])

Notes

only available as a method, not as a standalone function, i.e., a.reshape(compatibleShape) is valid, but ak.reshape(a,compatibleShape) is not.

rotl(other) pdarray[source]

Rotate bits left by <other>.

rotr(other) pdarray[source]

Rotate bits right by <other>.

save(prefix_path: str, dataset: str = 'array', mode: str = 'truncate', compression: str | None = None, file_format: str = 'HDF5', file_type: str = 'distribute') str[source]

DEPRECATED Save the pdarray to HDF5 or Parquet. The result is a collection of files, one file per locale of the arkouda server, where each filename starts with prefix_path. HDF5 support single files, in which case the file name will only be that provided. Each locale saves its chunk of the array to its corresponding file.

Parameters:
  • prefix_path (str) – Directory and filename prefix that all output files share

  • dataset (str) – Name of the dataset to create in files (must not already exist)

  • mode (str {'truncate' | 'append'}) – By default, truncate (overwrite) output files, if they exist. If ‘append’, attempt to create new dataset in existing files.

  • compression (str (Optional)) – (None | “snappy” | “gzip” | “brotli” | “zstd” | “lz4”) Sets the compression type used with Parquet files

  • file_format (str {'HDF5', 'Parquet'}) – By default, saved files will be written to the HDF5 file format. If ‘Parquet’, the files will be written to the Parquet file format. This is case insensitive.

  • file_type (str ("single" | "distribute")) – Default: “distribute” When set to single, dataset is written to a single file. When distribute, dataset is written on a file per locale. This is only supported by HDF5 files and will have no impact of Parquet Files.

Return type:

string message indicating result of save operation

Raises:
  • RuntimeError – Raised if a server-side error is thrown saving the pdarray

  • ValueError – Raised if there is an error in parsing the prefix path pointing to file write location or if the mode parameter is neither truncate nor append

  • TypeError – Raised if any one of the prefix_path, dataset, or mode parameters is not a string

See also

save_all, load, read, to_parquet, to_hdf

Notes

The prefix_path must be visible to the arkouda server and the user must have write permission. Output files have names of the form <prefix_path>_LOCALE<i>, where <i> ranges from 0 to numLocales. If any of the output files already exist and the mode is ‘truncate’, they will be overwritten. If the mode is ‘append’ and the number of output files is less than the number of locales or a dataset with the same name already exists, a RuntimeError will result. Previously all files saved in Parquet format were saved with a .parquet file extension. This will require you to use load as if you saved the file with the extension. Try this if an older file is not being found. Any file extension can be used.The file I/O does not rely on the extension to determine the file format.

Examples

>>> a = ak.arange(25)
>>> # Saving without an extension
>>> a.save('path/prefix', dataset='array')
Saves the array to numLocales HDF5 files with the name ``cwd/path/name_prefix_LOCALE####``
>>> # Saving with an extension (HDF5)
>>> a.save('path/prefix.h5', dataset='array')
Saves the array to numLocales HDF5 files with the name
``cwd/path/name_prefix_LOCALE####.h5`` where #### is replaced by each locale number
>>> # Saving with an extension (Parquet)
>>> a.save('path/prefix.parquet', dataset='array', file_format='Parquet')
Saves the array in numLocales Parquet files with the name
``cwd/path/name_prefix_LOCALE####.parquet`` where #### is replaced by each locale number
property shape

Return the shape of an array.

Returns:

The elements of the shape tuple give the lengths of the corresponding array dimensions.

Return type:

tuple of int

size
slice_bits(low, high) pdarray[source]

Returns a pdarray containing only bits from low to high of self.

This is zero indexed and inclusive on both ends, so slicing the bottom 64 bits is pda.slice_bits(0, 63)

Parameters:
  • low (int) – The lowest bit included in the slice (inclusive) zero indexed, so the first bit is 0

  • high (int) – The highest bit included in the slice (inclusive)

Returns:

A new pdarray containing the bits of self from low to high

Return type:

pdarray

Raises:

RuntimeError – Raised if there is a server-side error thrown

Examples

>>> p = ak.array([2**65 + (2**64 - 1)])
>>> bin(p[0])
'0b101111111111111111111111111111111111111111111111111111111111111111'
>>> bin(p.slice_bits(64, 65)[0])
'0b10'
>>> a = ak.array([143,15])
>>> a.slice_bits(1,3)
array([7 7])
>>> a.slice_bits(4,9)
array([8 0])
>>> a.slice_bits(1,9)
array([71 7])
std(ddof: arkouda.numpy.dtypes.int_scalars = 0) numpy.float64[source]

Compute the standard deviation. See arkouda.std for details.

sum(axis: int | Tuple[int, Ellipsis] | None = None, keepdims: bool = False) arkouda.numpy.dtypes.numpy_scalars | pdarray[source]

Return sum of array elements along the given axis.

Parameters:
  • axis (int, Tuple[int, ...], optional, default = None) – The axis or axes along which to do the operation If None, the computation is done across the entire array.

  • keepdims (bool, optional, default = False) – Whether to keep the singleton dimension(s) along axis in the result.

Returns:

numpy_scalar if axis is omitted, in which case operation is done over entire array pdarray if axis is supplied, in which case the operation is done along that axis

Return type:

numpy_scalar or pdarray

Raises:
  • TypeError – Raised if pda is not a pdarray instance

  • RuntimeError – Raised if there’s a server-side error thrown

Examples

>>> ak.sum(ak.array([1,2,3,4,5]))
15
>>> ak.sum(ak.array([5.5,4.5,3.5,2.5,1.5]))
17.5
>>> ak.array([[1,2,3],[5,4,3]]).sum(axis=1)
array([6 12])

Notes

Works as a method of a pdarray (e.g. a.sum()) or a standalone function (e.g. ak.sum(a))

to_csv(prefix_path: str, dataset: str = 'array', col_delim: str = ',', overwrite: bool = False)[source]

Write pdarry to CSV file(s). File will contain a single column with the pdarray data. All CSV files written by Arkouda include a header denoting data types of the columns.

Parameters:
  • prefix_path (str) – filename prefix to be used for saving files. Files will have _LOCALE#### appended when they are written to disk.

  • dataset (str, defaults to "array") – column name to save the pdarray under.

  • col_delim (str, defaults to ",") – value to be used to separate columns within the file. Please be sure that the value used DOES NOT appear in your dataset.

  • overwrite (bool, defaults to False) – If True, existing files matching the provided path will be overwritten. if False and existing files are found, an error will be returned.

Returns:

response message

Return type:

str

Raises:
  • ValueError – Raised if all datasets are not present in all parquet files or if one or more of the specified files do not exist

  • RuntimeError – Raised if one or more of the specified files cannot be opened. if ‘allow_errors’ is true, this may be raised if no values are returned from the server.

  • TypeError – Raise if the server returns an unknown arkouda_type

Notes

  • CSV format is not currently supported by load/load_all operations

  • The column delimiter is expected to be the same for all column names and data

  • Be sure that column delimiters are not found within your data.

  • All CSV files must delimit rows using newline (”n”) at this time.

to_cuda()[source]

Convert the array to a Numba DeviceND array, transferring array data from the arkouda server to Python via ndarray. If the array exceeds a builtin size limit, a RuntimeError is raised.

Returns:

A Numba ndarray with the same attributes and data as the pdarray; on GPU

Return type:

numba.DeviceNDArray

Raises:
  • ImportError – Raised if CUDA is not available

  • ModuleNotFoundError – Raised if Numba is either not installed or not enabled

  • RuntimeError – Raised if there is a server-side error thrown in the course of retrieving the pdarray.

Notes

The number of bytes in the array cannot exceed client.maxTransferBytes, otherwise a RuntimeError will be raised. This is to protect the user from overflowing the memory of the system on which the Python client is running, under the assumption that the server is running on a distributed system with much more memory than the client. The user may override this limit by setting client.maxTransferBytes to a larger value, but proceed with caution.

See also

array

Examples

>>> a = ak.arange(0, 5, 1)
>>> a.to_cuda()
array([0, 1, 2, 3, 4])
>>> type(a.to_cuda())
numpy.devicendarray
to_hdf(prefix_path: str, dataset: str = 'array', mode: str = 'truncate', file_type: str = 'distribute') str[source]

Save the pdarray to HDF5. The object can be saved to a collection of files or single file.

Parameters:
  • prefix_path (str) – Directory and filename prefix that all output files share

  • dataset (str) – Name of the dataset to create in files (must not already exist)

  • mode (str {'truncate' | 'append'}) – By default, truncate (overwrite) output files, if they exist. If ‘append’, attempt to create new dataset in existing files.

  • file_type (str ("single" | "distribute")) – Default: “distribute” When set to single, dataset is written to a single file. When distribute, dataset is written on a file per locale. This is only supported by HDF5 files and will have no impact of Parquet Files.

Return type:

string message indicating result of save operation

Raises:

RuntimeError – Raised if a server-side error is thrown saving the pdarray

Notes

  • The prefix_path must be visible to the arkouda server and the user must

have write permission. - Output files have names of the form <prefix_path>_LOCALE<i>, where <i> ranges from 0 to numLocales for file_type=’distribute’. Otherwise, the file name will be prefix_path. - If any of the output files already exist and the mode is ‘truncate’, they will be overwritten. If the mode is ‘append’ and the number of output files is less than the number of locales or a dataset with the same name already exists, a RuntimeError will result. - Any file extension can be used.The file I/O does not rely on the extension to determine the file format.

Examples

>>> a = ak.arange(25)
>>> # Saving without an extension
>>> a.to_hdf('path/prefix', dataset='array')
Saves the array to numLocales HDF5 files with the name ``cwd/path/name_prefix_LOCALE####``
>>> # Saving with an extension (HDF5)
>>> a.to_hdf('path/prefix.h5', dataset='array')
Saves the array to numLocales HDF5 files with the name
``cwd/path/name_prefix_LOCALE####.h5`` where #### is replaced by each locale number
>>> # Saving to a single file
>>> a.to_hdf('path/prefix.hdf5', dataset='array', file_type='single')
Saves the array in to single hdf5 file on the root node.
``cwd/path/name_prefix.hdf5``
to_list() List[source]

Convert the array to a list, transferring array data from the Arkouda server to client-side Python. Note: if the pdarray size exceeds client.maxTransferBytes, a RuntimeError is raised.

Returns:

A list with the same data as the pdarray

Return type:

list

Raises:

RuntimeError – Raised if there is a server-side error thrown, if the pdarray size exceeds the built-in client.maxTransferBytes size limit, or if the bytes received does not match expected number of bytes

Notes

The number of bytes in the array cannot exceed client.maxTransferBytes, otherwise a RuntimeError will be raised. This is to protect the user from overflowing the memory of the system on which the Python client is running, under the assumption that the server is running on a distributed system with much more memory than the client. The user may override this limit by setting client.maxTransferBytes to a larger value, but proceed with caution.

See also

to_ndarray

Examples

>>> a = ak.arange(0, 5, 1)
>>> a.to_list()
[0, 1, 2, 3, 4]
>>> type(a.to_list())
<class 'list'>
to_ndarray() numpy.ndarray[source]

Convert the array to a np.ndarray, transferring array data from the Arkouda server to client-side Python. Note: if the pdarray size exceeds client.maxTransferBytes, a RuntimeError is raised.

Returns:

A numpy ndarray with the same attributes and data as the pdarray

Return type:

np.ndarray

Raises:

RuntimeError – Raised if there is a server-side error thrown, if the pdarray size exceeds the built-in client.maxTransferBytes size limit, or if the bytes received does not match expected number of bytes

Notes

The number of bytes in the array cannot exceed client.maxTransferBytes, otherwise a RuntimeError will be raised. This is to protect the user from overflowing the memory of the system on which the Python client is running, under the assumption that the server is running on a distributed system with much more memory than the client. The user may override this limit by setting client.maxTransferBytes to a larger value, but proceed with caution.

See also

array, to_list

Examples

>>> a = ak.arange(0, 5, 1)
>>> a.to_ndarray()
array([0, 1, 2, 3, 4])
>>> type(a.to_ndarray())
<class 'numpy.ndarray'>
to_parquet(prefix_path: str, dataset: str = 'array', mode: str = 'truncate', compression: str | None = None) str[source]

Save the pdarray to Parquet. The result is a collection of files, one file per locale of the arkouda server, where each filename starts with prefix_path. Each locale saves its chunk of the array to its corresponding file.

Parameters:
  • prefix_path (str) – Directory and filename prefix that all output files share

  • dataset (str) – Name of the dataset to create in files (must not already exist)

  • mode (str {'truncate' | 'append'}) – By default, truncate (overwrite) output files, if they exist. If ‘append’, attempt to create new dataset in existing files.

  • compression (str (Optional)) – (None | “snappy” | “gzip” | “brotli” | “zstd” | “lz4”) Sets the compression type used with Parquet files

Return type:

string message indicating result of save operation

Raises:

RuntimeError – Raised if a server-side error is thrown saving the pdarray

Notes

  • The prefix_path must be visible to the arkouda server and the user must

have write permission. - Output files have names of the form <prefix_path>_LOCALE<i>, where <i> ranges from 0 to numLocales for file_type=’distribute’. - ‘append’ write mode is supported, but is not efficient. - If any of the output files already exist and the mode is ‘truncate’, they will be overwritten. If the mode is ‘append’ and the number of output files is less than the number of locales or a dataset with the same name already exists, a RuntimeError will result. - Any file extension can be used.The file I/O does not rely on the extension to determine the file format.

Examples

>>> a = ak.arange(25)
>>> # Saving without an extension
>>> a.to_parquet('path/prefix', dataset='array')
Saves the array to numLocales HDF5 files with the name ``cwd/path/name_prefix_LOCALE####``
>>> # Saving with an extension (HDF5)
>>> a.to_parqet('path/prefix.parquet', dataset='array')
Saves the array to numLocales HDF5 files with the name
``cwd/path/name_prefix_LOCALE####.parquet`` where #### is replaced by each locale number
transfer(hostname: str, port: arkouda.numpy.dtypes.int_scalars)[source]

Sends a pdarray to a different Arkouda server

Parameters:
  • hostname (str) – The hostname where the Arkouda server intended to receive the pdarray is running.

  • port (int_scalars) – The port to send the array over. This needs to be an open port (i.e., not one that the Arkouda server is running on). This will open up numLocales ports, each of which in succession, so will use ports of the range {port..(port+numLocales)} (e.g., running an Arkouda server of 4 nodes, port 1234 is passed as port, Arkouda will use ports 1234, 1235, 1236, and 1237 to send the array data). This port much match the port passed to the call to ak.receive_array().

Return type:

A message indicating a complete transfer

Raises:
  • ValueError – Raised if the op is not within the pdarray.BinOps set

  • TypeError – Raised if other is not a pdarray or the pdarray.dtype is not a supported dtype

unregister() None[source]

Unregister a pdarray in the arkouda server which was previously registered using register() and/or attahced to using attach()

Return type:

None

Raises:

RuntimeError – Raised if the server could not find the internal name/symbol to remove

Notes

Registered names/pdarrays in the server are immune to deletion until they are unregistered.

Examples

>>> a = zeros(100)
>>> a.register("my_zeros")
>>> # potentially disconnect from server and reconnect to server
>>> b = ak.pdarray.attach("my_zeros")
>>> # ...other work...
>>> b.unregister()
update_hdf(prefix_path: str, dataset: str = 'array', repack: bool = True)[source]

Overwrite the dataset with the name provided with this pdarray. If the dataset does not exist it is added

Parameters:
  • prefix_path (str) – Directory and filename prefix that all output files share

  • dataset (str) – Name of the dataset to create in files

  • repack (bool) – Default: True HDF5 does not release memory on delete. When True, the inaccessible data (that was overwritten) is removed. When False, the data remains, but is inaccessible. Setting to false will yield better performance, but will cause file sizes to expand.

Return type:

str - success message if successful

Raises:

RuntimeError – Raised if a server-side error is thrown saving the pdarray

Notes

  • If file does not contain File_Format attribute to indicate how it was saved, the file name is checked for _LOCALE#### to determine if it is distributed.

  • If the dataset provided does not exist, it will be added

value_counts()[source]

Count the occurrences of the unique values of self.

Returns:

  • unique_values (pdarray) – The unique values, sorted in ascending order

  • counts (pdarray, int64) – The number of times the corresponding unique value occurs

Examples

>>> ak.array([2, 0, 2, 4, 0, 0]).value_counts()
(array([0, 2, 4]), array([3, 2, 1]))
var(ddof: arkouda.numpy.dtypes.int_scalars = 0) numpy.float64[source]

Compute the variance. See arkouda.var for details.

arkouda.numpy.popcount(pda: pdarray) pdarray[source]

Find the population (number of bits set) for each integer in an array.

Parameters:

pda (pdarray, int64, uint64, bigint) – Input array (must be integral).

Returns:

population – The number of bits set (1) in each element

Return type:

pdarray

Raises:

TypeError – If input array is not int64, uint64, or bigint

Examples

>>> A = ak.arange(10)
>>> ak.popcount(A)
array([0, 1, 1, 2, 1, 2, 2, 3, 1, 2])
arkouda.numpy.power(pda: pdarray, pwr: int | float | pdarray, where: arkouda.numpy.dtypes.bool_scalars | pdarray = True) pdarray[source]

Raises an array to a power. If where is given, the operation will only take place in the positions where the where condition is True.

Note: Our implementation of the where argument deviates from numpy. The difference in behavior occurs at positions where the where argument contains a False. In numpy, these position will have uninitialized memory (which can contain anything and will vary between runs). We have chosen to instead return the value of the original array in these positions.

Parameters:
  • pda (pdarray) – A pdarray of values that will be raised to a power (pwr)

  • pwr (integer, float, or pdarray) – The power(s) that pda is raised to

  • where (Boolean or pdarray) – This condition is broadcast over the input. At locations where the condition is True, the corresponding value will be raised to the respective power. Elsewhere, it will retain its original value. Default set to True.

Returns:

a pdarray of values raised to a power, under the boolean where condition.

Return type:

pdarray

Examples

>>> a = ak.arange(5)
>>> ak.power(a, 3)
array([0, 1, 8, 27, 64])
>>> ak.power(a), 3, a % 2 == 0)
array([0, 1, 8, 3, 64])
Raises:
  • TypeError – raised if pda is not a pdarray, or if pwe is not an int, float, or pdarray

  • ValueError – raised if pda and power are of incompatible dimensions

arkouda.numpy.promote_to_common_dtype(arrays: List[arkouda.numpy.pdarrayclass.pdarray]) Tuple[Any, List[arkouda.numpy.pdarrayclass.pdarray]][source]

Promote a list of pdarrays to a common dtype.

Parameters:

arrays (List[pdarray]) – List of pdarrays to promote

Returns:

The common dtype of the pdarrays and the list of pdarrays promoted to that dtype

Return type:

dtype, List[pdarray]

Raises:

TypeError – Raised if any pdarray is a non-numeric type

See also

pdarray.promote_dtype

Examples

>>> a = ak.arange(5)
>>> b = ak.ones(5, dtype=ak.float64)
>>> dtype, promoted = ak.promote_to_common_dtype([a, b])
>>> dtype
dtype('float64')
>>> all(isinstance(p, ak.pdarray) and p.dtype == dtype for p in promoted)
True
arkouda.numpy.putmask(A: arkouda.numpy.pdarrayclass.pdarray, mask: arkouda.numpy.pdarrayclass.pdarray, Values: arkouda.numpy.pdarrayclass.pdarray) None[source]

Overwrites elements of A with elements from B based upon a mask array. Similar to numpy.putmask, where mask = False, A retains its original value, but where mask = True, A is overwritten with the corresponding entry from Values.

This is similar to ak.where, except that (1) no new pdarray is created, and (2) Values does not have to be the same size as A and mask.

Parameters:
  • A (pdarray) – Value(s) used when mask is False (see Notes for allowed dtypes)

  • mask (pdarray) – Used to choose values from A or B, must be same size as A, and of type ak.bool_

  • Values (pdarray) – Value(s) used when mask is False (see Notes for allowed dtypes)

Examples

>>> a = ak.array(np.arange(10))
>>> ak.putmask (a,a>2,a**2)
>>> a
array([0 1 2 9 16 25 36 49 64 81])
>>> a = ak.array(np.arange(10))
>>> values = ak.array([3,2])
>>> ak.putmask (a,a>2,values)
>>> a
array([0 1 2 2 3 2 3 2 3 2])
Raises:

RuntimeError – Raised if mask is not same size as A, or if A.dtype and Values.dtype are not an allowed pair (see Notes for details).

Notes

A and mask must be the same size. Values can be any size.
Allowed dtypes for A and Values conform to types accepted by numpy putmask.
If A is ak.float64, Values can be ak.float64, ak.int64, ak.uint64, ak.bool_.
If A is ak.int64, Values can be ak.int64 or ak.bool_.
If A is ak.uint64, Values can be ak.uint64, or ak.bool_.
If A is ak.bool_, Values must be ak.bool_.

Only one conditional clause is supported e.g., n < 5, n > 1.

multi-dim pdarrays are now implemented.

arkouda.numpy.rad2deg(pda: arkouda.numpy.pdarrayclass.pdarray, where: bool | arkouda.numpy.pdarrayclass.pdarray = True) arkouda.numpy.pdarrayclass.pdarray[source]

Converts angles element-wise from radians to degrees.

Parameters:
  • pda (pdarray)

  • where (bool or pdarray, default=True) – This condition is broadcast over the input. At locations where the condition is True, the corresponding value will be converted from radians to degrees. Elsewhere, it will retain its original value. Default set to True.

Returns:

A pdarray containing an angle converted to degrees, from radians, for each element of the original pdarray

Return type:

pdarray

Raises:

TypeError – Raised if the parameter is not a pdarray

arkouda.numpy.randint(low: arkouda.numpy.dtypes.numeric_scalars, high: arkouda.numpy.dtypes.numeric_scalars, size: arkouda.numpy.dtypes.int_scalars | Tuple[arkouda.numpy.dtypes.int_scalars, Ellipsis] = 1, dtype=akint64, seed: arkouda.numpy.dtypes.int_scalars | None = None) arkouda.numpy.pdarrayclass.pdarray[source]

Generate a pdarray of randomized int, float, or bool values in a specified range bounded by the low and high parameters.

Parameters:
Returns:

Values drawn uniformly from the specified range having the desired dtype

Return type:

pdarray

Raises:
  • TypeError – Raised if dtype.name not in DTypes, size is not an int, low or high is not an int or float, or seed is not an int

  • ValueError – Raised if size < 0 or if high < low

Notes

Calling randint with dtype=float64 will result in uniform non-integral floating point values.

Ranges >= 2**64 in size is undefined behavior because it exceeds the maximum value that can be stored on the server (uint64)

Examples

>>> ak.randint(0, 10, 5, seed=1701)
array([6 5 1 6 3])
>>> ak.randint(0, 1, 3, seed=1701, dtype=ak.float64)
array([0.011410423448327005 0.73618171558685619 0.12367222192448891])
>>> ak.randint(0, 1, 5, seed=1701, dtype=ak.bool_)
array([False True False True False])
arkouda.numpy.random_strings_lognormal(logmean: arkouda.numpy.dtypes.numeric_scalars, logstd: arkouda.numpy.dtypes.numeric_scalars, size: arkouda.numpy.dtypes.int_scalars, characters: str = 'uppercase', seed: arkouda.numpy.dtypes.int_scalars | None = None) arkouda.numpy.strings.Strings[source]

Generate random strings with log-normally distributed lengths and with characters drawn from a specified set.

Parameters:
  • logmean (numeric_scalars) – The log-mean of the length distribution

  • logstd (numeric_scalars) – The log-standard-deviation of the length distribution

  • size (int_scalars) – The number of strings to generate

  • characters ((uppercase, lowercase, numeric, printable, binary)) – The set of characters to draw from

  • seed (int_scalars, optional) – Value used to initialize the random number generator

Returns:

The Strings object encapsulating a pdarray of random strings

Return type:

Strings

Raises:
  • TypeError – Raised if logmean is neither a float nor a int, logstd is not a float, seed is not an int, size is not an int, or if characters is not a str

  • ValueError – Raised if logstd <= 0 or size < 0

Notes

The lengths of the generated strings are distributed $Lognormal(mu, sigma^2)$, with \(\mu = logmean\) and \(\sigma = logstd\). Thus, the strings will have an average length of \(exp(\mu + 0.5*\sigma^2)\), a minimum length of zero, and a heavy tail towards longer strings.

Examples

>>> ak.random_strings_lognormal(2, 0.25, 5, seed=1)
array(['VWHJEX', 'BEBBXJHGM', 'RWOVKBUR', 'LNJCSDXD', 'NKEDQC'])
>>> ak.random_strings_lognormal(2, 0.25, 5, seed=1, characters='printable')
array(['eL96<O', ')o-GOe lR', ')PV yHf(', '._b3Yc&K', ',7Wjef'])
arkouda.numpy.random_strings_uniform(minlen: arkouda.numpy.dtypes.int_scalars, maxlen: arkouda.numpy.dtypes.int_scalars, size: arkouda.numpy.dtypes.int_scalars, characters: str = 'uppercase', seed: None | arkouda.numpy.dtypes.int_scalars = None) arkouda.numpy.strings.Strings[source]

Generate random strings with lengths uniformly distributed between minlen and maxlen, and with characters drawn from a specified set.

Parameters:
  • minlen (int_scalars) – The minimum allowed length of string

  • maxlen (int_scalars) – The maximum allowed length of string

  • size (int_scalars) – The number of strings to generate

  • characters ((uppercase, lowercase, numeric, printable, binary)) – The set of characters to draw from

  • seed (Union[None, int_scalars], optional) – Value used to initialize the random number generator

Returns:

The array of random strings

Return type:

Strings

Raises:

ValueError – Raised if minlen < 0, maxlen < minlen, or size < 0

Examples

>>> ak.random_strings_uniform(minlen=1, maxlen=5, seed=8675309, size=5)
array(['ECWO', 'WSS', 'TZG', 'RW', 'C'])
>>> ak.random_strings_uniform(minlen=1, maxlen=5, seed=8675309, size=5,
... characters='printable')
array(['2 .z', 'aom', '2d|', 'o(', 'M'])
arkouda.numpy.register_all(data: dict)[source]

Register all objects in the provided dictionary

Parameters:

data (dict) – Maps name to register the object to the object. For example, {“MyArray”: ak.array([0, 1, 2])

Return type:

None

arkouda.numpy.repeat(a: int | Sequence[int] | arkouda.numpy.pdarrayclass.pdarray, repeats: int | Sequence[int] | arkouda.numpy.pdarrayclass.pdarray, axis: None | int = None) arkouda.numpy.pdarrayclass.pdarray[source]

Repeat each element of an array after themselves

Parameters:
  • a (int, Sequence of int, or pdarray) – Input array.

  • repeats (int, Sequence of int, or pdarray) – The number of repetitions for each element. repeats is broadcasted to fit the shape of the given axis.

  • axis (int, optional) – The axis along which to repeat values. By default, use the flattened input array, and return a flat output array.

Returns:

Output array which has the same shape as a, except along the given axis.

Return type:

pdarray

Examples

>>> ak.repeat(3, 4)
array([3 3 3 3])
>>> x = ak.array([[1,2],[3,4]])
>>> ak.repeat(x, 2)
array([1 1 2 2 3 3 4 4])
>>> ak.repeat(x, 3, axis=1)
array([array([1 1 1 2 2 2]) array([3 3 3 4 4 4])])
>>> ak.repeat(x, [1, 2], axis=0)
array([array([1 2]) array([3 4]) array([3 4])])
arkouda.numpy.resolve_scalar_dtype(val: object) str[source]

Try to infer what dtype arkouda_server should treat val as.

Parameters:

val (object) – The object to determine the dtype of.

Returns:

The dtype name, if it can be resolved, otherwise the type (as str).

Return type:

str

Examples

>>> ak.resolve_scalar_dtype(1)
'int64'
>>> ak.resolve_scalar_dtype(2.0)
'float64'
arkouda.numpy.rotl(x, rot) pdarray[source]

Rotate bits of <x> to the left by <rot>.

Parameters:
  • x (pdarray(int64/uint64) or integer) – Value(s) to rotate left.

  • rot (pdarray(int64/uint64) or integer) – Amount(s) to rotate by.

Returns:

rotated – The rotated elements of x.

Return type:

pdarray(int64/uint64)

Raises:

TypeError – If input array is not int64 or uint64

Examples

>>> A = ak.arange(10)
>>> ak.rotl(A, A)
array([0, 2, 8, 24, 64, 160, 384, 896, 2048, 4608])
arkouda.numpy.rotr(x, rot) pdarray[source]

Rotate bits of <x> to the left by <rot>.

Parameters:
  • x (pdarray(int64/uint64) or integer) – Value(s) to rotate left.

  • rot (pdarray(int64/uint64) or integer) – Amount(s) to rotate by.

Returns:

rotated – The rotated elements of x.

Return type:

pdarray(int64/uint64)

Raises:

TypeError – If input array is not int64 or uint64

Examples

>>> A = ak.arange(10)
>>> ak.rotr(1024 * A, A)
array([0, 512, 512, 384, 256, 160, 96, 56, 32, 18])
arkouda.numpy.round(pda: arkouda.numpy.pdarrayclass.pdarray) arkouda.numpy.pdarrayclass.pdarray[source]

Return the element-wise rounding of the array.

Parameters:

pda (pdarray)

Returns:

A pdarray containing input array elements rounded to the nearest integer

Return type:

pdarray

Raises:

TypeError – Raised if the parameter is not a pdarray

Examples

>>> ak.round(ak.array([1.1, 2.5, 3.14159]))
array([1.00000000000000000 3.00000000000000000 3.00000000000000000])
arkouda.numpy.scalar_array(value: arkouda.numpy.dtypes.numeric_scalars, dtype: numpy.dtype | type | str | arkouda.numpy.dtypes.bigint | None = None) arkouda.numpy.pdarrayclass.pdarray[source]

Create a pdarray from a single scalar value.

Parameters:

value (numeric_scalars) – Value to create pdarray from

Returns:

pdarray with a single element

Return type:

pdarray

Examples

>>> ak.scalar_array(5)
array([5])
>>> ak.scalar_array(7.0)
array([7.00000000000000000])
Raises:

RuntimeError – Raised if value cannot be cast as dtype

arkouda.numpy.segarray(segments: arkouda.numpy.pdarrayclass.pdarray, values: arkouda.numpy.pdarrayclass.pdarray, lengths=None, grouping=None)[source]

Alias for the from_parts function. Prevents user from needing to call ak.SegArray constructor DEPRECATED

arkouda.numpy.setdiff1d(A: arkouda.groupbyclass.groupable, B: arkouda.groupbyclass.groupable, assume_unique: bool = False) arkouda.numpy.pdarrayclass.pdarray | arkouda.groupbyclass.groupable[source]

Find the set difference of two arrays.

Return the sorted, unique values in A that are not in B.

Parameters:
Returns:

Sorted 1D array/List of sorted pdarrays of values in A that are not in B.

Return type:

pdarray/groupable

Raises:
  • TypeError – Raised if either A or B is not a pdarray

  • RuntimeError – Raised if the dtype of either pdarray is not supported

Notes

ak.setdiff1d is not supported for bool pdarrays

Examples

>>> a = ak.array([1, 2, 3, 2, 4, 1])
>>> b = ak.array([3, 4, 5, 6])
>>> ak.setdiff1d(a, b)
array([1 2])

Multi-Array Example

>>> a = ak.arange(1, 6)
>>> b = ak.array([1, 5, 3, 4, 2])
>>> c = ak.array([1, 4, 3, 2, 5])
>>> d = ak.array([1, 2, 3, 5, 4])
>>> multia = [a, a, a]
>>> multib = [b, c, d]
>>> ak.setdiff1d(multia, multib)
[array([2 4 5]), array([2 4 5]), array([2 4 5])]
arkouda.numpy.setxor1d(A: arkouda.groupbyclass.groupable, B: arkouda.groupbyclass.groupable, assume_unique: bool = False) arkouda.numpy.pdarrayclass.pdarray | arkouda.groupbyclass.groupable[source]

Find the set exclusive-or (symmetric difference) of two arrays.

Return the sorted, unique values that are in only one (not both) of the input arrays.

Parameters:
Returns:

Sorted 1D array/List of sorted pdarrays of unique values that are in only one of the input arrays.

Return type:

pdarray/groupable

Raises:
  • TypeError – Raised if either A or B is not a groupable

  • RuntimeError – Raised if the dtype of either pdarray is not supported

Examples

>>> a = ak.array([1, 2, 3, 2, 4])
>>> b = ak.array([2, 3, 5, 7, 5])
>>> ak.setxor1d(a,b)
array([1 4 5 7])

Multi-Array Example

>>> a = ak.arange(1, 6)
>>> b = ak.array([1, 5, 3, 4, 2])
>>> c = ak.array([1, 4, 3, 2, 5])
>>> d = ak.array([1, 2, 3, 5, 4])
>>> multia = [a, a, a]
>>> multib = [b, c, d]
>>> ak.setxor1d(multia, multib)
[array([2 2 4 4 5 5]), array([2 5 2 4 4 5]), array([2 4 5 4 2 5])]
arkouda.numpy.shape(a: arkouda.numpy.pdarrayclass.pdarray | arkouda.numpy.strings.Strings | arkouda.numpy.dtypes.all_scalars) Tuple[source]

Return the shape of an array.

Parameters:

a (pdarray) – Input array.

Returns:

shape – The elements of the shape tuple give the lengths of the corresponding array dimensions.

Return type:

tuple of ints

Examples

>>> import arkouda as ak
>>> ak.shape(ak.eye(3,2))
(3, 2)
>>> ak.shape([[1, 3]])
(1, 2)
>>> ak.shape([0])
(1,)
>>> ak.shape(0)
()
arkouda.numpy.sign(pda: arkouda.numpy.pdarrayclass.pdarray) arkouda.numpy.pdarrayclass.pdarray[source]

Return the element-wise sign of the array.

Parameters:

pda (pdarray)

Returns:

A pdarray containing sign values of the input array elements

Return type:

pdarray

Raises:

TypeError – Raised if the parameter is not a pdarray

Examples

>>> ak.sign(ak.array([-10, -5, 0, 5, 10]))
array([-1 -1 0 1 1])
arkouda.numpy.sin(pda: arkouda.numpy.pdarrayclass.pdarray, where: bool | arkouda.numpy.pdarrayclass.pdarray = True) arkouda.numpy.pdarrayclass.pdarray[source]

Return the element-wise sine of the array.

Parameters:
  • pda (pdarray)

  • where (bool or pdarray, default=True) – This condition is broadcast over the input. At locations where the condition is True, the sine will be applied to the corresponding value. Elsewhere, it will retain its original value. Default set to True.

Returns:

A pdarray containing sin for each element of the original pdarray

Return type:

pdarray

Raises:

TypeError – Raised if the parameter is not a pdarray

arkouda.numpy.sinh(pda: arkouda.numpy.pdarrayclass.pdarray, where: bool | arkouda.numpy.pdarrayclass.pdarray = True) arkouda.numpy.pdarrayclass.pdarray[source]

Return the element-wise hyperbolic sine of the array.

Parameters:
  • pda (pdarray)

  • where (bool or pdarray, default=True) – This condition is broadcast over the input. At locations where the condition is True, the hyperbolic sine will be applied to the corresponding value. Elsewhere, it will retain its original value. Default set to True.

Returns:

A pdarray containing hyperbolic sine for each element of the original pdarray

Return type:

pdarray

Raises:

TypeError – Raised if the parameter is not a pdarray

arkouda.numpy.sort(pda: arkouda.numpy.pdarrayclass.pdarray, algorithm: SortingAlgorithm = SortingAlgorithm.RadixSortLSD, axis: arkouda.numpy.dtypes.int_scalars = -1) arkouda.numpy.pdarrayclass.pdarray[source]

Return a sorted copy of the array. Only sorts numeric arrays; for Strings, use argsort.

Parameters:
  • pda (pdarray) – The array to sort (int64, uint64, or float64)

  • algorithm (SortingAlgorithm, default=SortingAlgorithm.RadixSortLSD) – The algorithm to be used for sorting the arrays.

  • axis (int_scalars, default=-1) – The axis to sort over. Setting to -1 means that it will sort over axis = ndim - 1.

Returns:

The sorted copy of pda

Return type:

pdarray of int64, uint64, or float64

Raises:
  • TypeError – Raised if the parameter is not a pdarray

  • ValueError – Raised if sort attempted on a pdarray with an unsupported dtype such as bool

See also

argsort

Notes

Uses a least-significant-digit radix sort, which is stable and resilient to non-uniformity in data but communication intensive.

Examples

>>> a = ak.randint(0, 10, 10)
>>> sorted = ak.sort(a)
>>> sorted
array([0 1 1 3 4 5 7 8 8 9])
arkouda.numpy.sqrt(pda: pdarray, where: arkouda.numpy.dtypes.bool_scalars | pdarray = True) pdarray[source]

Takes the square root of array. If where is given, the operation will only take place in the positions where the where condition is True.

Parameters:
  • pda (pdarray) – A pdarray of values the square roots of which will be computed

  • where (Boolean or pdarray) – This condition is broadcast over the input. At locations where the condition is True, the corresponding value will be square rooted. Elsewhere, it will retain its original value. Default set to True.

Returns:

a pdarray of square roots of the original values, or the original values themselves, subject to the boolean where condition.

Return type:

pdarray

Examples

>>> a = ak.arange(5)
>>> ak.sqrt(a)
array([0.00000000000000000 1.00000000000000000 1.4142135623730951
         1.7320508075688772 2.00000000000000000])
>>> ak.sqrt(a, ak.array([True, True, False, False, True]))
array([0.00000000000000000 1.00000000000000000 2.00000000000000000
         3.00000000000000000 2.00000000000000000])
Raises:

TypeError – raised if pda is not a pdarray of ak.int64 or ak.float64

Notes

Square roots of negative numbers are returned as nan.

arkouda.numpy.square(pda: arkouda.numpy.pdarrayclass.pdarray) arkouda.numpy.pdarrayclass.pdarray[source]

Return the element-wise square of the array.

Parameters:

pda (pdarray)

Returns:

A pdarray containing square values of the input array elements

Return type:

pdarray

Raises:

TypeError – Raised if the parameter is not a pdarray

Examples

>>> ak.square(ak.arange(1,5))
array([1 4 9 16])
arkouda.numpy.squeeze(x: arkouda.numpy.pdarrayclass.pdarray | arkouda.numpy.dtypes.numeric_scalars | arkouda.numpy.dtypes.bool_scalars, /, axis: None | int | Tuple[int, Ellipsis] = None) arkouda.numpy.pdarrayclass.pdarray[source]

Remove degenerate (size one) dimensions from an array.

Parameters:
  • x (pdarray) – The array to squeeze

  • axis (int or Tuple[int, ...]) – The axis or axes to squeeze (must have a size of one). If axis = None, all dimensions of size 1 will be squeezed.

Returns:

A copy of x with the dimensions specified in the axis argument removed.

Return type:

pdarray

Examples

>>> import arkouda as ak
>>> ak.connect()
>>> x = ak.arange(10).reshape((1, 10, 1))
>>> x
array([array([array([0]) array([1]) array([2]) array([3])....
 array([4]) array([5]) array([6]) array([7]) array([8]) array([9])])])
>>> x.shape
(1, 10, 1)
>>> ak.squeeze(x,axis=None)
array([0 1 2 3 4 5 6 7 8 9])
>>> ak.squeeze(x,axis=None).shape
(10,)
>>> ak.squeeze(x,axis=2)
array([array([0 1 2 3 4 5 6 7 8 9])])
>>> ak.squeeze(x,axis=2).shape
(1, 10)
>>> ak.squeeze(x,axis=(0,2))
array([0 1 2 3 4 5 6 7 8 9])
>>> ak.squeeze(x,axis=(0,2)).shape
(10,)
arkouda.numpy.standard_normal(size: arkouda.numpy.dtypes.int_scalars, seed: None | arkouda.numpy.dtypes.int_scalars = None) arkouda.numpy.pdarrayclass.pdarray[source]

Draw real numbers from the standard normal distribution.

Parameters:
  • size (int_scalars) – The number of samples to draw (size of the returned array)

  • seed (int_scalars) – Value used to initialize the random number generator

Returns:

The array of random numbers

Return type:

pdarray, float64

Raises:
  • TypeError – Raised if size is not an int

  • ValueError – Raised if size < 0

See also

randint

Notes

For random samples from \(N(\mu, \sigma^2)\), use:

(sigma * standard_normal(size)) + mu

Examples

>>> ak.standard_normal(3,1)
array([-0.68586185091150265 1.1723810583573377 0.567584107142031])
arkouda.numpy.std(pda: pdarray, ddof: arkouda.numpy.dtypes.int_scalars = 0) numpy.float64[source]

Return the standard deviation of values in the array. The standard deviation is implemented as the square root of the variance.

Parameters:
  • pda (pdarray) – values for which to calculate the standard deviation

  • ddof (int_scalars) – “Delta Degrees of Freedom” used in calculating std

Returns:

The scalar standard deviation of the array

Return type:

np.float64

Examples

>>> a = ak.arange(10)
>>> ak.std(a)
2.8722813232690143
>>> a.std()
2.8722813232690143
Raises:
  • TypeError – Raised if pda is not a pdarray instance or ddof is not an integer

  • ValueError – Raised if ddof is an integer < 0

  • RuntimeError – Raised if there’s a server-side error thrown

See also

mean, var

Notes

The standard deviation is the square root of the average of the squared deviations from the mean, i.e., std = sqrt(mean((x - x.mean())**2)).

The average squared deviation is normally calculated as x.sum() / N, where N = len(x). If, however, ddof is specified, the divisor N - ddof is used instead. In standard statistical practice, ddof=1 provides an unbiased estimator of the variance of the infinite population. ddof=0 provides a maximum likelihood estimate of the variance for normally distributed variables. The standard deviation computed in this function is the square root of the estimated variance, so even with ddof=1, it will not be an unbiased estimate of the standard deviation per se.

class arkouda.numpy.str_

A unicode string.

This type strips trailing null codepoints.

>>> s = np.str_("abc\x00")
>>> s
'abc'

Unlike the builtin str, this supports the python:bufferobjects, exposing its contents as UCS4:

>>> m = memoryview(np.str_("abc"))
>>> m.format
'3w'
>>> m.tobytes()
b'a\x00\x00\x00b\x00\x00\x00c\x00\x00\x00'
Character code:

'U'

Alias:

numpy.unicode_

T(*args, **kwargs)

Scalar attribute identical to the corresponding array attribute.

Please see ndarray.T.

all(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.all.

any(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.any.

argmax(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.argmax.

argmin(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.argmin.

argsort(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.argsort.

astype(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.astype.

base(*args, **kwargs)

Scalar attribute identical to the corresponding array attribute.

Please see ndarray.base.

byteswap(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.byteswap.

choose(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.choose.

clip(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.clip.

compress(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.compress.

conj(*args, **kwargs)
conjugate(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.conjugate.

copy(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.copy.

cumprod(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.cumprod.

cumsum(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.cumsum.

data(*args, **kwargs)

Pointer to start of data.

diagonal(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.diagonal.

dtype(*args, **kwargs)

Get array data-descriptor.

dump(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.dump.

dumps(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.dumps.

fill(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.fill.

flags(*args, **kwargs)

The integer value of flags.

flat(*args, **kwargs)

A 1-D view of the scalar.

flatten(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.flatten.

getfield(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.getfield.

imag(*args, **kwargs)

The imaginary part of the scalar.

item(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.item.

itemset(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.itemset.

itemsize(*args, **kwargs)

The length of one element in bytes.

max(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.max.

mean(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.mean.

min(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.min.

nbytes(*args, **kwargs)

The length of the scalar in bytes.

ndim(*args, **kwargs)

The number of array dimensions.

newbyteorder(*args, **kwargs)

newbyteorder(new_order=’S’, /)

Return a new dtype with a different byte order.

Changes are also made in all fields and sub-arrays of the data type.

The new_order code can be any from the following:

  • ‘S’ - swap dtype from current to opposite endian

  • {‘<’, ‘little’} - little endian

  • {‘>’, ‘big’} - big endian

  • {‘=’, ‘native’} - native order

  • {‘|’, ‘I’} - ignore (no change to byte order)

new_orderstr, optional

Byte order to force; a value from the byte order specifications above. The default value (‘S’) results in swapping the current byte order.

new_dtypedtype

New dtype object with the given change to the byte order.

nonzero(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.nonzero.

prod(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.prod.

ptp(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.ptp.

put(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.put.

ravel(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.ravel.

real(*args, **kwargs)

The real part of the scalar.

repeat(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.repeat.

reshape(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.reshape.

resize(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.resize.

round(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.round.

searchsorted(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.searchsorted.

setfield(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.setfield.

setflags(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.setflags.

shape(*args, **kwargs)

Tuple of array dimensions.

size(*args, **kwargs)

The number of elements in the gentype.

sort(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.sort.

squeeze(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.squeeze.

std(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.std.

strides(*args, **kwargs)

Tuple of bytes steps in each dimension.

sum(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.sum.

swapaxes(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.swapaxes.

take(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.take.

tobytes(*args, **kwargs)
tofile(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.tofile.

tolist(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.tolist.

tostring(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.tostring.

trace(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.trace.

transpose(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.transpose.

var(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.var.

view(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.view.

class arkouda.numpy.str_

A unicode string.

This type strips trailing null codepoints.

>>> s = np.str_("abc\x00")
>>> s
'abc'

Unlike the builtin str, this supports the python:bufferobjects, exposing its contents as UCS4:

>>> m = memoryview(np.str_("abc"))
>>> m.format
'3w'
>>> m.tobytes()
b'a\x00\x00\x00b\x00\x00\x00c\x00\x00\x00'
Character code:

'U'

Alias:

numpy.unicode_

T(*args, **kwargs)

Scalar attribute identical to the corresponding array attribute.

Please see ndarray.T.

all(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.all.

any(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.any.

argmax(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.argmax.

argmin(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.argmin.

argsort(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.argsort.

astype(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.astype.

base(*args, **kwargs)

Scalar attribute identical to the corresponding array attribute.

Please see ndarray.base.

byteswap(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.byteswap.

choose(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.choose.

clip(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.clip.

compress(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.compress.

conj(*args, **kwargs)
conjugate(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.conjugate.

copy(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.copy.

cumprod(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.cumprod.

cumsum(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.cumsum.

data(*args, **kwargs)

Pointer to start of data.

diagonal(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.diagonal.

dtype(*args, **kwargs)

Get array data-descriptor.

dump(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.dump.

dumps(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.dumps.

fill(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.fill.

flags(*args, **kwargs)

The integer value of flags.

flat(*args, **kwargs)

A 1-D view of the scalar.

flatten(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.flatten.

getfield(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.getfield.

imag(*args, **kwargs)

The imaginary part of the scalar.

item(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.item.

itemset(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.itemset.

itemsize(*args, **kwargs)

The length of one element in bytes.

max(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.max.

mean(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.mean.

min(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.min.

nbytes(*args, **kwargs)

The length of the scalar in bytes.

ndim(*args, **kwargs)

The number of array dimensions.

newbyteorder(*args, **kwargs)

newbyteorder(new_order=’S’, /)

Return a new dtype with a different byte order.

Changes are also made in all fields and sub-arrays of the data type.

The new_order code can be any from the following:

  • ‘S’ - swap dtype from current to opposite endian

  • {‘<’, ‘little’} - little endian

  • {‘>’, ‘big’} - big endian

  • {‘=’, ‘native’} - native order

  • {‘|’, ‘I’} - ignore (no change to byte order)

new_orderstr, optional

Byte order to force; a value from the byte order specifications above. The default value (‘S’) results in swapping the current byte order.

new_dtypedtype

New dtype object with the given change to the byte order.

nonzero(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.nonzero.

prod(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.prod.

ptp(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.ptp.

put(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.put.

ravel(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.ravel.

real(*args, **kwargs)

The real part of the scalar.

repeat(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.repeat.

reshape(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.reshape.

resize(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.resize.

round(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.round.

searchsorted(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.searchsorted.

setfield(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.setfield.

setflags(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.setflags.

shape(*args, **kwargs)

Tuple of array dimensions.

size(*args, **kwargs)

The number of elements in the gentype.

sort(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.sort.

squeeze(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.squeeze.

std(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.std.

strides(*args, **kwargs)

Tuple of bytes steps in each dimension.

sum(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.sum.

swapaxes(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.swapaxes.

take(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.take.

tobytes(*args, **kwargs)
tofile(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.tofile.

tolist(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.tolist.

tostring(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.tostring.

trace(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.trace.

transpose(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.transpose.

var(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.var.

view(*args, **kwargs)

Scalar method identical to the corresponding array attribute.

Please see ndarray.view.

class arkouda.numpy.str_scalars(origin, params, *, inst=True, name=None)

Bases: _GenericAlias

The central part of internal API.

This represents a generic version of type ‘origin’ with type arguments ‘params’. There are two kind of these aliases: user defined and special. The special ones are wrappers around builtin collections and ABCs in collections.abc. These must have ‘name’ always set. If ‘inst’ is False, then the alias can’t be instantiated, this is used by e.g. typing.List and typing.Dict.

arkouda.numpy.tan(pda: arkouda.numpy.pdarrayclass.pdarray, where: bool | arkouda.numpy.pdarrayclass.pdarray = True) arkouda.numpy.pdarrayclass.pdarray[source]

Return the element-wise tangent of the array.

Parameters:
  • pda (pdarray)

  • where (bool or pdarray, default=True) – This condition is broadcast over the input. At locations where the condition is True, the tangent will be applied to the corresponding value. Elsewhere, it will retain its original value. Default set to True.

Returns:

A pdarray containing tangent for each element of the original pdarray

Return type:

pdarray

Raises:

TypeError – Raised if the parameter is not a pdarray

arkouda.numpy.tanh(pda: arkouda.numpy.pdarrayclass.pdarray, where: bool | arkouda.numpy.pdarrayclass.pdarray = True) arkouda.numpy.pdarrayclass.pdarray[source]

Return the element-wise hyperbolic tangent of the array.

Parameters:
  • pda (pdarray)

  • where (bool or pdarray, default=True) – This condition is broadcast over the input. At locations where the condition is True, the hyperbolic tangent will be applied to the corresponding value. Elsewhere, it will retain its original value. Default set to True.

Returns:

A pdarray containing hyperbolic tangent for each element of the original pdarray

Return type:

pdarray

Raises:

TypeError – Raised if the parameter is not a pdarray

arkouda.numpy.tile(A: arkouda.numpy.pdarrayclass.pdarray, /, reps: int | Tuple[int, Ellipsis]) arkouda.numpy.pdarrayclass.pdarray[source]

Construct an array by repeating A the number of times given by reps.

If reps has length d, the result will have dimension of max(d, A.ndim).

If A.ndim < d, A is promoted to be d-dimensional by prepending new axes. So a shape (3,) array is promoted to (1, 3) for 2-D replication, or shape (1, 1, 3) for 3-D replication. If this is not the desired behavior, promote A to d-dimensions manually before calling this function.

If A.ndim > d, reps is promoted to A.ndim by prepending 1’s to it. Thus for an A of shape (2, 3, 4, 5), a reps of (2, 2) is treated as (1, 1, 2, 2).

Parameters:
  • A (pdarray) – The input pdarray to be tiled

  • reps (int or Tuple of int) – The number of repetitions of A along each axis.

Returns:

A new pdarray with the tiled data.

Return type:

pdarray

Examples

>>> a = ak.array([0, 1, 2])
>>> ak.tile(a, 2)
array([0 1 2 0 1 2])
>>> ak.tile(a, (2, 2))
array([array([0 1 2 0 1 2]) array([0 1 2 0 1 2])])
>>> ak.tile(a, (2, 1, 2))
array([array([array([0 1 2 0 1 2])]) array([array([0 1 2 0 1 2])])])
>>> b = ak.array([[1, 2], [3, 4]])
>>> ak.tile(b, 2)
array([array([1 2 1 2]) array([3 4 3 4])])
>>> ak.tile(b, (2, 1))
array([array([1 2]) array([3 4]) array([1 2]) array([3 4])])
>>> c = ak.array([1, 2, 3, 4])
>>> ak.tile(c, (4, 1))
array([array([1 2 3 4]) array([1 2 3 4]) array([1 2 3 4]) array([1 2 3 4])])
arkouda.numpy.timedelta_range(start=None, end=None, periods=None, freq=None, name=None, closed=None, **kwargs)[source]

Return a fixed frequency TimedeltaIndex, with day as the default frequency. Alias for ak.Timedelta(pd.timedelta_range(args)). Subject to size limit imposed by client.maxTransferBytes.

Parameters:
  • start (str or timedelta-like, default None) – Left bound for generating timedeltas.

  • end (str or timedelta-like, default None) – Right bound for generating timedeltas.

  • periods (int, default None) – Number of periods to generate.

  • freq (str or DateOffset, default 'D') – Frequency strings can have multiples, e.g. ‘5H’.

  • name (str, default None) – Name of the resulting TimedeltaIndex.

  • closed (str, default None) – Make the interval closed with respect to the given frequency to the ‘left’, ‘right’, or both sides (None).

Returns:

rng

Return type:

TimedeltaIndex

Notes

Of the four parameters start, end, periods, and freq, exactly three must be specified. If freq is omitted, the resulting TimedeltaIndex will have periods linearly spaced elements between start and end (closed on both sides).

To learn more about the frequency strings, please see this link.

arkouda.numpy.transpose(pda: arkouda.numpy.pdarrayclass.pdarray, axes: Tuple[int, Ellipsis] | None = None) arkouda.numpy.pdarrayclass.pdarray[source]

Compute the transpose of a matrix.

Parameters:
  • pda (pdarray)

  • axes (Tuple[int,...] Optional, defaults to None) – If specified, must be a tuple which contains a permutation of the axes of pda.

Returns:

the transpose of the input matrix For a 1-D array, this is the original array. For a 2-D array, this is the standard matrix transpose. For an n-D array, if axes are given, their order indicates how the axes are permuted. If axes is None, the axes are reversed.

Return type:

pdarray

Examples

>>> a = ak.array([[1,2,3,4,5],[1,2,3,4,5]])
>>> ak.transpose(a)
array([array([1 1]) array([2 2]) array([3 3]) array([4 4]) array([5 5])])
>>> z = ak.array(np.arange(27).reshape(3,3,3))
>>> ak.transpose(z,axes=(1,0,2))
array([array([array([0 1 2]) array([9 10 11]) array([18 19 20])]) array([array([3 4 5])
  array([12 13 14]) array([21 22 23])]) array([array([6 7 8]) array([15 16 17]) array([24 25 26])])])
Raises:
  • ValueError – Raised if axes is not a legitimate permutation of the axes of pda

  • TypeError – Raised if pda is not a pdarray, or if axes is neither a tuple nor None

arkouda.numpy.tril(pda: arkouda.numpy.pdarrayclass.pdarray, diag: arkouda.numpy.dtypes.int_scalars = 0) arkouda.numpy.pdarrayclass.pdarray[source]

Return a copy of the pda with the upper triangle zeroed out

Parameters:
  • pda (pdarray)

  • diag (int_scalars, optional) –

    if diag = 0, zeros start just above the main diagonal
    if diag = 1, zeros start at the main diagonal
    if diag = 2, zeros start just below the main diagonal
    etc. Default set to 0.

Returns:

a copy of pda with zeros in the upper triangle

Return type:

pdarray

Examples

>>> a = ak.array([[1,2,3,4,5],[2,3,4,5,6],[3,4,5,6,7],[4,5,6,7,8],[5,6,7,8,9]])
>>> ak.tril(a,diag=4)
array([array([1 2 3 4 5]) array([2 3 4 5 6]) array([3 4 5 6 7])
array([4 5 6 7 8]) array([5 6 7 8 9])])
>>> ak.tril(a,diag=3)
array([array([1 2 3 4 0]) array([2 3 4 5 6]) array([3 4 5 6 7])
array([4 5 6 7 8]) array([5 6 7 8 9])])
>>> ak.tril(a,diag=2)
array([array([1 2 3 0 0]) array([2 3 4 5 0]) array([3 4 5 6 7])
array([4 5 6 7 8]) array([5 6 7 8 9])])
>>> ak.tril(a,diag=1)
array([array([1 2 0 0 0]) array([2 3 4 0 0]) array([3 4 5 6 0])
array([4 5 6 7 8]) array([5 6 7 8 9])])
>>> ak.tril(a,diag=0)
array([array([1 0 0 0 0]) array([2 3 0 0 0]) array([3 4 5 0 0])
array([4 5 6 7 0]) array([5 6 7 8 9])])

Notes

Server returns an error if rank of pda < 2

arkouda.numpy.triu(pda: arkouda.numpy.pdarrayclass.pdarray, diag: arkouda.numpy.dtypes.int_scalars = 0) arkouda.numpy.pdarrayclass.pdarray[source]

Return a copy of the pda with the lower triangle zeroed out

Parameters:
  • pda (pdarray)

  • diag (int_scalars, default=0) –

    if diag = 0, zeros start just below the main diagonal
    if diag = 1, zeros start at the main diagonal
    if diag = 2, zeros start just above the main diagonal
    etc. Default set to 0.

Returns:

a copy of pda with zeros in the lower triangle

Return type:

pdarray

Examples

>>> a = ak.array([[1,2,3,4,5],[2,3,4,5,6],[3,4,5,6,7],[4,5,6,7,8],[5,6,7,8,9]])
>>> ak.triu(a,diag=0)
array([array([1 2 3 4 5]) array([0 3 4 5 6]) array([0 0 5 6 7])
array([0 0 0 7 8]) array([0 0 0 0 9])])
>>> ak.triu(a,diag=1)
array([array([0 2 3 4 5]) array([0 0 4 5 6]) array([0 0 0 6 7])
array([0 0 0 0 8]) array([0 0 0 0 0])])
>>> ak.triu(a,diag=2)
array([array([0 0 3 4 5]) array([0 0 0 5 6]) array([0 0 0 0 7])
array([0 0 0 0 0]) array([0 0 0 0 0])])
>>> ak.triu(a,diag=3)
array([array([0 0 0 4 5]) array([0 0 0 0 6]) array([0 0 0 0 0])
array([0 0 0 0 0]) array([0 0 0 0 0])])
>>> ak.triu(a,diag=4)
array([array([0 0 0 0 5]) array([0 0 0 0 0]) array([0 0 0 0 0])
array([0 0 0 0 0]) array([0 0 0 0 0])])

Notes

Server returns an error if rank of pda < 2

arkouda.numpy.trunc(pda: arkouda.numpy.pdarrayclass.pdarray) arkouda.numpy.pdarrayclass.pdarray[source]

Return the element-wise truncation of the array.

Parameters:

pda (pdarray)

Returns:

A pdarray containing input array elements truncated to the nearest integer

Return type:

pdarray

Raises:

TypeError – Raised if the parameter is not a pdarray

Examples

>>> ak.trunc(ak.array([1.1, 2.5, 3.14159]))
array([1.00000000000000000 2.00000000000000000 3.00000000000000000])
class arkouda.numpy.uint16(value)

Bases: numpy.unsignedinteger

Unsigned integer type, compatible with C unsigned short.

Character code:

'H'

Canonical name:

numpy.ushort

Alias on this platform (Linux x86_64):

numpy.uint16: 16-bit unsigned integer (0 to 65_535).

bit_count(*args, **kwargs)

uint16.bit_count() -> int

Computes the number of 1-bits in the absolute value of the input. Analogous to the builtin int.bit_count or popcount in C++.

>>> np.uint16(127).bit_count()
7
class arkouda.numpy.uint32(value)

Bases: numpy.unsignedinteger

Unsigned integer type, compatible with C unsigned int.

Character code:

'I'

Canonical name:

numpy.uintc

Alias on this platform (Linux x86_64):

numpy.uint32: 32-bit unsigned integer (0 to 4_294_967_295).

bit_count(*args, **kwargs)

uint32.bit_count() -> int

Computes the number of 1-bits in the absolute value of the input. Analogous to the builtin int.bit_count or popcount in C++.

>>> np.uint32(127).bit_count()
7
class arkouda.numpy.uint64(value)

Bases: numpy.unsignedinteger

Unsigned integer type, compatible with C unsigned long.

Character code:

'L'

Canonical name:

numpy.uint

Alias on this platform (Linux x86_64):

numpy.uint64: 64-bit unsigned integer (0 to 18_446_744_073_709_551_615).

Alias on this platform (Linux x86_64):

numpy.uintp: Unsigned integer large enough to fit pointer, compatible with C uintptr_t.

bit_count(*args, **kwargs)

uint64.bit_count() -> int

Computes the number of 1-bits in the absolute value of the input. Analogous to the builtin int.bit_count or popcount in C++.

>>> np.uint64(127).bit_count()
7
class arkouda.numpy.uint8(value)

Bases: numpy.unsignedinteger

Unsigned integer type, compatible with C unsigned char.

Character code:

'B'

Canonical name:

numpy.ubyte

Alias on this platform (Linux x86_64):

numpy.uint8: 8-bit unsigned integer (0 to 255).

bit_count(*args, **kwargs)

uint8.bit_count() -> int

Computes the number of 1-bits in the absolute value of the input. Analogous to the builtin int.bit_count or popcount in C++.

>>> np.uint8(127).bit_count()
7
arkouda.numpy.uniform(size: arkouda.numpy.dtypes.int_scalars, low: arkouda.numpy.dtypes.numeric_scalars = float(0.0), high: arkouda.numpy.dtypes.numeric_scalars = 1.0, seed: None | arkouda.numpy.dtypes.int_scalars = None) arkouda.numpy.pdarrayclass.pdarray[source]

Generate a pdarray with uniformly distributed random float values in a specified range.

Parameters:
  • low (float_scalars) – The low value (inclusive) of the range, defaults to 0.0

  • high (float_scalars) – The high value (inclusive) of the range, defaults to 1.0

  • size (int_scalars) – The length of the returned array

  • seed (int_scalars, optional) – Value used to initialize the random number generator

Returns:

Values drawn uniformly from the specified range

Return type:

pdarray, float64

Raises:
  • TypeError – Raised if dtype.name not in DTypes, size is not an int, or if either low or high is not an int or float

  • ValueError – Raised if size < 0 or if high < low

Notes

The logic for uniform is delegated to the ak.randint method which is invoked with a dtype of float64

Examples

>>> ak.uniform(3,seed=1701)
array([0.011410423448327005 0.73618171558685619 0.12367222192448891])
>>> ak.uniform(size=3,low=0,high=5,seed=0)
array([0.30013431967121934 0.47383036230759112 1.0441791878997098])
arkouda.numpy.union1d(A: arkouda.groupbyclass.groupable, B: arkouda.groupbyclass.groupable) arkouda.groupbyclass.groupable[source]

Find the union of two arrays/List of Arrays.

Return the unique, sorted array of values that are in either of the two input arrays.

Parameters:
Returns:

Unique, sorted union of the input arrays.

Return type:

pdarray/groupable

Raises:
  • TypeError – Raised if either A or B is not a groupable

  • RuntimeError – Raised if the dtype of either input is not supported

Examples

1D Example

>>> ak.union1d(ak.array([-1, 0, 1]), ak.array([-2, 0, 2]))
array([-2 -1 0 1 2])

Multi-Array Example

>>> a = ak.arange(1, 6)
>>> b = ak.array([1, 5, 3, 4, 2])
>>> c = ak.array([1, 4, 3, 2, 5])
>>> d = ak.array([1, 2, 3, 5, 4])
>>> multia = [a, a, a]
>>> multib = [b, c, d]
>>> ak.union1d(multia, multib)
[array([1 2 2 3 4 4 5 5]), array([1 2 5 3 2 4 4 5]), array([1 2 4 3 5 4 2 5])]
arkouda.numpy.unregister(name: str) str[source]
arkouda.numpy.unregister_all(names: list)[source]

Unregister all names provided

Parameters:

names (list) – List of names used to register objects to be unregistered

Return type:

None

arkouda.numpy.unregister_pdarray_by_name(user_defined_name: str) None[source]

Unregister a named pdarray in the arkouda server which was previously registered using register() and/or attahced to using attach_pdarray()

Parameters:

user_defined_name (str) – user defined name which array was registered under

Return type:

None

Raises:

RuntimeError – Raised if the server could not find the internal name/symbol to remove

See also

register, unregister, is_registered, list_registry, attach

Notes

Registered names/pdarrays in the server are immune to deletion until they are unregistered.

Examples

>>> a = zeros(100)
>>> a.register("my_zeros")
>>> # potentially disconnect from server and reconnect to server
>>> b = ak.attach_pdarray("my_zeros")
>>> # ...other work...
>>> ak.unregister_pdarray_by_name(b)
arkouda.numpy.value_counts(pda: arkouda.numpy.pdarrayclass.pdarray) tuple[arkouda.groupbyclass.groupable, arkouda.numpy.pdarrayclass.pdarray][source]

Count the occurrences of the unique values of an array.

Parameters:

pda (pdarray) – The array of values to count

Returns:

  • unique_values (pdarray, int64 or Strings) – The unique values, sorted in ascending order

  • counts (pdarray, int64) – The number of times the corresponding unique value occurs

Raises:

TypeError – Raised if the parameter is not a pdarray

See also

unique, histogram

Notes

This function differs from histogram() in that it only returns counts for values that are present, leaving out empty “bins”. This function delegates all logic to the unique() method where the return_counts parameter is set to True.

Examples

>>> A = ak.array([2, 0, 2, 4, 0, 0])
>>> ak.value_counts(A)
(array([0 2 4]), array([3 2 1]))
arkouda.numpy.var(pda: pdarray, ddof: arkouda.numpy.dtypes.int_scalars = 0) numpy.float64[source]

Return the variance of values in the array.

Parameters:
  • pda (pdarray) – Values for which to calculate the variance

  • ddof (int_scalars) – “Delta Degrees of Freedom” used in calculating var

Returns:

The scalar variance of the array

Return type:

np.float64

Examples

>>> a = ak.arange(10)
>>> ak.var(a)
8.25
>>> a.var()
8.25
Raises:
  • TypeError – Raised if pda is not a pdarray instance

  • ValueError – Raised if the ddof >= pdarray size

  • RuntimeError – Raised if there’s a server-side error thrown

See also

mean, std

Notes

The variance is the average of the squared deviations from the mean, i.e., var = mean((x - x.mean())**2).

The mean is normally calculated as x.sum() / N, where N = len(x). If, however, ddof is specified, the divisor N - ddof is used instead. In standard statistical practice, ddof=1 provides an unbiased estimator of the variance of a hypothetical infinite population. ddof=0 provides a maximum likelihood estimate of the variance for normally distributed variables.

arkouda.numpy.vecdot(x1: arkouda.numpy.pdarrayclass.pdarray, x2: arkouda.numpy.pdarrayclass.pdarray) arkouda.numpy.pdarrayclass.pdarray[source]

Compute the generalized dot product of two vectors along the given axis. Assumes that both tensors have already been broadcast to the same shape.

Parameters:
Returns:

x1 vecdot x2

Return type:

pdarray

Examples

>>> a = ak.array([[1,2,3,4,5],[1,2,3,4,5]])
>>> b = ak.array([[2,2,2,2,2],[2,2,2,2,2]])
>>> ak.vecdot(a,b)
array([4 8 12 16 20])
>>> ak.vecdot(b,a)
array([4 8 12 16 20])
Raises:

ValueTypeError – Raised if x1 and x2 are not of matching shape or if rank of x1 < 2

arkouda.numpy.vstack(tup: Tuple[arkouda.numpy.pdarrayclass.pdarray] | List[arkouda.numpy.pdarrayclass.pdarray], *, dtype: type | str | None = None, casting: Literal['no', 'equiv', 'safe', 'same_kind', 'unsafe'] = 'same_kind') arkouda.numpy.pdarrayclass.pdarray[source]

Stack a sequence of arrays vertically (row-wise).

This is equivalent to concatenation along the first axis after 1-D arrays of shape (N,) have been reshaped to (1,N).

Parameters:
  • tup (Tuple[pdarray]) – The arrays to be stacked

  • dtype (Optional[Union[type, str]], optional) – The data-type of the output array. If not provided, the output array will be determined using np.common_type on the input arrays Defaults to None

  • casting ({"no", "equiv", "safe", "same_kind", "unsafe"], optional) – Controls what kind of data casting may occur - currently unused

Returns:

  • pdarray – The stacked array

arkouda.numpy.where(condition: arkouda.numpy.pdarrayclass.pdarray, A: str | arkouda.numpy.dtypes.numeric_scalars | arkouda.numpy.pdarrayclass.pdarray | arkouda.numpy.strings.Strings | arkouda.categorical.Categorical, B: str | arkouda.numpy.dtypes.numeric_scalars | arkouda.numpy.pdarrayclass.pdarray | arkouda.numpy.strings.Strings | arkouda.categorical.Categorical) arkouda.numpy.pdarrayclass.pdarray | arkouda.numpy.strings.Strings | arkouda.categorical.Categorical[source]

Returns an array with elements chosen from A and B based upon a conditioning array. As is the case with numpy.where, the return array consists of values from the first array (A) where the conditioning array elements are True and from the second array (B) where the conditioning array elements are False.

Parameters:
Returns:

Values chosen from A where the condition is True and B where the condition is False

Return type:

pdarray

Raises:
  • TypeError – Raised if the condition object is not a pdarray, if A or B is not an int, np.int64, float, np.float64, bool, pdarray, str, Strings, Categorical if pdarray dtypes are not supported or do not match, or multiple condition clauses (see Notes section) are applied

  • ValueError – Raised if the shapes of the condition, A, and B pdarrays are unequal

Examples

>>> a1 = ak.arange(1,10)
>>> a2 = ak.ones(9, dtype=np.int64)
>>> cond = a1 < 5
>>> ak.where(cond,a1,a2)
array([1 2 3 4 1 1 1 1 1])
>>> a1 = ak.arange(1,10)
>>> a2 = ak.ones(9, dtype=np.int64)
>>> cond = a1 == 5
>>> ak.where(cond,a1,a2)
array([1 1 1 1 5 1 1 1 1])
>>> a1 = ak.arange(1,10)
>>> a2 = 10
>>> cond = a1 < 5
>>> ak.where(cond,a1,a2)
array([1 2 3 4 10 10 10 10 10])
>>> s1 = ak.array([f'str {i}' for i in range(10)])
>>> s2 = 'str 21'
>>> cond = (ak.arange(10) % 2 == 0)
>>> ak.where(cond,s1,s2)
array(['str 0', 'str 21', 'str 2', 'str 21', 'str 4',
'str 21', 'str 6', 'str 21', 'str 8', 'str 21'])
>>> c1 = ak.Categorical(ak.array([f'str {i}' for i in range(10)]))
>>> c2 = ak.Categorical(ak.array([f'str {i}' for i in range(9, -1, -1)]))
>>> cond = (ak.arange(10) % 2 == 0)
>>> ak.where(cond,c1,c2)
array(['str 0', 'str 8', 'str 2', 'str 6', 'str 4',
'str 4', 'str 6', 'str 2', 'str 8', 'str 0'])

Notes

A and B must have the same dtype and only one conditional clause is supported e.g., n < 5, n > 1, which is supported in numpy is not currently supported in Arkouda

arkouda.numpy.zeros(size: arkouda.numpy.dtypes.int_scalars | Tuple[arkouda.numpy.dtypes.int_scalars, Ellipsis] | str, dtype: numpy.dtype | type | str | arkouda.numpy.dtypes.bigint = float64, max_bits: int | None = None) arkouda.numpy.pdarrayclass.pdarray[source]

Create a pdarray filled with zeros.

Parameters:
  • size (int_scalars or tuple of int_scalars) – Size or shape of the array

  • dtype (all_scalars) – Type of resulting array, default ak.float64

  • max_bits (int) – Specifies the maximum number of bits; only used for bigint pdarrays Included for consistency, as zeros are represented as all zeros, regardless of the value of max_bits

Returns:

Zeros of the requested size or shape and dtype

Return type:

pdarray

Raises:
  • TypeError – Raised if the supplied dtype is not supported

  • RuntimeError – Raised if the size parameter is neither an int nor a str that is parseable to an int.

  • ValueError – Raised if the rank of the given shape is not in get_array_ranks() or is empty Raised if max_bits is not NONE and ndim does not equal 1

See also

ones, zeros_like

Examples

>>> ak.zeros(5, dtype=ak.int64)
array([0 0 0 0 0])
>>> ak.zeros(5, dtype=ak.float64)
array([0.00000000000000000 0.00000000000000000 0.00000000000000000
       0.00000000000000000 0.00000000000000000])
>>> ak.zeros(5, dtype=ak.bool_)
array([False False False False False])
arkouda.numpy.zeros(size: arkouda.numpy.dtypes.int_scalars | Tuple[arkouda.numpy.dtypes.int_scalars, Ellipsis] | str, dtype: numpy.dtype | type | str | arkouda.numpy.dtypes.bigint = float64, max_bits: int | None = None) arkouda.numpy.pdarrayclass.pdarray[source]

Create a pdarray filled with zeros.

Parameters:
  • size (int_scalars or tuple of int_scalars) – Size or shape of the array

  • dtype (all_scalars) – Type of resulting array, default ak.float64

  • max_bits (int) – Specifies the maximum number of bits; only used for bigint pdarrays Included for consistency, as zeros are represented as all zeros, regardless of the value of max_bits

Returns:

Zeros of the requested size or shape and dtype

Return type:

pdarray

Raises:
  • TypeError – Raised if the supplied dtype is not supported

  • RuntimeError – Raised if the size parameter is neither an int nor a str that is parseable to an int.

  • ValueError – Raised if the rank of the given shape is not in get_array_ranks() or is empty Raised if max_bits is not NONE and ndim does not equal 1

See also

ones, zeros_like

Examples

>>> ak.zeros(5, dtype=ak.int64)
array([0 0 0 0 0])
>>> ak.zeros(5, dtype=ak.float64)
array([0.00000000000000000 0.00000000000000000 0.00000000000000000
       0.00000000000000000 0.00000000000000000])
>>> ak.zeros(5, dtype=ak.bool_)
array([False False False False False])
arkouda.numpy.zeros_like(pda: arkouda.numpy.pdarrayclass.pdarray) arkouda.numpy.pdarrayclass.pdarray[source]

Create a zero-filled pdarray of the same size and dtype as an existing pdarray.

Parameters:

pda (pdarray) – Array to use for size and dtype

Returns:

Equivalent to ak.zeros(pda.size, pda.dtype)

Return type:

pdarray

Raises:

TypeError – Raised if the pda parameter is not a pdarray.

See also

zeros, ones_like

Examples

>>> ak.zeros_like(ak.ones(5,dtype=ak.int64))
array([0 0 0 0 0])
>>> ak.zeros_like(ak.ones(5,dtype=ak.float64))
array([0.00000000000000000 0.00000000000000000 0.00000000000000000
       0.00000000000000000 0.00000000000000000])
>>> ak.zeros_like(ak.ones(5,dtype=ak.bool_))
array([False False False False False])