The `pdarray` class¶

Just as the backbone of NumPy is the ndarray, the backbone of arkouda is an array class called pdarray. And just as the ndarray object is a Python wrapper for C-style data with C and Fortran methods, the pdarray object is a Python wrapper for distributed data with parallel methods written in Chapel. The API of pdarray is similar, but not identical, to that of ndarray.

class arkouda.pdarray(name, mydtype, size, ndim, shape, itemsize, max_bits=None)[source]¶

The basic arkouda array class. This class contains only the attributies of the array; the data resides on the arkouda server. When a server operation results in a new array, arkouda will create a pdarray instance that points to the array data on the server. As such, the user should not initialize pdarray instances directly.

name¶

The server-side identifier for the array

Type:: str

dtype¶

The element type of the array

Type:: dtype

size¶

The number of elements in the array

Type:: int_scalars

ndim¶

The rank of the array (currently only rank 1 arrays supported)

Type:: int_scalars

shape¶

A list or tuple containing the sizes of each dimension of the array

Type:: Sequence[int]

itemsize¶

The size in bytes of each element

Type:: int_scalars

Data Type¶

Currently, pdarray supports three user-facing data types (strings are exposed via a separate class, see Strings in Arkouda):

int64: 64-bit signed integer
float64: IEEE 64-bit floating point number
bool: 8-bit boolean value

Arkouda inherits all of its data types from numpy. For example, ak.int64 is derived from np.int64.

Rank¶

Currently, a pdarray can only have rank 1. We plan to support sparse, multi-dimensional arrays via data structures incorporating rank-1 pdarray objects.

Name¶

The name attribute of an array is a string used by the arkouda server to identify the pdarray object in its symbol table. This name is chosen by the server, and the user should not overwrite it.

Operators¶

The pdarray class supports most Python special methods, including arithmetic, bitwise, and comparison operators.

Iteration¶

Iterating directly over a pdarray with for x in array is not supported to discourage transferring all array data from the arkouda server to the Python client since there is almost always a more array-oriented way to express an iterator-based computation. To force this transfer, use the to_ndarray function to return the pdarray as a numpy.ndarray. This transfer will raise an error if it exceeds the byte limit defined in ak.client.maxTransferBytes.

arkouda.pdarray.to_ndarray(self)¶

Convert the array to a np.ndarray, transferring array data from the Arkouda server to client-side Python. Note: if the pdarray size exceeds client.maxTransferBytes, a RuntimeError is raised.

Returns:: A numpy ndarray with the same attributes and data as the pdarray
Return type:: np.ndarray
Raises:: RuntimeError – Raised if there is a server-side error thrown, if the pdarray size exceeds the built-in client.maxTransferBytes size limit, or if the bytes received does not match expected number of bytes

Notes

The number of bytes in the array cannot exceed client.maxTransferBytes, otherwise a RuntimeError will be raised. This is to protect the user from overflowing the memory of the system on which the Python client is running, under the assumption that the server is running on a distributed system with much more memory than the client. The user may override this limit by setting client.maxTransferBytes to a larger value, but proceed with caution.

Type Casting¶

Conversion between dtypes is sometimes implicit, as in the following example:

>>> a = ak.arange(10)
>>> b = 1.0 * a
>>> b.dtype
dtype('float64')

Explicit conversion is supported via the cast function.

arkouda.cast(pda, dt, errors=ErrorMode.strict)[source]¶

Cast an array to another dtype.

Parameters:

pda (pdarray or Strings) – The array of values to cast
dt (np.dtype, type, or str) – The target dtype to cast values to
errors ({strict, ignore, return_validity}) –
Controls how errors are handled when casting strings to a numeric type (ignored for casts from numeric types).
- strict: raise RuntimeError if any string cannot be converted
- ignore: never raise an error. Uninterpretable strings get
  converted to NaN (float64), -2**63 (int64), zero (uint64 and uint8), or False (bool)
- return_validity: in addition to returning the same output as “ignore”, also return a bool array indicating where the cast was successful.

Return type:

Union[pdarray, Strings, TypeVar(Categorical), Tuple[pdarray, pdarray]]

Returns:

pdarray or Strings – Array of values cast to desired dtype
[validity (pdarray(bool)]) – If errors=”return_validity” and input is Strings, a second array is returned with True where the cast succeeded and False where it failed.

Notes

The cast is performed according to Chapel’s casting rules and is NOT safe from overflows or underflows. The user must ensure that the target dtype has the precision and capacity to hold the desired result.

Examples

>>> ak.cast(ak.linspace(1.0,5.0,5), dt=ak.int64)
array([1, 2, 3, 4, 5])

>>> ak.cast(ak.arange(0,5), dt=ak.float64).dtype
dtype('float64')

>>> ak.cast(ak.arange(0,5), dt=ak.bool_)
array([False, True, True, True, True])

>>> ak.cast(ak.linspace(0,4,5), dt=ak.bool_)
array([False, True, True, True, True])

Reshape¶

Using the .reshape method, a multi-dimension view of a pdarray will be returned as an ArrayView

The pdarray class¶

Data Type¶

Rank¶

Name¶

Operators¶

Iteration¶

Type Casting¶

Reshape¶

The `pdarray` class¶