The pdarray class

Just as the backbone of NumPy is the ndarray, the backbone of arkouda is an array class called pdarray. And just as the ndarray object is a Python wrapper for C-style data with C and Fortran methods, the pdarray object is a Python wrapper for distributed data with parallel methods written in Chapel. The API of pdarray is similar, but not identical, to that of ndarray.

class arkouda.pdarray(name, mydtype, size, ndim, shape, itemsize, max_bits=None)[source]

The basic arkouda array class. This class contains only the attributies of the array; the data resides on the arkouda server. When a server operation results in a new array, arkouda will create a pdarray instance that points to the array data on the server. As such, the user should not initialize pdarray instances directly.

name

The server-side identifier for the array

Type:

str

dtype

The element type of the array

Type:

dtype

size

The number of elements in the array

Type:

int_scalars

ndim

The rank of the array (currently only rank 1 arrays supported)

Type:

int_scalars

shape

A list or tuple containing the sizes of each dimension of the array

Type:

Sequence[int]

itemsize

The size in bytes of each element

Type:

int_scalars

Data Type

Currently, pdarray supports three user-facing data types (strings are exposed via a separate class, see Strings in Arkouda):

  • int64: 64-bit signed integer

  • float64: IEEE 64-bit floating point number

  • bool: 8-bit boolean value

Arkouda inherits all of its data types from numpy. For example, ak.int64 is derived from np.int64.

Rank

Currently, a pdarray can only have rank 1. We plan to support sparse, multi-dimensional arrays via data structures incorporating rank-1 pdarray objects.

Name

The name attribute of an array is a string used by the arkouda server to identify the pdarray object in its symbol table. This name is chosen by the server, and the user should not overwrite it.

Operators

The pdarray class supports most Python special methods, including arithmetic, bitwise, and comparison operators.

Iteration

Iterating directly over a pdarray with for x in array is not supported to discourage transferring all array data from the arkouda server to the Python client since there is almost always a more array-oriented way to express an iterator-based computation. To force this transfer, use the to_ndarray function to return the pdarray as a numpy.ndarray. This transfer will raise an error if it exceeds the byte limit defined in ak.client.maxTransferBytes.

arkouda.pdarray.to_ndarray(self)

Convert the array to a np.ndarray, transferring array data from the Arkouda server to client-side Python. Note: if the pdarray size exceeds client.maxTransferBytes, a RuntimeError is raised.

Returns:

A numpy ndarray with the same attributes and data as the pdarray

Return type:

np.ndarray

Raises:

RuntimeError – Raised if there is a server-side error thrown, if the pdarray size exceeds the built-in client.maxTransferBytes size limit, or if the bytes received does not match expected number of bytes

Notes

The number of bytes in the array cannot exceed client.maxTransferBytes, otherwise a RuntimeError will be raised. This is to protect the user from overflowing the memory of the system on which the Python client is running, under the assumption that the server is running on a distributed system with much more memory than the client. The user may override this limit by setting client.maxTransferBytes to a larger value, but proceed with caution.

See also

array, to_list

Examples

>>> a = ak.arange(0, 5, 1)
>>> a.to_ndarray()
array([0, 1, 2, 3, 4])
>>> type(a.to_ndarray())
numpy.ndarray

Type Casting

Conversion between dtypes is sometimes implicit, as in the following example:

>>> a = ak.arange(10)
>>> b = 1.0 * a
>>> b.dtype
dtype('float64')

Explicit conversion is supported via the cast function.

arkouda.cast(pda, dt, errors=ErrorMode.strict)[source]

Cast an array to another dtype.

Parameters:
  • pda (pdarray or Strings) – The array of values to cast

  • dt (np.dtype, type, or str) – The target dtype to cast values to

  • errors ({strict, ignore, return_validity}) –

    Controls how errors are handled when casting strings to a numeric type (ignored for casts from numeric types).

    • strict: raise RuntimeError if any string cannot be converted

    • ignore: never raise an error. Uninterpretable strings get

      converted to NaN (float64), -2**63 (int64), zero (uint64 and uint8), or False (bool)

    • return_validity: in addition to returning the same output as “ignore”, also return a bool array indicating where the cast was successful.

Return type:

Union[pdarray, Strings, TypeVar(Categorical), Tuple[pdarray, pdarray]]

Returns:

  • pdarray or Strings – Array of values cast to desired dtype

  • [validity (pdarray(bool)]) – If errors=”return_validity” and input is Strings, a second array is returned with True where the cast succeeded and False where it failed.

Notes

The cast is performed according to Chapel’s casting rules and is NOT safe from overflows or underflows. The user must ensure that the target dtype has the precision and capacity to hold the desired result.

Examples

>>> ak.cast(ak.linspace(1.0,5.0,5), dt=ak.int64)
array([1, 2, 3, 4, 5])
>>> ak.cast(ak.arange(0,5), dt=ak.float64).dtype
dtype('float64')
>>> ak.cast(ak.arange(0,5), dt=ak.bool_)
array([False, True, True, True, True])
>>> ak.cast(ak.linspace(0,4,5), dt=ak.bool_)
array([False, True, True, True, True])

Reshape

Using the .reshape method, a multi-dimension view of a pdarray will be returned as an ArrayView