The pdarray
class¶
Just as the backbone of NumPy is the ndarray
, the backbone of arkouda is an array class called pdarray
. And just as the ndarray
object is a Python wrapper for C-style data with C and Fortran methods, the pdarray
object is a Python wrapper for distributed data with parallel methods written in Chapel. The API of pdarray
is similar, but not identical, to that of ndarray
.
- class arkouda.pdarray(name, mydtype, size, ndim, shape, itemsize, max_bits=None)[source]¶
The basic arkouda array class. This class contains only the attributies of the array; the data resides on the arkouda server. When a server operation results in a new array, arkouda will create a pdarray instance that points to the array data on the server. As such, the user should not initialize pdarray instances directly.
- name¶
The server-side identifier for the array
- Type:
str
- dtype¶
The element type of the array
- Type:
dtype
- size¶
The number of elements in the array
- Type:
- ndim¶
The rank of the array (currently only rank 1 arrays supported)
- Type:
- shape¶
A list or tuple containing the sizes of each dimension of the array
- Type:
Sequence[int]
- itemsize¶
The size in bytes of each element
- Type:
Data Type¶
Currently, pdarray
supports three user-facing data types (strings are exposed via a separate class, see Strings in Arkouda):
int64
: 64-bit signed integerfloat64
: IEEE 64-bit floating point numberbool
: 8-bit boolean value
Arkouda inherits all of its data types from numpy. For example, ak.int64
is derived from np.int64
.
Rank¶
Currently, a pdarray
can only have rank 1. We plan to support sparse, multi-dimensional arrays via data structures incorporating rank-1 pdarray
objects.
Name¶
The name
attribute of an array is a string used by the arkouda server to identify the pdarray
object in its symbol table. This name is chosen by the server, and the user should not overwrite it.
Operators¶
The pdarray
class supports most Python special methods, including arithmetic, bitwise, and comparison operators.
Iteration¶
Iterating directly over a pdarray
with for x in array
is not supported to discourage transferring all array data from the arkouda server to the Python client since there is almost always a more array-oriented way to express an iterator-based computation. To force this transfer, use the to_ndarray
function to return the pdarray
as a numpy.ndarray
. This transfer will raise an error if it exceeds the byte limit defined in ak.client.maxTransferBytes
.
- arkouda.pdarray.to_ndarray(self)¶
Convert the array to a np.ndarray, transferring array data from the Arkouda server to client-side Python. Note: if the pdarray size exceeds client.maxTransferBytes, a RuntimeError is raised.
- Returns:
A numpy ndarray with the same attributes and data as the pdarray
- Return type:
np.ndarray
- Raises:
RuntimeError – Raised if there is a server-side error thrown, if the pdarray size exceeds the built-in client.maxTransferBytes size limit, or if the bytes received does not match expected number of bytes
Notes
The number of bytes in the array cannot exceed
client.maxTransferBytes
, otherwise aRuntimeError
will be raised. This is to protect the user from overflowing the memory of the system on which the Python client is running, under the assumption that the server is running on a distributed system with much more memory than the client. The user may override this limit by setting client.maxTransferBytes to a larger value, but proceed with caution.See also
array
,to_list
Examples
>>> a = ak.arange(0, 5, 1) >>> a.to_ndarray() array([0, 1, 2, 3, 4])
>>> type(a.to_ndarray()) numpy.ndarray
Type Casting¶
Conversion between dtypes is sometimes implicit, as in the following example:
>>> a = ak.arange(10)
>>> b = 1.0 * a
>>> b.dtype
dtype('float64')
Explicit conversion is supported via the cast
function.
- arkouda.cast(pda, dt, errors=ErrorMode.strict)[source]¶
Cast an array to another dtype.
- Parameters:
dt (np.dtype, type, or str) – The target dtype to cast values to
errors ({strict, ignore, return_validity}) –
Controls how errors are handled when casting strings to a numeric type (ignored for casts from numeric types).
strict: raise RuntimeError if any string cannot be converted
- ignore: never raise an error. Uninterpretable strings get
converted to NaN (float64), -2**63 (int64), zero (uint64 and uint8), or False (bool)
return_validity: in addition to returning the same output as “ignore”, also return a bool array indicating where the cast was successful.
- Return type:
Union
[pdarray
,Strings
,TypeVar
(Categorical
),Tuple
[pdarray
,pdarray
]]- Returns:
pdarray or Strings – Array of values cast to desired dtype
[validity (pdarray(bool)]) – If errors=”return_validity” and input is Strings, a second array is returned with True where the cast succeeded and False where it failed.
Notes
The cast is performed according to Chapel’s casting rules and is NOT safe from overflows or underflows. The user must ensure that the target dtype has the precision and capacity to hold the desired result.
Examples
>>> ak.cast(ak.linspace(1.0,5.0,5), dt=ak.int64) array([1, 2, 3, 4, 5])
>>> ak.cast(ak.arange(0,5), dt=ak.float64).dtype dtype('float64')
>>> ak.cast(ak.arange(0,5), dt=ak.bool_) array([False, True, True, True, True])
>>> ak.cast(ak.linspace(0,4,5), dt=ak.bool_) array([False, True, True, True, True])
Reshape¶
Using the .reshape
method, a multi-dimension view of a pdarray will be returned as an ArrayView