The pdarray class¶
Just as the backbone of NumPy is the ndarray, the backbone of arkouda is an array class called pdarray. And just as the ndarray object is a Python wrapper for C-style data with C and Fortran methods, the pdarray object is a Python wrapper for distributed data with parallel methods written in Chapel. The API of pdarray is similar, but not identical, to that of ndarray.
- class arkouda.pdarray(name, mydtype, size, ndim, shape, itemsize, max_bits=None)[source]
The basic arkouda array class. This class contains only the attributes of the array; the data resides on the arkouda server. When a server operation results in a new array, arkouda will create a pdarray instance that points to the array data on the server. As such, the user should not initialize pdarray instances directly.
- name
The server-side identifier for the array
- Type:
- dtype
The element dtype of the array
- Type:
- size
The number of elements in the array
- Type:
- ndim
The rank of the array
- Type:
- shape
A tuple containing the sizes of each dimension of the array
- Type:
Tuple[int, …]
- itemsize
The size in bytes of each element
- Type:
Data Type¶
Currently, pdarray supports three user-facing data types (strings are exposed via a separate class, see Strings in Arkouda):
int64: 64-bit signed integerfloat64: IEEE 64-bit floating point numberbool: 8-bit boolean value
Arkouda inherits all of its data types from numpy. For example, ak.int64 is derived from np.int64.
Rank¶
Currently, a pdarray can only have rank 1. We plan to support sparse, multi-dimensional arrays via data structures incorporating rank-1 pdarray objects.
Name¶
The name attribute of an array is a string used by the arkouda server to identify the pdarray object in its symbol table. This name is chosen by the server, and the user should not overwrite it.
Operators¶
The pdarray class supports most Python special methods, including arithmetic, bitwise, and comparison operators.
Iteration¶
Iterating directly over a pdarray with for x in array is not supported to discourage transferring all array data from the arkouda server to the Python client since there is almost always a more array-oriented way to express an iterator-based computation. To force this transfer, use the to_ndarray function to return the pdarray as a numpy.ndarray. This transfer will raise an error if it exceeds the byte limit defined in ak.client.maxTransferBytes.
- arkouda.pdarray.to_ndarray(self)
Convert the array to a np.ndarray, transferring array data from the Arkouda server to client-side Python. Note: if the pdarray size exceeds client.maxTransferBytes, a RuntimeError is raised.
- Returns:
A numpy ndarray with the same attributes and data as the pdarray
- Return type:
ndarray- Raises:
RuntimeError – Raised if there is a server-side error thrown, if the pdarray size exceeds the built-in client.maxTransferBytes size limit, or if the bytes received does not match expected number of bytes
Notes
The number of bytes in the array cannot exceed
client.maxTransferBytes, otherwise aRuntimeErrorwill be raised. This is to protect the user from overflowing the memory of the system on which the Python client is running, under the assumption that the server is running on a distributed system with much more memory than the client. The user may override this limit by setting client.maxTransferBytes to a larger value, but proceed with caution.See also
array,tolistExamples
>>> import arkouda as ak >>> a = ak.arange(0, 5, 1) >>> a.to_ndarray() array([0, 1, 2, 3, 4]) >>> type(a.to_ndarray()) <class 'numpy.ndarray'>
Type Casting¶
Conversion between dtypes is sometimes implicit, as in the following example:
>>> a = ak.arange(10)
>>> b = 1.0 * a
>>> b.dtype
dtype('float64')
Explicit conversion is supported via the cast function.
- arkouda.cast(pda, dt, errors=ErrorMode.strict)[source]
- Overloads:
pda (pdarray), dt (StringDTypeTypes), errors (Literal[ErrorMode.strict, ErrorMode.ignore]) → Strings
pda (pdarray), dt (NumericDTypeTypes), errors (Literal[ErrorMode.strict, ErrorMode.ignore]) → pdarray
pda (Strings), dt (_Union[ArkoudaNumericTypes, BuiltinNumericTypes, np.dtype[Any], bigint]), errors (Literal[ErrorMode.return_validity]) → Tuple[pdarray, pdarray]
pda (Strings), dt (_Union[ArkoudaNumericTypes, BuiltinNumericTypes, np.dtype[Any], bigint]), errors (Literal[ErrorMode.strict, ErrorMode.ignore]) → pdarray
pda (Strings), dt (StringDTypeTypes), errors (Literal[ErrorMode.strict, ErrorMode.ignore]) → Strings
pda (Strings), dt (type[‘Categorical’]), errors (Literal[ErrorMode.strict, ErrorMode.ignore]) → Categorical
pda (Categorical), dt (StringDTypeTypes), errors (Literal[ErrorMode.strict, ErrorMode.ignore]) → Strings
pda (_Union[pdarray, numeric_scalars]), dt (_Union[ArkoudaNumericTypes, BuiltinNumericTypes, np.dtype[Any], bigint, None]), errors (Literal[ErrorMode.strict, ErrorMode.ignore]) → pdarray
pda (_Union[pdarray, Strings, ‘Categorical’, numeric_scalars]), dt (str), errors (Literal[ErrorMode.strict, ErrorMode.ignore]) → _Union[pdarray, Strings, ‘Categorical’]
Cast an array to another dtype.
- Parameters:
pda (
Union[pdarray,Strings,Categorical]) – The array of values to cast.dt (
Union[dtype,type,str,bigint]) – The target dtype to cast values to.errors (
ErrorMode) –Controls how errors are handled when casting strings to a numeric type (ignored for casts from numeric types).
strict: RaiseRuntimeErrorif any string cannot be converted.ignore: Never raise an error. Uninterpretable strings are converted toNaN(float64),-2**63(int64), zero (uint64 and uint8), orFalse(bool).return_validity: In addition to returning the same output as"ignore", also return a boolean array indicating where the cast was successful.
- Returns:
result (pdarray, Strings, or Categorical) – Array of values cast to the desired dtype.
validity (pdarray(bool), optional) – If
errors="return_validity"and the input isStrings, a second array is returned withTruewhere the cast succeeded andFalsewhere it failed.
Notes
The cast is performed according to Chapel’s casting rules and is not safe from overflows or underflows. The user must ensure that the target dtype has the precision and capacity to hold the desired result.
Examples
>>> import arkouda as ak >>> ak.cast(ak.linspace(1.0, 5.0, 5), dt=ak.int64) array([1 2 3 4 5])
>>> ak.cast(ak.arange(0, 5), dt=ak.float64).dtype dtype('float64')
>>> ak.cast(ak.arange(0, 5), dt=ak.bool_) array([False True True True True])
>>> ak.cast(ak.linspace(0, 4, 5), dt=ak.bool_) array([False True True True True])
Reshape¶
Using the .reshape method, a multi-dimension view of a pdarray will be returned as an ArrayView
- arkouda.pdarray.reshape(self, *shape)
- Overloads:
self, shape (Sequence[int_scalars]) → pdarray
self, shape (int_scalars) → pdarray
self, shape (pdarray) → pdarray
self, shape (np.ndarray) → pdarray
Gives a new shape to an array without changing its data.
- Parameters:
shape (
Union[int,int8,int16,int32,int64,uint8,uint16,uint32,uint64,Sequence[Union[int,int8,int16,int32,int64,uint8,uint16,uint32,uint64]],ndarray,pdarray]) – The new shape should be compatible with the original shape.- Returns:
a pdarray with the same data, reshaped to the new shape
- Return type:
Examples
>>> import arkouda as ak >>> a = ak.array([[3,2,1],[2,3,1]]) >>> a.reshape((3,2)) array([array([3 2]) array([1 2]) array([3 1])]) >>> a.reshape(3,2) array([array([3 2]) array([1 2]) array([3 1])]) >>> a.reshape((6,1)) array([array([3]) array([2]) array([1]) array([2]) array([3]) array([1])])
Notes
only available as a method, not as a standalone function, i.e., a.reshape(compatibleShape) is valid, but ak.reshape(a,compatibleShape) is not.