arkouda.pandas.extension¶
Experimental pandas extension types backed by Arkouda arrays.
This subpackage provides experimental implementations of :pandas:`pandas.api.extensions.ExtensionArray` and corresponding extension dtypes that wrap Arkouda distributed arrays.
These classes make it possible to use Arkouda arrays inside pandas
objects such as Series and DataFrame. They aim to provide
familiar pandas semantics while leveraging Arkouda’s distributed,
high-performance backend.
Warning
This module is experimental. The API is not stable and may change without notice between releases. Use with caution in production environments.
Classes¶
Arkouda-backed numeric/bool pandas ExtensionArray. |
|
Arkouda-backed arbitrary-precision integer dtype. |
|
Arkouda-backed boolean dtype. |
|
Arkouda-backed categorical pandas ExtensionArray. |
|
Arkouda-backed categorical dtype. |
|
Arkouda DataFrame accessor. |
|
Abstract base class for custom 1-D array types. |
|
Arkouda-backed 64-bit floating-point dtype. |
|
Arkouda-backed index accessor for pandas |
|
Extension dtype for Arkouda-backed 64-bit integers. |
|
Arkouda-backed Series accessor. |
|
Arkouda-backed string pandas ExtensionArray. |
|
Arkouda-backed string dtype. |
|
Arkouda-backed unsigned 64-bit integer dtype. |
|
Arkouda-backed unsigned 8-bit integer dtype. |
Package Contents¶
- class arkouda.pandas.extension.ArkoudaArray(data: arkouda.numpy.pdarrayclass.pdarray | numpy.ndarray | Sequence[Any] | ArkoudaArray, dtype: Any = None, copy: bool = False)[source]¶
Bases:
arkouda.pandas.extension._arkouda_extension_array.ArkoudaExtensionArray,pandas.api.extensions.ExtensionArrayArkouda-backed numeric/bool pandas ExtensionArray.
Wraps or converts supported inputs into an Arkouda
pdarrayto serve as the backing store. Ensures the underlying array is 1-D and lives on the Arkouda server.- Parameters:
data (pdarray | ndarray | Sequence[Any] | ArkoudaArray) –
Input to wrap or convert. - If an Arkouda
pdarray, it is used directly unlessdtypeis givenor
copy=True, in which case a new array is created viaak.array.If a NumPy array, it is transferred to Arkouda via
ak.array.If a Python sequence, it is converted to NumPy then to Arkouda.
If another
ArkoudaArray, its underlyingpdarrayis reused.
dtype (Any, optional) – Desired dtype to cast to (NumPy dtype or Arkouda dtype string). If omitted, dtype is inferred from
data.copy (bool) – If True, attempt to copy the underlying data when converting/wrapping. Default is False.
- Raises:
TypeError – If
datacannot be interpreted as an Arkouda array-like object.ValueError – If the resulting array is not one-dimensional.
- default_fill_value¶
Sentinel used when filling missing values (default: -1).
- Type:
int
Examples
>>> import arkouda as ak >>> from arkouda.pandas.extension import ArkoudaArray >>> ArkoudaArray(ak.arange(5)) ArkoudaArray([0 1 2 3 4]) >>> ArkoudaArray([10, 20, 30]) ArkoudaArray([10 20 30])
- all(axis=0, skipna=True, **kwargs)[source]¶
Return whether all elements are True.
This is mainly to support pandas’ BaseExtensionArray.equals, which calls .all() on the result of a boolean expression.
- any(axis=0, skipna=True, **kwargs)[source]¶
Return whether any element is True.
Added for symmetry with .all() and to support potential pandas boolean-reduction calls.
- astype(dtype: numpy.dtype[Any], copy: bool = True) numpy.typing.NDArray[Any][source]¶
- astype(dtype: pandas.core.dtypes.dtypes.ExtensionDtype, copy: bool = True) pandas.api.extensions.ExtensionArray
- astype(dtype: Any, copy: bool = True) pandas.api.extensions.ExtensionArray | numpy.typing.NDArray[Any]
Cast the array to a specified dtype.
Casting rules:
If
dtyperequestsobject, returns a NumPyNDArray[Any]of dtypeobjectcontaining the array values.Otherwise, the target dtype is normalized using Arkouda’s dtype resolution rules.
If the normalized dtype matches the current dtype and
copy=False, returnsself.In all other cases, casts the underlying Arkouda array to the target dtype and returns an Arkouda-backed
ArkoudaExtensionArray.
- Parameters:
dtype (Any) – Target dtype. May be a NumPy dtype, pandas dtype, Arkouda dtype, or any dtype-like object accepted by Arkouda.
copy (bool) – Whether to force a copy when the target dtype matches the current dtype. Default is True.
- Returns:
The cast result. Returns a NumPy array only when casting to
object; otherwise returns an Arkouda-backed ExtensionArray.- Return type:
Union[ExtensionArray, NDArray[Any]]
Examples
Basic numeric casting returns an Arkouda-backed array:
>>> import arkouda as ak >>> from arkouda.pandas.extension import ArkoudaArray >>> a = ArkoudaArray(ak.array([1, 2, 3], dtype="int64")) >>> a.astype("float64").to_ndarray() array([1., 2., 3.])
Casting to the same dtype with
copy=Falsereturns the original object:>>> b = a.astype("int64", copy=False) >>> b is a True
Forcing a copy when the dtype is unchanged returns a new array:
>>> c = a.astype("int64", copy=True) >>> c is a False >>> c.to_ndarray() array([1, 2, 3])
Casting to
objectmaterializes the data to a NumPy array:>>> a.astype(object) array([1, 2, 3], dtype=object)
NumPy and pandas dtype objects are also accepted:
>>> import numpy as np >>> a.astype(np.dtype("bool")).to_ndarray() array([ True, True, True])
- default_fill_value: int = -1¶
- property dtype¶
An instance of ExtensionDtype.
See also
api.extensions.ExtensionDtypeBase class for extension dtypes.
api.extensions.ExtensionArrayBase class for extension array types.
api.extensions.ExtensionArray.dtypeThe dtype of an ExtensionArray.
Series.dtypeThe dtype of a Series.
DataFrame.dtypeThe dtype of a DataFrame.
Examples
>>> pd.array([1, 2, 3]).dtype Int64Dtype()
- equals(other)[source]¶
Return if another array is equivalent to this array.
Equivalent means that both arrays have the same shape and dtype, and all values compare equal. Missing values in the same location are considered equal (in contrast with normal equality).
- Parameters:
other (ExtensionArray) – Array to compare to this Array.
- Returns:
Whether the arrays are equivalent.
- Return type:
boolean
See also
numpy.array_equalEquivalent method for numpy array.
Series.equalsEquivalent method for Series.
DataFrame.equalsEquivalent method for DataFrame.
Examples
>>> arr1 = pd.array([1, 2, np.nan]) >>> arr2 = pd.array([1, 2, np.nan]) >>> arr1.equals(arr2) True
>>> arr1 = pd.array([1, 3, np.nan]) >>> arr2 = pd.array([1, 2, np.nan]) >>> arr1.equals(arr2) False
- isna() numpy.ndarray[source]¶
Return a boolean mask indicating missing values.
This method implements the pandas ExtensionArray.isna contract and always returns a NumPy ndarray of dtype
boolwith the same length as the array.- Returns:
A boolean mask where
Truemarks elements considered missing.- Return type:
np.ndarray
- Raises:
TypeError – If the underlying data buffer does not support missing-value detection or cannot produce a boolean mask.
- property nbytes¶
The number of bytes needed to store this object in memory.
See also
ExtensionArray.shapeReturn a tuple of the array dimensions.
ExtensionArray.sizeThe number of elements in the array.
Examples
>>> pd.array([1, 2, 3]).nbytes 27
- value_counts(dropna: bool = True) pandas.Series[source]¶
Return counts of unique values as a pandas Series.
This method computes the frequency of each distinct value in the underlying Arkouda array and returns the result as a pandas
Series, with the unique values as the index and their counts as the data.- Parameters:
dropna (bool) – Whether to exclude missing values. Currently, missing-value handling is supported only for floating-point data, where
NaNvalues are treated as missing. Default is True.- Returns:
A Series containing the counts of unique values. The index is an
ArkoudaArrayof unique values, and the values are anArkoudaArrayof counts.- Return type:
pd.Series
Notes
Only
dropna=Trueis supported.The following pandas options are not yet implemented:
normalize,sort, andbins.Counting is performed server-side in Arkouda; only the small result (unique values and counts) is materialized on the client.
Examples
>>> import arkouda as ak >>> from arkouda.pandas.extension import ArkoudaArray >>> >>> a = ArkoudaArray(ak.array([1, 2, 1, 3, 2, 1])) >>> a.value_counts() 1 3 2 2 3 1 dtype: int64
Floating-point data with NaN values:
>>> b = ArkoudaArray(ak.array([1.0, 2.0, float("nan"), 1.0])) >>> b.value_counts() 1.0 2 2.0 1 dtype: int64
- class arkouda.pandas.extension.ArkoudaBigintDtype[source]¶
Bases:
_ArkoudaBaseDtypeArkouda-backed arbitrary-precision integer dtype.
This dtype integrates Arkouda’s server-backed
pdarray<bigint>with the pandas ExtensionArray interface viaArkoudaArray. It enables pandas objects (Series, DataFrame) to hold and operate on very large integers that exceed 64-bit precision, while keeping the data distributed on the Arkouda server.- construct_array_type()[source]¶
Returns the
ArkoudaArrayclass used for storage.
- classmethod construct_array_type()[source]¶
Return the ExtensionArray subclass that handles storage for this dtype.
- Returns:
The
ArkoudaArrayclass associated with this dtype.- Return type:
- kind = 'O'¶
A character code (one of ‘biufcmMOSUV’), default ‘O’
This should match the NumPy dtype used when the array is converted to an ndarray, which is probably ‘O’ for object if the extension type cannot be represented as a built-in NumPy type.
See also
numpy.dtype.kind
- na_value = -1¶
Default NA value to use for this type.
This is used in e.g. ExtensionArray.take. This should be the user-facing “boxed” version of the NA value, not the physical NA value for storage. e.g. for JSONArray, this is an empty dictionary.
- name = 'bigint'¶
A string identifying the data type.
Will be used for display in, e.g.
Series.dtype
- type¶
The scalar type for the array, e.g.
intIt’s expected
ExtensionArray[item]returns an instance ofExtensionDtype.typefor scalaritem, assuming that value is valid (not NA). NA values do not need to be instances of type.
- class arkouda.pandas.extension.ArkoudaBoolDtype[source]¶
Bases:
_ArkoudaBaseDtypeArkouda-backed boolean dtype.
This dtype integrates Arkouda’s server-backed pdarray<bool> with the pandas ExtensionArray interface via
ArkoudaArray. It allows pandas objects (Series, DataFrame) to store and manipulate distributed boolean arrays without materializing them on the client.- construct_array_type()[source]¶
Returns the
ArkoudaArrayclass used for storage.
- classmethod construct_array_type()[source]¶
Return the ExtensionArray subclass that handles storage for this dtype.
- Returns:
The
ArkoudaArrayclass associated with this dtype.- Return type:
- kind = 'b'¶
A character code (one of ‘biufcmMOSUV’), default ‘O’
This should match the NumPy dtype used when the array is converted to an ndarray, which is probably ‘O’ for object if the extension type cannot be represented as a built-in NumPy type.
See also
numpy.dtype.kind
- na_value = False¶
Default NA value to use for this type.
This is used in e.g. ExtensionArray.take. This should be the user-facing “boxed” version of the NA value, not the physical NA value for storage. e.g. for JSONArray, this is an empty dictionary.
- name = 'bool_'¶
A string identifying the data type.
Will be used for display in, e.g.
Series.dtype
- type¶
The scalar type for the array, e.g.
intIt’s expected
ExtensionArray[item]returns an instance ofExtensionDtype.typefor scalaritem, assuming that value is valid (not NA). NA values do not need to be instances of type.
- class arkouda.pandas.extension.ArkoudaCategorical(data: arkouda.pandas.categorical.Categorical | ArkoudaCategorical | numpy.ndarray | Sequence[Any])[source]¶
Bases:
arkouda.pandas.extension._arkouda_extension_array.ArkoudaExtensionArray,pandas.api.extensions.ExtensionArrayArkouda-backed categorical pandas ExtensionArray.
Ensures the underlying data is an Arkouda
Categorical. Accepts an existingCategoricalor converts from Python/NumPy sequences of labels.- Parameters:
data (Categorical | ArkoudaCategorical | ndarray | Sequence[Any]) – Input to wrap or convert. - If
Categorical, used directly. - If anotherArkoudaCategorical, its backing object is reused. - If list/tuple/ndarray, converted viaak.Categorical(ak.array(data)).- Raises:
TypeError – If
datacannot be converted to ArkoudaCategorical.
- astype(dtype: numpy.dtype[Any], copy: bool = True) numpy.typing.NDArray[Any][source]¶
- astype(dtype: pandas.core.dtypes.dtypes.ExtensionDtype, copy: bool = True) pandas.api.extensions.ExtensionArray
- astype(dtype: Any, copy: bool = True) pandas.api.extensions.ExtensionArray | numpy.typing.NDArray[Any]
Cast to a specified dtype.
If
dtypeis categorical (pandascategory/CategoricalDtype/ArkoudaCategoricalDtype), returns an Arkouda-backedArkoudaCategorical(optionally copied).If
dtyperequestsobject, returns a NumPyndarrayof dtype object containing the category labels (materialized to the client).If
dtyperequests a string dtype, returns an Arkouda-backedArkoudaStringArraycontaining the labels as strings.Otherwise, casts the labels (as strings) to the requested dtype and returns an Arkouda-backed ExtensionArray.
- Parameters:
dtype (Any) – Target dtype.
copy (bool) – Whether to force a copy when possible. If categorical-to-categorical and
copy=True, attempts to copy the underlying ArkoudaCategorical(if supported). Default is True.
- Returns:
The cast result. Returns a NumPy array only when casting to
object; otherwise returns an Arkouda-backed ExtensionArray.- Return type:
Union[ExtensionArray, NDArray[Any]]
Examples
Casting to
categoryreturns an Arkouda-backed categorical array:>>> import arkouda as ak >>> from arkouda.pandas.extension import ArkoudaCategorical >>> c = ArkoudaCategorical(ak.Categorical(ak.array(["x", "y", "x"]))) >>> out = c.astype("category") >>> out is c False
Forcing a copy when casting to the same categorical dtype returns a new array:
>>> out2 = c.astype("category", copy=True) >>> out2 is c False >>> out2.to_ndarray() array(['x', 'y', 'x'], dtype='<U...')
Casting to
objectmaterializes the category labels to a NumPy object array:>>> c.astype(object) array(['x', 'y', 'x'], dtype=object)
Casting to a string dtype returns an Arkouda-backed string array of labels:
>>> s = c.astype("string") >>> s.to_ndarray() array(['x', 'y', 'x'], dtype='<U1')
Casting to another dtype casts the labels-as-strings and returns an Arkouda-backed array:
>>> c_num = ArkoudaCategorical(ak.Categorical(ak.array(["1", "2", "3"]))) >>> a = c_num.astype("int64") >>> a.to_ndarray() array([1, 2, 3])
- property dtype¶
An instance of ExtensionDtype.
See also
api.extensions.ExtensionDtypeBase class for extension dtypes.
api.extensions.ExtensionArrayBase class for extension array types.
api.extensions.ExtensionArray.dtypeThe dtype of an ExtensionArray.
Series.dtypeThe dtype of a Series.
DataFrame.dtypeThe dtype of a DataFrame.
Examples
>>> pd.array([1, 2, 3]).dtype Int64Dtype()
- isna() numpy.ndarray[source]¶
# Return a boolean mask indicating missing values.
# This implements the pandas ExtensionArray.isna contract and returns a # NumPy ndarray[bool] of the same length as this categorical array.
# Returns # ——- # np.ndarray # Boolean mask where True indicates a missing value.
# Raises # —— # TypeError # If the underlying categorical cannot expose its codes or if missing # detection is unsupported. #
- value_counts(dropna: bool = True) pandas.Series[source]¶
Return counts of categories as a pandas Series.
This method computes category frequencies from the underlying Arkouda
Categoricaland returns them as a pandasSeries, where the index contains the category labels and the values contain the corresponding counts.- Parameters:
dropna (bool) – Whether to drop missing values from the result. When
True, the result is filtered using the categorical’sna_value. WhenFalse, all categories returned by the underlying computation are included. Default is True.- Returns:
A Series containing category counts. The index is an
ArkoudaStringArrayof category labels and the values are anArkoudaArrayof counts.- Return type:
pd.Series
Notes
The result is computed server-side in Arkouda; only the (typically small) output of categories and counts is materialized for the pandas
Series.This method does not yet support pandas options such as
normalize,sort, orbins.The handling of missing values depends on the Arkouda
Categoricaldefinition ofna_value.
Examples
>>> import arkouda as ak >>> from arkouda.pandas.extension import ArkoudaCategorical >>> >>> a = ArkoudaCategorical(["a", "b", "a", "c", "b", "a"]) >>> a.value_counts() a 3 b 2 c 1 dtype: int64
- class arkouda.pandas.extension.ArkoudaCategoricalDtype[source]¶
Bases:
_ArkoudaBaseDtypeArkouda-backed categorical dtype.
This dtype integrates Arkouda’s distributed
Categoricaltype with the pandas ExtensionArray interface viaArkoudaCategorical. It enables pandas objects (Series, DataFrame) to hold categorical data stored and processed on the Arkouda server, while exposing familiar pandas APIs.- construct_array_type()[source]¶
Returns the
ArkoudaCategoricalused as the storage class.
- classmethod construct_array_type()[source]¶
Return the ExtensionArray subclass that handles storage for this dtype.
- Returns:
The
ArkoudaCategoricalclass associated with this dtype.- Return type:
- kind = 'O'¶
A character code (one of ‘biufcmMOSUV’), default ‘O’
This should match the NumPy dtype used when the array is converted to an ndarray, which is probably ‘O’ for object if the extension type cannot be represented as a built-in NumPy type.
See also
numpy.dtype.kind
- na_value = -1¶
Default NA value to use for this type.
This is used in e.g. ExtensionArray.take. This should be the user-facing “boxed” version of the NA value, not the physical NA value for storage. e.g. for JSONArray, this is an empty dictionary.
- name = 'category'¶
A string identifying the data type.
Will be used for display in, e.g.
Series.dtype
- type¶
The scalar type for the array, e.g.
intIt’s expected
ExtensionArray[item]returns an instance ofExtensionDtype.typefor scalaritem, assuming that value is valid (not NA). NA values do not need to be instances of type.
- class arkouda.pandas.extension.ArkoudaDataFrameAccessor(pandas_obj)[source]¶
Arkouda DataFrame accessor.
Allows
df.akaccess to Arkouda-backed operations.- collect() pandas.DataFrame[source]¶
Materialize an Arkouda-backed pandas DataFrame into a NumPy-backed one.
This operation retrieves each Arkouda-backed column from the server using
to_ndarray()and constructs a standard pandas DataFrame whose columns are plain NumPyndarrayobjects. The returned DataFrame has no dependency on Arkouda.- Returns:
A pandas DataFrame with NumPy-backed columns.
- Return type:
pd_DataFrame
Examples
Converting an Arkouda-backed DataFrame into a NumPy-backed one:
>>> import pandas as pd >>> import arkouda as ak >>> from arkouda.pandas.extension import ArkoudaDataFrameAccessor
Create a pandas DataFrame and convert it to Arkouda-backed form:
>>> df = pd.DataFrame({"x": [1, 2, 3], "y": ["a", "b", "c"]}) >>> akdf = df.ak.to_ak()
akdfis still a pandas DataFrame, but its columns live on Arkouda:>>> type(akdf["x"].array) <class 'arkouda.pandas.extension._arkouda_array.ArkoudaArray'>
Now fully materialize it to local NumPy arrays:
>>> collected = akdf.ak.collect() >>> collected x y 0 1 a 1 2 b 2 3 c
The columns are now NumPy arrays:
>>> type(collected["x"].values) <class 'numpy.ndarray'>
- static from_ak_legacy(akdf: arkouda.pandas.dataframe.DataFrame) pandas.DataFrame[source]¶
Convert a legacy Arkouda
DataFrameinto a pandasDataFramebacked by Arkouda ExtensionArrays.This is the zero-copy-ish counterpart to
to_ak_legacy(). Instead of materializing columns into NumPy arrays, this function wraps each underlying Arkouda server-side array in the appropriateArkoudaExtensionArraysubclass (ArkoudaArray,ArkoudaStringArray, orArkoudaCategorical). The resulting pandasDataFrametherefore keeps all data on the Arkouda server, enabling scalable operations without transferring data to the Python client.- Parameters:
akdf (ak_DataFrame) – A legacy Arkouda
DataFrame(arkouda.pandas.dataframe.DataFrame) whose columns are Arkouda objects (pdarray,Strings, orCategorical).- Returns:
A pandas
DataFramein which each column is an Arkouda-backed ExtensionArray—typically one of:No materialization to NumPy occurs. All column data remain server-resident.
- Return type:
pd_DataFrame
Notes
This function performs a zero-copy conversion for the underlying Arkouda arrays (server-side). Only lightweight Python wrappers are created.
The resulting pandas
DataFramecan interoperate with most pandas APIs that support extension arrays.Round-tripping through
to_ak_legacy()andfrom_ak_legacy()preserves Arkouda semantics.
Examples
Basic conversion¶
>>> import arkouda as ak >>> akdf = ak.DataFrame({"a": ak.arange(5), "b": ak.array([10,11,12,13,14])})
>>> pdf = pd.DataFrame.ak.from_ak_legacy(akdf) >>> pdf a b 0 0 10 1 1 11 2 2 12 3 3 13 4 4 14
Columns stay Arkouda-backed¶
>>> type(pdf["a"].array) <class 'arkouda.pandas.extension._arkouda_array.ArkoudaArray'>
>>> pdf["a"].array._data array([0 1 2 3 4])
No NumPy materialization occurs¶
>>> pdf["a"].values # pandas always materializes .values ArkoudaArray([0 1 2 3 4])
But the underlying column is still Arkouda: >>> pdf[“a”].array._data array([0 1 2 3 4])
Categorical and Strings columns work as well¶
>>> akdf2 = ak.DataFrame({ ... "s": ak.array(["a","b","a"]), ... "c": ak.Categorical(ak.array(["e","f","g"])) ... }) >>> pdf2 = pd.DataFrame.ak.from_ak_legacy(akdf2)
>>> type(pdf2["s"].array) <class 'arkouda.pandas.extension._arkouda_string_array.ArkoudaStringArray'>
>>> type(pdf2["c"].array) <class 'arkouda.pandas.extension._arkouda_categorical_array.ArkoudaCategorical'>
- merge(right: pandas.DataFrame, on: str | List[str] | None = None, left_on: str | List[str] | None = None, right_on: str | List[str] | None = None, how: str = 'inner', left_suffix: str = '_x', right_suffix: str = '_y', convert_ints: bool = True, sort: bool = True) pandas.DataFrame[source]¶
Merge two Arkouda-backed pandas DataFrames using Arkouda’s join.
- Parameters:
right (pd.DataFrame) – Right-hand DataFrame to merge with
self._obj. All columns must be Arkouda-backed ExtensionArrays.on (Optional[Union[str, List[str]]]) – Column name(s) to join on. Must be present in both left and right DataFrames. If not provided and neither
left_onnorright_onis set, the intersection of column names in left and right is used. Default is None.left_on (Optional[Union[str, List[str]]]) – Column name(s) from the left DataFrame to use as join keys. Must be used together with
right_on. If provided,onis ignored for the left side. Default is Noneright_on (Optional[Union[str, List[str]]]) – Column name(s) from the right DataFrame to use as join keys. Must be used together with
left_on. If provided,onis ignored for the right side. Default is Nonehow (str) – Type of merge to be performed. One of
'left','right','inner', or'outer'. Default is ‘inner’.left_suffix (str) – Suffix to apply to overlapping column names from the left frame that are not part of the join keys. Default is ‘_x’.
right_suffix (str) – Suffix to apply to overlapping column names from the right frame that are not part of the join keys.Default is ‘_y’.
convert_ints (bool) – Whether to allow Arkouda to upcast integer columns as needed (for example, to accommodate missing values) during the merge. Default is True.
sort (bool) – Whether to sort the join keys in the output. Default is True.
- Returns:
A pandas DataFrame whose columns are
ArkoudaArrayExtensionArrays. All column data remain on the Arkouda server.- Return type:
pd.DataFrame
- Raises:
TypeError – If
rightis not apandas.DataFrameor if any column in the left or right DataFrame is not Arkouda-backed.
- to_ak() pandas.DataFrame[source]¶
Convert this pandas DataFrame to an Arkouda-backed pandas DataFrame.
Each column of the original pandas DataFrame is materialized to the Arkouda server via
ak.array()and wrapped in anArkoudaArrayExtensionArray. The result is still a pandas DataFrame, but all column data reside on the Arkouda server and behave according to the Arkouda ExtensionArray API.This method does not return a legacy
ak_DataFrame. For that (server-side DataFrame structure), useto_ak_legacy().- Returns:
A pandas DataFrame whose columns are Arkouda-backed
ArkoudaArrayobjects.- Return type:
pd_DataFrame
Examples
Convert a plain pandas DataFrame to an Arkouda-backed one:
>>> import pandas as pd >>> import arkouda as ak >>> df = pd.DataFrame({"x": [1, 2, 3], "y": ["a", "b", "c"]}) >>> akdf = df.ak.to_ak() >>> type(akdf) <class 'pandas...DataFrame'>
The columns are now Arkouda ExtensionArrays:
>>> isinstance(akdf["x"].array, ArkoudaArray) True >>> akdf["x"].tolist() [np.int64(1), np.int64(2), np.int64(3)]
Arkouda operations work directly on the columns:
>>> akdf["x"].array._data + 10 array([11 12 13])
Converting back to a NumPy-backed DataFrame:
>>> akdf_numpy = akdf.ak.collect() >>> akdf_numpy x y 0 1 a 1 2 b 2 3 c
- to_ak_legacy() arkouda.pandas.dataframe.DataFrame[source]¶
Convert this pandas DataFrame into the legacy
arkouda.DataFrame.This method performs a materializing conversion of a pandas DataFrame into the legacy Arkouda DataFrame structure. Every column is converted to Arkouda server-side data:
Python / NumPy numeric and boolean arrays become
pdarray.String columns become Arkouda string arrays (
Strings).Pandas categoricals become Arkouda
Categoricalobjects.The result is a legacy
ak_DataFramewhose columns all reside on the Arkouda server.
This differs from
to_ak(), which creates Arkouda-backed ExtensionArrays but retains a pandas.DataFrame structure.- Returns:
The legacy Arkouda DataFrame with all columns materialized onto the Arkouda server.
- Return type:
ak_DataFrame
Examples
Convert a plain pandas DataFrame to a legacy Arkouda DataFrame:
>>> import pandas as pd >>> import arkouda as ak >>> df = pd.DataFrame({ ... "i": [1, 2, 3], ... "s": ["a", "b", "c"], ... "c": pd.Series(["low", "low", "high"], dtype="category"), ... }) >>> akdf = df.ak.to_ak_legacy() >>> type(akdf) <class 'arkouda.pandas.dataframe.DataFrame'>
Columns have the appropriate Arkouda types:
>>> from arkouda.numpy.pdarrayclass import pdarray >>> from arkouda.numpy.strings import Strings >>> from arkouda.pandas.categorical import Categorical >>> isinstance(akdf["i"], pdarray) True >>> isinstance(akdf["s"], Strings) True >>> isinstance(akdf["c"], Categorical) True
Values round-trip through the conversion:
>>> akdf["i"].tolist() [1, 2, 3]
- class arkouda.pandas.extension.ArkoudaExtensionArray(data)[source]¶
Bases:
pandas.api.extensions.ExtensionArrayAbstract base class for custom 1-D array types.
pandas will recognize instances of this class as proper arrays with a custom type and will not attempt to coerce them to objects. They may be stored directly inside a
DataFrameorSeries.- dtype¶
- nbytes¶
- ndim¶
- shape¶
- astype()¶
- dropna()¶
- fillna()¶
- equals()¶
- insert()¶
- isin()¶
- isna()¶
- item()¶
- ravel()¶
- repeat()¶
- searchsorted()¶
- shift()¶
- tolist()¶
- unique()¶
- _explode()¶
- _formatter()¶
- _hash_pandas_object()¶
- _reduce()¶
- _values_for_argsort()¶
See also
api.extensions.ExtensionDtypeA custom data type, to be paired with an ExtensionArray.
api.extensions.ExtensionArray.dtypeAn instance of ExtensionDtype.
Notes
The interface includes the following abstract methods that must be implemented by subclasses:
_from_sequence
_from_factorized
__getitem__
__len__
__eq__
dtype
nbytes
isna
take
copy
_concat_same_type
interpolate
A default repr displaying the type, (truncated) data, length, and dtype is provided. It can be customized or replaced by by overriding:
__repr__ : A default repr for the ExtensionArray.
_formatter : Print scalars inside a Series or DataFrame.
Some methods require casting the ExtensionArray to an ndarray of Python objects with
self.astype(object), which may be expensive. When performance is a concern, we highly recommend overriding the following methods:fillna
_pad_or_backfill
dropna
unique
factorize / _values_for_factorize
argsort, argmax, argmin / _values_for_argsort
searchsorted
map
The remaining methods implemented on this class should be performant, as they only compose abstract methods. Still, a more efficient implementation may be available, and these methods can be overridden.
One can implement methods to handle array accumulations or reductions.
_accumulate
_reduce
One can implement methods to handle parsing from strings that will be used in methods such as
pandas.io.parsers.read_csv._from_sequence_of_strings
This class does not inherit from ‘abc.ABCMeta’ for performance reasons. Methods and properties required by the interface raise
pandas.errors.AbstractMethodErrorand noregistermethod is provided for registering virtual subclasses.ExtensionArrays are limited to 1 dimension.
They may be backed by none, one, or many NumPy arrays. For example,
pandas.Categoricalis an extension array backed by two arrays, one for codes and one for categories. An array of IPv6 address may be backed by a NumPy structured array with two fields, one for the lower 64 bits and one for the upper 64 bits. Or they may be backed by some other storage type, like Python lists. Pandas makes no assumptions on how the data are stored, just that it can be converted to a NumPy array. The ExtensionArray interface does not impose any rules on how this data is stored. However, currently, the backing data cannot be stored in attributes called.valuesor._valuesto ensure full compatibility with pandas internals. But other names as.data,._data,._items, … can be freely used.If implementing NumPy’s
__array_ufunc__interface, pandas expects thatYou defer by returning
NotImplementedwhen any Series are present in inputs. Pandas will extract the arrays and call the ufunc again.You define a
_HANDLED_TYPEStuple as an attribute on the class. Pandas inspect this to determine whether the ufunc is valid for the types present.
See extending.extension.ufunc for more.
By default, ExtensionArrays are not hashable. Immutable subclasses may override this behavior.
Examples
Please see the following:
https://github.com/pandas-dev/pandas/blob/main/pandas/tests/extension/list/array.py
- abstractmethod argmax(axis=None, out=None)[source]¶
Return the index of maximum value.
In case of multiple occurrences of the maximum value, the index corresponding to the first occurrence is returned.
- Parameters:
skipna (bool, default True)
- Return type:
int
See also
ExtensionArray.argminReturn the index of the minimum value.
Examples
>>> arr = pd.array([3, 1, 2, 5, 4]) >>> arr.argmax() np.int64(3)
- abstractmethod argmin(axis=None, out=None)[source]¶
Return the index of minimum value.
In case of multiple occurrences of the minimum value, the index corresponding to the first occurrence is returned.
- Parameters:
skipna (bool, default True)
- Return type:
int
See also
ExtensionArray.argmaxReturn the index of the maximum value.
Examples
>>> arr = pd.array([3, 1, 2, 5, 4]) >>> arr.argmin() np.int64(1)
- argsort(*, ascending: bool = True, kind: str = 'quicksort', **kwargs: object) numpy.typing.NDArray[numpy.intp][source]¶
Return the indices that would sort the array.
This method computes the permutation indices that would sort the underlying Arkouda data and returns them as a NumPy array, in accordance with the pandas
ExtensionArraycontract. The indices can be used to reorder the array viatakeoriloc.For floating-point data,
NaNvalues are handled according to thena_positionkeyword argument.- Parameters:
ascending (bool, default True) – If True, sort values in ascending order. If False, sort in descending order.
kind (str, default "quicksort") – Sorting algorithm. Present for API compatibility with NumPy and pandas but currently ignored.
**kwargs –
Additional keyword arguments for compatibility. Supported keyword:
na_position: {“first”, “last”}, default “last” Where to placeNaNvalues in the sorted result. This option is currently only applied for floating-pointpdarraydata; forStringsandCategoricaldata it has no effect.
- Returns:
A 1D NumPy array of dtype
np.intpcontaining the indices that would sort the array.- Return type:
numpy.ndarray
- Raises:
ValueError – If
na_positionis not “first” or “last”.TypeError – If the underlying data type does not support sorting.
Notes
Supports Arkouda
pdarray,Strings, andCategoricaldata.For floating-point arrays,
NaNvalues are repositioned according tona_position.The sorting computation occurs on the Arkouda server, but the resulting permutation indices are materialized on the client as a NumPy array, as required by pandas internals.
Examples
>>> import arkouda as ak >>> from arkouda.pandas.extension import ArkoudaArray >>> a = ArkoudaArray(ak.array([3.0, float("nan"), 1.0])) >>> a.argsort() # NA last by default array([2, 0, 1]) >>> a.argsort(na_position="first") array([1, 2, 0])
- copy(deep: bool = True)[source]¶
Return a copy of the array.
- Parameters:
deep (bool, default True) –
Whether to make a deep copy of the underlying Arkouda data. - If
True, the underlying server-side array is duplicated. - IfFalse, a new ExtensionArray wrapper is created but theunderlying data is shared (no server-side copy).
- Returns:
A new instance of the same concrete subclass containing either a deep copy or a shared reference to the underlying data.
- Return type:
Notes
- Pandas semantics:
deep=Falsecreates a new wrapper but may share memory.deep=Truemust create an independent copy of the data.- Arkouda semantics:
Arkouda arrays do not presently support views. Therefore: -
deep=Falsereturns a new wrapper around the sameserver-side array.
deep=Trueforces a full server-side copy.
Examples
Shallow copy (shared data):
>>> import arkouda as ak >>> from arkouda.pandas.extension import ArkoudaArray >>> arr = ArkoudaArray(ak.arange(5)) >>> c1 = arr.copy(deep=False) >>> c1 ArkoudaArray([0 1 2 3 4])
Underlying data is the same object:
>>> arr._data is c1._data True
Deep copy (independent server-side data):
>>> c2 = arr.copy(deep=True) >>> c2 ArkoudaArray([0 1 2 3 4])
Underlying data is a distinct pdarray on the server:
>>> arr._data is c2._data False
- default_fill_value: arkouda.numpy.dtypes.all_scalars | str | None = -1¶
- abstractmethod duplicated(arrays, /, *, axis=0)[source]¶
Return boolean ndarray denoting duplicate values.
- Parameters:
keep ({'first', 'last', False}, default 'first') –
first: Mark duplicates asTrueexcept for the first occurrence.last: Mark duplicates asTrueexcept for the last occurrence.False : Mark all duplicates as
True.
- Returns:
With true in indices where elements are duplicated and false otherwise.
- Return type:
ndarray[bool]
See also
DataFrame.duplicatedReturn boolean Series denoting duplicate rows.
Series.duplicatedIndicate duplicate Series values.
api.extensions.ExtensionArray.uniqueCompute the ExtensionArray of unique values.
Examples
>>> pd.array([1, 1, 2, 3, 3], dtype="Int64").duplicated() array([False, True, False, False, True])
- factorize(use_na_sentinel=True) Tuple[numpy.typing.NDArray[numpy.intp], ArkoudaExtensionArray][source]¶
Encode the values of this array as integer codes and unique values.
This is similar to
pandas.factorize(), but the grouping/factorization work is performed in Arkouda. The returnedcodesare a NumPy array for pandas compatibility, whileuniquesare returned as an ExtensionArray of the same type asself.Each distinct non-missing value is assigned a unique integer code. For floating dtypes,
NaNis treated as missing; for all other dtypes, no values are considered missing.- Parameters:
use_na_sentinel (bool, default True) – If True, missing values are encoded as
-1in the returned codes. If False, missing values are assigned the codelen(uniques). (Missingness is only detected for floating dtypes viaNaN.)- Returns:
A pair
(codes, uniques)where:codesis a 1D NumPy array of dtypenp.intpwith the same length as this array, containing the factor codes for each element.uniquesis an ExtensionArray containing the unique (non-missing) values, with the same extension type asself.
If
use_na_sentinel=True, missing values incodesare-1. Otherwise they receive the codelen(uniques).- Return type:
(numpy.ndarray, ExtensionArray)
Notes
Only floating-point dtypes treat
NaNas missing; for other dtypes, all values are treated as non-missing.uniquesare constructed from Arkouda’s unique keys and returned astype(self)(uniques_ak)so that pandas internals (e.g.groupby) can treat them as an ExtensionArray.String/None/null missing-value behavior is not yet unified with pandas.
Examples
>>> import arkouda as ak >>> from arkouda.pandas.extension import ArkoudaArray >>> arr = ArkoudaArray(ak.array([1, 2, 1, 3])) >>> codes, uniques = arr.factorize() >>> codes array([0, 1, 0, 2]) >>> uniques ArkoudaArray([1 2 3])
- abstractmethod interpolate(method='linear', *, limit=None, **kwargs)[source]¶
Fill NaN values using an interpolation method.
- Parameters:
method (str, default 'linear') – Interpolation technique to use. One of: * ‘linear’: Ignore the index and treat the values as equally spaced. This is the only method supported on MultiIndexes. * ‘time’: Works on daily and higher resolution data to interpolate given length of interval. * ‘index’, ‘values’: use the actual numerical values of the index. * ‘nearest’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’, ‘barycentric’, ‘polynomial’: Passed to scipy.interpolate.interp1d, whereas ‘spline’ is passed to scipy.interpolate.UnivariateSpline. These methods use the numerical values of the index. Both ‘polynomial’ and ‘spline’ require that you also specify an order (int), e.g. arr.interpolate(method=’polynomial’, order=5). * ‘krogh’, ‘piecewise_polynomial’, ‘spline’, ‘pchip’, ‘akima’, ‘cubicspline’: Wrappers around the SciPy interpolation methods of similar names. See Notes. * ‘from_derivatives’: Refers to scipy.interpolate.BPoly.from_derivatives.
axis (int) – Axis to interpolate along. For 1-dimensional data, use 0.
index (Index) – Index to use for interpolation.
limit (int or None) – Maximum number of consecutive NaNs to fill. Must be greater than 0.
limit_direction ({'forward', 'backward', 'both'}) – Consecutive NaNs will be filled in this direction.
limit_area ({'inside', 'outside'} or None) – If limit is specified, consecutive NaNs will be filled with this restriction. * None: No fill restriction. * ‘inside’: Only fill NaNs surrounded by valid values (interpolate). * ‘outside’: Only fill NaNs outside valid values (extrapolate).
copy (bool) – If True, a copy of the object is returned with interpolated values.
**kwargs (optional) – Keyword arguments to pass on to the interpolating function.
- Returns:
An ExtensionArray with interpolated values.
- Return type:
ExtensionArray
See also
Series.interpolateInterpolate values in a Series.
DataFrame.interpolateInterpolate values in a DataFrame.
Notes
All parameters must be specified as keyword arguments.
The ‘krogh’, ‘piecewise_polynomial’, ‘spline’, ‘pchip’ and ‘akima’ methods are wrappers around the respective SciPy implementations of similar names. These use the actual numerical values of the index.
Examples
Interpolating values in a NumPy array:
>>> arr = pd.arrays.NumpyExtensionArray(np.array([0, 1, np.nan, 3])) >>> arr.interpolate( ... method="linear", ... limit=3, ... limit_direction="forward", ... index=pd.Index(range(len(arr))), ... fill_value=1, ... copy=False, ... axis=0, ... limit_area="inside", ... ) <NumpyExtensionArray> [0.0, 1.0, 2.0, 3.0] Length: 4, dtype: float64
Interpolating values in a FloatingArray:
>>> arr = pd.array([1.0, pd.NA, 3.0, 4.0, pd.NA, 6.0], dtype="Float64") >>> arr.interpolate( ... method="linear", ... axis=0, ... index=pd.Index(range(len(arr))), ... limit=None, ... limit_direction="both", ... limit_area=None, ... copy=True, ... ) <FloatingArray> [1.0, 2.0, 3.0, 4.0, 5.0, 6.0] Length: 6, dtype: Float64
- take(indexer, fill_value=None, allow_fill=False)[source]¶
Take elements by (0-based) position, returning a new array.
- This implementation:
normalizes the indexer to Arkouda int64,
explicitly emulates NumPy-style negative wrapping when allow_fill=False,
If
allow_fill=True, then only-1is allowed as a sentinel for missing; those positions are filled withfill_value. Any other negative index raisesValueError.validates bounds (raising IndexError) when allow_fill=True,
gathers once, then fills masked positions in a single pass.
- to_ndarray() numpy.ndarray[source]¶
Convert to a NumPy ndarray, without any dtype conversion or copy options.
- Returns:
A new NumPy array materialized from the underlying Arkouda data.
- Return type:
numpy.ndarray
Notes
This is a lightweight convenience wrapper around the backend’s
.to_ndarray()method. Unliketo_numpy(), this method does not acceptdtypeorcopyarguments and always performs a materialization step.
- to_numpy(dtype=None, copy=False, na_value=None)[source]¶
Convert the array to a NumPy ndarray.
- Parameters:
- Returns:
NumPy array representation of the data.
- Return type:
numpy.ndarray
- view(dtype=None)[source]¶
Return a shallow view of the ExtensionArray.
This method is used by pandas internals (e.g.
BlockManager.copy(deep=False)) to create a newExtensionArraywrapper that shares the same underlying Arkouda data without materializing or copying server-side arrays.- Parameters:
dtype (optional) – If provided and different from the current dtype, a dtype conversion is requested. In this case, the operation is delegated to
astype(dtype, copy=False)and a new array with the requested dtype is returned.- Returns:
A new ExtensionArray instance of the same concrete class that references the same underlying Arkouda data.
- Return type:
Notes
This method performs a shallow copy only: the underlying Arkouda server-side array is shared between the original and the returned object.
No data is materialized, copied, or cast unless
dtypeis explicitly requested.Optional internal attributes (e.g. masks, categorical metadata, caches) are copied by reference when present, to preserve logical consistency.
This method exists to satisfy pandas’ expectations around
.view()andcopy(deep=False)semantics forExtensionArrayimplementations.
Examples
Create a shallow view that shares the same underlying data:
>>> import arkouda as ak >>> from arkouda.pandas.extension._arkouda_array import ArkoudaArray >>> ak_arr = ak.arange(5) >>> ea = ArkoudaArray(ak_arr) >>> v = ea.view() >>> v is ea False >>> v._data is ea._data True
Requesting a dtype conversion delegates to
astypewithout copying the underlying data unless required:>>> v2 = ea.view(dtype="float64") >>> v2.dtype == ea.astype("float64").dtype True
This method is commonly invoked indirectly by pandas during operations that require shallow copies:
>>> import pandas as pd >>> s = pd.Series(ea) >>> df = pd.DataFrame({"col": s}) # does not raise
- class arkouda.pandas.extension.ArkoudaFloat64Dtype[source]¶
Bases:
_ArkoudaBaseDtypeArkouda-backed 64-bit floating-point dtype.
This dtype integrates Arkouda’s server-backed pdarray<float64> with the pandas ExtensionArray interface via
ArkoudaArray. It allows pandas objects (Series, DataFrame) to store and manipulate large distributed float64 arrays without materializing them on the client.- construct_array_type()[source]¶
Returns the
ArkoudaArrayclass used for storage.
- classmethod construct_array_type()[source]¶
Return the ExtensionArray subclass that handles storage for this dtype.
- Returns:
The
ArkoudaArrayclass associated with this dtype.- Return type:
- kind = 'f'¶
A character code (one of ‘biufcmMOSUV’), default ‘O’
This should match the NumPy dtype used when the array is converted to an ndarray, which is probably ‘O’ for object if the extension type cannot be represented as a built-in NumPy type.
See also
numpy.dtype.kind
- na_value¶
Default NA value to use for this type.
This is used in e.g. ExtensionArray.take. This should be the user-facing “boxed” version of the NA value, not the physical NA value for storage. e.g. for JSONArray, this is an empty dictionary.
- name = 'float64'¶
A string identifying the data type.
Will be used for display in, e.g.
Series.dtype
- type¶
The scalar type for the array, e.g.
intIt’s expected
ExtensionArray[item]returns an instance ofExtensionDtype.typefor scalaritem, assuming that value is valid (not NA). NA values do not need to be instances of type.
- class arkouda.pandas.extension.ArkoudaIndexAccessor(pandas_obj: pandas.Index | pandas.MultiIndex)[source]¶
Arkouda-backed index accessor for pandas
IndexandMultiIndex.This accessor provides methods for converting between:
NumPy-backed pandas indexes
pandas indexes backed by
ArkoudaExtensionArray(zero-copy EA mode)legacy Arkouda
ak.Indexandak.MultiIndexobjects
The
.aknamespace mirrors the DataFrame accessor, providing a consistent interface for distributed index operations. All conversions avoid unnecessary NumPy materialization unless explicitly requested viacollect().- Parameters:
pandas_obj (Union[pd.Index, pd.MultiIndex]) – The pandas
IndexorMultiIndexinstance that this accessor wraps.
Notes
to_ak→ pandas object, Arkouda-backed (ExtensionArrays).to_ak_legacy→ legacy Arkouda index objects.collect→ NumPy-backed pandas object.is_arkouda→ reports whether the index is Arkouda-backed.
Examples
Basic single-level Index conversion:
>>> import pandas as pd >>> import arkouda as ak >>> idx = pd.Index([10, 20, 30], name="vals")
Convert to Arkouda-backed:
>>> ak_idx = idx.ak.to_ak() >>> ak_idx.ak.is_arkouda True
Materialize back:
>>> restored = ak_idx.ak.collect() >>> restored.equals(idx) True
Convert to legacy Arkouda:
>>> ak_legacy = idx.ak.to_ak_legacy() >>> type(ak_legacy) <class 'arkouda.pandas.index.Index'>
MultiIndex conversion:
>>> arrays = [[1, 1, 2], ["red", "blue", "red"]] >>> midx = pd.MultiIndex.from_arrays(arrays, names=["num", "color"]) >>> ak_midx = midx.ak.to_ak() >>> ak_midx.ak.is_arkouda True
- collect() pandas.Index | pandas.MultiIndex[source]¶
Materialize this Index or MultiIndex back to a plain NumPy-backed pandas index.
- Returns:
An Index whose underlying data are plain NumPy arrays.
- Return type:
Union[pd.Index, pd.MultiIndex]
- Raises:
TypeError – If the index is Arkouda-backed but does not expose the expected
_dataattribute, or if the index type is unsupported.
Examples
Single-level Index round-trip:
>>> import pandas as pd >>> import arkouda as ak >>> idx = pd.Index([1, 2, 3], name="x") >>> ak_idx = idx.ak.to_ak() >>> np_idx = ak_idx.ak.collect() >>> np_idx Index([1, 2, 3], dtype='int64', name='x') >>> np_idx.equals(idx) True
Behavior when already NumPy-backed (no-op except shallow copy):
>>> plain = pd.Index([10, 20, 30]) >>> plain2 = plain.ak.collect() >>> plain2.equals(plain) True
Verifying that Arkouda-backed values materialize to NumPy:
>>> ak_idx = pd.Index([5, 6, 7]).ak.to_ak() >>> type(ak_idx.array) <class 'arkouda.pandas.extension._arkouda_array.ArkoudaArray'> >>> out = ak_idx.ak.collect() >>> type(out.array) <class 'pandas...NumpyExtensionArray'>
- concat(other: pandas.Index | pandas.MultiIndex) pandas.Index | pandas.MultiIndex[source]¶
Concatenate this index with another Arkouda-backed index.
Both
self._objandothermust be convertible to legacy Arkoudaak_Index/ak_MultiIndex. The concatenation is performed in Arkouda and the result is wrapped back into an Arkouda-backed pandas Index or MultiIndex.- Parameters:
other (Union[pd.Index, pd.MultiIndex]) – The other index to concatenate with
self._obj. It must be apandas.Indexorpandas.MultiIndex.- Returns:
A pandas Index or MultiIndex backed by Arkouda, containing the concatenated values from
self._objandother.- Return type:
Union[pd.Index, pd.MultiIndex]
- Raises:
TypeError – If
otheris not apandas.Indexorpandas.MultiIndex.
- static from_ak_legacy(akidx: arkouda.pandas.index.Index | arkouda.pandas.index.MultiIndex) pandas.Index | pandas.MultiIndex[source]¶
Convert a legacy Arkouda
ak.Indexorak.MultiIndexinto a pandas Index/MultiIndex backed by Arkouda ExtensionArrays.This is the index analogue of
df.ak.from_ak_legacy_ea(): it performs a zero-copy-style wrapping of Arkouda server-side arrays intoArkoudaExtensionArrayobjects, producing a pandas Index or MultiIndex whose levels remain distributed on the Arkouda server.No materialization to NumPy occurs.
- Parameters:
akidx (Union[ak_Index, ak_MultiIndex]) – The legacy Arkouda Index or MultiIndex to wrap.
- Returns:
A pandas index object whose underlying data are
ArkoudaExtensionArrayinstances referencing the Arkouda server-side arrays.- Return type:
Union[pd.Index, pd.MultiIndex]
Notes
ak.Index→pd.Indexwith Arkouda-backed values.ak.MultiIndex→pd.MultiIndexwhere each level is backed by anArkoudaExtensionArray.This function does not validate whether the input is already wrapped; callers should ensure the argument is a legacy Arkouda index object.
Examples
>>> import arkouda as ak >>> import pandas as pd
Wrap a legacy
ak.Indexinto a pandasIndexwithout copying:>>> ak_idx = ak.Index(ak.arange(5)) >>> pd_idx = pd.Index.ak.from_ak_legacy(ak_idx) >>> pd_idx Index([0, 1, 2, 3, 4], dtype='int64')
The resulting index stores its values on the Arkouda server:
>>> type(pd_idx.array) <class 'arkouda.pandas.extension._arkouda_array.ArkoudaArray'>
MultiIndex example:
>>> ak_lvl1 = ak.array(['a', 'a', 'b', 'b']) >>> ak_lvl2 = ak.array([1, 2, 1, 2]) >>> ak_mi = ak.MultiIndex([ak_lvl1, ak_lvl2], names=['letter', 'number'])
>>> pd_mi = pd.Index.ak.from_ak_legacy(ak_mi) >>> pd_mi MultiIndex([('a', 1), ('a', 2), ('b', 1), ('b', 2)], names=['letter', 'number'])
Each level is backed by an Arkouda ExtensionArray and remains distributed:
>>> [type(level._data) for level in pd_mi.levels] [<class 'arkouda.pandas.extension._arkouda_string_array.ArkoudaStringArray'>, <class 'arkouda.pandas.extension._arkouda_array.ArkoudaArray'>]
No NumPy materialization occurs; the underlying data stay on the Arkouda server.
- property is_arkouda: bool¶
Return whether the underlying Index is Arkouda-backed.
An Index or MultiIndex is considered Arkouda-backed if its underlying storage uses
ArkoudaExtensionArray. This applies to both single-level and multi-level indices.- Returns:
True if the Index/MultiIndex is backed by Arkouda server-side arrays, False otherwise.
- Return type:
Examples
NumPy-backed Index:
>>> import pandas as pd >>> idx = pd.Index([1, 2, 3]) >>> idx.ak.is_arkouda False
Arkouda-backed single-level Index:
>>> import arkouda as ak >>> ak_idx = pd.Index([10, 20, 30]).ak.to_ak() >>> ak_idx.ak.is_arkouda True
Arkouda-backed MultiIndex:
>>> arrays = [[1, 1, 2], ["a", "b", "a"]] >>> midx = pd.MultiIndex.from_arrays(arrays) >>> ak_midx = midx.ak.to_ak() >>> ak_midx.ak.is_arkouda True
- lookup(key: object) arkouda.numpy.pdarrayclass.pdarray[source]¶
Perform a server-side lookup on the underlying Arkouda index.
This is a thin convenience wrapper around the legacy
arkouda.pandas.index.Index.lookup()/arkouda.pandas.index.MultiIndex.lookup()methods. It converts the pandas index to a legacy Arkouda index, performs the lookup on the server, and returns the resulting boolean mask.- Parameters:
key (object) – Lookup key or keys, interpreted in the same way as the legacy Arkouda
Index/MultiIndexlookupmethod. For a single-level index this may be a scalar or an Arkoudapdarray; for MultiIndex it may be a tuple or sequence of values/arrays.- Returns:
A boolean Arkouda array indicating which positions in the index match the given
key.- Return type:
- to_ak() pandas.Index | pandas.MultiIndex[source]¶
Convert this pandas Index or MultiIndex to an Arkouda-backed index.
Unlike
to_ak_legacy(), which returns a legacy Arkouda Index object, this method returns a pandas Index or MultiIndex whose data reside on the Arkouda server and are wrapped inArkoudaExtensionArrayExtensionArrays.The conversion is zero-copy with respect to NumPy: no materialization to local NumPy arrays occurs.
- Returns:
An Index whose underlying data live on the Arkouda server.
- Return type:
Union[pd.Index, pd.MultiIndex]
Examples
Convert a simple Index to Arkouda-backed form:
>>> import pandas as pd >>> import arkouda as ak >>> idx = pd.Index([10, 20, 30], name="values") >>> ak_idx = idx.ak.to_ak() >>> type(ak_idx.array) <class 'arkouda.pandas.extension._arkouda_array.ArkoudaArray'>
Round-trip back to NumPy-backed pandas objects:
>>> restored = ak_idx.ak.collect() >>> restored.equals(idx) True
- to_ak_legacy() arkouda.pandas.index.Index | arkouda.pandas.index.MultiIndex[source]¶
Convert this pandas Index or MultiIndex into a legacy Arkouda
ak.Indexorak.MultiIndexobject.This is the index analogue of
df.ak.to_ak_legacy(), returning the actual Arkouda index objects on the server, rather than a pandas wrapper backed byArkoudaExtensionArray.The conversion is zero-copy with respect to NumPy: values are transferred directly into Arkouda arrays without materializing to local NumPy.
- Returns:
A legacy Arkouda Index/MultiIndex whose data live on the Arkouda server.
- Return type:
Union[ak_Index, ak_MultiIndex]
Examples
Convert a simple pandas Index into a legacy Arkouda Index:
>>> import pandas as pd >>> import arkouda as ak >>> idx = pd.Index([10, 20, 30], name="numbers") >>> ak_idx = idx.ak.to_ak_legacy() >>> type(ak_idx) <class 'arkouda.pandas.index.Index'> >>> ak_idx.name 'numbers'
- to_csv(prefix_path: str, dataset: str = 'index') str[source]¶
Save this index to CSV via the legacy
to_csvimplementation and return the server response message.
- to_dict(labels=None)[source]¶
Convert this index to a dictionary representation if supported.
For MultiIndex, this delegates to
MultiIndex.to_dictand returns a mapping of label -> Index. For single-level Indexes, this will raise a TypeError, since the legacy API only definesto_dicton MultiIndex.
- to_hdf(prefix_path: str, dataset: str = 'index', mode: Literal['truncate', 'append'] = 'truncate', file_type: Literal['single', 'distribute'] = 'distribute') str[source]¶
Save this index to HDF5 via the legacy
to_hdfimplementation and return the server response message.
- class arkouda.pandas.extension.ArkoudaInt64Dtype[source]¶
Bases:
_ArkoudaBaseDtypeExtension dtype for Arkouda-backed 64-bit integers.
This dtype allows seamless use of Arkouda’s distributed
int64arrays inside pandas objects (Series,Index,DataFrame). It is backed byarkouda.pdarraywithdtype='int64'and integrates with pandas via theArkoudaArrayextension array.- construct_array_type()[source]¶
Return the associated extension array class (
ArkoudaArray).
- classmethod construct_array_type()[source]¶
Return the associated pandas ExtensionArray type.
This is part of the pandas ExtensionDtype interface and is used internally by pandas when constructing arrays of this dtype. It ensures that operations like
Series(..., dtype=ArkoudaInt64Dtype())produce the correct Arkouda-backed extension array.- Returns:
The
ArkoudaArrayclass that implements the storage and behavior for this dtype.- Return type:
Notes
This hook tells pandas which ExtensionArray to instantiate whenever this dtype is requested.
All Arkouda dtypes defined in this module will return
ArkoudaArray(or a subclass thereof).
Examples
>>> from arkouda.pandas.extension import ArkoudaInt64Dtype >>> ArkoudaInt64Dtype.construct_array_type() <class 'arkouda.pandas.extension._arkouda_array.ArkoudaArray'>
- kind = 'i'¶
A character code (one of ‘biufcmMOSUV’), default ‘O’
This should match the NumPy dtype used when the array is converted to an ndarray, which is probably ‘O’ for object if the extension type cannot be represented as a built-in NumPy type.
See also
numpy.dtype.kind
- na_value = -1¶
Default NA value to use for this type.
This is used in e.g. ExtensionArray.take. This should be the user-facing “boxed” version of the NA value, not the physical NA value for storage. e.g. for JSONArray, this is an empty dictionary.
- name = 'int64'¶
A string identifying the data type.
Will be used for display in, e.g.
Series.dtype
- type¶
The scalar type for the array, e.g.
intIt’s expected
ExtensionArray[item]returns an instance ofExtensionDtype.typefor scalaritem, assuming that value is valid (not NA). NA values do not need to be instances of type.
- class arkouda.pandas.extension.ArkoudaSeriesAccessor(pandas_obj: pandas.Series)[source]¶
Arkouda-backed Series accessor.
Provides a symmetric API to the Index accessor for Series-level conversion and materialization.
- Parameters:
pandas_obj (pd.Series) – The Series this accessor wraps.
Examples
>>> import pandas as pd >>> import arkouda as ak >>> s = pd.Series([1, 2, 3], name="nums")
Convert to Arkouda-backed:
>>> ak_s = s.ak.to_ak() >>> ak_s.ak.is_arkouda True
Materialize back:
>>> restored = ak_s.ak.collect() >>> restored.equals(s) True
Convert to legacy Arkouda:
>>> ak_arr = s.ak.to_ak_legacy() >>> type(ak_arr) <class 'arkouda.pandas.series.Series'>
- apply(func: Callable[[Any], Any] | str, result_dtype: numpy.dtype | str | None = None) pandas.Series[source]¶
Apply a Python function element-wise to this Arkouda-backed Series.
This delegates to
arkouda.apply.apply(), executing the function on the Arkouda server without materializing to NumPy.- Parameters:
func (Union[Callable[[Any], Any], str]) – A Python callable or a specially formatted lambda string (e.g.
"lambda x,: x+1").result_dtype (Optional[Union[np.dtype, str]]) – The dtype of the resulting array. Required if the function changes dtype. Must be compatible with
arkouda.apply.apply(). Default is None.
- Returns:
A new Arkouda-backed Series containing the transformed values.
- Return type:
pd.Series
- Raises:
TypeError – If the Series is not Arkouda-backed or if its values are not a numeric pdarray.
- argsort(*, ascending: bool = True, **kwargs: object) pandas.Series[source]¶
Return the integer indices that would sort the Series values.
This mirrors
pandas.Series.argsortbut returns an Arkouda-backed pandas Series (distributed), not a NumPy-backed result.- Parameters:
ascending (bool) – Sort values in ascending order if True, descending order if False. Default is True.
**kwargs (object) –
Additional keyword arguments.
- na_position{“first”, “last”}, default “last”
Where to place NaN values in the sorted result. Currently only applied for floating-point
pdarraydata; forStringsandCategoricalit has no effect.
- Returns:
An Arkouda-backed Series of integer permutation indices. The returned Series has the same index as the original.
- Return type:
pd.Series
- Raises:
TypeError – If the Series is not Arkouda-backed, or the underlying dtype does not support sorting.
ValueError – If
na_positionis not “first” or “last”.
- collect() pandas.Series[source]¶
Materialize this Series back to a NumPy-backed pandas Series.
- Returns:
A NumPy-backed Series.
- Return type:
pd.Series
Examples
>>> s = pd.Series([1,2,3]).ak.to_ak() >>> out = s.ak.collect() >>> type(out.array) <class 'pandas...NumpyExtensionArray'>
- static from_ak_legacy(akarr: Any, name: str | None = None) pandas.Series[source]¶
Construct an Arkouda-backed pandas Series directly from a legacy Arkouda array.
This performs zero-copy wrapping using ArkoudaExtensionArray and does not materialize data.
- Parameters:
akarr (Any) – A legacy Arkouda array (pdarray, Strings, or Categorical).
name (str | None) – Optional. Name of the resulting Series.
- Returns:
A pandas Series backed by ArkoudaExtensionArray.
- Return type:
pd.Series
Examples
>>> import arkouda as ak >>> import pandas as pd
Basic example with a legacy
pdarray:>>> ak_arr = ak.arange(5) >>> s = pd.Series.ak.from_ak_legacy(ak_arr, name="values") >>> s 0 0 1 1 2 2 3 3 4 4 Name: values, dtype: int64
The underlying data remain on the Arkouda server:
>>> type(s._values) <class 'arkouda.pandas.extension._arkouda_array.ArkoudaArray'>
Using a legacy
Stringsobject:>>> ak_str = ak.array(["a", "b", "c"]) >>> s_str = pd.Series.ak.from_ak_legacy(ak_str, name="letters") >>> s_str 0 a 1 b 2 c Name: letters, dtype: string
Using a legacy
Categorical:>>> ak_cat = ak.Categorical(ak.array(["red", "blue", "red"])) >>> s_cat = pd.Series.ak.from_ak_legacy(ak_cat, name="color") >>> s_cat 0 red 1 blue 2 red Name: color, dtype: category
No NumPy copies are made—the Series is a zero-copy wrapper over Arkouda server-side arrays.
- groupby() arkouda.pandas.groupbyclass.GroupBy[source]¶
Return an Arkouda GroupBy object for this Series, without materializing.
- Return type:
- Raises:
TypeError – Returns TypeError if Series is not arkouda backed.
Examples
>>> import arkouda as ak >>> import pandas as pd >>> s = pd.Series([80, 443, 80]).ak.to_ak() >>> g = s.ak.groupby() >>> keys, counts = g.size()
- property is_arkouda: bool¶
Return True if this Series is fully Arkouda-backed.
A Series is considered Arkouda-backed when both:
Its values are stored in an
ArkoudaExtensionArray.Its index (including each level of a MultiIndex) is backed by
ArkoudaExtensionArray.
- Returns:
True if both data and index are Arkouda-backed, otherwise False.
- Return type:
Examples
>>> s = pd.Series([1, 2, 3]) >>> s.ak.is_arkouda False
>>> ak_s = s.ak.to_ak() >>> ak_s.ak.is_arkouda True
- locate(key: object) pandas.Series[source]¶
Lookup values by index label on the Arkouda server.
This is a thin wrapper around the legacy
arkouda.pandas.series.Series.locate()method. It converts the pandas Series to a legacy Arkoudaak.Series, performs the locate operation on the server, and wraps the result back into an Arkouda-backed pandas Series (ExtensionArray-backed) without NumPy materialization.- Parameters:
key (object) – Lookup key or keys. Interpreted in the same way as the legacy Arkouda
Series.locatemethod. This may be: - a scalar - a list/tuple of scalars - an Arkoudapdarray- an ArkoudaIndex/MultiIndex- an ArkoudaSeries(special case: preserves key index)- Returns:
A pandas Series backed by Arkouda ExtensionArrays containing the located values. The returned Series remains distributed (no NumPy materialization) and is sorted by index.
- Return type:
pd.Series
Notes
This method is Arkouda-specific; pandas does not define
Series.locate.If
keyis a pandas Index/MultiIndex, consider converting it viakey.ak.to_ak_legacy()before callinglocatefor the most direct path.
Examples
>>> import arkouda as ak >>> import pandas as pd >>> s = pd.Series([10, 20, 30], index=pd.Index([1, 2, 3])).ak.to_ak() >>> out = s.ak.locate([3, 1]) >>> out.tolist() [np.int64(10), np.int64(30)]
- to_ak() pandas.Series[source]¶
Convert this pandas Series into an Arkouda-backed Series.
This method produces a pandas
Serieswhose underlying storage usesArkoudaExtensionArray, meaning the data reside on the Arkouda server rather than in local NumPy buffers. The conversion is zero-copy with respect to NumPy: data are only materialized if the original Series is NumPy-backed.The returned Series preserves the original index (including index names) and the original Series
name.- Returns:
A Series backed by an
ArkoudaExtensionArray, referencing Arkouda server-side arrays. The resulting Series retains the original index and name.- Return type:
pd.Series
Notes
If the Series is already Arkouda-backed, this method returns a new Series that is semantically equivalent and still Arkouda-backed.
If the Series is NumPy-backed, values are transferred to Arkouda server-side arrays via
ak.array.No NumPy-side materialization occurs when converting an already Arkouda-backed Series.
Examples
Basic numeric conversion:
>>> import pandas as pd >>> import arkouda as ak >>> s = pd.Series([1, 2, 3], name="nums") >>> s_ak = s.ak.to_ak() >>> type(s_ak.array) <class 'arkouda.pandas.extension._arkouda_array.ArkoudaArray'> >>> s_ak.tolist() [np.int64(1), np.int64(2), np.int64(3)]
Preserving the index and name:
>>> idx = pd.Index([10, 20, 30], name="id") >>> s = pd.Series([100, 200, 300], index=idx, name="values") >>> s_ak = s.ak.to_ak() >>> s_ak.name 'values' >>> s_ak.index.name 'id'
String data:
>>> s = pd.Series(["red", "blue", "green"], name="colors") >>> s_ak = s.ak.to_ak() >>> s_ak.tolist() [np.str_('red'), np.str_('blue'), np.str_('green')]
Idempotence (calling
to_akrepeatedly stays Arkouda-backed):>>> s_ak2 = s_ak.ak.to_ak() >>> s_ak2.ak.is_arkouda True >>> s_ak2.tolist() == s_ak.tolist() True
- to_ak_legacy() arkouda.pandas.series.Series[source]¶
Convert this Series into a legacy Arkouda Series.
- Returns:
The legacy Arkouda Series..
- Return type:
ak_Series
Examples
>>> import pandas as pd >>> s = pd.Series([10,20,30]) >>> ak_arr = s.ak.to_ak_legacy() >>> type(ak_arr) <class 'arkouda.pandas.series.Series'>
- class arkouda.pandas.extension.ArkoudaStringArray(data: arkouda.numpy.strings.Strings | numpy.ndarray | Sequence[Any] | ArkoudaStringArray)[source]¶
Bases:
arkouda.pandas.extension._arkouda_extension_array.ArkoudaExtensionArray,pandas.api.extensions.ExtensionArrayArkouda-backed string pandas ExtensionArray.
Ensures the underlying data is an Arkouda
Stringsobject. Accepts existingStringsor converts from NumPy arrays and Python sequences of strings.- Parameters:
data (Strings | ndarray | Sequence[Any] | ArkoudaStringArray) – Input to wrap or convert. - If
Strings, used directly. - If NumPy/sequence, converted viaak.array. - If anotherArkoudaStringArray, its backingStringsis reused.- Raises:
TypeError – If
datacannot be converted to ArkoudaStrings.
- astype(dtype: numpy.dtype[Any], copy: bool = True) numpy.typing.NDArray[Any][source]¶
- astype(dtype: pandas.core.dtypes.dtypes.ExtensionDtype, copy: bool = True) pandas.api.extensions.ExtensionArray
- astype(dtype: Any, copy: bool = True) pandas.api.extensions.ExtensionArray | numpy.typing.NDArray[Any]
Cast to a specified dtype.
Casting rules:
If
dtyperequestsobject, returns a NumPyNDArray[Any]of dtypeobjectcontaining the string values.If
dtypeis a string dtype (e.g. pandasStringDtype, NumPy unicode, or Arkouda string dtype), returns anArkoudaStringArray. Ifcopy=True, attempts to copy the underlying ArkoudaStringsdata.For all other dtypes, casts the underlying Arkouda
StringsusingStrings.astypeand returns an Arkouda-backedArkoudaExtensionArrayconstructed from the result.
- Parameters:
dtype (Any) – Target dtype. May be a NumPy dtype, pandas dtype, or Arkouda dtype.
copy (bool) – Whether to force a copy when the result is an
ArkoudaStringArray. Default is True.
- Returns:
The cast result. Returns a NumPy array only when casting to
object; otherwise returns an Arkouda-backed ExtensionArray.- Return type:
Union[ExtensionArray, NDArray[Any]]
Examples
Casting to a string dtype returns an Arkouda-backed string array:
>>> import arkouda as ak >>> from arkouda.pandas.extension import ArkoudaStringArray >>> s = ArkoudaStringArray(ak.array(["a", "b", "c"])) >>> out = s.astype("string") >>> out is s False
Forcing a copy when casting to a string dtype returns a new array:
>>> out2 = s.astype("string", copy=True) >>> out2 is s False >>> out2.to_ndarray() array(['a', 'b', 'c'], dtype='<U1')
Casting to
objectmaterializes the data to a NumPy array:>>> s.astype(object) array(['a', 'b', 'c'], dtype=object)
Casting to a non-string dtype uses Arkouda to cast the underlying strings and returns an Arkouda-backed ExtensionArray:
>>> s_num = ArkoudaStringArray(ak.array(["1", "2", "3"])) >>> a = s_num.astype("int64") >>> a.to_ndarray() array([1, 2, 3])
NumPy and pandas dtype objects are also accepted:
>>> import numpy as np >>> a = s_num.astype(np.dtype("float64")) >>> a.to_ndarray() array([1., 2., 3.])
- property dtype¶
An instance of ExtensionDtype.
See also
api.extensions.ExtensionDtypeBase class for extension dtypes.
api.extensions.ExtensionArrayBase class for extension array types.
api.extensions.ExtensionArray.dtypeThe dtype of an ExtensionArray.
Series.dtypeThe dtype of a Series.
DataFrame.dtypeThe dtype of a DataFrame.
Examples
>>> pd.array([1, 2, 3]).dtype Int64Dtype()
- isna()[source]¶
A 1-D array indicating if each value is missing.
- Returns:
In most cases, this should return a NumPy ndarray. For exceptional cases like
SparseArray, where returning an ndarray would be expensive, an ExtensionArray may be returned.- Return type:
numpy.ndarray or pandas.api.extensions.ExtensionArray
See also
ExtensionArray.dropnaReturn ExtensionArray without NA values.
ExtensionArray.fillnaFill NA/NaN values using the specified method.
Notes
If returning an ExtensionArray, then
na_values._is_booleanshould be Truena_valuesshould implementExtensionArray._reduce()na_valuesshould implementExtensionArray._accumulate()na_values.anyandna_values.allshould be implemented
Examples
>>> arr = pd.array([1, 2, np.nan, np.nan]) >>> arr.isna() array([False, False, True, True])
- item(*args, **kwargs)[source]¶
Return the array element at the specified position as a Python scalar.
- Parameters:
index (int, optional) – Position of the element. If not provided, the array must contain exactly one element.
- Returns:
The element at the specified position.
- Return type:
scalar
- Raises:
ValueError – If no index is provided and the array does not have exactly one element.
IndexError – If the specified position is out of bounds.
See also
numpy.ndarray.itemReturn the item of an array as a scalar.
Examples
>>> arr = pd.array([1], dtype="Int64") >>> arr.item() np.int64(1)
>>> arr = pd.array([1, 2, 3], dtype="Int64") >>> arr.item(0) np.int64(1) >>> arr.item(2) np.int64(3)
- value_counts(dropna: bool = True) pandas.Series[source]¶
Return counts of unique strings as a pandas Series.
This method computes the frequency of each distinct string value in the underlying Arkouda
Stringsobject and returns the result as a pandasSeries, with the unique string values as the index and their counts as the data.- Parameters:
dropna (bool) – Whether to exclude missing values. Missing-value handling for Arkouda string arrays is not yet implemented, so this parameter is accepted for pandas compatibility but currently has no effect. Default is True.
- Returns:
A Series containing the counts of unique string values. The index is an
ArkoudaStringArrayof unique values, and the values are anArkoudaArrayof counts.- Return type:
pd.Series
Notes
The following pandas options are not yet implemented:
normalize,sort, andbins.Counting is performed server-side in Arkouda; only the small result (unique values and counts) is materialized on the client.
Examples
Basic usage:
>>> import arkouda as ak >>> from arkouda.pandas.extension import ArkoudaStringArray >>> >>> s = ArkoudaStringArray(["red", "blue", "red", "green", "blue", "red"]) >>> s.value_counts() red 3 blue 2 green 1 dtype: int64
Empty input:
>>> empty = ArkoudaStringArray([]) >>> empty.value_counts() Series([], dtype: int64)
- class arkouda.pandas.extension.ArkoudaStringDtype[source]¶
Bases:
_ArkoudaBaseDtypeArkouda-backed string dtype.
This dtype integrates Arkouda’s distributed
Stringstype with the pandas ExtensionArray interface viaArkoudaStringArray. It enables pandas objects (Series, DataFrame) to hold large, server-backed string columns without converting to NumPy or Python objects.- construct_array_type()[source]¶
Returns the
ArkoudaStringArrayused as the storage class.
- classmethod construct_array_type()[source]¶
Return the ExtensionArray subclass that handles storage for this dtype.
- Returns:
The
ArkoudaStringArrayclass associated with this dtype.- Return type:
- kind = 'O'¶
A character code (one of ‘biufcmMOSUV’), default ‘O’
This should match the NumPy dtype used when the array is converted to an ndarray, which is probably ‘O’ for object if the extension type cannot be represented as a built-in NumPy type.
See also
numpy.dtype.kind
- na_value = ''¶
Default NA value to use for this type.
This is used in e.g. ExtensionArray.take. This should be the user-facing “boxed” version of the NA value, not the physical NA value for storage. e.g. for JSONArray, this is an empty dictionary.
- name = 'string'¶
A string identifying the data type.
Will be used for display in, e.g.
Series.dtype
- type¶
The scalar type for the array, e.g.
intIt’s expected
ExtensionArray[item]returns an instance ofExtensionDtype.typefor scalaritem, assuming that value is valid (not NA). NA values do not need to be instances of type.
- class arkouda.pandas.extension.ArkoudaUint64Dtype[source]¶
Bases:
_ArkoudaBaseDtypeArkouda-backed unsigned 64-bit integer dtype.
This dtype integrates Arkouda’s
uint64arrays with pandas, allowing users to createpandas.Seriesorpandas.DataFrameobjects that store their data on the Arkouda server while still conforming to the pandas ExtensionArray API.- construct_array_type()[source]¶
Return the
ArkoudaArrayclass used as the storage container for this dtype.
Examples
>>> import arkouda as ak >>> import pandas as pd >>> from arkouda.pandas.extension import ArkoudaUint64Dtype, ArkoudaArray
>>> arr = ArkoudaArray(ak.array([1, 2, 3], dtype="uint64")) >>> s = pd.Series(arr, dtype=ArkoudaUint64Dtype()) >>> s 0 1 1 2 2 3 dtype: uint64
- classmethod construct_array_type()[source]¶
Return the ExtensionArray class associated with this dtype.
This is required by the pandas ExtensionDtype API. It tells pandas which
ExtensionArraysubclass should be used to hold data of this dtype inside apandas.Seriesorpandas.DataFrame.- Returns:
The
ArkoudaArrayclass, which implements the storage and operations for Arkouda-backed arrays.- Return type:
- kind = 'u'¶
A character code (one of ‘biufcmMOSUV’), default ‘O’
This should match the NumPy dtype used when the array is converted to an ndarray, which is probably ‘O’ for object if the extension type cannot be represented as a built-in NumPy type.
See also
numpy.dtype.kind
- na_value = -1¶
Default NA value to use for this type.
This is used in e.g. ExtensionArray.take. This should be the user-facing “boxed” version of the NA value, not the physical NA value for storage. e.g. for JSONArray, this is an empty dictionary.
- name = 'uint64'¶
A string identifying the data type.
Will be used for display in, e.g.
Series.dtype
- type¶
The scalar type for the array, e.g.
intIt’s expected
ExtensionArray[item]returns an instance ofExtensionDtype.typefor scalaritem, assuming that value is valid (not NA). NA values do not need to be instances of type.
- class arkouda.pandas.extension.ArkoudaUint8Dtype[source]¶
Bases:
_ArkoudaBaseDtypeArkouda-backed unsigned 8-bit integer dtype.
This dtype integrates Arkouda’s
uint8arrays with the pandas ExtensionArray API, allowing pandasSeriesandDataFrameobjects to store and operate on Arkouda-backed unsigned 8-bit integers. The underlying storage is an Arkoudapdarray<uint8>, exposed through theArkoudaArrayextension array.- construct_array_type()[source]¶
Returns the
ArkoudaArraytype that provides the storage and behavior for this dtype.
- classmethod construct_array_type()[source]¶
Return the ExtensionArray subclass that handles storage for this dtype.
This method is required by the pandas ExtensionDtype interface. It tells pandas which ExtensionArray class to use when creating arrays of this dtype (for example, when calling
Series(..., dtype="arkouda.uint8")).- Returns:
The
ArkoudaArrayclass associated with this dtype.- Return type:
- kind = 'u'¶
A character code (one of ‘biufcmMOSUV’), default ‘O’
This should match the NumPy dtype used when the array is converted to an ndarray, which is probably ‘O’ for object if the extension type cannot be represented as a built-in NumPy type.
See also
numpy.dtype.kind
- na_value = -1¶
Default NA value to use for this type.
This is used in e.g. ExtensionArray.take. This should be the user-facing “boxed” version of the NA value, not the physical NA value for storage. e.g. for JSONArray, this is an empty dictionary.
- name = 'uint8'¶
A string identifying the data type.
Will be used for display in, e.g.
Series.dtype
- type¶
The scalar type for the array, e.g.
intIt’s expected
ExtensionArray[item]returns an instance ofExtensionDtype.typefor scalaritem, assuming that value is valid (not NA). NA values do not need to be instances of type.