arkouda.pandas.extension

Experimental pandas extension types backed by Arkouda arrays.

This subpackage provides experimental implementations of :pandas:`pandas.api.extensions.ExtensionArray` and corresponding extension dtypes that wrap Arkouda distributed arrays.

These classes make it possible to use Arkouda arrays inside pandas objects such as Series and DataFrame. They aim to provide familiar pandas semantics while leveraging Arkouda’s distributed, high-performance backend.

Warning

This module is experimental. The API is not stable and may change without notice between releases. Use with caution in production environments.

Classes

ArkoudaArray

Arkouda-backed numeric/bool pandas ExtensionArray.

ArkoudaBigintDtype

Arkouda-backed arbitrary-precision integer dtype.

ArkoudaBoolDtype

Arkouda-backed boolean dtype.

ArkoudaCategorical

Arkouda-backed categorical pandas ExtensionArray.

ArkoudaCategoricalDtype

Arkouda-backed categorical dtype.

ArkoudaDataFrameAccessor

Arkouda DataFrame accessor.

ArkoudaExtensionArray

Abstract base class for custom 1-D array types.

ArkoudaFloat64Dtype

Arkouda-backed 64-bit floating-point dtype.

ArkoudaIndexAccessor

Arkouda-backed index accessor for pandas Index and MultiIndex.

ArkoudaInt64Dtype

Extension dtype for Arkouda-backed 64-bit integers.

ArkoudaSeriesAccessor

Arkouda-backed Series accessor.

ArkoudaStringArray

Arkouda-backed string pandas ExtensionArray.

ArkoudaStringDtype

Arkouda-backed string dtype.

ArkoudaUint64Dtype

Arkouda-backed unsigned 64-bit integer dtype.

ArkoudaUint8Dtype

Arkouda-backed unsigned 8-bit integer dtype.

Package Contents

class arkouda.pandas.extension.ArkoudaArray(data: arkouda.numpy.pdarrayclass.pdarray | numpy.ndarray | Sequence[Any] | ArkoudaArray, dtype: Any = None, copy: bool = False)[source]

Bases: arkouda.pandas.extension._arkouda_extension_array.ArkoudaExtensionArray, pandas.api.extensions.ExtensionArray

Arkouda-backed numeric/bool pandas ExtensionArray.

Wraps or converts supported inputs into an Arkouda pdarray to serve as the backing store. Ensures the underlying array is 1-D and lives on the Arkouda server.

Parameters:
  • data (pdarray | ndarray | Sequence[Any] | ArkoudaArray) –

    Input to wrap or convert. - If an Arkouda pdarray, it is used directly unless dtype is given

    or copy=True, in which case a new array is created via ak.array.

    • If a NumPy array, it is transferred to Arkouda via ak.array.

    • If a Python sequence, it is converted to NumPy then to Arkouda.

    • If another ArkoudaArray, its underlying pdarray is reused.

  • dtype (Any, optional) – Desired dtype to cast to (NumPy dtype or Arkouda dtype string). If omitted, dtype is inferred from data.

  • copy (bool) – If True, attempt to copy the underlying data when converting/wrapping. Default is False.

Raises:
  • TypeError – If data cannot be interpreted as an Arkouda array-like object.

  • ValueError – If the resulting array is not one-dimensional.

default_fill_value

Sentinel used when filling missing values (default: -1).

Type:

int

Examples

>>> import arkouda as ak
>>> from arkouda.pandas.extension import ArkoudaArray
>>> ArkoudaArray(ak.arange(5))
ArkoudaArray([0 1 2 3 4])
>>> ArkoudaArray([10, 20, 30])
ArkoudaArray([10 20 30])
all(axis=0, skipna=True, **kwargs)[source]

Return whether all elements are True.

This is mainly to support pandas’ BaseExtensionArray.equals, which calls .all() on the result of a boolean expression.

any(axis=0, skipna=True, **kwargs)[source]

Return whether any element is True.

Added for symmetry with .all() and to support potential pandas boolean-reduction calls.

astype(dtype: numpy.dtype[Any], copy: bool = True) numpy.typing.NDArray[Any][source]
astype(dtype: pandas.core.dtypes.dtypes.ExtensionDtype, copy: bool = True) pandas.api.extensions.ExtensionArray
astype(dtype: Any, copy: bool = True) pandas.api.extensions.ExtensionArray | numpy.typing.NDArray[Any]

Cast the array to a specified dtype.

Casting rules:

  • If dtype requests object, returns a NumPy NDArray[Any] of dtype object containing the array values.

  • Otherwise, the target dtype is normalized using Arkouda’s dtype resolution rules.

  • If the normalized dtype matches the current dtype and copy=False, returns self.

  • In all other cases, casts the underlying Arkouda array to the target dtype and returns an Arkouda-backed ArkoudaExtensionArray.

Parameters:
  • dtype (Any) – Target dtype. May be a NumPy dtype, pandas dtype, Arkouda dtype, or any dtype-like object accepted by Arkouda.

  • copy (bool) – Whether to force a copy when the target dtype matches the current dtype. Default is True.

Returns:

The cast result. Returns a NumPy array only when casting to object; otherwise returns an Arkouda-backed ExtensionArray.

Return type:

Union[ExtensionArray, NDArray[Any]]

Examples

Basic numeric casting returns an Arkouda-backed array:

>>> import arkouda as ak
>>> from arkouda.pandas.extension import ArkoudaArray
>>> a = ArkoudaArray(ak.array([1, 2, 3], dtype="int64"))
>>> a.astype("float64").to_ndarray()
array([1., 2., 3.])

Casting to the same dtype with copy=False returns the original object:

>>> b = a.astype("int64", copy=False)
>>> b is a
True

Forcing a copy when the dtype is unchanged returns a new array:

>>> c = a.astype("int64", copy=True)
>>> c is a
False
>>> c.to_ndarray()
array([1, 2, 3])

Casting to object materializes the data to a NumPy array:

>>> a.astype(object)
array([1, 2, 3], dtype=object)

NumPy and pandas dtype objects are also accepted:

>>> import numpy as np
>>> a.astype(np.dtype("bool")).to_ndarray()
array([ True,  True,  True])
default_fill_value: int = -1
property dtype

An instance of ExtensionDtype.

See also

api.extensions.ExtensionDtype

Base class for extension dtypes.

api.extensions.ExtensionArray

Base class for extension array types.

api.extensions.ExtensionArray.dtype

The dtype of an ExtensionArray.

Series.dtype

The dtype of a Series.

DataFrame.dtype

The dtype of a DataFrame.

Examples

>>> pd.array([1, 2, 3]).dtype
Int64Dtype()
equals(other)[source]

Return if another array is equivalent to this array.

Equivalent means that both arrays have the same shape and dtype, and all values compare equal. Missing values in the same location are considered equal (in contrast with normal equality).

Parameters:

other (ExtensionArray) – Array to compare to this Array.

Returns:

Whether the arrays are equivalent.

Return type:

boolean

See also

numpy.array_equal

Equivalent method for numpy array.

Series.equals

Equivalent method for Series.

DataFrame.equals

Equivalent method for DataFrame.

Examples

>>> arr1 = pd.array([1, 2, np.nan])
>>> arr2 = pd.array([1, 2, np.nan])
>>> arr1.equals(arr2)
True
>>> arr1 = pd.array([1, 3, np.nan])
>>> arr2 = pd.array([1, 2, np.nan])
>>> arr1.equals(arr2)
False
isna() numpy.ndarray[source]

Return a boolean mask indicating missing values.

This method implements the pandas ExtensionArray.isna contract and always returns a NumPy ndarray of dtype bool with the same length as the array.

Returns:

A boolean mask where True marks elements considered missing.

Return type:

np.ndarray

Raises:

TypeError – If the underlying data buffer does not support missing-value detection or cannot produce a boolean mask.

isnull()[source]

Alias for isna().

property nbytes

The number of bytes needed to store this object in memory.

See also

ExtensionArray.shape

Return a tuple of the array dimensions.

ExtensionArray.size

The number of elements in the array.

Examples

>>> pd.array([1, 2, 3]).nbytes
27
value_counts(dropna: bool = True) pandas.Series[source]

Return counts of unique values as a pandas Series.

This method computes the frequency of each distinct value in the underlying Arkouda array and returns the result as a pandas Series, with the unique values as the index and their counts as the data.

Parameters:

dropna (bool) – Whether to exclude missing values. Currently, missing-value handling is supported only for floating-point data, where NaN values are treated as missing. Default is True.

Returns:

A Series containing the counts of unique values. The index is an ArkoudaArray of unique values, and the values are an ArkoudaArray of counts.

Return type:

pd.Series

Notes

  • Only dropna=True is supported.

  • The following pandas options are not yet implemented: normalize, sort, and bins.

  • Counting is performed server-side in Arkouda; only the small result (unique values and counts) is materialized on the client.

Examples

>>> import arkouda as ak
>>> from arkouda.pandas.extension import ArkoudaArray
>>>
>>> a = ArkoudaArray(ak.array([1, 2, 1, 3, 2, 1]))
>>> a.value_counts()
1    3
2    2
3    1
dtype: int64

Floating-point data with NaN values:

>>> b = ArkoudaArray(ak.array([1.0, 2.0, float("nan"), 1.0]))
>>> b.value_counts()
1.0    2
2.0    1
dtype: int64
class arkouda.pandas.extension.ArkoudaBigintDtype[source]

Bases: _ArkoudaBaseDtype

Arkouda-backed arbitrary-precision integer dtype.

This dtype integrates Arkouda’s server-backed pdarray<bigint> with the pandas ExtensionArray interface via ArkoudaArray. It enables pandas objects (Series, DataFrame) to hold and operate on very large integers that exceed 64-bit precision, while keeping the data distributed on the Arkouda server.

construct_array_type()[source]

Returns the ArkoudaArray class used for storage.

classmethod construct_array_type()[source]

Return the ExtensionArray subclass that handles storage for this dtype.

Returns:

The ArkoudaArray class associated with this dtype.

Return type:

type

kind = 'O'

A character code (one of ‘biufcmMOSUV’), default ‘O’

This should match the NumPy dtype used when the array is converted to an ndarray, which is probably ‘O’ for object if the extension type cannot be represented as a built-in NumPy type.

See also

numpy.dtype.kind

na_value = -1

Default NA value to use for this type.

This is used in e.g. ExtensionArray.take. This should be the user-facing “boxed” version of the NA value, not the physical NA value for storage. e.g. for JSONArray, this is an empty dictionary.

name = 'bigint'

A string identifying the data type.

Will be used for display in, e.g. Series.dtype

type

The scalar type for the array, e.g. int

It’s expected ExtensionArray[item] returns an instance of ExtensionDtype.type for scalar item, assuming that value is valid (not NA). NA values do not need to be instances of type.

class arkouda.pandas.extension.ArkoudaBoolDtype[source]

Bases: _ArkoudaBaseDtype

Arkouda-backed boolean dtype.

This dtype integrates Arkouda’s server-backed pdarray<bool> with the pandas ExtensionArray interface via ArkoudaArray. It allows pandas objects (Series, DataFrame) to store and manipulate distributed boolean arrays without materializing them on the client.

construct_array_type()[source]

Returns the ArkoudaArray class used for storage.

classmethod construct_array_type()[source]

Return the ExtensionArray subclass that handles storage for this dtype.

Returns:

The ArkoudaArray class associated with this dtype.

Return type:

type

kind = 'b'

A character code (one of ‘biufcmMOSUV’), default ‘O’

This should match the NumPy dtype used when the array is converted to an ndarray, which is probably ‘O’ for object if the extension type cannot be represented as a built-in NumPy type.

See also

numpy.dtype.kind

na_value = False

Default NA value to use for this type.

This is used in e.g. ExtensionArray.take. This should be the user-facing “boxed” version of the NA value, not the physical NA value for storage. e.g. for JSONArray, this is an empty dictionary.

name = 'bool_'

A string identifying the data type.

Will be used for display in, e.g. Series.dtype

type

The scalar type for the array, e.g. int

It’s expected ExtensionArray[item] returns an instance of ExtensionDtype.type for scalar item, assuming that value is valid (not NA). NA values do not need to be instances of type.

class arkouda.pandas.extension.ArkoudaCategorical(data: arkouda.pandas.categorical.Categorical | ArkoudaCategorical | numpy.ndarray | Sequence[Any])[source]

Bases: arkouda.pandas.extension._arkouda_extension_array.ArkoudaExtensionArray, pandas.api.extensions.ExtensionArray

Arkouda-backed categorical pandas ExtensionArray.

Ensures the underlying data is an Arkouda Categorical. Accepts an existing Categorical or converts from Python/NumPy sequences of labels.

Parameters:

data (Categorical | ArkoudaCategorical | ndarray | Sequence[Any]) – Input to wrap or convert. - If Categorical, used directly. - If another ArkoudaCategorical, its backing object is reused. - If list/tuple/ndarray, converted via ak.Categorical(ak.array(data)).

Raises:

TypeError – If data cannot be converted to Arkouda Categorical.

default_fill_value

Sentinel used when filling missing values (default: “”).

Type:

str

add_categories(*args, **kwargs)[source]
as_ordered(*args, **kwargs)[source]
as_unordered(*args, **kwargs)[source]
astype(dtype: numpy.dtype[Any], copy: bool = True) numpy.typing.NDArray[Any][source]
astype(dtype: pandas.core.dtypes.dtypes.ExtensionDtype, copy: bool = True) pandas.api.extensions.ExtensionArray
astype(dtype: Any, copy: bool = True) pandas.api.extensions.ExtensionArray | numpy.typing.NDArray[Any]

Cast to a specified dtype.

  • If dtype is categorical (pandas category / CategoricalDtype / ArkoudaCategoricalDtype), returns an Arkouda-backed ArkoudaCategorical (optionally copied).

  • If dtype requests object, returns a NumPy ndarray of dtype object containing the category labels (materialized to the client).

  • If dtype requests a string dtype, returns an Arkouda-backed ArkoudaStringArray containing the labels as strings.

  • Otherwise, casts the labels (as strings) to the requested dtype and returns an Arkouda-backed ExtensionArray.

Parameters:
  • dtype (Any) – Target dtype.

  • copy (bool) – Whether to force a copy when possible. If categorical-to-categorical and copy=True, attempts to copy the underlying Arkouda Categorical (if supported). Default is True.

Returns:

The cast result. Returns a NumPy array only when casting to object; otherwise returns an Arkouda-backed ExtensionArray.

Return type:

Union[ExtensionArray, NDArray[Any]]

Examples

Casting to category returns an Arkouda-backed categorical array:

>>> import arkouda as ak
>>> from arkouda.pandas.extension import ArkoudaCategorical
>>> c = ArkoudaCategorical(ak.Categorical(ak.array(["x", "y", "x"])))
>>> out = c.astype("category")
>>> out is c
False

Forcing a copy when casting to the same categorical dtype returns a new array:

>>> out2 = c.astype("category", copy=True)
>>> out2 is c
False
>>> out2.to_ndarray()
array(['x', 'y', 'x'], dtype='<U...')

Casting to object materializes the category labels to a NumPy object array:

>>> c.astype(object)
array(['x', 'y', 'x'], dtype=object)

Casting to a string dtype returns an Arkouda-backed string array of labels:

>>> s = c.astype("string")
>>> s.to_ndarray()
array(['x', 'y', 'x'], dtype='<U1')

Casting to another dtype casts the labels-as-strings and returns an Arkouda-backed array:

>>> c_num = ArkoudaCategorical(ak.Categorical(ak.array(["1", "2", "3"])))
>>> a = c_num.astype("int64")
>>> a.to_ndarray()
array([1, 2, 3])
check_for_ordered(*args, **kwargs)[source]
default_fill_value: str = ''
describe(*args, **kwargs)[source]
property dtype

An instance of ExtensionDtype.

See also

api.extensions.ExtensionDtype

Base class for extension dtypes.

api.extensions.ExtensionArray

Base class for extension array types.

api.extensions.ExtensionArray.dtype

The dtype of an ExtensionArray.

Series.dtype

The dtype of a Series.

DataFrame.dtype

The dtype of a DataFrame.

Examples

>>> pd.array([1, 2, 3]).dtype
Int64Dtype()
classmethod from_codes(*args, **kwargs)[source]
Abstractmethod:

isna() numpy.ndarray[source]

# Return a boolean mask indicating missing values.

# This implements the pandas ExtensionArray.isna contract and returns a # NumPy ndarray[bool] of the same length as this categorical array.

# Returns # ——- # np.ndarray # Boolean mask where True indicates a missing value.

# Raises # —— # TypeError # If the underlying categorical cannot expose its codes or if missing # detection is unsupported. #

isnull()[source]

Alias for isna().

max(*args, **kwargs)[source]
memory_usage(*args, **kwargs)[source]
min(*args, **kwargs)[source]
notna(*args, **kwargs)[source]
notnull(*args, **kwargs)[source]
remove_categories(*args, **kwargs)[source]
remove_unused_categories(*args, **kwargs)[source]
rename_categories(*args, **kwargs)[source]
reorder_categories(*args, **kwargs)[source]
set_categories(*args, **kwargs)[source]
set_ordered(*args, **kwargs)[source]
sort_values(*args, **kwargs)[source]
to_list(*args, **kwargs)[source]
value_counts(dropna: bool = True) pandas.Series[source]

Return counts of categories as a pandas Series.

This method computes category frequencies from the underlying Arkouda Categorical and returns them as a pandas Series, where the index contains the category labels and the values contain the corresponding counts.

Parameters:

dropna (bool) – Whether to drop missing values from the result. When True, the result is filtered using the categorical’s na_value. When False, all categories returned by the underlying computation are included. Default is True.

Returns:

A Series containing category counts. The index is an ArkoudaStringArray of category labels and the values are an ArkoudaArray of counts.

Return type:

pd.Series

Notes

  • The result is computed server-side in Arkouda; only the (typically small) output of categories and counts is materialized for the pandas Series.

  • This method does not yet support pandas options such as normalize, sort, or bins.

  • The handling of missing values depends on the Arkouda Categorical definition of na_value.

Examples

>>> import arkouda as ak
>>> from arkouda.pandas.extension import ArkoudaCategorical
>>>
>>> a = ArkoudaCategorical(["a", "b", "a", "c", "b", "a"])
>>> a.value_counts()
a    3
b    2
c    1
dtype: int64
class arkouda.pandas.extension.ArkoudaCategoricalDtype[source]

Bases: _ArkoudaBaseDtype

Arkouda-backed categorical dtype.

This dtype integrates Arkouda’s distributed Categorical type with the pandas ExtensionArray interface via ArkoudaCategorical. It enables pandas objects (Series, DataFrame) to hold categorical data stored and processed on the Arkouda server, while exposing familiar pandas APIs.

construct_array_type()[source]

Returns the ArkoudaCategorical used as the storage class.

classmethod construct_array_type()[source]

Return the ExtensionArray subclass that handles storage for this dtype.

Returns:

The ArkoudaCategorical class associated with this dtype.

Return type:

type

kind = 'O'

A character code (one of ‘biufcmMOSUV’), default ‘O’

This should match the NumPy dtype used when the array is converted to an ndarray, which is probably ‘O’ for object if the extension type cannot be represented as a built-in NumPy type.

See also

numpy.dtype.kind

na_value = -1

Default NA value to use for this type.

This is used in e.g. ExtensionArray.take. This should be the user-facing “boxed” version of the NA value, not the physical NA value for storage. e.g. for JSONArray, this is an empty dictionary.

name = 'category'

A string identifying the data type.

Will be used for display in, e.g. Series.dtype

type

The scalar type for the array, e.g. int

It’s expected ExtensionArray[item] returns an instance of ExtensionDtype.type for scalar item, assuming that value is valid (not NA). NA values do not need to be instances of type.

class arkouda.pandas.extension.ArkoudaDataFrameAccessor(pandas_obj)[source]

Arkouda DataFrame accessor.

Allows df.ak access to Arkouda-backed operations.

collect() pandas.DataFrame[source]

Materialize an Arkouda-backed pandas DataFrame into a NumPy-backed one.

This operation retrieves each Arkouda-backed column from the server using to_ndarray() and constructs a standard pandas DataFrame whose columns are plain NumPy ndarray objects. The returned DataFrame has no dependency on Arkouda.

Returns:

A pandas DataFrame with NumPy-backed columns.

Return type:

pd_DataFrame

Examples

Converting an Arkouda-backed DataFrame into a NumPy-backed one:

>>> import pandas as pd
>>> import arkouda as ak
>>> from arkouda.pandas.extension import ArkoudaDataFrameAccessor

Create a pandas DataFrame and convert it to Arkouda-backed form:

>>> df = pd.DataFrame({"x": [1, 2, 3], "y": ["a", "b", "c"]})
>>> akdf = df.ak.to_ak()

akdf is still a pandas DataFrame, but its columns live on Arkouda:

>>> type(akdf["x"].array)
<class 'arkouda.pandas.extension._arkouda_array.ArkoudaArray'>

Now fully materialize it to local NumPy arrays:

>>> collected = akdf.ak.collect()
>>> collected
   x  y
0  1  a
1  2  b
2  3  c

The columns are now NumPy arrays:

>>> type(collected["x"].values)
<class 'numpy.ndarray'>
static from_ak_legacy(akdf: arkouda.pandas.dataframe.DataFrame) pandas.DataFrame[source]

Convert a legacy Arkouda DataFrame into a pandas DataFrame backed by Arkouda ExtensionArrays.

This is the zero-copy-ish counterpart to to_ak_legacy(). Instead of materializing columns into NumPy arrays, this function wraps each underlying Arkouda server-side array in the appropriate ArkoudaExtensionArray subclass (ArkoudaArray, ArkoudaStringArray, or ArkoudaCategorical). The resulting pandas DataFrame therefore keeps all data on the Arkouda server, enabling scalable operations without transferring data to the Python client.

Parameters:

akdf (ak_DataFrame) – A legacy Arkouda DataFrame (arkouda.pandas.dataframe.DataFrame) whose columns are Arkouda objects (pdarray, Strings, or Categorical).

Returns:

A pandas DataFrame in which each column is an Arkouda-backed ExtensionArray—typically one of:

No materialization to NumPy occurs. All column data remain server-resident.

Return type:

pd_DataFrame

Notes

  • This function performs a zero-copy conversion for the underlying Arkouda arrays (server-side). Only lightweight Python wrappers are created.

  • The resulting pandas DataFrame can interoperate with most pandas APIs that support extension arrays.

  • Round-tripping through to_ak_legacy() and from_ak_legacy() preserves Arkouda semantics.

Examples

Basic conversion

>>> import arkouda as ak
>>> akdf = ak.DataFrame({"a": ak.arange(5), "b": ak.array([10,11,12,13,14])})
>>> pdf = pd.DataFrame.ak.from_ak_legacy(akdf)
>>> pdf
   a   b
0  0  10
1  1  11
2  2  12
3  3  13
4  4  14

Columns stay Arkouda-backed

>>> type(pdf["a"].array)
<class 'arkouda.pandas.extension._arkouda_array.ArkoudaArray'>
>>> pdf["a"].array._data
array([0 1 2 3 4])

No NumPy materialization occurs

>>> pdf["a"].values    # pandas always materializes .values
ArkoudaArray([0 1 2 3 4])

But the underlying column is still Arkouda: >>> pdf[“a”].array._data array([0 1 2 3 4])

Categorical and Strings columns work as well

>>> akdf2 = ak.DataFrame({
...     "s": ak.array(["a","b","a"]),
...     "c": ak.Categorical(ak.array(["e","f","g"]))
... })
>>> pdf2 = pd.DataFrame.ak.from_ak_legacy(akdf2)
>>> type(pdf2["s"].array)
<class 'arkouda.pandas.extension._arkouda_string_array.ArkoudaStringArray'>
>>> type(pdf2["c"].array)
<class 'arkouda.pandas.extension._arkouda_categorical_array.ArkoudaCategorical'>
merge(right: pandas.DataFrame, on: str | List[str] | None = None, left_on: str | List[str] | None = None, right_on: str | List[str] | None = None, how: str = 'inner', left_suffix: str = '_x', right_suffix: str = '_y', convert_ints: bool = True, sort: bool = True) pandas.DataFrame[source]

Merge two Arkouda-backed pandas DataFrames using Arkouda’s join.

Parameters:
  • right (pd.DataFrame) – Right-hand DataFrame to merge with self._obj. All columns must be Arkouda-backed ExtensionArrays.

  • on (Optional[Union[str, List[str]]]) – Column name(s) to join on. Must be present in both left and right DataFrames. If not provided and neither left_on nor right_on is set, the intersection of column names in left and right is used. Default is None.

  • left_on (Optional[Union[str, List[str]]]) – Column name(s) from the left DataFrame to use as join keys. Must be used together with right_on. If provided, on is ignored for the left side. Default is None

  • right_on (Optional[Union[str, List[str]]]) – Column name(s) from the right DataFrame to use as join keys. Must be used together with left_on. If provided, on is ignored for the right side. Default is None

  • how (str) – Type of merge to be performed. One of 'left', 'right', 'inner', or 'outer'. Default is ‘inner’.

  • left_suffix (str) – Suffix to apply to overlapping column names from the left frame that are not part of the join keys. Default is ‘_x’.

  • right_suffix (str) – Suffix to apply to overlapping column names from the right frame that are not part of the join keys.Default is ‘_y’.

  • convert_ints (bool) – Whether to allow Arkouda to upcast integer columns as needed (for example, to accommodate missing values) during the merge. Default is True.

  • sort (bool) – Whether to sort the join keys in the output. Default is True.

Returns:

A pandas DataFrame whose columns are ArkoudaArray ExtensionArrays. All column data remain on the Arkouda server.

Return type:

pd.DataFrame

Raises:

TypeError – If right is not a pandas.DataFrame or if any column in the left or right DataFrame is not Arkouda-backed.

to_ak() pandas.DataFrame[source]

Convert this pandas DataFrame to an Arkouda-backed pandas DataFrame.

Each column of the original pandas DataFrame is materialized to the Arkouda server via ak.array() and wrapped in an ArkoudaArray ExtensionArray. The result is still a pandas DataFrame, but all column data reside on the Arkouda server and behave according to the Arkouda ExtensionArray API.

This method does not return a legacy ak_DataFrame. For that (server-side DataFrame structure), use to_ak_legacy().

Returns:

A pandas DataFrame whose columns are Arkouda-backed ArkoudaArray objects.

Return type:

pd_DataFrame

Examples

Convert a plain pandas DataFrame to an Arkouda-backed one:

>>> import pandas as pd
>>> import arkouda as ak
>>> df = pd.DataFrame({"x": [1, 2, 3], "y": ["a", "b", "c"]})
>>> akdf = df.ak.to_ak()
>>> type(akdf)
 <class 'pandas...DataFrame'>

The columns are now Arkouda ExtensionArrays:

>>> isinstance(akdf["x"].array, ArkoudaArray)
True
>>> akdf["x"].tolist()
[np.int64(1), np.int64(2), np.int64(3)]

Arkouda operations work directly on the columns:

>>> akdf["x"].array._data + 10
array([11 12 13])

Converting back to a NumPy-backed DataFrame:

>>> akdf_numpy = akdf.ak.collect()
>>> akdf_numpy
   x  y
0  1  a
1  2  b
2  3  c
to_ak_legacy() arkouda.pandas.dataframe.DataFrame[source]

Convert this pandas DataFrame into the legacy arkouda.DataFrame.

This method performs a materializing conversion of a pandas DataFrame into the legacy Arkouda DataFrame structure. Every column is converted to Arkouda server-side data:

  • Python / NumPy numeric and boolean arrays become pdarray.

  • String columns become Arkouda string arrays (Strings).

  • Pandas categoricals become Arkouda Categorical objects.

  • The result is a legacy ak_DataFrame whose columns all reside on the Arkouda server.

This differs from to_ak(), which creates Arkouda-backed ExtensionArrays but retains a pandas.DataFrame structure.

Returns:

The legacy Arkouda DataFrame with all columns materialized onto the Arkouda server.

Return type:

ak_DataFrame

Examples

Convert a plain pandas DataFrame to a legacy Arkouda DataFrame:

>>> import pandas as pd
>>> import arkouda as ak
>>> df = pd.DataFrame({
...     "i": [1, 2, 3],
...     "s": ["a", "b", "c"],
...     "c": pd.Series(["low", "low", "high"], dtype="category"),
... })
>>> akdf = df.ak.to_ak_legacy()
>>> type(akdf)
<class 'arkouda.pandas.dataframe.DataFrame'>

Columns have the appropriate Arkouda types:

>>> from arkouda.numpy.pdarrayclass import pdarray
>>> from arkouda.numpy.strings import Strings
>>> from arkouda.pandas.categorical import Categorical
>>> isinstance(akdf["i"], pdarray)
True
>>> isinstance(akdf["s"], Strings)
True
>>> isinstance(akdf["c"], Categorical)
True

Values round-trip through the conversion:

>>> akdf["i"].tolist()
[1, 2, 3]
class arkouda.pandas.extension.ArkoudaExtensionArray(data)[source]

Bases: pandas.api.extensions.ExtensionArray

Abstract base class for custom 1-D array types.

pandas will recognize instances of this class as proper arrays with a custom type and will not attempt to coerce them to objects. They may be stored directly inside a DataFrame or Series.

dtype
nbytes
ndim
shape
argsort()[source]
astype()
copy()[source]
dropna()
duplicated()[source]
factorize()[source]
fillna()
equals()
insert()
interpolate()[source]
isin()
isna()
item()
ravel()
repeat()
searchsorted()
shift()
take()[source]
tolist()
unique()
view()[source]
_accumulate()[source]
_concat_same_type()[source]
_explode()
_formatter()
_from_factorized()[source]
_from_sequence()[source]
_from_sequence_of_strings()[source]
_hash_pandas_object()
_pad_or_backfill()[source]
_reduce()
_values_for_argsort()
_values_for_factorize()[source]

See also

api.extensions.ExtensionDtype

A custom data type, to be paired with an ExtensionArray.

api.extensions.ExtensionArray.dtype

An instance of ExtensionDtype.

Notes

The interface includes the following abstract methods that must be implemented by subclasses:

  • _from_sequence

  • _from_factorized

  • __getitem__

  • __len__

  • __eq__

  • dtype

  • nbytes

  • isna

  • take

  • copy

  • _concat_same_type

  • interpolate

A default repr displaying the type, (truncated) data, length, and dtype is provided. It can be customized or replaced by by overriding:

  • __repr__ : A default repr for the ExtensionArray.

  • _formatter : Print scalars inside a Series or DataFrame.

Some methods require casting the ExtensionArray to an ndarray of Python objects with self.astype(object), which may be expensive. When performance is a concern, we highly recommend overriding the following methods:

  • fillna

  • _pad_or_backfill

  • dropna

  • unique

  • factorize / _values_for_factorize

  • argsort, argmax, argmin / _values_for_argsort

  • searchsorted

  • map

The remaining methods implemented on this class should be performant, as they only compose abstract methods. Still, a more efficient implementation may be available, and these methods can be overridden.

One can implement methods to handle array accumulations or reductions.

  • _accumulate

  • _reduce

One can implement methods to handle parsing from strings that will be used in methods such as pandas.io.parsers.read_csv.

  • _from_sequence_of_strings

This class does not inherit from ‘abc.ABCMeta’ for performance reasons. Methods and properties required by the interface raise pandas.errors.AbstractMethodError and no register method is provided for registering virtual subclasses.

ExtensionArrays are limited to 1 dimension.

They may be backed by none, one, or many NumPy arrays. For example, pandas.Categorical is an extension array backed by two arrays, one for codes and one for categories. An array of IPv6 address may be backed by a NumPy structured array with two fields, one for the lower 64 bits and one for the upper 64 bits. Or they may be backed by some other storage type, like Python lists. Pandas makes no assumptions on how the data are stored, just that it can be converted to a NumPy array. The ExtensionArray interface does not impose any rules on how this data is stored. However, currently, the backing data cannot be stored in attributes called .values or ._values to ensure full compatibility with pandas internals. But other names as .data, ._data, ._items, … can be freely used.

If implementing NumPy’s __array_ufunc__ interface, pandas expects that

  1. You defer by returning NotImplemented when any Series are present in inputs. Pandas will extract the arrays and call the ufunc again.

  2. You define a _HANDLED_TYPES tuple as an attribute on the class. Pandas inspect this to determine whether the ufunc is valid for the types present.

See extending.extension.ufunc for more.

By default, ExtensionArrays are not hashable. Immutable subclasses may override this behavior.

Examples

Please see the following:

https://github.com/pandas-dev/pandas/blob/main/pandas/tests/extension/list/array.py

abstractmethod argmax(axis=None, out=None)[source]

Return the index of maximum value.

In case of multiple occurrences of the maximum value, the index corresponding to the first occurrence is returned.

Parameters:

skipna (bool, default True)

Return type:

int

See also

ExtensionArray.argmin

Return the index of the minimum value.

Examples

>>> arr = pd.array([3, 1, 2, 5, 4])
>>> arr.argmax()
np.int64(3)
abstractmethod argmin(axis=None, out=None)[source]

Return the index of minimum value.

In case of multiple occurrences of the minimum value, the index corresponding to the first occurrence is returned.

Parameters:

skipna (bool, default True)

Return type:

int

See also

ExtensionArray.argmax

Return the index of the maximum value.

Examples

>>> arr = pd.array([3, 1, 2, 5, 4])
>>> arr.argmin()
np.int64(1)
argsort(*, ascending: bool = True, kind: str = 'quicksort', **kwargs: object) numpy.typing.NDArray[numpy.intp][source]

Return the indices that would sort the array.

This method computes the permutation indices that would sort the underlying Arkouda data and returns them as a NumPy array, in accordance with the pandas ExtensionArray contract. The indices can be used to reorder the array via take or iloc.

For floating-point data, NaN values are handled according to the na_position keyword argument.

Parameters:
  • ascending (bool, default True) – If True, sort values in ascending order. If False, sort in descending order.

  • kind (str, default "quicksort") – Sorting algorithm. Present for API compatibility with NumPy and pandas but currently ignored.

  • **kwargs

    Additional keyword arguments for compatibility. Supported keyword:

    • na_position : {“first”, “last”}, default “last” Where to place NaN values in the sorted result. This option is currently only applied for floating-point pdarray data; for Strings and Categorical data it has no effect.

Returns:

A 1D NumPy array of dtype np.intp containing the indices that would sort the array.

Return type:

numpy.ndarray

Raises:
  • ValueError – If na_position is not “first” or “last”.

  • TypeError – If the underlying data type does not support sorting.

Notes

  • Supports Arkouda pdarray, Strings, and Categorical data.

  • For floating-point arrays, NaN values are repositioned according to na_position.

  • The sorting computation occurs on the Arkouda server, but the resulting permutation indices are materialized on the client as a NumPy array, as required by pandas internals.

Examples

>>> import arkouda as ak
>>> from arkouda.pandas.extension import ArkoudaArray
>>> a = ArkoudaArray(ak.array([3.0, float("nan"), 1.0]))
>>> a.argsort() # NA last by default
array([2, 0, 1])
>>> a.argsort(na_position="first")
array([1, 2, 0])
abstractmethod broadcast_arrays(*arrays)[source]
abstractmethod broadcast_to(x, shape, /)[source]
abstractmethod concat(arrays, /, *, axis=0)[source]
copy(deep: bool = True)[source]

Return a copy of the array.

Parameters:

deep (bool, default True) –

Whether to make a deep copy of the underlying Arkouda data. - If True, the underlying server-side array is duplicated. - If False, a new ExtensionArray wrapper is created but the

underlying data is shared (no server-side copy).

Returns:

A new instance of the same concrete subclass containing either a deep copy or a shared reference to the underlying data.

Return type:

ArkoudaExtensionArray

Notes

Pandas semantics:

deep=False creates a new wrapper but may share memory. deep=True must create an independent copy of the data.

Arkouda semantics:

Arkouda arrays do not presently support views. Therefore: - deep=False returns a new wrapper around the same

server-side array.

  • deep=True forces a full server-side copy.

Examples

Shallow copy (shared data):

>>> import arkouda as ak
>>> from arkouda.pandas.extension import ArkoudaArray
>>> arr = ArkoudaArray(ak.arange(5))
>>> c1 = arr.copy(deep=False)
>>> c1
ArkoudaArray([0 1 2 3 4])

Underlying data is the same object:

>>> arr._data is c1._data
True

Deep copy (independent server-side data):

>>> c2 = arr.copy(deep=True)
>>> c2
ArkoudaArray([0 1 2 3 4])

Underlying data is a distinct pdarray on the server:

>>> arr._data is c2._data
False
default_fill_value: arkouda.numpy.dtypes.all_scalars | str | None = -1
abstractmethod duplicated(arrays, /, *, axis=0)[source]

Return boolean ndarray denoting duplicate values.

Parameters:

keep ({'first', 'last', False}, default 'first') –

  • first : Mark duplicates as True except for the first occurrence.

  • last : Mark duplicates as True except for the last occurrence.

  • False : Mark all duplicates as True.

Returns:

With true in indices where elements are duplicated and false otherwise.

Return type:

ndarray[bool]

See also

DataFrame.duplicated

Return boolean Series denoting duplicate rows.

Series.duplicated

Indicate duplicate Series values.

api.extensions.ExtensionArray.unique

Compute the ExtensionArray of unique values.

Examples

>>> pd.array([1, 1, 2, 3, 3], dtype="Int64").duplicated()
array([False,  True, False, False,  True])
abstractmethod expand_dims(x, /, *, axis)[source]
factorize(use_na_sentinel=True) Tuple[numpy.typing.NDArray[numpy.intp], ArkoudaExtensionArray][source]

Encode the values of this array as integer codes and unique values.

This is similar to pandas.factorize(), but the grouping/factorization work is performed in Arkouda. The returned codes are a NumPy array for pandas compatibility, while uniques are returned as an ExtensionArray of the same type as self.

Each distinct non-missing value is assigned a unique integer code. For floating dtypes, NaN is treated as missing; for all other dtypes, no values are considered missing.

Parameters:

use_na_sentinel (bool, default True) – If True, missing values are encoded as -1 in the returned codes. If False, missing values are assigned the code len(uniques). (Missingness is only detected for floating dtypes via NaN.)

Returns:

A pair (codes, uniques) where:

  • codes is a 1D NumPy array of dtype np.intp with the same length as this array, containing the factor codes for each element.

  • uniques is an ExtensionArray containing the unique (non-missing) values, with the same extension type as self.

If use_na_sentinel=True, missing values in codes are -1. Otherwise they receive the code len(uniques).

Return type:

(numpy.ndarray, ExtensionArray)

Notes

  • Only floating-point dtypes treat NaN as missing; for other dtypes, all values are treated as non-missing.

  • uniques are constructed from Arkouda’s unique keys and returned as type(self)(uniques_ak) so that pandas internals (e.g. groupby) can treat them as an ExtensionArray.

  • String/None/null missing-value behavior is not yet unified with pandas.

Examples

>>> import arkouda as ak
>>> from arkouda.pandas.extension import ArkoudaArray
>>> arr = ArkoudaArray(ak.array([1, 2, 1, 3]))
>>> codes, uniques = arr.factorize()
>>> codes
array([0, 1, 0, 2])
>>> uniques
ArkoudaArray([1 2 3])
abstractmethod interpolate(method='linear', *, limit=None, **kwargs)[source]

Fill NaN values using an interpolation method.

Parameters:
  • method (str, default 'linear') – Interpolation technique to use. One of: * ‘linear’: Ignore the index and treat the values as equally spaced. This is the only method supported on MultiIndexes. * ‘time’: Works on daily and higher resolution data to interpolate given length of interval. * ‘index’, ‘values’: use the actual numerical values of the index. * ‘nearest’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’, ‘barycentric’, ‘polynomial’: Passed to scipy.interpolate.interp1d, whereas ‘spline’ is passed to scipy.interpolate.UnivariateSpline. These methods use the numerical values of the index. Both ‘polynomial’ and ‘spline’ require that you also specify an order (int), e.g. arr.interpolate(method=’polynomial’, order=5). * ‘krogh’, ‘piecewise_polynomial’, ‘spline’, ‘pchip’, ‘akima’, ‘cubicspline’: Wrappers around the SciPy interpolation methods of similar names. See Notes. * ‘from_derivatives’: Refers to scipy.interpolate.BPoly.from_derivatives.

  • axis (int) – Axis to interpolate along. For 1-dimensional data, use 0.

  • index (Index) – Index to use for interpolation.

  • limit (int or None) – Maximum number of consecutive NaNs to fill. Must be greater than 0.

  • limit_direction ({'forward', 'backward', 'both'}) – Consecutive NaNs will be filled in this direction.

  • limit_area ({'inside', 'outside'} or None) – If limit is specified, consecutive NaNs will be filled with this restriction. * None: No fill restriction. * ‘inside’: Only fill NaNs surrounded by valid values (interpolate). * ‘outside’: Only fill NaNs outside valid values (extrapolate).

  • copy (bool) – If True, a copy of the object is returned with interpolated values.

  • **kwargs (optional) – Keyword arguments to pass on to the interpolating function.

Returns:

An ExtensionArray with interpolated values.

Return type:

ExtensionArray

See also

Series.interpolate

Interpolate values in a Series.

DataFrame.interpolate

Interpolate values in a DataFrame.

Notes

  • All parameters must be specified as keyword arguments.

  • The ‘krogh’, ‘piecewise_polynomial’, ‘spline’, ‘pchip’ and ‘akima’ methods are wrappers around the respective SciPy implementations of similar names. These use the actual numerical values of the index.

Examples

Interpolating values in a NumPy array:

>>> arr = pd.arrays.NumpyExtensionArray(np.array([0, 1, np.nan, 3]))
>>> arr.interpolate(
...     method="linear",
...     limit=3,
...     limit_direction="forward",
...     index=pd.Index(range(len(arr))),
...     fill_value=1,
...     copy=False,
...     axis=0,
...     limit_area="inside",
... )
<NumpyExtensionArray>
[0.0, 1.0, 2.0, 3.0]
Length: 4, dtype: float64

Interpolating values in a FloatingArray:

>>> arr = pd.array([1.0, pd.NA, 3.0, 4.0, pd.NA, 6.0], dtype="Float64")
>>> arr.interpolate(
...     method="linear",
...     axis=0,
...     index=pd.Index(range(len(arr))),
...     limit=None,
...     limit_direction="both",
...     limit_area=None,
...     copy=True,
... )
<FloatingArray>
[1.0, 2.0, 3.0, 4.0, 5.0, 6.0]
Length: 6, dtype: Float64
kurt(*args, **kwargs)[source]
median(*args, **kwargs)[source]
abstractmethod permute_dims(x, /, axes)[source]
abstractmethod reshape(x, /, shape)[source]
sem(*args, **kwargs)[source]
skew(*args, **kwargs)[source]
abstractmethod split(x, indices_or_sections, /, *, axis=0)[source]
abstractmethod squeeze(x, /, *, axis=None)[source]
abstractmethod stack(arrays, /, *, axis=0)[source]
swapaxes(*args, **kwargs)[source]
take(indexer, fill_value=None, allow_fill=False)[source]

Take elements by (0-based) position, returning a new array.

This implementation:
  • normalizes the indexer to Arkouda int64,

  • explicitly emulates NumPy-style negative wrapping when allow_fill=False,

  • If allow_fill=True, then only -1 is allowed as a sentinel for missing; those positions are filled with fill_value. Any other negative index raises ValueError.

  • validates bounds (raising IndexError) when allow_fill=True,

  • gathers once, then fills masked positions in a single pass.

to_ndarray() numpy.ndarray[source]

Convert to a NumPy ndarray, without any dtype conversion or copy options.

Returns:

A new NumPy array materialized from the underlying Arkouda data.

Return type:

numpy.ndarray

Notes

This is a lightweight convenience wrapper around the backend’s .to_ndarray() method. Unlike to_numpy(), this method does not accept dtype or copy arguments and always performs a materialization step.

to_numpy(dtype=None, copy=False, na_value=None)[source]

Convert the array to a NumPy ndarray.

Parameters:
  • dtype (str, numpy.dtype, optional) – Desired dtype for the result. If None, the underlying dtype is preserved.

  • copy (bool, default False) – Whether to ensure a copy is made: - If False, a view of the underlying buffer may be returned when possible. - If True, always return a new NumPy array.

Returns:

NumPy array representation of the data.

Return type:

numpy.ndarray

view(dtype=None)[source]

Return a shallow view of the ExtensionArray.

This method is used by pandas internals (e.g. BlockManager.copy(deep=False)) to create a new ExtensionArray wrapper that shares the same underlying Arkouda data without materializing or copying server-side arrays.

Parameters:

dtype (optional) – If provided and different from the current dtype, a dtype conversion is requested. In this case, the operation is delegated to astype(dtype, copy=False) and a new array with the requested dtype is returned.

Returns:

A new ExtensionArray instance of the same concrete class that references the same underlying Arkouda data.

Return type:

ArkoudaExtensionArray

Notes

  • This method performs a shallow copy only: the underlying Arkouda server-side array is shared between the original and the returned object.

  • No data is materialized, copied, or cast unless dtype is explicitly requested.

  • Optional internal attributes (e.g. masks, categorical metadata, caches) are copied by reference when present, to preserve logical consistency.

  • This method exists to satisfy pandas’ expectations around .view() and copy(deep=False) semantics for ExtensionArray implementations.

Examples

Create a shallow view that shares the same underlying data:

>>> import arkouda as ak
>>> from arkouda.pandas.extension._arkouda_array import ArkoudaArray
>>> ak_arr = ak.arange(5)
>>> ea = ArkoudaArray(ak_arr)
>>> v = ea.view()
>>> v is ea
False
>>> v._data is ea._data
True

Requesting a dtype conversion delegates to astype without copying the underlying data unless required:

>>> v2 = ea.view(dtype="float64")
>>> v2.dtype == ea.astype("float64").dtype
True

This method is commonly invoked indirectly by pandas during operations that require shallow copies:

>>> import pandas as pd
>>> s = pd.Series(ea)
>>> df = pd.DataFrame({"col": s})  # does not raise

See also

copy

Create a shallow or deep copy of the array.

astype

Cast the array to a new dtype.

class arkouda.pandas.extension.ArkoudaFloat64Dtype[source]

Bases: _ArkoudaBaseDtype

Arkouda-backed 64-bit floating-point dtype.

This dtype integrates Arkouda’s server-backed pdarray<float64> with the pandas ExtensionArray interface via ArkoudaArray. It allows pandas objects (Series, DataFrame) to store and manipulate large distributed float64 arrays without materializing them on the client.

construct_array_type()[source]

Returns the ArkoudaArray class used for storage.

classmethod construct_array_type()[source]

Return the ExtensionArray subclass that handles storage for this dtype.

Returns:

The ArkoudaArray class associated with this dtype.

Return type:

type

kind = 'f'

A character code (one of ‘biufcmMOSUV’), default ‘O’

This should match the NumPy dtype used when the array is converted to an ndarray, which is probably ‘O’ for object if the extension type cannot be represented as a built-in NumPy type.

See also

numpy.dtype.kind

na_value

Default NA value to use for this type.

This is used in e.g. ExtensionArray.take. This should be the user-facing “boxed” version of the NA value, not the physical NA value for storage. e.g. for JSONArray, this is an empty dictionary.

name = 'float64'

A string identifying the data type.

Will be used for display in, e.g. Series.dtype

type

The scalar type for the array, e.g. int

It’s expected ExtensionArray[item] returns an instance of ExtensionDtype.type for scalar item, assuming that value is valid (not NA). NA values do not need to be instances of type.

class arkouda.pandas.extension.ArkoudaIndexAccessor(pandas_obj: pandas.Index | pandas.MultiIndex)[source]

Arkouda-backed index accessor for pandas Index and MultiIndex.

This accessor provides methods for converting between:

  • NumPy-backed pandas indexes

  • pandas indexes backed by ArkoudaExtensionArray (zero-copy EA mode)

  • legacy Arkouda ak.Index and ak.MultiIndex objects

The .ak namespace mirrors the DataFrame accessor, providing a consistent interface for distributed index operations. All conversions avoid unnecessary NumPy materialization unless explicitly requested via collect().

Parameters:

pandas_obj (Union[pd.Index, pd.MultiIndex]) – The pandas Index or MultiIndex instance that this accessor wraps.

Notes

  • to_ak → pandas object, Arkouda-backed (ExtensionArrays).

  • to_ak_legacy → legacy Arkouda index objects.

  • collect → NumPy-backed pandas object.

  • is_arkouda → reports whether the index is Arkouda-backed.

Examples

Basic single-level Index conversion:

>>> import pandas as pd
>>> import arkouda as ak
>>> idx = pd.Index([10, 20, 30], name="vals")

Convert to Arkouda-backed:

>>> ak_idx = idx.ak.to_ak()
>>> ak_idx.ak.is_arkouda
True

Materialize back:

>>> restored = ak_idx.ak.collect()
>>> restored.equals(idx)
True

Convert to legacy Arkouda:

>>> ak_legacy = idx.ak.to_ak_legacy()
>>> type(ak_legacy)
<class 'arkouda.pandas.index.Index'>

MultiIndex conversion:

>>> arrays = [[1, 1, 2], ["red", "blue", "red"]]
>>> midx = pd.MultiIndex.from_arrays(arrays, names=["num", "color"])
>>> ak_midx = midx.ak.to_ak()
>>> ak_midx.ak.is_arkouda
True
collect() pandas.Index | pandas.MultiIndex[source]

Materialize this Index or MultiIndex back to a plain NumPy-backed pandas index.

Returns:

An Index whose underlying data are plain NumPy arrays.

Return type:

Union[pd.Index, pd.MultiIndex]

Raises:

TypeError – If the index is Arkouda-backed but does not expose the expected _data attribute, or if the index type is unsupported.

Examples

Single-level Index round-trip:

>>> import pandas as pd
>>> import arkouda as ak
>>> idx = pd.Index([1, 2, 3], name="x")
>>> ak_idx = idx.ak.to_ak()
>>> np_idx = ak_idx.ak.collect()
>>> np_idx
Index([1, 2, 3], dtype='int64', name='x')
>>> np_idx.equals(idx)
True

Behavior when already NumPy-backed (no-op except shallow copy):

>>> plain = pd.Index([10, 20, 30])
>>> plain2 = plain.ak.collect()
>>> plain2.equals(plain)
True

Verifying that Arkouda-backed values materialize to NumPy:

>>> ak_idx = pd.Index([5, 6, 7]).ak.to_ak()
>>> type(ak_idx.array)
<class 'arkouda.pandas.extension._arkouda_array.ArkoudaArray'>
>>> out = ak_idx.ak.collect()
>>> type(out.array)
<class 'pandas...NumpyExtensionArray'>
concat(other: pandas.Index | pandas.MultiIndex) pandas.Index | pandas.MultiIndex[source]

Concatenate this index with another Arkouda-backed index.

Both self._obj and other must be convertible to legacy Arkouda ak_Index / ak_MultiIndex. The concatenation is performed in Arkouda and the result is wrapped back into an Arkouda-backed pandas Index or MultiIndex.

Parameters:

other (Union[pd.Index, pd.MultiIndex]) – The other index to concatenate with self._obj. It must be a pandas.Index or pandas.MultiIndex.

Returns:

A pandas Index or MultiIndex backed by Arkouda, containing the concatenated values from self._obj and other.

Return type:

Union[pd.Index, pd.MultiIndex]

Raises:

TypeError – If other is not a pandas.Index or pandas.MultiIndex.

static from_ak_legacy(akidx: arkouda.pandas.index.Index | arkouda.pandas.index.MultiIndex) pandas.Index | pandas.MultiIndex[source]

Convert a legacy Arkouda ak.Index or ak.MultiIndex into a pandas Index/MultiIndex backed by Arkouda ExtensionArrays.

This is the index analogue of df.ak.from_ak_legacy_ea(): it performs a zero-copy-style wrapping of Arkouda server-side arrays into ArkoudaExtensionArray objects, producing a pandas Index or MultiIndex whose levels remain distributed on the Arkouda server.

No materialization to NumPy occurs.

Parameters:

akidx (Union[ak_Index, ak_MultiIndex]) – The legacy Arkouda Index or MultiIndex to wrap.

Returns:

A pandas index object whose underlying data are ArkoudaExtensionArray instances referencing the Arkouda server-side arrays.

Return type:

Union[pd.Index, pd.MultiIndex]

Notes

  • ak.Indexpd.Index with Arkouda-backed values.

  • ak.MultiIndexpd.MultiIndex where each level is backed by an ArkoudaExtensionArray.

  • This function does not validate whether the input is already wrapped; callers should ensure the argument is a legacy Arkouda index object.

Examples

>>> import arkouda as ak
>>> import pandas as pd

Wrap a legacy ak.Index into a pandas Index without copying:

>>> ak_idx = ak.Index(ak.arange(5))
>>> pd_idx = pd.Index.ak.from_ak_legacy(ak_idx)
>>> pd_idx
Index([0, 1, 2, 3, 4], dtype='int64')

The resulting index stores its values on the Arkouda server:

>>> type(pd_idx.array)
<class 'arkouda.pandas.extension._arkouda_array.ArkoudaArray'>

MultiIndex example:

>>> ak_lvl1 = ak.array(['a', 'a', 'b', 'b'])
>>> ak_lvl2 = ak.array([1, 2, 1, 2])
>>> ak_mi = ak.MultiIndex([ak_lvl1, ak_lvl2], names=['letter', 'number'])
>>> pd_mi = pd.Index.ak.from_ak_legacy(ak_mi)
>>> pd_mi
MultiIndex([('a', 1),
            ('a', 2),
            ('b', 1),
            ('b', 2)],
           names=['letter', 'number'])

Each level is backed by an Arkouda ExtensionArray and remains distributed:

>>> [type(level._data) for level in pd_mi.levels]
[<class 'arkouda.pandas.extension._arkouda_string_array.ArkoudaStringArray'>,
<class 'arkouda.pandas.extension._arkouda_array.ArkoudaArray'>]

No NumPy materialization occurs; the underlying data stay on the Arkouda server.

property is_arkouda: bool

Return whether the underlying Index is Arkouda-backed.

An Index or MultiIndex is considered Arkouda-backed if its underlying storage uses ArkoudaExtensionArray. This applies to both single-level and multi-level indices.

Returns:

True if the Index/MultiIndex is backed by Arkouda server-side arrays, False otherwise.

Return type:

bool

Examples

NumPy-backed Index:

>>> import pandas as pd
>>> idx = pd.Index([1, 2, 3])
>>> idx.ak.is_arkouda
False

Arkouda-backed single-level Index:

>>> import arkouda as ak
>>> ak_idx = pd.Index([10, 20, 30]).ak.to_ak()
>>> ak_idx.ak.is_arkouda
True

Arkouda-backed MultiIndex:

>>> arrays = [[1, 1, 2], ["a", "b", "a"]]
>>> midx = pd.MultiIndex.from_arrays(arrays)
>>> ak_midx = midx.ak.to_ak()
>>> ak_midx.ak.is_arkouda
True
lookup(key: object) arkouda.numpy.pdarrayclass.pdarray[source]

Perform a server-side lookup on the underlying Arkouda index.

This is a thin convenience wrapper around the legacy arkouda.pandas.index.Index.lookup() / arkouda.pandas.index.MultiIndex.lookup() methods. It converts the pandas index to a legacy Arkouda index, performs the lookup on the server, and returns the resulting boolean mask.

Parameters:

key (object) – Lookup key or keys, interpreted in the same way as the legacy Arkouda Index / MultiIndex lookup method. For a single-level index this may be a scalar or an Arkouda pdarray; for MultiIndex it may be a tuple or sequence of values/arrays.

Returns:

A boolean Arkouda array indicating which positions in the index match the given key.

Return type:

pdarray

to_ak() pandas.Index | pandas.MultiIndex[source]

Convert this pandas Index or MultiIndex to an Arkouda-backed index.

Unlike to_ak_legacy(), which returns a legacy Arkouda Index object, this method returns a pandas Index or MultiIndex whose data reside on the Arkouda server and are wrapped in ArkoudaExtensionArray ExtensionArrays.

The conversion is zero-copy with respect to NumPy: no materialization to local NumPy arrays occurs.

Returns:

An Index whose underlying data live on the Arkouda server.

Return type:

Union[pd.Index, pd.MultiIndex]

Examples

Convert a simple Index to Arkouda-backed form:

>>> import pandas as pd
>>> import arkouda as ak
>>> idx = pd.Index([10, 20, 30], name="values")
>>> ak_idx = idx.ak.to_ak()
>>> type(ak_idx.array)
<class 'arkouda.pandas.extension._arkouda_array.ArkoudaArray'>

Round-trip back to NumPy-backed pandas objects:

>>> restored = ak_idx.ak.collect()
>>> restored.equals(idx)
True
to_ak_legacy() arkouda.pandas.index.Index | arkouda.pandas.index.MultiIndex[source]

Convert this pandas Index or MultiIndex into a legacy Arkouda ak.Index or ak.MultiIndex object.

This is the index analogue of df.ak.to_ak_legacy(), returning the actual Arkouda index objects on the server, rather than a pandas wrapper backed by ArkoudaExtensionArray.

The conversion is zero-copy with respect to NumPy: values are transferred directly into Arkouda arrays without materializing to local NumPy.

Returns:

A legacy Arkouda Index/MultiIndex whose data live on the Arkouda server.

Return type:

Union[ak_Index, ak_MultiIndex]

Examples

Convert a simple pandas Index into a legacy Arkouda Index:

>>> import pandas as pd
>>> import arkouda as ak
>>> idx = pd.Index([10, 20, 30], name="numbers")
>>> ak_idx = idx.ak.to_ak_legacy()
>>> type(ak_idx)
<class 'arkouda.pandas.index.Index'>
>>> ak_idx.name
'numbers'
to_csv(prefix_path: str, dataset: str = 'index') str[source]

Save this index to CSV via the legacy to_csv implementation and return the server response message.

to_dict(labels=None)[source]

Convert this index to a dictionary representation if supported.

For MultiIndex, this delegates to MultiIndex.to_dict and returns a mapping of label -> Index. For single-level Indexes, this will raise a TypeError, since the legacy API only defines to_dict on MultiIndex.

to_hdf(prefix_path: str, dataset: str = 'index', mode: Literal['truncate', 'append'] = 'truncate', file_type: Literal['single', 'distribute'] = 'distribute') str[source]

Save this index to HDF5 via the legacy to_hdf implementation and return the server response message.

to_parquet(prefix_path: str, dataset: str = 'index', mode: Literal['truncate', 'append'] = 'truncate') str[source]

Save this index to Parquet via the legacy to_parquet implementation and return the server response message.

update_hdf(prefix_path: str, dataset: str = 'index', repack: bool = True)[source]

Overwrite or append this index into an existing HDF5 dataset via the legacy update_hdf implementation.

class arkouda.pandas.extension.ArkoudaInt64Dtype[source]

Bases: _ArkoudaBaseDtype

Extension dtype for Arkouda-backed 64-bit integers.

This dtype allows seamless use of Arkouda’s distributed int64 arrays inside pandas objects (Series, Index, DataFrame). It is backed by arkouda.pdarray with dtype='int64' and integrates with pandas via the ArkoudaArray extension array.

construct_array_type()[source]

Return the associated extension array class (ArkoudaArray).

classmethod construct_array_type()[source]

Return the associated pandas ExtensionArray type.

This is part of the pandas ExtensionDtype interface and is used internally by pandas when constructing arrays of this dtype. It ensures that operations like Series(..., dtype=ArkoudaInt64Dtype()) produce the correct Arkouda-backed extension array.

Returns:

The ArkoudaArray class that implements the storage and behavior for this dtype.

Return type:

type

Notes

  • This hook tells pandas which ExtensionArray to instantiate whenever this dtype is requested.

  • All Arkouda dtypes defined in this module will return ArkoudaArray (or a subclass thereof).

Examples

>>> from arkouda.pandas.extension import ArkoudaInt64Dtype
>>> ArkoudaInt64Dtype.construct_array_type()
<class 'arkouda.pandas.extension._arkouda_array.ArkoudaArray'>
kind = 'i'

A character code (one of ‘biufcmMOSUV’), default ‘O’

This should match the NumPy dtype used when the array is converted to an ndarray, which is probably ‘O’ for object if the extension type cannot be represented as a built-in NumPy type.

See also

numpy.dtype.kind

na_value = -1

Default NA value to use for this type.

This is used in e.g. ExtensionArray.take. This should be the user-facing “boxed” version of the NA value, not the physical NA value for storage. e.g. for JSONArray, this is an empty dictionary.

name = 'int64'

A string identifying the data type.

Will be used for display in, e.g. Series.dtype

type

The scalar type for the array, e.g. int

It’s expected ExtensionArray[item] returns an instance of ExtensionDtype.type for scalar item, assuming that value is valid (not NA). NA values do not need to be instances of type.

class arkouda.pandas.extension.ArkoudaSeriesAccessor(pandas_obj: pandas.Series)[source]

Arkouda-backed Series accessor.

Provides a symmetric API to the Index accessor for Series-level conversion and materialization.

Parameters:

pandas_obj (pd.Series) – The Series this accessor wraps.

Examples

>>> import pandas as pd
>>> import arkouda as ak
>>> s = pd.Series([1, 2, 3], name="nums")

Convert to Arkouda-backed:

>>> ak_s = s.ak.to_ak()
>>> ak_s.ak.is_arkouda
True

Materialize back:

>>> restored = ak_s.ak.collect()
>>> restored.equals(s)
True

Convert to legacy Arkouda:

>>> ak_arr = s.ak.to_ak_legacy()
>>> type(ak_arr)
<class 'arkouda.pandas.series.Series'>
apply(func: Callable[[Any], Any] | str, result_dtype: numpy.dtype | str | None = None) pandas.Series[source]

Apply a Python function element-wise to this Arkouda-backed Series.

This delegates to arkouda.apply.apply(), executing the function on the Arkouda server without materializing to NumPy.

Parameters:
  • func (Union[Callable[[Any], Any], str]) – A Python callable or a specially formatted lambda string (e.g. "lambda x,: x+1").

  • result_dtype (Optional[Union[np.dtype, str]]) – The dtype of the resulting array. Required if the function changes dtype. Must be compatible with arkouda.apply.apply(). Default is None.

Returns:

A new Arkouda-backed Series containing the transformed values.

Return type:

pd.Series

Raises:

TypeError – If the Series is not Arkouda-backed or if its values are not a numeric pdarray.

argsort(*, ascending: bool = True, **kwargs: object) pandas.Series[source]

Return the integer indices that would sort the Series values.

This mirrors pandas.Series.argsort but returns an Arkouda-backed pandas Series (distributed), not a NumPy-backed result.

Parameters:
  • ascending (bool) – Sort values in ascending order if True, descending order if False. Default is True.

  • **kwargs (object) –

    Additional keyword arguments.

    na_position{“first”, “last”}, default “last”

    Where to place NaN values in the sorted result. Currently only applied for floating-point pdarray data; for Strings and Categorical it has no effect.

Returns:

An Arkouda-backed Series of integer permutation indices. The returned Series has the same index as the original.

Return type:

pd.Series

Raises:
  • TypeError – If the Series is not Arkouda-backed, or the underlying dtype does not support sorting.

  • ValueError – If na_position is not “first” or “last”.

collect() pandas.Series[source]

Materialize this Series back to a NumPy-backed pandas Series.

Returns:

A NumPy-backed Series.

Return type:

pd.Series

Examples

>>> s = pd.Series([1,2,3]).ak.to_ak()
>>> out = s.ak.collect()
>>> type(out.array)
<class 'pandas...NumpyExtensionArray'>
static from_ak_legacy(akarr: Any, name: str | None = None) pandas.Series[source]

Construct an Arkouda-backed pandas Series directly from a legacy Arkouda array.

This performs zero-copy wrapping using ArkoudaExtensionArray and does not materialize data.

Parameters:
  • akarr (Any) – A legacy Arkouda array (pdarray, Strings, or Categorical).

  • name (str | None) – Optional. Name of the resulting Series.

Returns:

A pandas Series backed by ArkoudaExtensionArray.

Return type:

pd.Series

Examples

>>> import arkouda as ak
>>> import pandas as pd

Basic example with a legacy pdarray:

>>> ak_arr = ak.arange(5)
>>> s = pd.Series.ak.from_ak_legacy(ak_arr, name="values")
>>> s
0    0
1    1
2    2
3    3
4    4
Name: values, dtype: int64

The underlying data remain on the Arkouda server:

>>> type(s._values)
<class 'arkouda.pandas.extension._arkouda_array.ArkoudaArray'>

Using a legacy Strings object:

>>> ak_str = ak.array(["a", "b", "c"])
>>> s_str = pd.Series.ak.from_ak_legacy(ak_str, name="letters")
>>> s_str
0    a
1    b
2    c
Name: letters, dtype: string

Using a legacy Categorical:

>>> ak_cat = ak.Categorical(ak.array(["red", "blue", "red"]))
>>> s_cat = pd.Series.ak.from_ak_legacy(ak_cat, name="color")
>>> s_cat
0     red
1    blue
2     red
Name: color, dtype: category

No NumPy copies are made—the Series is a zero-copy wrapper over Arkouda server-side arrays.

groupby() arkouda.pandas.groupbyclass.GroupBy[source]

Return an Arkouda GroupBy object for this Series, without materializing.

Return type:

GroupBy

Raises:

TypeError – Returns TypeError if Series is not arkouda backed.

Examples

>>> import arkouda as ak
>>> import pandas as pd
>>> s = pd.Series([80, 443, 80]).ak.to_ak()
>>> g = s.ak.groupby()
>>> keys, counts = g.size()
property is_arkouda: bool

Return True if this Series is fully Arkouda-backed.

A Series is considered Arkouda-backed when both:

  1. Its values are stored in an ArkoudaExtensionArray.

  2. Its index (including each level of a MultiIndex) is backed by ArkoudaExtensionArray.

Returns:

True if both data and index are Arkouda-backed, otherwise False.

Return type:

bool

Examples

>>> s = pd.Series([1, 2, 3])
>>> s.ak.is_arkouda
False
>>> ak_s = s.ak.to_ak()
>>> ak_s.ak.is_arkouda
True
locate(key: object) pandas.Series[source]

Lookup values by index label on the Arkouda server.

This is a thin wrapper around the legacy arkouda.pandas.series.Series.locate() method. It converts the pandas Series to a legacy Arkouda ak.Series, performs the locate operation on the server, and wraps the result back into an Arkouda-backed pandas Series (ExtensionArray-backed) without NumPy materialization.

Parameters:

key (object) – Lookup key or keys. Interpreted in the same way as the legacy Arkouda Series.locate method. This may be: - a scalar - a list/tuple of scalars - an Arkouda pdarray - an Arkouda Index / MultiIndex - an Arkouda Series (special case: preserves key index)

Returns:

A pandas Series backed by Arkouda ExtensionArrays containing the located values. The returned Series remains distributed (no NumPy materialization) and is sorted by index.

Return type:

pd.Series

Notes

  • This method is Arkouda-specific; pandas does not define Series.locate.

  • If key is a pandas Index/MultiIndex, consider converting it via key.ak.to_ak_legacy() before calling locate for the most direct path.

Examples

>>> import arkouda as ak
>>> import pandas as pd
>>> s = pd.Series([10, 20, 30], index=pd.Index([1, 2, 3])).ak.to_ak()
>>> out = s.ak.locate([3, 1])
>>> out.tolist()
[np.int64(10), np.int64(30)]
to_ak() pandas.Series[source]

Convert this pandas Series into an Arkouda-backed Series.

This method produces a pandas Series whose underlying storage uses ArkoudaExtensionArray, meaning the data reside on the Arkouda server rather than in local NumPy buffers. The conversion is zero-copy with respect to NumPy: data are only materialized if the original Series is NumPy-backed.

The returned Series preserves the original index (including index names) and the original Series name.

Returns:

A Series backed by an ArkoudaExtensionArray, referencing Arkouda server-side arrays. The resulting Series retains the original index and name.

Return type:

pd.Series

Notes

  • If the Series is already Arkouda-backed, this method returns a new Series that is semantically equivalent and still Arkouda-backed.

  • If the Series is NumPy-backed, values are transferred to Arkouda server-side arrays via ak.array.

  • No NumPy-side materialization occurs when converting an already Arkouda-backed Series.

Examples

Basic numeric conversion:

>>> import pandas as pd
>>> import arkouda as ak
>>> s = pd.Series([1, 2, 3], name="nums")
>>> s_ak = s.ak.to_ak()
>>> type(s_ak.array)
<class 'arkouda.pandas.extension._arkouda_array.ArkoudaArray'>
>>> s_ak.tolist()
[np.int64(1), np.int64(2), np.int64(3)]

Preserving the index and name:

>>> idx = pd.Index([10, 20, 30], name="id")
>>> s = pd.Series([100, 200, 300], index=idx, name="values")
>>> s_ak = s.ak.to_ak()
>>> s_ak.name
'values'
>>> s_ak.index.name
'id'

String data:

>>> s = pd.Series(["red", "blue", "green"], name="colors")
>>> s_ak = s.ak.to_ak()
>>> s_ak.tolist()
[np.str_('red'), np.str_('blue'), np.str_('green')]

Idempotence (calling to_ak repeatedly stays Arkouda-backed):

>>> s_ak2 = s_ak.ak.to_ak()
>>> s_ak2.ak.is_arkouda
True
>>> s_ak2.tolist() == s_ak.tolist()
True
to_ak_legacy() arkouda.pandas.series.Series[source]

Convert this Series into a legacy Arkouda Series.

Returns:

The legacy Arkouda Series..

Return type:

ak_Series

Examples

>>> import pandas as pd
>>> s = pd.Series([10,20,30])
>>> ak_arr = s.ak.to_ak_legacy()
>>> type(ak_arr)
<class 'arkouda.pandas.series.Series'>
class arkouda.pandas.extension.ArkoudaStringArray(data: arkouda.numpy.strings.Strings | numpy.ndarray | Sequence[Any] | ArkoudaStringArray)[source]

Bases: arkouda.pandas.extension._arkouda_extension_array.ArkoudaExtensionArray, pandas.api.extensions.ExtensionArray

Arkouda-backed string pandas ExtensionArray.

Ensures the underlying data is an Arkouda Strings object. Accepts existing Strings or converts from NumPy arrays and Python sequences of strings.

Parameters:

data (Strings | ndarray | Sequence[Any] | ArkoudaStringArray) – Input to wrap or convert. - If Strings, used directly. - If NumPy/sequence, converted via ak.array. - If another ArkoudaStringArray, its backing Strings is reused.

Raises:

TypeError – If data cannot be converted to Arkouda Strings.

default_fill_value

Sentinel used when filling missing values (default: “”).

Type:

str

all(*args, **kwargs)[source]
any(*args, **kwargs)[source]
argpartition(*args, **kwargs)[source]
astype(dtype: numpy.dtype[Any], copy: bool = True) numpy.typing.NDArray[Any][source]
astype(dtype: pandas.core.dtypes.dtypes.ExtensionDtype, copy: bool = True) pandas.api.extensions.ExtensionArray
astype(dtype: Any, copy: bool = True) pandas.api.extensions.ExtensionArray | numpy.typing.NDArray[Any]

Cast to a specified dtype.

Casting rules:

  • If dtype requests object, returns a NumPy NDArray[Any] of dtype object containing the string values.

  • If dtype is a string dtype (e.g. pandas StringDtype, NumPy unicode, or Arkouda string dtype), returns an ArkoudaStringArray. If copy=True, attempts to copy the underlying Arkouda Strings data.

  • For all other dtypes, casts the underlying Arkouda Strings using Strings.astype and returns an Arkouda-backed ArkoudaExtensionArray constructed from the result.

Parameters:
  • dtype (Any) – Target dtype. May be a NumPy dtype, pandas dtype, or Arkouda dtype.

  • copy (bool) – Whether to force a copy when the result is an ArkoudaStringArray. Default is True.

Returns:

The cast result. Returns a NumPy array only when casting to object; otherwise returns an Arkouda-backed ExtensionArray.

Return type:

Union[ExtensionArray, NDArray[Any]]

Examples

Casting to a string dtype returns an Arkouda-backed string array:

>>> import arkouda as ak
>>> from arkouda.pandas.extension import ArkoudaStringArray
>>> s = ArkoudaStringArray(ak.array(["a", "b", "c"]))
>>> out = s.astype("string")
>>> out is s
False

Forcing a copy when casting to a string dtype returns a new array:

>>> out2 = s.astype("string", copy=True)
>>> out2 is s
False
>>> out2.to_ndarray()
array(['a', 'b', 'c'], dtype='<U1')

Casting to object materializes the data to a NumPy array:

>>> s.astype(object)
array(['a', 'b', 'c'], dtype=object)

Casting to a non-string dtype uses Arkouda to cast the underlying strings and returns an Arkouda-backed ExtensionArray:

>>> s_num = ArkoudaStringArray(ak.array(["1", "2", "3"]))
>>> a = s_num.astype("int64")
>>> a.to_ndarray()
array([1, 2, 3])

NumPy and pandas dtype objects are also accepted:

>>> import numpy as np
>>> a = s_num.astype(np.dtype("float64"))
>>> a.to_ndarray()
array([1., 2., 3.])
byteswap(*args, **kwargs)[source]
choose(*args, **kwargs)[source]
clip(*args, **kwargs)[source]
compress(*args, **kwargs)[source]
conj(*args, **kwargs)[source]
conjugate(*args, **kwargs)[source]
cumprod(*args, **kwargs)[source]
cumsum(*args, **kwargs)[source]
default_fill_value: str = ''
diagonal(*args, **kwargs)[source]
dot(*args, **kwargs)[source]
property dtype

An instance of ExtensionDtype.

See also

api.extensions.ExtensionDtype

Base class for extension dtypes.

api.extensions.ExtensionArray

Base class for extension array types.

api.extensions.ExtensionArray.dtype

The dtype of an ExtensionArray.

Series.dtype

The dtype of a Series.

DataFrame.dtype

The dtype of a DataFrame.

Examples

>>> pd.array([1, 2, 3]).dtype
Int64Dtype()
dump(*args, **kwargs)[source]
dumps(*args, **kwargs)[source]
fill(*args, **kwargs)[source]
flatten(*args, **kwargs)[source]
getfield(*args, **kwargs)[source]
isna()[source]

A 1-D array indicating if each value is missing.

Returns:

In most cases, this should return a NumPy ndarray. For exceptional cases like SparseArray, where returning an ndarray would be expensive, an ExtensionArray may be returned.

Return type:

numpy.ndarray or pandas.api.extensions.ExtensionArray

See also

ExtensionArray.dropna

Return ExtensionArray without NA values.

ExtensionArray.fillna

Fill NA/NaN values using the specified method.

Notes

If returning an ExtensionArray, then

  • na_values._is_boolean should be True

  • na_values should implement ExtensionArray._reduce()

  • na_values should implement ExtensionArray._accumulate()

  • na_values.any and na_values.all should be implemented

Examples

>>> arr = pd.array([1, 2, np.nan, np.nan])
>>> arr.isna()
array([False, False,  True,  True])
item(*args, **kwargs)[source]

Return the array element at the specified position as a Python scalar.

Parameters:

index (int, optional) – Position of the element. If not provided, the array must contain exactly one element.

Returns:

The element at the specified position.

Return type:

scalar

Raises:
  • ValueError – If no index is provided and the array does not have exactly one element.

  • IndexError – If the specified position is out of bounds.

See also

numpy.ndarray.item

Return the item of an array as a scalar.

Examples

>>> arr = pd.array([1], dtype="Int64")
>>> arr.item()
np.int64(1)
>>> arr = pd.array([1, 2, 3], dtype="Int64")
>>> arr.item(0)
np.int64(1)
>>> arr.item(2)
np.int64(3)
max(*args, **kwargs)[source]
mean(*args, **kwargs)[source]
min(*args, **kwargs)[source]
nonzero(*args, **kwargs)[source]
partition(*args, **kwargs)[source]
prod(*args, **kwargs)[source]
put(*args, **kwargs)[source]
resize(*args, **kwargs)[source]
round(*args, **kwargs)[source]
setfield(*args, **kwargs)[source]
setflags(*args, **kwargs)[source]
sort(*args, **kwargs)[source]
std(*args, **kwargs)[source]
sum(*args, **kwargs)[source]
swapaxes(*args, **kwargs)[source]
to_device(*args, **kwargs)[source]
tobytes(*args, **kwargs)[source]
tofile(*args, **kwargs)[source]
trace(*args, **kwargs)[source]
value_counts(dropna: bool = True) pandas.Series[source]

Return counts of unique strings as a pandas Series.

This method computes the frequency of each distinct string value in the underlying Arkouda Strings object and returns the result as a pandas Series, with the unique string values as the index and their counts as the data.

Parameters:

dropna (bool) – Whether to exclude missing values. Missing-value handling for Arkouda string arrays is not yet implemented, so this parameter is accepted for pandas compatibility but currently has no effect. Default is True.

Returns:

A Series containing the counts of unique string values. The index is an ArkoudaStringArray of unique values, and the values are an ArkoudaArray of counts.

Return type:

pd.Series

Notes

  • The following pandas options are not yet implemented: normalize, sort, and bins.

  • Counting is performed server-side in Arkouda; only the small result (unique values and counts) is materialized on the client.

Examples

Basic usage:

>>> import arkouda as ak
>>> from arkouda.pandas.extension import ArkoudaStringArray
>>>
>>> s = ArkoudaStringArray(["red", "blue", "red", "green", "blue", "red"])
>>> s.value_counts()
red      3
blue     2
green    1
dtype: int64

Empty input:

>>> empty = ArkoudaStringArray([])
>>> empty.value_counts()
Series([], dtype: int64)
var(*args, **kwargs)[source]
class arkouda.pandas.extension.ArkoudaStringDtype[source]

Bases: _ArkoudaBaseDtype

Arkouda-backed string dtype.

This dtype integrates Arkouda’s distributed Strings type with the pandas ExtensionArray interface via ArkoudaStringArray. It enables pandas objects (Series, DataFrame) to hold large, server-backed string columns without converting to NumPy or Python objects.

construct_array_type()[source]

Returns the ArkoudaStringArray used as the storage class.

classmethod construct_array_type()[source]

Return the ExtensionArray subclass that handles storage for this dtype.

Returns:

The ArkoudaStringArray class associated with this dtype.

Return type:

type

kind = 'O'

A character code (one of ‘biufcmMOSUV’), default ‘O’

This should match the NumPy dtype used when the array is converted to an ndarray, which is probably ‘O’ for object if the extension type cannot be represented as a built-in NumPy type.

See also

numpy.dtype.kind

na_value = ''

Default NA value to use for this type.

This is used in e.g. ExtensionArray.take. This should be the user-facing “boxed” version of the NA value, not the physical NA value for storage. e.g. for JSONArray, this is an empty dictionary.

name = 'string'

A string identifying the data type.

Will be used for display in, e.g. Series.dtype

type

The scalar type for the array, e.g. int

It’s expected ExtensionArray[item] returns an instance of ExtensionDtype.type for scalar item, assuming that value is valid (not NA). NA values do not need to be instances of type.

class arkouda.pandas.extension.ArkoudaUint64Dtype[source]

Bases: _ArkoudaBaseDtype

Arkouda-backed unsigned 64-bit integer dtype.

This dtype integrates Arkouda’s uint64 arrays with pandas, allowing users to create pandas.Series or pandas.DataFrame objects that store their data on the Arkouda server while still conforming to the pandas ExtensionArray API.

construct_array_type()[source]

Return the ArkoudaArray class used as the storage container for this dtype.

Examples

>>> import arkouda as ak
>>> import pandas as pd
>>> from arkouda.pandas.extension import ArkoudaUint64Dtype, ArkoudaArray
>>> arr = ArkoudaArray(ak.array([1, 2, 3], dtype="uint64"))
>>> s = pd.Series(arr, dtype=ArkoudaUint64Dtype())
>>> s
0    1
1    2
2    3
dtype: uint64
classmethod construct_array_type()[source]

Return the ExtensionArray class associated with this dtype.

This is required by the pandas ExtensionDtype API. It tells pandas which ExtensionArray subclass should be used to hold data of this dtype inside a pandas.Series or pandas.DataFrame.

Returns:

The ArkoudaArray class, which implements the storage and operations for Arkouda-backed arrays.

Return type:

type

kind = 'u'

A character code (one of ‘biufcmMOSUV’), default ‘O’

This should match the NumPy dtype used when the array is converted to an ndarray, which is probably ‘O’ for object if the extension type cannot be represented as a built-in NumPy type.

See also

numpy.dtype.kind

na_value = -1

Default NA value to use for this type.

This is used in e.g. ExtensionArray.take. This should be the user-facing “boxed” version of the NA value, not the physical NA value for storage. e.g. for JSONArray, this is an empty dictionary.

name = 'uint64'

A string identifying the data type.

Will be used for display in, e.g. Series.dtype

type

The scalar type for the array, e.g. int

It’s expected ExtensionArray[item] returns an instance of ExtensionDtype.type for scalar item, assuming that value is valid (not NA). NA values do not need to be instances of type.

class arkouda.pandas.extension.ArkoudaUint8Dtype[source]

Bases: _ArkoudaBaseDtype

Arkouda-backed unsigned 8-bit integer dtype.

This dtype integrates Arkouda’s uint8 arrays with the pandas ExtensionArray API, allowing pandas Series and DataFrame objects to store and operate on Arkouda-backed unsigned 8-bit integers. The underlying storage is an Arkouda pdarray<uint8>, exposed through the ArkoudaArray extension array.

construct_array_type()[source]

Returns the ArkoudaArray type that provides the storage and behavior for this dtype.

classmethod construct_array_type()[source]

Return the ExtensionArray subclass that handles storage for this dtype.

This method is required by the pandas ExtensionDtype interface. It tells pandas which ExtensionArray class to use when creating arrays of this dtype (for example, when calling Series(..., dtype="arkouda.uint8")).

Returns:

The ArkoudaArray class associated with this dtype.

Return type:

type

kind = 'u'

A character code (one of ‘biufcmMOSUV’), default ‘O’

This should match the NumPy dtype used when the array is converted to an ndarray, which is probably ‘O’ for object if the extension type cannot be represented as a built-in NumPy type.

See also

numpy.dtype.kind

na_value = -1

Default NA value to use for this type.

This is used in e.g. ExtensionArray.take. This should be the user-facing “boxed” version of the NA value, not the physical NA value for storage. e.g. for JSONArray, this is an empty dictionary.

name = 'uint8'

A string identifying the data type.

Will be used for display in, e.g. Series.dtype

type

The scalar type for the array, e.g. int

It’s expected ExtensionArray[item] returns an instance of ExtensionDtype.type for scalar item, assuming that value is valid (not NA). NA values do not need to be instances of type.