arkouda.pandas

Submodules

Attributes

Classes

ArkoudaArray

Arkouda-backed numeric/bool pandas ExtensionArray.

ArkoudaBigintDtype

Arkouda-backed arbitrary-precision integer dtype.

ArkoudaBoolDtype

Arkouda-backed boolean dtype.

ArkoudaCategorical

Arkouda-backed categorical pandas ExtensionArray.

ArkoudaCategoricalDtype

Arkouda-backed categorical dtype.

ArkoudaDataFrameAccessor

Arkouda DataFrame accessor.

ArkoudaFloat64Dtype

Arkouda-backed 64-bit floating-point dtype.

ArkoudaIndexAccessor

Arkouda-backed index accessor for pandas Index and MultiIndex.

ArkoudaInt64Dtype

Extension dtype for Arkouda-backed 64-bit integers.

ArkoudaStringArray

Arkouda-backed string pandas ExtensionArray.

ArkoudaStringDtype

Arkouda-backed string dtype.

ArkoudaUint64Dtype

Arkouda-backed unsigned 64-bit integer dtype.

ArkoudaUint8Dtype

Arkouda-backed unsigned 8-bit integer dtype.

CachedAccessor

Descriptor for caching namespace-based accessors.

DatetimeAccessor

Accessor for datetime-like operations on Arkouda Series.

Properties

Base class for accessor implementations in Arkouda.

Row

Dictionary-like representation of a single row in an Arkouda DataFrame.

Series

One-dimensional Arkouda array with axis labels.

StringAccessor

Accessor for string operations on Arkouda Series.

Functions

compute_join_size(→ Tuple[int, int])

Compute the internal size of a hypothetical join between a and b. Returns

date_operators(cls)

Add common datetime operation methods to a DatetimeAccessor class.

from_series(...)

Convert a pandas Series to an Arkouda pdarray or Strings.

gen_ranges(starts, ends[, stride, return_lengths])

Generate a segmented array of variable-length, contiguous ranges between pairs of

join_on_eq_with_dt(...)

Inner-join on equality between two integer arrays where the time-window predicate is also true.

string_operators(cls)

Add common string operation methods to a StringAccessor class.

Package Contents

class arkouda.pandas.ArkoudaArray(data: arkouda.numpy.pdarrayclass.pdarray | numpy.ndarray | Sequence[Any] | ArkoudaArray, dtype: Any = None, copy: bool = False)[source]

Bases: arkouda.pandas.extension._arkouda_extension_array.ArkoudaExtensionArray, pandas.api.extensions.ExtensionArray

Arkouda-backed numeric/bool pandas ExtensionArray.

Wraps or converts supported inputs into an Arkouda pdarray to serve as the backing store. Ensures the underlying array is 1-D and lives on the Arkouda server.

Parameters:
  • data (pdarray | ndarray | Sequence[Any] | ArkoudaArray) –

    Input to wrap or convert. - If an Arkouda pdarray, it is used directly unless dtype is given

    or copy=True, in which case a new array is created via ak.array.

    • If a NumPy array, it is transferred to Arkouda via ak.array.

    • If a Python sequence, it is converted to NumPy then to Arkouda.

    • If another ArkoudaArray, its underlying pdarray is reused.

  • dtype (Any, optional) – Desired dtype to cast to (NumPy dtype or Arkouda dtype string). If omitted, dtype is inferred from data.

  • copy (bool) – If True, attempt to copy the underlying data when converting/wrapping. Default is False.

Raises:
  • TypeError – If data cannot be interpreted as an Arkouda array-like object.

  • ValueError – If the resulting array is not one-dimensional.

default_fill_value

Sentinel used when filling missing values (default: -1).

Type:

int

Examples

>>> import arkouda as ak
>>> from arkouda.pandas.extension import ArkoudaArray
>>> ArkoudaArray(ak.arange(5))
ArkoudaArray([0 1 2 3 4])
>>> ArkoudaArray([10, 20, 30])
ArkoudaArray([10 20 30])
all(axis=0, skipna=True, **kwargs)[source]

Return whether all elements are True.

This is mainly to support pandas’ BaseExtensionArray.equals, which calls .all() on the result of a boolean expression.

any(axis=0, skipna=True, **kwargs)[source]

Return whether any element is True.

Added for symmetry with .all() and to support potential pandas boolean-reduction calls.

astype(dtype: numpy.dtype[Any], copy: bool = True) numpy.typing.NDArray[Any][source]
astype(dtype: pandas.core.dtypes.dtypes.ExtensionDtype, copy: bool = True) pandas.api.extensions.ExtensionArray
astype(dtype: Any, copy: bool = True) pandas.api.extensions.ExtensionArray | numpy.typing.NDArray[Any]

Cast the array to a specified dtype.

Casting rules:

  • If dtype requests object, returns a NumPy NDArray[Any] of dtype object containing the array values.

  • Otherwise, the target dtype is normalized using Arkouda’s dtype resolution rules.

  • If the normalized dtype matches the current dtype and copy=False, returns self.

  • In all other cases, casts the underlying Arkouda array to the target dtype and returns an Arkouda-backed ArkoudaExtensionArray.

Parameters:
  • dtype (Any) – Target dtype. May be a NumPy dtype, pandas dtype, Arkouda dtype, or any dtype-like object accepted by Arkouda.

  • copy (bool) – Whether to force a copy when the target dtype matches the current dtype. Default is True.

Returns:

The cast result. Returns a NumPy array only when casting to object; otherwise returns an Arkouda-backed ExtensionArray.

Return type:

Union[ExtensionArray, NDArray[Any]]

Examples

Basic numeric casting returns an Arkouda-backed array:

>>> import arkouda as ak
>>> from arkouda.pandas.extension import ArkoudaArray
>>> a = ArkoudaArray(ak.array([1, 2, 3], dtype="int64"))
>>> a.astype("float64").to_ndarray()
array([1., 2., 3.])

Casting to the same dtype with copy=False returns the original object:

>>> b = a.astype("int64", copy=False)
>>> b is a
True

Forcing a copy when the dtype is unchanged returns a new array:

>>> c = a.astype("int64", copy=True)
>>> c is a
False
>>> c.to_ndarray()
array([1, 2, 3])

Casting to object materializes the data to a NumPy array:

>>> a.astype(object)
array([1, 2, 3], dtype=object)

NumPy and pandas dtype objects are also accepted:

>>> import numpy as np
>>> a.astype(np.dtype("bool")).to_ndarray()
array([ True,  True,  True])
default_fill_value: int = -1
property dtype

An instance of ExtensionDtype.

See also

api.extensions.ExtensionDtype

Base class for extension dtypes.

api.extensions.ExtensionArray

Base class for extension array types.

api.extensions.ExtensionArray.dtype

The dtype of an ExtensionArray.

Series.dtype

The dtype of a Series.

DataFrame.dtype

The dtype of a DataFrame.

Examples

>>> pd.array([1, 2, 3]).dtype
Int64Dtype()
equals(other)[source]

Return if another array is equivalent to this array.

Equivalent means that both arrays have the same shape and dtype, and all values compare equal. Missing values in the same location are considered equal (in contrast with normal equality).

Parameters:

other (ExtensionArray) – Array to compare to this Array.

Returns:

Whether the arrays are equivalent.

Return type:

boolean

See also

numpy.array_equal

Equivalent method for numpy array.

Series.equals

Equivalent method for Series.

DataFrame.equals

Equivalent method for DataFrame.

Examples

>>> arr1 = pd.array([1, 2, np.nan])
>>> arr2 = pd.array([1, 2, np.nan])
>>> arr1.equals(arr2)
True
>>> arr1 = pd.array([1, 3, np.nan])
>>> arr2 = pd.array([1, 2, np.nan])
>>> arr1.equals(arr2)
False
isna() numpy.ndarray[source]

Return a boolean mask indicating missing values.

This method implements the pandas ExtensionArray.isna contract and always returns a NumPy ndarray of dtype bool with the same length as the array.

Returns:

A boolean mask where True marks elements considered missing.

Return type:

np.ndarray

Raises:

TypeError – If the underlying data buffer does not support missing-value detection or cannot produce a boolean mask.

isnull()[source]

Alias for isna().

property nbytes

The number of bytes needed to store this object in memory.

See also

ExtensionArray.shape

Return a tuple of the array dimensions.

ExtensionArray.size

The number of elements in the array.

Examples

>>> pd.array([1, 2, 3]).nbytes
27
value_counts(dropna: bool = True) pandas.Series[source]

Return counts of unique values as a pandas Series.

This method computes the frequency of each distinct value in the underlying Arkouda array and returns the result as a pandas Series, with the unique values as the index and their counts as the data.

Parameters:

dropna (bool) – Whether to exclude missing values. Currently, missing-value handling is supported only for floating-point data, where NaN values are treated as missing. Default is True.

Returns:

A Series containing the counts of unique values. The index is an ArkoudaArray of unique values, and the values are an ArkoudaArray of counts.

Return type:

pd.Series

Notes

  • Only dropna=True is supported.

  • The following pandas options are not yet implemented: normalize, sort, and bins.

  • Counting is performed server-side in Arkouda; only the small result (unique values and counts) is materialized on the client.

Examples

>>> import arkouda as ak
>>> from arkouda.pandas.extension import ArkoudaArray
>>>
>>> a = ArkoudaArray(ak.array([1, 2, 1, 3, 2, 1]))
>>> a.value_counts()
1    3
2    2
3    1
dtype: int64

Floating-point data with NaN values:

>>> b = ArkoudaArray(ak.array([1.0, 2.0, float("nan"), 1.0]))
>>> b.value_counts()
1.0    2
2.0    1
dtype: int64
arkouda.pandas.ArkoudaArrayLike
class arkouda.pandas.ArkoudaBigintDtype[source]

Bases: _ArkoudaBaseDtype

Arkouda-backed arbitrary-precision integer dtype.

This dtype integrates Arkouda’s server-backed pdarray<bigint> with the pandas ExtensionArray interface via ArkoudaArray. It enables pandas objects (Series, DataFrame) to hold and operate on very large integers that exceed 64-bit precision, while keeping the data distributed on the Arkouda server.

construct_array_type()[source]

Returns the ArkoudaArray class used for storage.

classmethod construct_array_type()[source]

Return the ExtensionArray subclass that handles storage for this dtype.

Returns:

The ArkoudaArray class associated with this dtype.

Return type:

type

kind = 'O'

A character code (one of ‘biufcmMOSUV’), default ‘O’

This should match the NumPy dtype used when the array is converted to an ndarray, which is probably ‘O’ for object if the extension type cannot be represented as a built-in NumPy type.

See also

numpy.dtype.kind

na_value = -1

Default NA value to use for this type.

This is used in e.g. ExtensionArray.take. This should be the user-facing “boxed” version of the NA value, not the physical NA value for storage. e.g. for JSONArray, this is an empty dictionary.

name = 'bigint'

A string identifying the data type.

Will be used for display in, e.g. Series.dtype

type

The scalar type for the array, e.g. int

It’s expected ExtensionArray[item] returns an instance of ExtensionDtype.type for scalar item, assuming that value is valid (not NA). NA values do not need to be instances of type.

class arkouda.pandas.ArkoudaBoolDtype[source]

Bases: _ArkoudaBaseDtype

Arkouda-backed boolean dtype.

This dtype integrates Arkouda’s server-backed pdarray<bool> with the pandas ExtensionArray interface via ArkoudaArray. It allows pandas objects (Series, DataFrame) to store and manipulate distributed boolean arrays without materializing them on the client.

construct_array_type()[source]

Returns the ArkoudaArray class used for storage.

classmethod construct_array_type()[source]

Return the ExtensionArray subclass that handles storage for this dtype.

Returns:

The ArkoudaArray class associated with this dtype.

Return type:

type

kind = 'b'

A character code (one of ‘biufcmMOSUV’), default ‘O’

This should match the NumPy dtype used when the array is converted to an ndarray, which is probably ‘O’ for object if the extension type cannot be represented as a built-in NumPy type.

See also

numpy.dtype.kind

na_value = False

Default NA value to use for this type.

This is used in e.g. ExtensionArray.take. This should be the user-facing “boxed” version of the NA value, not the physical NA value for storage. e.g. for JSONArray, this is an empty dictionary.

name = 'bool_'

A string identifying the data type.

Will be used for display in, e.g. Series.dtype

type

The scalar type for the array, e.g. int

It’s expected ExtensionArray[item] returns an instance of ExtensionDtype.type for scalar item, assuming that value is valid (not NA). NA values do not need to be instances of type.

class arkouda.pandas.ArkoudaCategorical(data: arkouda.pandas.categorical.Categorical | ArkoudaCategorical | numpy.ndarray | Sequence[Any])[source]

Bases: arkouda.pandas.extension._arkouda_extension_array.ArkoudaExtensionArray, pandas.api.extensions.ExtensionArray

Arkouda-backed categorical pandas ExtensionArray.

Ensures the underlying data is an Arkouda Categorical. Accepts an existing Categorical or converts from Python/NumPy sequences of labels.

Parameters:

data (Categorical | ArkoudaCategorical | ndarray | Sequence[Any]) – Input to wrap or convert. - If Categorical, used directly. - If another ArkoudaCategorical, its backing object is reused. - If list/tuple/ndarray, converted via ak.Categorical(ak.array(data)).

Raises:

TypeError – If data cannot be converted to Arkouda Categorical.

default_fill_value

Sentinel used when filling missing values (default: “”).

Type:

str

add_categories(*args, **kwargs)[source]
as_ordered(*args, **kwargs)[source]
as_unordered(*args, **kwargs)[source]
astype(dtype: numpy.dtype[Any], copy: bool = True) numpy.typing.NDArray[Any][source]
astype(dtype: pandas.core.dtypes.dtypes.ExtensionDtype, copy: bool = True) pandas.api.extensions.ExtensionArray
astype(dtype: Any, copy: bool = True) pandas.api.extensions.ExtensionArray | numpy.typing.NDArray[Any]

Cast to a specified dtype.

  • If dtype is categorical (pandas category / CategoricalDtype / ArkoudaCategoricalDtype), returns an Arkouda-backed ArkoudaCategorical (optionally copied).

  • If dtype requests object, returns a NumPy ndarray of dtype object containing the category labels (materialized to the client).

  • If dtype requests a string dtype, returns an Arkouda-backed ArkoudaStringArray containing the labels as strings.

  • Otherwise, casts the labels (as strings) to the requested dtype and returns an Arkouda-backed ExtensionArray.

Parameters:
  • dtype (Any) – Target dtype.

  • copy (bool) – Whether to force a copy when possible. If categorical-to-categorical and copy=True, attempts to copy the underlying Arkouda Categorical (if supported). Default is True.

Returns:

The cast result. Returns a NumPy array only when casting to object; otherwise returns an Arkouda-backed ExtensionArray.

Return type:

Union[ExtensionArray, NDArray[Any]]

Examples

Casting to category returns an Arkouda-backed categorical array:

>>> import arkouda as ak
>>> from arkouda.pandas.extension import ArkoudaCategorical
>>> c = ArkoudaCategorical(ak.Categorical(ak.array(["x", "y", "x"])))
>>> out = c.astype("category")
>>> out is c
False

Forcing a copy when casting to the same categorical dtype returns a new array:

>>> out2 = c.astype("category", copy=True)
>>> out2 is c
False
>>> out2.to_ndarray()
array(['x', 'y', 'x'], dtype='<U...')

Casting to object materializes the category labels to a NumPy object array:

>>> c.astype(object)
array(['x', 'y', 'x'], dtype=object)

Casting to a string dtype returns an Arkouda-backed string array of labels:

>>> s = c.astype("string")
>>> s.to_ndarray()
array(['x', 'y', 'x'], dtype='<U1')

Casting to another dtype casts the labels-as-strings and returns an Arkouda-backed array:

>>> c_num = ArkoudaCategorical(ak.Categorical(ak.array(["1", "2", "3"])))
>>> a = c_num.astype("int64")
>>> a.to_ndarray()
array([1, 2, 3])
check_for_ordered(*args, **kwargs)[source]
default_fill_value: str = ''
describe(*args, **kwargs)[source]
property dtype

An instance of ExtensionDtype.

See also

api.extensions.ExtensionDtype

Base class for extension dtypes.

api.extensions.ExtensionArray

Base class for extension array types.

api.extensions.ExtensionArray.dtype

The dtype of an ExtensionArray.

Series.dtype

The dtype of a Series.

DataFrame.dtype

The dtype of a DataFrame.

Examples

>>> pd.array([1, 2, 3]).dtype
Int64Dtype()
classmethod from_codes(*args, **kwargs)[source]
Abstractmethod:

isna() numpy.ndarray[source]

# Return a boolean mask indicating missing values.

# This implements the pandas ExtensionArray.isna contract and returns a # NumPy ndarray[bool] of the same length as this categorical array.

# Returns # ——- # np.ndarray # Boolean mask where True indicates a missing value.

# Raises # —— # TypeError # If the underlying categorical cannot expose its codes or if missing # detection is unsupported. #

isnull()[source]

Alias for isna().

max(*args, **kwargs)[source]
memory_usage(*args, **kwargs)[source]
min(*args, **kwargs)[source]
notna(*args, **kwargs)[source]
notnull(*args, **kwargs)[source]
remove_categories(*args, **kwargs)[source]
remove_unused_categories(*args, **kwargs)[source]
rename_categories(*args, **kwargs)[source]
reorder_categories(*args, **kwargs)[source]
set_categories(*args, **kwargs)[source]
set_ordered(*args, **kwargs)[source]
sort_values(*args, **kwargs)[source]
to_list(*args, **kwargs)[source]
value_counts(dropna: bool = True) pandas.Series[source]

Return counts of categories as a pandas Series.

This method computes category frequencies from the underlying Arkouda Categorical and returns them as a pandas Series, where the index contains the category labels and the values contain the corresponding counts.

Parameters:

dropna (bool) – Whether to drop missing values from the result. When True, the result is filtered using the categorical’s na_value. When False, all categories returned by the underlying computation are included. Default is True.

Returns:

A Series containing category counts. The index is an ArkoudaStringArray of category labels and the values are an ArkoudaArray of counts.

Return type:

pd.Series

Notes

  • The result is computed server-side in Arkouda; only the (typically small) output of categories and counts is materialized for the pandas Series.

  • This method does not yet support pandas options such as normalize, sort, or bins.

  • The handling of missing values depends on the Arkouda Categorical definition of na_value.

Examples

>>> import arkouda as ak
>>> from arkouda.pandas.extension import ArkoudaCategorical
>>>
>>> a = ArkoudaCategorical(["a", "b", "a", "c", "b", "a"])
>>> a.value_counts()
a    3
b    2
c    1
dtype: int64
class arkouda.pandas.ArkoudaCategoricalDtype[source]

Bases: _ArkoudaBaseDtype

Arkouda-backed categorical dtype.

This dtype integrates Arkouda’s distributed Categorical type with the pandas ExtensionArray interface via ArkoudaCategorical. It enables pandas objects (Series, DataFrame) to hold categorical data stored and processed on the Arkouda server, while exposing familiar pandas APIs.

construct_array_type()[source]

Returns the ArkoudaCategorical used as the storage class.

classmethod construct_array_type()[source]

Return the ExtensionArray subclass that handles storage for this dtype.

Returns:

The ArkoudaCategorical class associated with this dtype.

Return type:

type

kind = 'O'

A character code (one of ‘biufcmMOSUV’), default ‘O’

This should match the NumPy dtype used when the array is converted to an ndarray, which is probably ‘O’ for object if the extension type cannot be represented as a built-in NumPy type.

See also

numpy.dtype.kind

na_value = -1

Default NA value to use for this type.

This is used in e.g. ExtensionArray.take. This should be the user-facing “boxed” version of the NA value, not the physical NA value for storage. e.g. for JSONArray, this is an empty dictionary.

name = 'category'

A string identifying the data type.

Will be used for display in, e.g. Series.dtype

type

The scalar type for the array, e.g. int

It’s expected ExtensionArray[item] returns an instance of ExtensionDtype.type for scalar item, assuming that value is valid (not NA). NA values do not need to be instances of type.

class arkouda.pandas.ArkoudaDataFrameAccessor(pandas_obj)[source]

Arkouda DataFrame accessor.

Allows df.ak access to Arkouda-backed operations.

collect() pandas.DataFrame[source]

Materialize an Arkouda-backed pandas DataFrame into a NumPy-backed one.

This operation retrieves each Arkouda-backed column from the server using to_ndarray() and constructs a standard pandas DataFrame whose columns are plain NumPy ndarray objects. The returned DataFrame has no dependency on Arkouda.

Returns:

A pandas DataFrame with NumPy-backed columns.

Return type:

pd_DataFrame

Examples

Converting an Arkouda-backed DataFrame into a NumPy-backed one:

>>> import pandas as pd
>>> import arkouda as ak
>>> from arkouda.pandas.extension import ArkoudaDataFrameAccessor

Create a pandas DataFrame and convert it to Arkouda-backed form:

>>> df = pd.DataFrame({"x": [1, 2, 3], "y": ["a", "b", "c"]})
>>> akdf = df.ak.to_ak()

akdf is still a pandas DataFrame, but its columns live on Arkouda:

>>> type(akdf["x"].array)
<class 'arkouda.pandas.extension._arkouda_array.ArkoudaArray'>

Now fully materialize it to local NumPy arrays:

>>> collected = akdf.ak.collect()
>>> collected
   x  y
0  1  a
1  2  b
2  3  c

The columns are now NumPy arrays:

>>> type(collected["x"].values)
<class 'numpy.ndarray'>
static from_ak_legacy(akdf: arkouda.pandas.dataframe.DataFrame) pandas.DataFrame[source]

Convert a legacy Arkouda DataFrame into a pandas DataFrame backed by Arkouda ExtensionArrays.

This is the zero-copy-ish counterpart to to_ak_legacy(). Instead of materializing columns into NumPy arrays, this function wraps each underlying Arkouda server-side array in the appropriate ArkoudaExtensionArray subclass (ArkoudaArray, ArkoudaStringArray, or ArkoudaCategorical). The resulting pandas DataFrame therefore keeps all data on the Arkouda server, enabling scalable operations without transferring data to the Python client.

Parameters:

akdf (ak_DataFrame) – A legacy Arkouda DataFrame (arkouda.pandas.dataframe.DataFrame) whose columns are Arkouda objects (pdarray, Strings, or Categorical).

Returns:

A pandas DataFrame in which each column is an Arkouda-backed ExtensionArray—typically one of:

No materialization to NumPy occurs. All column data remain server-resident.

Return type:

pd_DataFrame

Notes

  • This function performs a zero-copy conversion for the underlying Arkouda arrays (server-side). Only lightweight Python wrappers are created.

  • The resulting pandas DataFrame can interoperate with most pandas APIs that support extension arrays.

  • Round-tripping through to_ak_legacy() and from_ak_legacy() preserves Arkouda semantics.

Examples

Basic conversion

>>> import arkouda as ak
>>> akdf = ak.DataFrame({"a": ak.arange(5), "b": ak.array([10,11,12,13,14])})
>>> pdf = pd.DataFrame.ak.from_ak_legacy(akdf)
>>> pdf
   a   b
0  0  10
1  1  11
2  2  12
3  3  13
4  4  14

Columns stay Arkouda-backed

>>> type(pdf["a"].array)
<class 'arkouda.pandas.extension._arkouda_array.ArkoudaArray'>
>>> pdf["a"].array._data
array([0 1 2 3 4])

No NumPy materialization occurs

>>> pdf["a"].values    # pandas always materializes .values
ArkoudaArray([0 1 2 3 4])

But the underlying column is still Arkouda: >>> pdf[“a”].array._data array([0 1 2 3 4])

Categorical and Strings columns work as well

>>> akdf2 = ak.DataFrame({
...     "s": ak.array(["a","b","a"]),
...     "c": ak.Categorical(ak.array(["e","f","g"]))
... })
>>> pdf2 = pd.DataFrame.ak.from_ak_legacy(akdf2)
>>> type(pdf2["s"].array)
<class 'arkouda.pandas.extension._arkouda_string_array.ArkoudaStringArray'>
>>> type(pdf2["c"].array)
<class 'arkouda.pandas.extension._arkouda_categorical_array.ArkoudaCategorical'>
merge(right: pandas.DataFrame, on: str | List[str] | None = None, left_on: str | List[str] | None = None, right_on: str | List[str] | None = None, how: str = 'inner', left_suffix: str = '_x', right_suffix: str = '_y', convert_ints: bool = True, sort: bool = True) pandas.DataFrame[source]

Merge two Arkouda-backed pandas DataFrames using Arkouda’s join.

Parameters:
  • right (pd.DataFrame) – Right-hand DataFrame to merge with self._obj. All columns must be Arkouda-backed ExtensionArrays.

  • on (Optional[Union[str, List[str]]]) – Column name(s) to join on. Must be present in both left and right DataFrames. If not provided and neither left_on nor right_on is set, the intersection of column names in left and right is used. Default is None.

  • left_on (Optional[Union[str, List[str]]]) – Column name(s) from the left DataFrame to use as join keys. Must be used together with right_on. If provided, on is ignored for the left side. Default is None

  • right_on (Optional[Union[str, List[str]]]) – Column name(s) from the right DataFrame to use as join keys. Must be used together with left_on. If provided, on is ignored for the right side. Default is None

  • how (str) – Type of merge to be performed. One of 'left', 'right', 'inner', or 'outer'. Default is ‘inner’.

  • left_suffix (str) – Suffix to apply to overlapping column names from the left frame that are not part of the join keys. Default is ‘_x’.

  • right_suffix (str) – Suffix to apply to overlapping column names from the right frame that are not part of the join keys.Default is ‘_y’.

  • convert_ints (bool) – Whether to allow Arkouda to upcast integer columns as needed (for example, to accommodate missing values) during the merge. Default is True.

  • sort (bool) – Whether to sort the join keys in the output. Default is True.

Returns:

A pandas DataFrame whose columns are ArkoudaArray ExtensionArrays. All column data remain on the Arkouda server.

Return type:

pd.DataFrame

Raises:

TypeError – If right is not a pandas.DataFrame or if any column in the left or right DataFrame is not Arkouda-backed.

to_ak() pandas.DataFrame[source]

Convert this pandas DataFrame to an Arkouda-backed pandas DataFrame.

Each column of the original pandas DataFrame is materialized to the Arkouda server via ak.array() and wrapped in an ArkoudaArray ExtensionArray. The result is still a pandas DataFrame, but all column data reside on the Arkouda server and behave according to the Arkouda ExtensionArray API.

This method does not return a legacy ak_DataFrame. For that (server-side DataFrame structure), use to_ak_legacy().

Returns:

A pandas DataFrame whose columns are Arkouda-backed ArkoudaArray objects.

Return type:

pd_DataFrame

Examples

Convert a plain pandas DataFrame to an Arkouda-backed one:

>>> import pandas as pd
>>> import arkouda as ak
>>> df = pd.DataFrame({"x": [1, 2, 3], "y": ["a", "b", "c"]})
>>> akdf = df.ak.to_ak()
>>> type(akdf)
 <class 'pandas...DataFrame'>

The columns are now Arkouda ExtensionArrays:

>>> isinstance(akdf["x"].array, ArkoudaArray)
True
>>> akdf["x"].tolist()
[np.int64(1), np.int64(2), np.int64(3)]

Arkouda operations work directly on the columns:

>>> akdf["x"].array._data + 10
array([11 12 13])

Converting back to a NumPy-backed DataFrame:

>>> akdf_numpy = akdf.ak.collect()
>>> akdf_numpy
   x  y
0  1  a
1  2  b
2  3  c
to_ak_legacy() arkouda.pandas.dataframe.DataFrame[source]

Convert this pandas DataFrame into the legacy arkouda.DataFrame.

This method performs a materializing conversion of a pandas DataFrame into the legacy Arkouda DataFrame structure. Every column is converted to Arkouda server-side data:

  • Python / NumPy numeric and boolean arrays become pdarray.

  • String columns become Arkouda string arrays (Strings).

  • Pandas categoricals become Arkouda Categorical objects.

  • The result is a legacy ak_DataFrame whose columns all reside on the Arkouda server.

This differs from to_ak(), which creates Arkouda-backed ExtensionArrays but retains a pandas.DataFrame structure.

Returns:

The legacy Arkouda DataFrame with all columns materialized onto the Arkouda server.

Return type:

ak_DataFrame

Examples

Convert a plain pandas DataFrame to a legacy Arkouda DataFrame:

>>> import pandas as pd
>>> import arkouda as ak
>>> df = pd.DataFrame({
...     "i": [1, 2, 3],
...     "s": ["a", "b", "c"],
...     "c": pd.Series(["low", "low", "high"], dtype="category"),
... })
>>> akdf = df.ak.to_ak_legacy()
>>> type(akdf)
<class 'arkouda.pandas.dataframe.DataFrame'>

Columns have the appropriate Arkouda types:

>>> from arkouda.numpy.pdarrayclass import pdarray
>>> from arkouda.numpy.strings import Strings
>>> from arkouda.pandas.categorical import Categorical
>>> isinstance(akdf["i"], pdarray)
True
>>> isinstance(akdf["s"], Strings)
True
>>> isinstance(akdf["c"], Categorical)
True

Values round-trip through the conversion:

>>> akdf["i"].tolist()
[1, 2, 3]
class arkouda.pandas.ArkoudaFloat64Dtype[source]

Bases: _ArkoudaBaseDtype

Arkouda-backed 64-bit floating-point dtype.

This dtype integrates Arkouda’s server-backed pdarray<float64> with the pandas ExtensionArray interface via ArkoudaArray. It allows pandas objects (Series, DataFrame) to store and manipulate large distributed float64 arrays without materializing them on the client.

construct_array_type()[source]

Returns the ArkoudaArray class used for storage.

classmethod construct_array_type()[source]

Return the ExtensionArray subclass that handles storage for this dtype.

Returns:

The ArkoudaArray class associated with this dtype.

Return type:

type

kind = 'f'

A character code (one of ‘biufcmMOSUV’), default ‘O’

This should match the NumPy dtype used when the array is converted to an ndarray, which is probably ‘O’ for object if the extension type cannot be represented as a built-in NumPy type.

See also

numpy.dtype.kind

na_value

Default NA value to use for this type.

This is used in e.g. ExtensionArray.take. This should be the user-facing “boxed” version of the NA value, not the physical NA value for storage. e.g. for JSONArray, this is an empty dictionary.

name = 'float64'

A string identifying the data type.

Will be used for display in, e.g. Series.dtype

type

The scalar type for the array, e.g. int

It’s expected ExtensionArray[item] returns an instance of ExtensionDtype.type for scalar item, assuming that value is valid (not NA). NA values do not need to be instances of type.

class arkouda.pandas.ArkoudaIndexAccessor(pandas_obj: pandas.Index | pandas.MultiIndex)[source]

Arkouda-backed index accessor for pandas Index and MultiIndex.

This accessor provides methods for converting between:

  • NumPy-backed pandas indexes

  • pandas indexes backed by ArkoudaExtensionArray (zero-copy EA mode)

  • legacy Arkouda ak.Index and ak.MultiIndex objects

The .ak namespace mirrors the DataFrame accessor, providing a consistent interface for distributed index operations. All conversions avoid unnecessary NumPy materialization unless explicitly requested via collect().

Parameters:

pandas_obj (Union[pd.Index, pd.MultiIndex]) – The pandas Index or MultiIndex instance that this accessor wraps.

Notes

  • to_ak → pandas object, Arkouda-backed (ExtensionArrays).

  • to_ak_legacy → legacy Arkouda index objects.

  • collect → NumPy-backed pandas object.

  • is_arkouda → reports whether the index is Arkouda-backed.

Examples

Basic single-level Index conversion:

>>> import pandas as pd
>>> import arkouda as ak
>>> idx = pd.Index([10, 20, 30], name="vals")

Convert to Arkouda-backed:

>>> ak_idx = idx.ak.to_ak()
>>> ak_idx.ak.is_arkouda
True

Materialize back:

>>> restored = ak_idx.ak.collect()
>>> restored.equals(idx)
True

Convert to legacy Arkouda:

>>> ak_legacy = idx.ak.to_ak_legacy()
>>> type(ak_legacy)
<class 'arkouda.pandas.index.Index'>

MultiIndex conversion:

>>> arrays = [[1, 1, 2], ["red", "blue", "red"]]
>>> midx = pd.MultiIndex.from_arrays(arrays, names=["num", "color"])
>>> ak_midx = midx.ak.to_ak()
>>> ak_midx.ak.is_arkouda
True
collect() pandas.Index | pandas.MultiIndex[source]

Materialize this Index or MultiIndex back to a plain NumPy-backed pandas index.

Returns:

An Index whose underlying data are plain NumPy arrays.

Return type:

Union[pd.Index, pd.MultiIndex]

Raises:

TypeError – If the index is Arkouda-backed but does not expose the expected _data attribute, or if the index type is unsupported.

Examples

Single-level Index round-trip:

>>> import pandas as pd
>>> import arkouda as ak
>>> idx = pd.Index([1, 2, 3], name="x")
>>> ak_idx = idx.ak.to_ak()
>>> np_idx = ak_idx.ak.collect()
>>> np_idx
Index([1, 2, 3], dtype='int64', name='x')
>>> np_idx.equals(idx)
True

Behavior when already NumPy-backed (no-op except shallow copy):

>>> plain = pd.Index([10, 20, 30])
>>> plain2 = plain.ak.collect()
>>> plain2.equals(plain)
True

Verifying that Arkouda-backed values materialize to NumPy:

>>> ak_idx = pd.Index([5, 6, 7]).ak.to_ak()
>>> type(ak_idx.array)
<class 'arkouda.pandas.extension._arkouda_array.ArkoudaArray'>
>>> out = ak_idx.ak.collect()
>>> type(out.array)
<class 'pandas...NumpyExtensionArray'>
concat(other: pandas.Index | pandas.MultiIndex) pandas.Index | pandas.MultiIndex[source]

Concatenate this index with another Arkouda-backed index.

Both self._obj and other must be convertible to legacy Arkouda ak_Index / ak_MultiIndex. The concatenation is performed in Arkouda and the result is wrapped back into an Arkouda-backed pandas Index or MultiIndex.

Parameters:

other (Union[pd.Index, pd.MultiIndex]) – The other index to concatenate with self._obj. It must be a pandas.Index or pandas.MultiIndex.

Returns:

A pandas Index or MultiIndex backed by Arkouda, containing the concatenated values from self._obj and other.

Return type:

Union[pd.Index, pd.MultiIndex]

Raises:

TypeError – If other is not a pandas.Index or pandas.MultiIndex.

static from_ak_legacy(akidx: arkouda.pandas.index.Index | arkouda.pandas.index.MultiIndex) pandas.Index | pandas.MultiIndex[source]

Convert a legacy Arkouda ak.Index or ak.MultiIndex into a pandas Index/MultiIndex backed by Arkouda ExtensionArrays.

This is the index analogue of df.ak.from_ak_legacy_ea(): it performs a zero-copy-style wrapping of Arkouda server-side arrays into ArkoudaExtensionArray objects, producing a pandas Index or MultiIndex whose levels remain distributed on the Arkouda server.

No materialization to NumPy occurs.

Parameters:

akidx (Union[ak_Index, ak_MultiIndex]) – The legacy Arkouda Index or MultiIndex to wrap.

Returns:

A pandas index object whose underlying data are ArkoudaExtensionArray instances referencing the Arkouda server-side arrays.

Return type:

Union[pd.Index, pd.MultiIndex]

Notes

  • ak.Indexpd.Index with Arkouda-backed values.

  • ak.MultiIndexpd.MultiIndex where each level is backed by an ArkoudaExtensionArray.

  • This function does not validate whether the input is already wrapped; callers should ensure the argument is a legacy Arkouda index object.

Examples

>>> import arkouda as ak
>>> import pandas as pd

Wrap a legacy ak.Index into a pandas Index without copying:

>>> ak_idx = ak.Index(ak.arange(5))
>>> pd_idx = pd.Index.ak.from_ak_legacy(ak_idx)
>>> pd_idx
Index([0, 1, 2, 3, 4], dtype='int64')

The resulting index stores its values on the Arkouda server:

>>> type(pd_idx.array)
<class 'arkouda.pandas.extension._arkouda_array.ArkoudaArray'>

MultiIndex example:

>>> ak_lvl1 = ak.array(['a', 'a', 'b', 'b'])
>>> ak_lvl2 = ak.array([1, 2, 1, 2])
>>> ak_mi = ak.MultiIndex([ak_lvl1, ak_lvl2], names=['letter', 'number'])
>>> pd_mi = pd.Index.ak.from_ak_legacy(ak_mi)
>>> pd_mi
MultiIndex([('a', 1),
            ('a', 2),
            ('b', 1),
            ('b', 2)],
           names=['letter', 'number'])

Each level is backed by an Arkouda ExtensionArray and remains distributed:

>>> [type(level._data) for level in pd_mi.levels]
[<class 'arkouda.pandas.extension._arkouda_string_array.ArkoudaStringArray'>,
<class 'arkouda.pandas.extension._arkouda_array.ArkoudaArray'>]

No NumPy materialization occurs; the underlying data stay on the Arkouda server.

property is_arkouda: bool

Return whether the underlying Index is Arkouda-backed.

An Index or MultiIndex is considered Arkouda-backed if its underlying storage uses ArkoudaExtensionArray. This applies to both single-level and multi-level indices.

Returns:

True if the Index/MultiIndex is backed by Arkouda server-side arrays, False otherwise.

Return type:

bool

Examples

NumPy-backed Index:

>>> import pandas as pd
>>> idx = pd.Index([1, 2, 3])
>>> idx.ak.is_arkouda
False

Arkouda-backed single-level Index:

>>> import arkouda as ak
>>> ak_idx = pd.Index([10, 20, 30]).ak.to_ak()
>>> ak_idx.ak.is_arkouda
True

Arkouda-backed MultiIndex:

>>> arrays = [[1, 1, 2], ["a", "b", "a"]]
>>> midx = pd.MultiIndex.from_arrays(arrays)
>>> ak_midx = midx.ak.to_ak()
>>> ak_midx.ak.is_arkouda
True
lookup(key: object) arkouda.numpy.pdarrayclass.pdarray[source]

Perform a server-side lookup on the underlying Arkouda index.

This is a thin convenience wrapper around the legacy arkouda.pandas.index.Index.lookup() / arkouda.pandas.index.MultiIndex.lookup() methods. It converts the pandas index to a legacy Arkouda index, performs the lookup on the server, and returns the resulting boolean mask.

Parameters:

key (object) – Lookup key or keys, interpreted in the same way as the legacy Arkouda Index / MultiIndex lookup method. For a single-level index this may be a scalar or an Arkouda pdarray; for MultiIndex it may be a tuple or sequence of values/arrays.

Returns:

A boolean Arkouda array indicating which positions in the index match the given key.

Return type:

pdarray

to_ak() pandas.Index | pandas.MultiIndex[source]

Convert this pandas Index or MultiIndex to an Arkouda-backed index.

Unlike to_ak_legacy(), which returns a legacy Arkouda Index object, this method returns a pandas Index or MultiIndex whose data reside on the Arkouda server and are wrapped in ArkoudaExtensionArray ExtensionArrays.

The conversion is zero-copy with respect to NumPy: no materialization to local NumPy arrays occurs.

Returns:

An Index whose underlying data live on the Arkouda server.

Return type:

Union[pd.Index, pd.MultiIndex]

Examples

Convert a simple Index to Arkouda-backed form:

>>> import pandas as pd
>>> import arkouda as ak
>>> idx = pd.Index([10, 20, 30], name="values")
>>> ak_idx = idx.ak.to_ak()
>>> type(ak_idx.array)
<class 'arkouda.pandas.extension._arkouda_array.ArkoudaArray'>

Round-trip back to NumPy-backed pandas objects:

>>> restored = ak_idx.ak.collect()
>>> restored.equals(idx)
True
to_ak_legacy() arkouda.pandas.index.Index | arkouda.pandas.index.MultiIndex[source]

Convert this pandas Index or MultiIndex into a legacy Arkouda ak.Index or ak.MultiIndex object.

This is the index analogue of df.ak.to_ak_legacy(), returning the actual Arkouda index objects on the server, rather than a pandas wrapper backed by ArkoudaExtensionArray.

The conversion is zero-copy with respect to NumPy: values are transferred directly into Arkouda arrays without materializing to local NumPy.

Returns:

A legacy Arkouda Index/MultiIndex whose data live on the Arkouda server.

Return type:

Union[ak_Index, ak_MultiIndex]

Examples

Convert a simple pandas Index into a legacy Arkouda Index:

>>> import pandas as pd
>>> import arkouda as ak
>>> idx = pd.Index([10, 20, 30], name="numbers")
>>> ak_idx = idx.ak.to_ak_legacy()
>>> type(ak_idx)
<class 'arkouda.pandas.index.Index'>
>>> ak_idx.name
'numbers'
to_csv(prefix_path: str, dataset: str = 'index') str[source]

Save this index to CSV via the legacy to_csv implementation and return the server response message.

to_dict(labels=None)[source]

Convert this index to a dictionary representation if supported.

For MultiIndex, this delegates to MultiIndex.to_dict and returns a mapping of label -> Index. For single-level Indexes, this will raise a TypeError, since the legacy API only defines to_dict on MultiIndex.

to_hdf(prefix_path: str, dataset: str = 'index', mode: Literal['truncate', 'append'] = 'truncate', file_type: Literal['single', 'distribute'] = 'distribute') str[source]

Save this index to HDF5 via the legacy to_hdf implementation and return the server response message.

to_parquet(prefix_path: str, dataset: str = 'index', mode: Literal['truncate', 'append'] = 'truncate') str[source]

Save this index to Parquet via the legacy to_parquet implementation and return the server response message.

update_hdf(prefix_path: str, dataset: str = 'index', repack: bool = True)[source]

Overwrite or append this index into an existing HDF5 dataset via the legacy update_hdf implementation.

class arkouda.pandas.ArkoudaInt64Dtype[source]

Bases: _ArkoudaBaseDtype

Extension dtype for Arkouda-backed 64-bit integers.

This dtype allows seamless use of Arkouda’s distributed int64 arrays inside pandas objects (Series, Index, DataFrame). It is backed by arkouda.pdarray with dtype='int64' and integrates with pandas via the ArkoudaArray extension array.

construct_array_type()[source]

Return the associated extension array class (ArkoudaArray).

classmethod construct_array_type()[source]

Return the associated pandas ExtensionArray type.

This is part of the pandas ExtensionDtype interface and is used internally by pandas when constructing arrays of this dtype. It ensures that operations like Series(..., dtype=ArkoudaInt64Dtype()) produce the correct Arkouda-backed extension array.

Returns:

The ArkoudaArray class that implements the storage and behavior for this dtype.

Return type:

type

Notes

  • This hook tells pandas which ExtensionArray to instantiate whenever this dtype is requested.

  • All Arkouda dtypes defined in this module will return ArkoudaArray (or a subclass thereof).

Examples

>>> from arkouda.pandas.extension import ArkoudaInt64Dtype
>>> ArkoudaInt64Dtype.construct_array_type()
<class 'arkouda.pandas.extension._arkouda_array.ArkoudaArray'>
kind = 'i'

A character code (one of ‘biufcmMOSUV’), default ‘O’

This should match the NumPy dtype used when the array is converted to an ndarray, which is probably ‘O’ for object if the extension type cannot be represented as a built-in NumPy type.

See also

numpy.dtype.kind

na_value = -1

Default NA value to use for this type.

This is used in e.g. ExtensionArray.take. This should be the user-facing “boxed” version of the NA value, not the physical NA value for storage. e.g. for JSONArray, this is an empty dictionary.

name = 'int64'

A string identifying the data type.

Will be used for display in, e.g. Series.dtype

type

The scalar type for the array, e.g. int

It’s expected ExtensionArray[item] returns an instance of ExtensionDtype.type for scalar item, assuming that value is valid (not NA). NA values do not need to be instances of type.

class arkouda.pandas.ArkoudaStringArray(data: arkouda.numpy.strings.Strings | numpy.ndarray | Sequence[Any] | ArkoudaStringArray)[source]

Bases: arkouda.pandas.extension._arkouda_extension_array.ArkoudaExtensionArray, pandas.api.extensions.ExtensionArray

Arkouda-backed string pandas ExtensionArray.

Ensures the underlying data is an Arkouda Strings object. Accepts existing Strings or converts from NumPy arrays and Python sequences of strings.

Parameters:

data (Strings | ndarray | Sequence[Any] | ArkoudaStringArray) – Input to wrap or convert. - If Strings, used directly. - If NumPy/sequence, converted via ak.array. - If another ArkoudaStringArray, its backing Strings is reused.

Raises:

TypeError – If data cannot be converted to Arkouda Strings.

default_fill_value

Sentinel used when filling missing values (default: “”).

Type:

str

all(*args, **kwargs)[source]
any(*args, **kwargs)[source]
argpartition(*args, **kwargs)[source]
astype(dtype: numpy.dtype[Any], copy: bool = True) numpy.typing.NDArray[Any][source]
astype(dtype: pandas.core.dtypes.dtypes.ExtensionDtype, copy: bool = True) pandas.api.extensions.ExtensionArray
astype(dtype: Any, copy: bool = True) pandas.api.extensions.ExtensionArray | numpy.typing.NDArray[Any]

Cast to a specified dtype.

Casting rules:

  • If dtype requests object, returns a NumPy NDArray[Any] of dtype object containing the string values.

  • If dtype is a string dtype (e.g. pandas StringDtype, NumPy unicode, or Arkouda string dtype), returns an ArkoudaStringArray. If copy=True, attempts to copy the underlying Arkouda Strings data.

  • For all other dtypes, casts the underlying Arkouda Strings using Strings.astype and returns an Arkouda-backed ArkoudaExtensionArray constructed from the result.

Parameters:
  • dtype (Any) – Target dtype. May be a NumPy dtype, pandas dtype, or Arkouda dtype.

  • copy (bool) – Whether to force a copy when the result is an ArkoudaStringArray. Default is True.

Returns:

The cast result. Returns a NumPy array only when casting to object; otherwise returns an Arkouda-backed ExtensionArray.

Return type:

Union[ExtensionArray, NDArray[Any]]

Examples

Casting to a string dtype returns an Arkouda-backed string array:

>>> import arkouda as ak
>>> from arkouda.pandas.extension import ArkoudaStringArray
>>> s = ArkoudaStringArray(ak.array(["a", "b", "c"]))
>>> out = s.astype("string")
>>> out is s
False

Forcing a copy when casting to a string dtype returns a new array:

>>> out2 = s.astype("string", copy=True)
>>> out2 is s
False
>>> out2.to_ndarray()
array(['a', 'b', 'c'], dtype='<U1')

Casting to object materializes the data to a NumPy array:

>>> s.astype(object)
array(['a', 'b', 'c'], dtype=object)

Casting to a non-string dtype uses Arkouda to cast the underlying strings and returns an Arkouda-backed ExtensionArray:

>>> s_num = ArkoudaStringArray(ak.array(["1", "2", "3"]))
>>> a = s_num.astype("int64")
>>> a.to_ndarray()
array([1, 2, 3])

NumPy and pandas dtype objects are also accepted:

>>> import numpy as np
>>> a = s_num.astype(np.dtype("float64"))
>>> a.to_ndarray()
array([1., 2., 3.])
byteswap(*args, **kwargs)[source]
choose(*args, **kwargs)[source]
clip(*args, **kwargs)[source]
compress(*args, **kwargs)[source]
conj(*args, **kwargs)[source]
conjugate(*args, **kwargs)[source]
cumprod(*args, **kwargs)[source]
cumsum(*args, **kwargs)[source]
default_fill_value: str = ''
diagonal(*args, **kwargs)[source]
dot(*args, **kwargs)[source]
property dtype

An instance of ExtensionDtype.

See also

api.extensions.ExtensionDtype

Base class for extension dtypes.

api.extensions.ExtensionArray

Base class for extension array types.

api.extensions.ExtensionArray.dtype

The dtype of an ExtensionArray.

Series.dtype

The dtype of a Series.

DataFrame.dtype

The dtype of a DataFrame.

Examples

>>> pd.array([1, 2, 3]).dtype
Int64Dtype()
dump(*args, **kwargs)[source]
dumps(*args, **kwargs)[source]
fill(*args, **kwargs)[source]
flatten(*args, **kwargs)[source]
getfield(*args, **kwargs)[source]
isna()[source]

A 1-D array indicating if each value is missing.

Returns:

In most cases, this should return a NumPy ndarray. For exceptional cases like SparseArray, where returning an ndarray would be expensive, an ExtensionArray may be returned.

Return type:

numpy.ndarray or pandas.api.extensions.ExtensionArray

See also

ExtensionArray.dropna

Return ExtensionArray without NA values.

ExtensionArray.fillna

Fill NA/NaN values using the specified method.

Notes

If returning an ExtensionArray, then

  • na_values._is_boolean should be True

  • na_values should implement ExtensionArray._reduce()

  • na_values should implement ExtensionArray._accumulate()

  • na_values.any and na_values.all should be implemented

Examples

>>> arr = pd.array([1, 2, np.nan, np.nan])
>>> arr.isna()
array([False, False,  True,  True])
item(*args, **kwargs)[source]

Return the array element at the specified position as a Python scalar.

Parameters:

index (int, optional) – Position of the element. If not provided, the array must contain exactly one element.

Returns:

The element at the specified position.

Return type:

scalar

Raises:
  • ValueError – If no index is provided and the array does not have exactly one element.

  • IndexError – If the specified position is out of bounds.

See also

numpy.ndarray.item

Return the item of an array as a scalar.

Examples

>>> arr = pd.array([1], dtype="Int64")
>>> arr.item()
np.int64(1)
>>> arr = pd.array([1, 2, 3], dtype="Int64")
>>> arr.item(0)
np.int64(1)
>>> arr.item(2)
np.int64(3)
max(*args, **kwargs)[source]
mean(*args, **kwargs)[source]
min(*args, **kwargs)[source]
nonzero(*args, **kwargs)[source]
partition(*args, **kwargs)[source]
prod(*args, **kwargs)[source]
put(*args, **kwargs)[source]
resize(*args, **kwargs)[source]
round(*args, **kwargs)[source]
setfield(*args, **kwargs)[source]
setflags(*args, **kwargs)[source]
sort(*args, **kwargs)[source]
std(*args, **kwargs)[source]
sum(*args, **kwargs)[source]
swapaxes(*args, **kwargs)[source]
to_device(*args, **kwargs)[source]
tobytes(*args, **kwargs)[source]
tofile(*args, **kwargs)[source]
trace(*args, **kwargs)[source]
value_counts(dropna: bool = True) pandas.Series[source]

Return counts of unique strings as a pandas Series.

This method computes the frequency of each distinct string value in the underlying Arkouda Strings object and returns the result as a pandas Series, with the unique string values as the index and their counts as the data.

Parameters:

dropna (bool) – Whether to exclude missing values. Missing-value handling for Arkouda string arrays is not yet implemented, so this parameter is accepted for pandas compatibility but currently has no effect. Default is True.

Returns:

A Series containing the counts of unique string values. The index is an ArkoudaStringArray of unique values, and the values are an ArkoudaArray of counts.

Return type:

pd.Series

Notes

  • The following pandas options are not yet implemented: normalize, sort, and bins.

  • Counting is performed server-side in Arkouda; only the small result (unique values and counts) is materialized on the client.

Examples

Basic usage:

>>> import arkouda as ak
>>> from arkouda.pandas.extension import ArkoudaStringArray
>>>
>>> s = ArkoudaStringArray(["red", "blue", "red", "green", "blue", "red"])
>>> s.value_counts()
red      3
blue     2
green    1
dtype: int64

Empty input:

>>> empty = ArkoudaStringArray([])
>>> empty.value_counts()
Series([], dtype: int64)
var(*args, **kwargs)[source]
class arkouda.pandas.ArkoudaStringDtype[source]

Bases: _ArkoudaBaseDtype

Arkouda-backed string dtype.

This dtype integrates Arkouda’s distributed Strings type with the pandas ExtensionArray interface via ArkoudaStringArray. It enables pandas objects (Series, DataFrame) to hold large, server-backed string columns without converting to NumPy or Python objects.

construct_array_type()[source]

Returns the ArkoudaStringArray used as the storage class.

classmethod construct_array_type()[source]

Return the ExtensionArray subclass that handles storage for this dtype.

Returns:

The ArkoudaStringArray class associated with this dtype.

Return type:

type

kind = 'O'

A character code (one of ‘biufcmMOSUV’), default ‘O’

This should match the NumPy dtype used when the array is converted to an ndarray, which is probably ‘O’ for object if the extension type cannot be represented as a built-in NumPy type.

See also

numpy.dtype.kind

na_value = ''

Default NA value to use for this type.

This is used in e.g. ExtensionArray.take. This should be the user-facing “boxed” version of the NA value, not the physical NA value for storage. e.g. for JSONArray, this is an empty dictionary.

name = 'string'

A string identifying the data type.

Will be used for display in, e.g. Series.dtype

type

The scalar type for the array, e.g. int

It’s expected ExtensionArray[item] returns an instance of ExtensionDtype.type for scalar item, assuming that value is valid (not NA). NA values do not need to be instances of type.

class arkouda.pandas.ArkoudaUint64Dtype[source]

Bases: _ArkoudaBaseDtype

Arkouda-backed unsigned 64-bit integer dtype.

This dtype integrates Arkouda’s uint64 arrays with pandas, allowing users to create pandas.Series or pandas.DataFrame objects that store their data on the Arkouda server while still conforming to the pandas ExtensionArray API.

construct_array_type()[source]

Return the ArkoudaArray class used as the storage container for this dtype.

Examples

>>> import arkouda as ak
>>> import pandas as pd
>>> from arkouda.pandas.extension import ArkoudaUint64Dtype, ArkoudaArray
>>> arr = ArkoudaArray(ak.array([1, 2, 3], dtype="uint64"))
>>> s = pd.Series(arr, dtype=ArkoudaUint64Dtype())
>>> s
0    1
1    2
2    3
dtype: uint64
classmethod construct_array_type()[source]

Return the ExtensionArray class associated with this dtype.

This is required by the pandas ExtensionDtype API. It tells pandas which ExtensionArray subclass should be used to hold data of this dtype inside a pandas.Series or pandas.DataFrame.

Returns:

The ArkoudaArray class, which implements the storage and operations for Arkouda-backed arrays.

Return type:

type

kind = 'u'

A character code (one of ‘biufcmMOSUV’), default ‘O’

This should match the NumPy dtype used when the array is converted to an ndarray, which is probably ‘O’ for object if the extension type cannot be represented as a built-in NumPy type.

See also

numpy.dtype.kind

na_value = -1

Default NA value to use for this type.

This is used in e.g. ExtensionArray.take. This should be the user-facing “boxed” version of the NA value, not the physical NA value for storage. e.g. for JSONArray, this is an empty dictionary.

name = 'uint64'

A string identifying the data type.

Will be used for display in, e.g. Series.dtype

type

The scalar type for the array, e.g. int

It’s expected ExtensionArray[item] returns an instance of ExtensionDtype.type for scalar item, assuming that value is valid (not NA). NA values do not need to be instances of type.

class arkouda.pandas.ArkoudaUint8Dtype[source]

Bases: _ArkoudaBaseDtype

Arkouda-backed unsigned 8-bit integer dtype.

This dtype integrates Arkouda’s uint8 arrays with the pandas ExtensionArray API, allowing pandas Series and DataFrame objects to store and operate on Arkouda-backed unsigned 8-bit integers. The underlying storage is an Arkouda pdarray<uint8>, exposed through the ArkoudaArray extension array.

construct_array_type()[source]

Returns the ArkoudaArray type that provides the storage and behavior for this dtype.

classmethod construct_array_type()[source]

Return the ExtensionArray subclass that handles storage for this dtype.

This method is required by the pandas ExtensionDtype interface. It tells pandas which ExtensionArray class to use when creating arrays of this dtype (for example, when calling Series(..., dtype="arkouda.uint8")).

Returns:

The ArkoudaArray class associated with this dtype.

Return type:

type

kind = 'u'

A character code (one of ‘biufcmMOSUV’), default ‘O’

This should match the NumPy dtype used when the array is converted to an ndarray, which is probably ‘O’ for object if the extension type cannot be represented as a built-in NumPy type.

See also

numpy.dtype.kind

na_value = -1

Default NA value to use for this type.

This is used in e.g. ExtensionArray.take. This should be the user-facing “boxed” version of the NA value, not the physical NA value for storage. e.g. for JSONArray, this is an empty dictionary.

name = 'uint8'

A string identifying the data type.

Will be used for display in, e.g. Series.dtype

type

The scalar type for the array, e.g. int

It’s expected ExtensionArray[item] returns an instance of ExtensionDtype.type for scalar item, assuming that value is valid (not NA). NA values do not need to be instances of type.

class arkouda.pandas.CachedAccessor(name: str, accessor)[source]

Descriptor for caching namespace-based accessors.

This custom property-like object enables lazy initialization of accessors (e.g., .str, .dt) on Series-like objects, similar to pandas-style extension accessors.

Parameters:
  • name (str) – The name of the namespace to be accessed (e.g., df.foo).

  • accessor (type) – A class implementing the accessor logic.

Notes

The accessor class’s __init__ method must accept a single positional argument, which should be one of Series, DataFrame, or Index.

class arkouda.pandas.DatetimeAccessor(series)[source]

Bases: Properties

Accessor for datetime-like operations on Arkouda Series.

Provides datetime methods such as .floor(), .ceil(), and .round(), mirroring the .dt accessor in pandas.

This accessor is automatically attached to Series objects that wrap arkouda.Datetime values. It should not be instantiated directly.

Parameters:

series (arkouda.pandas.Series) – The Series object containing Datetime values.

Raises:

AttributeError – If the underlying Series values are not of type arkouda.Datetime.

Examples

>>> import arkouda as ak
>>> from arkouda import Datetime, Series
>>> s = Series(Datetime(ak.array([1_000_000_000_000])))
>>> s.dt.floor("D")
0   1970-01-01
dtype: datetime64[ns]
series
class arkouda.pandas.Properties[source]

Base class for accessor implementations in Arkouda.

Provides the _make_op class method to dynamically generate accessor methods that wrap underlying Strings or Datetime operations and return new Series.

Notes

This class is subclassed by StringAccessor and DatetimeAccessor, and is not intended to be used directly.

Examples

Subclasses should define _make_op(“operation_name”), which will generate a method that applies series.values.operation_name(…) and returns a new Series.

class arkouda.pandas.Row(dict=None, /, **kwargs)[source]

Bases: collections.UserDict

Dictionary-like representation of a single row in an Arkouda DataFrame.

Wraps the column-to-value mapping for one row and provides convenient ASCII and HTML formatting for display.

Parameters:

data (dict) – Mapping of column names to their corresponding values for this row.

Examples

>>> import arkouda as ak
>>> from arkouda.pandas.row import Row
>>> df = ak.DataFrame({"x": ak.array([10, 20]), "y": ak.array(["a", "b"])})

Suppose df[0] returns {"x": 10, "y": "a}:

>>> row = Row({"x": 10, "y": "a"})
>>> print(row)
keys    values
------  --------
x       10
y       a
class arkouda.pandas.Series(data: Tuple | List | arkouda.pandas.groupbyclass.groupable_element_type | Series | arkouda.numpy.segarray.SegArray | pandas.Series | pandas.Categorical, name=None, index: arkouda.numpy.pdarrayclass.pdarray | arkouda.numpy.strings.Strings | Tuple | List | arkouda.pandas.index.Index | None = None)[source]

One-dimensional Arkouda array with axis labels.

Parameters:
  • index (pdarray or Strings, optional) – An array of indices associated with the data array. If not provided (or empty), it defaults to a range of ints whose size matches the size of the data.

  • data (tuple, list, groupable_element_type, Series, or SegArray) – A 1D array-like. Must not be None.

Raises:
  • TypeError – Raised if index is not a pdarray or Strings object. Raised if data is not a supported type.

  • ValueError – Raised if the index size does not match the data size.

Notes

The Series class accepts either positional arguments or keyword arguments.

Positional arguments
  • Series(data): data is provided and an index is generated automatically.

  • Series(data, index): both data and index are provided.

Keyword arguments
  • Series(data=..., index=...): index is optional but must match the size of data when provided.

add(b: Series) Series[source]
argmax()[source]
argmin()[source]
property at: _LocIndexer

Accesses entries of a Series by label.

Returns:

An indexer for label-based access to Series entries.

Return type:

_LocIndexer

static concat(arrays: List, axis: int = 0, index_labels: List[str] | None = None, value_labels: List[str] | None = None, ordered: bool = False) arkouda.pandas.dataframe.DataFrame | Series[source]

Concatenate a list of Arkouda Series or grouped arrays horizontally or vertically.

If a list of grouped Arkouda arrays is passed, they are converted to Series. Each grouping is a 2-tuple where the first item is the key(s) and the second is the value. If concatenating horizontally (axis=1), all series/groupings must have the same length and the same index. The index is converted to a column in the resulting DataFrame; if it’s a MultiIndex, each level is converted to a separate column.

Parameters:
  • arrays (List) – A list of Series or groupings (tuples of index and values) to concatenate.

  • axis (int) – The axis to concatenate along: - 0 = vertical (stack series into one) - 1 = horizontal (align by index and produce a DataFrame) Defaults to 0.

  • index_labels (List[str] or None, optional) – Column name(s) to label the index when axis=1.

  • value_labels (List[str] or None, optional) – Column names to label the values of each Series.

  • ordered (bool) – Unused parameter. Reserved for future support of deterministic vs. performance-optimized concatenation. Defaults to False.

Returns:

  • If axis=0: a new Series

  • If axis=1: a new DataFrame

Return type:

Series or DataFrame

diff() Series[source]

Diffs consecutive values of the series.

Returns a new series with the same index and length. First value is set to NaN.

dt
property dtype: numpy.dtype
fillna(value: supported_scalars | Series | arkouda.numpy.pdarrayclass.pdarray) Series[source]

Fill NA/NaN values using the specified method.

Parameters:

value (supported_scalars, Series, or pdarray) – Value to use to fill holes (e.g. 0), alternately a Series of values specifying which value to use for each index. Values not in the Series will not be filled. This value cannot be a list.

Returns:

Object with missing values filled.

Return type:

Series

Examples

>>> import arkouda as ak
>>> from arkouda import Series
>>> import numpy as np
>>> data = ak.Series([1, np.nan, 3, np.nan, 5])
>>> data
0    1.0
1    NaN
2    3.0
3    NaN
4    5.0
dtype: float64
>>> fill_values1 = ak.ones(5)
>>> data.fillna(fill_values1)
0    1.0
1    1.0
2    3.0
3    1.0
4    5.0
dtype: float64
>>> fill_values2 = Series(ak.ones(5))
>>> data.fillna(fill_values2)
0    1.0
1    1.0
2    3.0
3    1.0
4    5.0
dtype: float64
>>> fill_values3 = 100.0
>>> data.fillna(fill_values3)
0      1.0
1    100.0
2      3.0
3    100.0
4      5.0
dtype: float64
classmethod from_return_msg(rep_msg: str) Series[source]

Return a Series instance pointing to components created by the arkouda server.

The user should not call this function directly.

Parameters:

rep_msg (builtin_str) –

  • delimited string containing the values and indexes.

Returns:

A Series representing a set of pdarray components on the server.

Return type:

Series

Raises:

RuntimeError – Raised if a server-side error is thrown in the process of creating the Series instance.

has_repeat_labels() bool[source]

Return whether the Series has any labels that appear more than once.

hasnans() arkouda.numpy.dtypes.bool_scalars[source]

Return True if there are any NaNs.

Return type:

bool

Examples

>>> import arkouda as ak
>>> from arkouda import Series
>>> import numpy as np
>>> s = ak.Series(ak.array([1, 2, 3, np.nan]))
>>> s
0    1.0
1    2.0
2    3.0
3    NaN
dtype: float64
>>> s.hasnans()
np.True_
head(n: int = 10) Series[source]

Return the first n values of the series.

property iat: _iLocIndexer

Accesses entries of a Series by position.

Returns:

An indexer for position-based access to a single element.

Return type:

_iLocIndexer

property iloc: _iLocIndexer

Accesses entries of a Series by position.

Returns:

An indexer for position-based access to Series entries.

Return type:

_iLocIndexer

is_registered() bool[source]
Return True iff the object is contained in the registry or is a component of a

registered object.

Returns:

Indicates if the object is contained in the registry

Return type:

bool

Raises:

RegistrationError – Raised if there’s a server-side error or a mis-match of registered components

See also

register, attach, unregister

Notes

Objects registered with the server are immune to deletion until they are unregistered.

isin(lst: arkouda.numpy.pdarrayclass.pdarray | arkouda.numpy.strings.Strings | List) Series[source]

Find Series elements whose values are in the specified list.

Parameters:

lst (pdarray, Strings, or List) – Either a Python list or an Arkouda array to check membership against.

Returns:

A Series of booleans that is True for elements found in the list, and False otherwise.

Return type:

Series

isna() Series[source]

Detect missing values.

Return a boolean same-sized object indicating if the values are NA. NA values, such as numpy.NaN, gets mapped to True values. Everything else gets mapped to False values. Characters such as empty strings ‘’ are not considered NA values.

Returns:

Mask of bool values for each element in Series that indicates whether an element is an NA value.

Return type:

Series

Examples

>>> import arkouda as ak
>>> from arkouda import Series
>>> import numpy as np
>>> s = Series(ak.array([1, 2, np.nan]), index = ak.array([1, 2, 4]))
>>> s.isna()
1    False
2    False
4     True
dtype: bool
isnull() Series[source]

Series.isnull is an alias for Series.isna.

Detect missing values.

Return a boolean same-sized object indicating if the values are NA. NA values, such as numpy.NaN, gets mapped to True values. Everything else gets mapped to False values. Characters such as empty strings ‘’ are not considered NA values.

Returns:

Mask of bool values for each element in Series that indicates whether an element is an NA value.

Return type:

Series

Examples

>>> import arkouda as ak
>>> from arkouda import Series
>>> import numpy as np
>>> s = Series(ak.array([1, 2, np.nan]), index = ak.array([1, 2, 4]))
>>> s.isnull()
1    False
2    False
4     True
dtype: bool
property loc: _LocIndexer

Accesses entries of a Series by label.

Returns:

An indexer for label-based access to Series entries.

Return type:

_LocIndexer

locate(key: int | arkouda.numpy.pdarrayclass.pdarray | arkouda.pandas.index.Index | Series | List | Tuple) Series[source]

Lookup values by index label.

Parameters:

key (int, pdarray, Index, Series, List, or Tuple) –

The key or keys to look up. This can be: - A scalar - A list of scalars - A list of lists (for MultiIndex) - A Series (in which case labels are preserved, and its values are used as keys)

Keys will be converted to Arkouda arrays as needed.

Returns:

A Series containing the values corresponding to the key.

Return type:

Series

map(arg: dict | arkouda.Series) arkouda.Series[source]

Map values of Series according to an input mapping.

Parameters:

arg (dict or Series) – The mapping correspondence.

Returns:

A new series with the same index as the caller. When the input Series has Categorical values, the return Series will have Strings values. Otherwise, the return type will match the input type.

Return type:

Series

Raises:

TypeError – Raised if arg is not of type dict or arkouda.Series. Raised if series values not of type pdarray, Categorical, or Strings.

Examples

>>> import arkouda as ak
>>> s = ak.Series(ak.array([2, 3, 2, 3, 4]))
>>> s
0    2
1    3
2    2
3    3
4    4
dtype: int64
>>> s.map({4: 25.0, 2: 30.0, 1: 7.0, 3: 5.0})
0    30.0
1     5.0
2    30.0
3     5.0
4    25.0
dtype: float64
>>> s2 = ak.Series(ak.array(["a","b","c","d"]), index = ak.array([4,2,1,3]))
>>> s.map(s2)
0    b
1    d
2    b
3    d
4    a
dtype: ...
max()[source]
mean()[source]
memory_usage(index: bool = True, unit: Literal['B', 'KB', 'MB', 'GB'] = 'B') int[source]

Return the memory usage of the Series.

The memory usage can optionally include the contribution of the index.

Parameters:
  • index (bool) – Specifies whether to include the memory usage of the Series index. Defaults to True.

  • unit ({"B", "KB", "MB", "GB"}) – Unit to return. One of {‘B’, ‘KB’, ‘MB’, ‘GB’}. Defaults to “B”.

Returns:

Bytes of memory consumed.

Return type:

int

See also

arkouda.numpy.pdarrayclass.nbytes, arkouda.Index.memory_usage, arkouda.pandas.series.Series.memory_usage, arkouda.pandas.datafame.DataFrame.memory_usage

Examples

>>> import arkouda as ak
>>> from arkouda.pandas.series import Series
>>> s = ak.Series(ak.arange(3))
>>> s.memory_usage()
48

Not including the index gives the size of the rest of the data, which is necessarily smaller:

>>> s.memory_usage(index=False)
24

Select the units:

>>> s = ak.Series(ak.arange(3000))
>>> s.memory_usage(unit="KB")
46.875
min()[source]
property ndim: int
notna() Series[source]

Detect existing (non-missing) values.

Return a boolean same-sized object indicating if the values are not NA. Non-missing values get mapped to True. Characters such as empty strings ‘’ are not considered NA values. NA values, such as numpy.NaN, get mapped to False values.

Returns:

Mask of bool values for each element in Series that indicates whether an element is not an NA value.

Return type:

Series

Examples

>>> import arkouda as ak
>>> from arkouda import Series
>>> import numpy as np
>>> s = Series(ak.array([1, 2, np.nan]), index = ak.array([1, 2, 4]))
>>> s.notna()
1     True
2     True
4    False
dtype: bool
notnull() Series[source]

Series.notnull is an alias for Series.notna.

Detect existing (non-missing) values.

Return a boolean same-sized object indicating if the values are not NA. Non-missing values get mapped to True. Characters such as empty strings ‘’ are not considered NA values. NA values, such as numpy.NaN, get mapped to False values.

Returns:

Mask of bool values for each element in Series that indicates whether an element is not an NA value.

Return type:

Series

Examples

>>> import arkouda as ak
>>> from arkouda import Series
>>> import numpy as np
>>> s = Series(ak.array([1, 2, np.nan]), index = ak.array([1, 2, 4]))
>>> s.notnull()
1     True
2     True
4    False
dtype: bool
objType = 'Series'
static pdconcat(arrays: List, axis: int = 0, labels: arkouda.numpy.strings.Strings | None = None) pandas.Series | pandas.DataFrame[source]

Concatenate a list of Arkouda Series or grouped arrays, returning a local pandas object.

If a list of grouped Arkouda arrays is passed, they are converted to Series. Each grouping is a 2-tuple with the first item being the key(s) and the second the value.

If axis=1 (horizontal), each Series or grouping must have the same length and the same index. The index is converted to a column in the resulting DataFrame. If it is a MultiIndex, each level is converted to a separate column.

Parameters:
  • arrays (List) – A list of Series or groupings (tuples of index and values) to concatenate.

  • axis (int) – The axis along which to concatenate: - 0 = vertical (stack into a Series) - 1 = horizontal (align by index into a DataFrame) Defaults to 0.

  • labels (Strings or None, optional) – Names to assign to the resulting columns in the DataFrame.

Returns:

  • If axis=0: a local pandas Series

  • If axis=1: a local pandas DataFrame

Return type:

Series or DataFrame

prod()[source]
register(user_defined_name: str)[source]

Register this Series object and underlying components with the Arkouda server.

Parameters:

user_defined_name (builtin_str) – User-defined name the Series is to be registered under. This will be the root name for the underlying components.

Returns:

The same Series which is now registered with the arkouda server and has an updated name. This is an in-place modification, the original is returned to support a fluid programming style. Please note you cannot register two different Series with the same name.

Return type:

Series

Raises:
  • TypeError – Raised if user_defined_name is not a str

  • RegistrationError – If the server was unable to register the Series with the user_defined_name

See also

unregister, attach, is_registered

Notes

Objects registered with the server are immune to deletion until they are unregistered.

registered_name: str | None = None
property shape: Tuple[int]
size
sort_index(ascending: bool = True) Series[source]

Sort the Series by its index.

Parameters:

ascending (bool) – Whether to sort the index in ascending (default) or descending order. Defaults to True.

Returns:

A new Series sorted by index.

Return type:

Series

sort_values(ascending: bool = True) Series[source]

Sort the Series by its values.

Parameters:

ascending (bool) – Whether to sort values in ascending (default) or descending order. Defaults to True.

Returns:

A new Series sorted by its values.

Return type:

Series

std()[source]
str
sum()[source]
tail(n: int = 10) Series[source]

Return the last n values of the series.

to_dataframe(index_labels: List[str] | None = None, value_label: str | None = None) arkouda.pandas.dataframe.DataFrame[source]

Convert the Series to an Arkouda DataFrame.

Parameters:
  • index_labels (list of str or None, optional) – Column name(s) to label the index.

  • value_label (str or None, optional) – Column name to label the values.

Returns:

An Arkouda DataFrame representing the Series.

Return type:

DataFrame

to_markdown(mode='wt', index=True, tablefmt='grid', storage_options=None, **kwargs)[source]

Print Series in Markdown-friendly format.

Parameters:
  • mode (str, optional) – Mode in which file is opened, “wt” by default.

  • index (bool, optional, default True) – Add index (row) labels.

  • tablefmt (str = "grid") – Table format to call from tablulate: https://pypi.org/project/tabulate/

  • storage_options (dict, optional) – Extra options that make sense for a particular storage connection, e.g. host, port, username, password, etc., if using a URL that will be parsed by fsspec, e.g., starting “s3://”, “gcs://”. An error will be raised if providing this argument with a non-fsspec URL. See the fsspec and backend storage implementation docs for the set of allowed keys and values.

  • **kwargs – These parameters will be passed to tabulate.

Note

This function should only be called on small Series as it calls pandas.Series.to_markdown: https://pandas.pydata.org/docs/reference/api/pandas.Series.to_markdown.html

Examples

>>> import arkouda as ak
>>> s = ak.Series(["elk", "pig", "dog", "quetzal"], name="animal")
>>> print(s.to_markdown())
+----+----------+
|    | animal   |
+====+==========+
|  0 | elk      |
+----+----------+
|  1 | pig      |
+----+----------+
|  2 | dog      |
+----+----------+
|  3 | quetzal  |
+----+----------+

Output markdown with a tabulate option.

>>> print(s.to_markdown(tablefmt="grid"))
+----+----------+
|    | animal   |
+====+==========+
|  0 | elk      |
+----+----------+
|  1 | pig      |
+----+----------+
|  2 | dog      |
+----+----------+
|  3 | quetzal  |
+----+----------+
to_ndarray() numpy.ndarray[source]
to_pandas() pandas.Series[source]

Convert the series to a local PANDAS series.

tolist() list[source]
topn(n: int = 10) Series[source]

Return the top values of the Series.

Parameters:

n (int) – Number of values to return. Defaults to 10.

Returns:

A new Series containing the top n values.

Return type:

Series

unregister()[source]

Unregister this Series object in the arkouda server which was previously registered using register() and/or attached to using attach().

Raises:

RegistrationError – If the object is already unregistered or if there is a server error when attempting to unregister

See also

register, attach, is_registered

Notes

Objects registered with the server are immune to deletion until they are unregistered.

validate_key(key: Series | arkouda.numpy.pdarrayclass.pdarray | arkouda.numpy.strings.Strings | arkouda.pandas.categorical.Categorical | List | supported_scalars | arkouda.numpy.segarray.SegArray) arkouda.numpy.pdarrayclass.pdarray | arkouda.numpy.strings.Strings | arkouda.pandas.categorical.Categorical | supported_scalars | arkouda.numpy.segarray.SegArray[source]

Validate type requirements for keys when reading or writing the Series.

Also converts list and tuple arguments into pdarrays.

Parameters:

key (Series, pdarray, Strings, Categorical, List, supported_scalars, or SegArray) – The key or container of keys that might be used to index into the Series.

Return type:

The validated key(s), with lists and tuples converted to pdarrays

Raises:
  • TypeError – Raised if keys are not boolean values or the type of the labels Raised if key is not one of the supported types

  • KeyError – Raised if container of keys has keys not present in the Series

  • IndexError – Raised if the length of a boolean key array is different from the Series

validate_val(val: arkouda.numpy.pdarrayclass.pdarray | arkouda.numpy.strings.Strings | supported_scalars | List) arkouda.numpy.pdarrayclass.pdarray | arkouda.numpy.strings.Strings | supported_scalars[source]

Validate type requirements for values being written into the Series.

Also converts list and tuple arguments into pdarrays.

Parameters:

val (pdarray, Strings, supported_scalars, or List) – The value or container of values that might be assigned into the Series.

Return type:

The validated value, with lists converted to pdarrays

Raises:

TypeError

Raised if val is not the same type or a container with elements

of the same time as the Series

Raised if val is a string or Strings type. Raised if val is not one of the supported types

value_counts(sort: bool = True) Series[source]

Return a Series containing counts of unique values.

Parameters:

sort (bool) – Whether to sort the result by count in descending order. If False, the order of the results is not guaranteed. Defaults to True.

Returns:

A Series where the index contains the unique values and the values are their counts in the original Series.

Return type:

Series

var()[source]
class arkouda.pandas.StringAccessor(series)[source]

Bases: Properties

Accessor for string operations on Arkouda Series.

Provides string-like methods such as .contains(), .startswith(), and .endswith() via the .str accessor, similar to pandas.

This accessor is automatically attached to Series objects that wrap arkouda.Strings or arkouda.Categorical values. It should not be instantiated directly.

Parameters:

series (arkouda.pandas.Series) – The Series object containing Strings or Categorical values.

Raises:

AttributeError – If the underlying Series values are not Strings or Categorical.

Examples

>>> import arkouda as ak
>>> from arkouda import Series
>>> s = Series(["apple", "banana", "apricot"])
>>> s.str.startswith("a")
0     True
1    False
2     True
dtype: bool
series
arkouda.pandas.compute_join_size(a: arkouda.numpy.pdarrayclass.pdarray, b: arkouda.numpy.pdarrayclass.pdarray) Tuple[int, int][source]

Compute the internal size of a hypothetical join between a and b. Returns both the number of elements and number of bytes required for the join.

arkouda.pandas.date_operators(cls)[source]

Add common datetime operation methods to a DatetimeAccessor class.

This class decorator dynamically attaches datetime operations (floor, ceil, round) to the given class using the _make_op helper.

Parameters:

cls (type) – The accessor class to decorate.

Returns:

The accessor class with datetime methods added.

Return type:

type

Notes

Used internally to implement the .dt accessor API.

arkouda.pandas.from_series(series: pandas.Series, dtype: type | str | None = None) arkouda.numpy.pdarrayclass.pdarray | arkouda.numpy.strings.Strings[source]

Convert a pandas Series to an Arkouda pdarray or Strings.

If dtype is not provided, the dtype is inferred from the pandas Series (using pandas dtype metadata). If dtype is provided, it is used as an override and normalized via Arkouda’s dtype resolution rules.

In addition to the core numeric and boolean types, this function supports datetime and timedelta Series of any resolution (ns, us, ms, etc.) by converting them to an int64 pdarray of nanoseconds.

Parameters:
  • series (pd.Series) – The pandas Series to convert.

  • dtype (Optional[Union[type, str]], optional) –

    Optional dtype override. This may be a Python type (e.g. bool), a NumPy scalar type (e.g. np.int64), or a dtype string.

    String-like spellings are normalized to Arkouda string dtype, including "object", "str", "string", "string[python]", and "string[pyarrow]".

Returns:

An Arkouda pdarray for numeric, boolean, datetime, or timedelta inputs, or an Arkouda Strings for string inputs.

Return type:

Union[pdarray, Strings]

Raises:

ValueError – Raised if the dtype cannot be interpreted or is unsupported for conversion.

Examples

>>> import arkouda as ak
>>> import numpy as np
>>> import pandas as pd

Integers:

>>> np.random.seed(1701)
>>> ak.from_series(pd.Series(np.random.randint(0, 10, 5)))
array([4 3 3 5 0])
>>> ak.from_series(pd.Series(['1', '2', '3', '4', '5']), dtype=np.int64)
array([1 2 3 4 5])

Floats:

>>> np.random.seed(1701)
>>> ak.from_series(pd.Series(np.random.uniform(low=0.0, high=1.0, size=3)))
array([0.089433234324597599 0.1153776854774361 0.51874393620990389])

Booleans:

>>> np.random.seed(1864)
>>> ak.from_series(pd.Series(np.random.choice([True, False], size=5)))
array([True True True False False])

Strings (pandas dtype spellings normalized to Arkouda Strings):

>>> ak.from_series(pd.Series(['a', 'b', 'c', 'd', 'e'], dtype="string"))
array(['a', 'b', 'c', 'd', 'e'])
>>> ak.from_series(pd.Series(['a', 'b', 'c'], dtype="string[pyarrow]"))
array(['a', 'b', 'c'])

Datetime (any resolution is accepted and returned as int64 nanoseconds):

>>> ak.from_series(pd.Series(pd.to_datetime(['1/1/2018', np.datetime64('2018-01-01')])))
array([1514764800000000000 1514764800000000000])

Notes

Datetime and timedelta Series are converted to int64 nanoseconds.

String-like pandas dtypes (including object) are treated as string and converted to Arkouda Strings.

arkouda.pandas.gen_ranges(starts, ends, stride=1, return_lengths=False)[source]

Generate a segmented array of variable-length, contiguous ranges between pairs of start- and end-points.

Parameters:
  • starts (pdarray, int64) – The start value of each range

  • ends (pdarray, int64) – The end value (exclusive) of each range

  • stride (int) – Difference between successive elements of each range

  • return_lengths (bool, optional) – Whether or not to return the lengths of each segment. Default False.

Returns:

segmentspdarray, int64

The starting index of each range in the resulting array

rangespdarray, int64

The actual ranges, flattened into a single array

lengthspdarray, int64

The lengths of each segment. Only returned if return_lengths=True.

Return type:

pdarray|int64, pdarray|int64, pdarray|int64

arkouda.pandas.join_on_eq_with_dt(a1: arkouda.numpy.pdarrayclass.pdarray, a2: arkouda.numpy.pdarrayclass.pdarray, t1: arkouda.numpy.pdarrayclass.pdarray, t2: arkouda.numpy.pdarrayclass.pdarray, dt: int | numpy.int64, pred: str, result_limit: int | numpy.int64 = 1000) Tuple[arkouda.numpy.pdarrayclass.pdarray, arkouda.numpy.pdarrayclass.pdarray][source]

Inner-join on equality between two integer arrays where the time-window predicate is also true.

Parameters:
  • a1 (pdarray) – Values to join (must be int64 dtype).

  • a2 (pdarray) – Values to join (must be int64 dtype).

  • t1 (pdarray) – timestamps in millis corresponding to the a1 pdarray

  • t2 (pdarray) – timestamps in millis corresponding to the a2 pdarray

  • dt (Union[int,np.int64]) – time delta

  • pred (str) – time window predicate

  • result_limit (Union[int,np.int64]) – size limit for returned result

Returns:

result_array_onepdarray, int64

a1 indices where a1 == a2

result_array_onepdarray, int64

a2 indices where a2 == a1

Return type:

Tuple[pdarray, pdarray]

Raises:
  • TypeError – Raised if a1, a2, t1, or t2 is not a pdarray, or if dt or result_limit is not an int

  • ValueError – if a1, a2, t1, or t2 dtype is not int64, pred is not ‘true_dt’, ‘abs_dt’, or ‘pos_dt’, or result_limit is < 0

arkouda.pandas.string_operators(cls)[source]

Add common string operation methods to a StringAccessor class.

This class decorator dynamically attaches string operations (contains, startswith, endswith) to the given class using the _make_op helper.

Parameters:

cls (type) – The accessor class to decorate.

Returns:

The accessor class with string methods added.

Return type:

type

Notes

Used internally to implement the .str accessor API.