arkouda.pandas¶
Submodules¶
- arkouda.pandas.accessor
- arkouda.pandas.categorical
- arkouda.pandas.conversion
- arkouda.pandas.dataframe
- arkouda.pandas.extension
- arkouda.pandas.groupbyclass
- arkouda.pandas.index
- arkouda.pandas.io
- arkouda.pandas.io_util
- arkouda.pandas.join
- arkouda.pandas.match
- arkouda.pandas.matcher
- arkouda.pandas.row
- arkouda.pandas.series
- arkouda.pandas.typing
Attributes¶
Classes¶
Arkouda-backed numeric/bool pandas ExtensionArray. |
|
Arkouda-backed arbitrary-precision integer dtype. |
|
Arkouda-backed boolean dtype. |
|
Arkouda-backed categorical pandas ExtensionArray. |
|
Arkouda-backed categorical dtype. |
|
Arkouda DataFrame accessor. |
|
Arkouda-backed 64-bit floating-point dtype. |
|
Arkouda-backed index accessor for pandas |
|
Extension dtype for Arkouda-backed 64-bit integers. |
|
Arkouda-backed string pandas ExtensionArray. |
|
Arkouda-backed string dtype. |
|
Arkouda-backed unsigned 64-bit integer dtype. |
|
Arkouda-backed unsigned 8-bit integer dtype. |
|
Descriptor for caching namespace-based accessors. |
|
Accessor for datetime-like operations on Arkouda Series. |
|
Base class for accessor implementations in Arkouda. |
|
Dictionary-like representation of a single row in an Arkouda |
|
One-dimensional Arkouda array with axis labels. |
|
Accessor for string operations on Arkouda Series. |
Functions¶
|
Compute the internal size of a hypothetical join between a and b. Returns |
|
Add common datetime operation methods to a DatetimeAccessor class. |
|
Convert a pandas |
|
Generate a segmented array of variable-length, contiguous ranges between pairs of |
|
Inner-join on equality between two integer arrays where the time-window predicate is also true. |
|
Add common string operation methods to a StringAccessor class. |
Package Contents¶
- class arkouda.pandas.ArkoudaArray(data: arkouda.numpy.pdarrayclass.pdarray | numpy.ndarray | Sequence[Any] | ArkoudaArray, dtype: Any = None, copy: bool = False)[source]¶
Bases:
arkouda.pandas.extension._arkouda_extension_array.ArkoudaExtensionArray,pandas.api.extensions.ExtensionArrayArkouda-backed numeric/bool pandas ExtensionArray.
Wraps or converts supported inputs into an Arkouda
pdarrayto serve as the backing store. Ensures the underlying array is 1-D and lives on the Arkouda server.- Parameters:
data (pdarray | ndarray | Sequence[Any] | ArkoudaArray) –
Input to wrap or convert. - If an Arkouda
pdarray, it is used directly unlessdtypeis givenor
copy=True, in which case a new array is created viaak.array.If a NumPy array, it is transferred to Arkouda via
ak.array.If a Python sequence, it is converted to NumPy then to Arkouda.
If another
ArkoudaArray, its underlyingpdarrayis reused.
dtype (Any, optional) – Desired dtype to cast to (NumPy dtype or Arkouda dtype string). If omitted, dtype is inferred from
data.copy (bool) – If True, attempt to copy the underlying data when converting/wrapping. Default is False.
- Raises:
TypeError – If
datacannot be interpreted as an Arkouda array-like object.ValueError – If the resulting array is not one-dimensional.
- default_fill_value¶
Sentinel used when filling missing values (default: -1).
- Type:
int
Examples
>>> import arkouda as ak >>> from arkouda.pandas.extension import ArkoudaArray >>> ArkoudaArray(ak.arange(5)) ArkoudaArray([0 1 2 3 4]) >>> ArkoudaArray([10, 20, 30]) ArkoudaArray([10 20 30])
- all(axis=0, skipna=True, **kwargs)[source]¶
Return whether all elements are True.
This is mainly to support pandas’ BaseExtensionArray.equals, which calls .all() on the result of a boolean expression.
- any(axis=0, skipna=True, **kwargs)[source]¶
Return whether any element is True.
Added for symmetry with .all() and to support potential pandas boolean-reduction calls.
- astype(dtype: numpy.dtype[Any], copy: bool = True) numpy.typing.NDArray[Any][source]¶
- astype(dtype: pandas.core.dtypes.dtypes.ExtensionDtype, copy: bool = True) pandas.api.extensions.ExtensionArray
- astype(dtype: Any, copy: bool = True) pandas.api.extensions.ExtensionArray | numpy.typing.NDArray[Any]
Cast the array to a specified dtype.
Casting rules:
If
dtyperequestsobject, returns a NumPyNDArray[Any]of dtypeobjectcontaining the array values.Otherwise, the target dtype is normalized using Arkouda’s dtype resolution rules.
If the normalized dtype matches the current dtype and
copy=False, returnsself.In all other cases, casts the underlying Arkouda array to the target dtype and returns an Arkouda-backed
ArkoudaExtensionArray.
- Parameters:
dtype (Any) – Target dtype. May be a NumPy dtype, pandas dtype, Arkouda dtype, or any dtype-like object accepted by Arkouda.
copy (bool) – Whether to force a copy when the target dtype matches the current dtype. Default is True.
- Returns:
The cast result. Returns a NumPy array only when casting to
object; otherwise returns an Arkouda-backed ExtensionArray.- Return type:
Union[ExtensionArray, NDArray[Any]]
Examples
Basic numeric casting returns an Arkouda-backed array:
>>> import arkouda as ak >>> from arkouda.pandas.extension import ArkoudaArray >>> a = ArkoudaArray(ak.array([1, 2, 3], dtype="int64")) >>> a.astype("float64").to_ndarray() array([1., 2., 3.])
Casting to the same dtype with
copy=Falsereturns the original object:>>> b = a.astype("int64", copy=False) >>> b is a True
Forcing a copy when the dtype is unchanged returns a new array:
>>> c = a.astype("int64", copy=True) >>> c is a False >>> c.to_ndarray() array([1, 2, 3])
Casting to
objectmaterializes the data to a NumPy array:>>> a.astype(object) array([1, 2, 3], dtype=object)
NumPy and pandas dtype objects are also accepted:
>>> import numpy as np >>> a.astype(np.dtype("bool")).to_ndarray() array([ True, True, True])
- default_fill_value: int = -1¶
- property dtype¶
An instance of ExtensionDtype.
See also
api.extensions.ExtensionDtypeBase class for extension dtypes.
api.extensions.ExtensionArrayBase class for extension array types.
api.extensions.ExtensionArray.dtypeThe dtype of an ExtensionArray.
Series.dtypeThe dtype of a Series.
DataFrame.dtypeThe dtype of a DataFrame.
Examples
>>> pd.array([1, 2, 3]).dtype Int64Dtype()
- equals(other)[source]¶
Return if another array is equivalent to this array.
Equivalent means that both arrays have the same shape and dtype, and all values compare equal. Missing values in the same location are considered equal (in contrast with normal equality).
- Parameters:
other (ExtensionArray) – Array to compare to this Array.
- Returns:
Whether the arrays are equivalent.
- Return type:
boolean
See also
numpy.array_equalEquivalent method for numpy array.
Series.equalsEquivalent method for Series.
DataFrame.equalsEquivalent method for DataFrame.
Examples
>>> arr1 = pd.array([1, 2, np.nan]) >>> arr2 = pd.array([1, 2, np.nan]) >>> arr1.equals(arr2) True
>>> arr1 = pd.array([1, 3, np.nan]) >>> arr2 = pd.array([1, 2, np.nan]) >>> arr1.equals(arr2) False
- isna() numpy.ndarray[source]¶
Return a boolean mask indicating missing values.
This method implements the pandas ExtensionArray.isna contract and always returns a NumPy ndarray of dtype
boolwith the same length as the array.- Returns:
A boolean mask where
Truemarks elements considered missing.- Return type:
np.ndarray
- Raises:
TypeError – If the underlying data buffer does not support missing-value detection or cannot produce a boolean mask.
- property nbytes¶
The number of bytes needed to store this object in memory.
See also
ExtensionArray.shapeReturn a tuple of the array dimensions.
ExtensionArray.sizeThe number of elements in the array.
Examples
>>> pd.array([1, 2, 3]).nbytes 27
- value_counts(dropna: bool = True) pandas.Series[source]¶
Return counts of unique values as a pandas Series.
This method computes the frequency of each distinct value in the underlying Arkouda array and returns the result as a pandas
Series, with the unique values as the index and their counts as the data.- Parameters:
dropna (bool) – Whether to exclude missing values. Currently, missing-value handling is supported only for floating-point data, where
NaNvalues are treated as missing. Default is True.- Returns:
A Series containing the counts of unique values. The index is an
ArkoudaArrayof unique values, and the values are anArkoudaArrayof counts.- Return type:
pd.Series
Notes
Only
dropna=Trueis supported.The following pandas options are not yet implemented:
normalize,sort, andbins.Counting is performed server-side in Arkouda; only the small result (unique values and counts) is materialized on the client.
Examples
>>> import arkouda as ak >>> from arkouda.pandas.extension import ArkoudaArray >>> >>> a = ArkoudaArray(ak.array([1, 2, 1, 3, 2, 1])) >>> a.value_counts() 1 3 2 2 3 1 dtype: int64
Floating-point data with NaN values:
>>> b = ArkoudaArray(ak.array([1.0, 2.0, float("nan"), 1.0])) >>> b.value_counts() 1.0 2 2.0 1 dtype: int64
- arkouda.pandas.ArkoudaArrayLike¶
- class arkouda.pandas.ArkoudaBigintDtype[source]¶
Bases:
_ArkoudaBaseDtypeArkouda-backed arbitrary-precision integer dtype.
This dtype integrates Arkouda’s server-backed
pdarray<bigint>with the pandas ExtensionArray interface viaArkoudaArray. It enables pandas objects (Series, DataFrame) to hold and operate on very large integers that exceed 64-bit precision, while keeping the data distributed on the Arkouda server.- construct_array_type()[source]¶
Returns the
ArkoudaArrayclass used for storage.
- classmethod construct_array_type()[source]¶
Return the ExtensionArray subclass that handles storage for this dtype.
- Returns:
The
ArkoudaArrayclass associated with this dtype.- Return type:
- kind = 'O'¶
A character code (one of ‘biufcmMOSUV’), default ‘O’
This should match the NumPy dtype used when the array is converted to an ndarray, which is probably ‘O’ for object if the extension type cannot be represented as a built-in NumPy type.
See also
numpy.dtype.kind
- na_value = -1¶
Default NA value to use for this type.
This is used in e.g. ExtensionArray.take. This should be the user-facing “boxed” version of the NA value, not the physical NA value for storage. e.g. for JSONArray, this is an empty dictionary.
- name = 'bigint'¶
A string identifying the data type.
Will be used for display in, e.g.
Series.dtype
- type¶
The scalar type for the array, e.g.
intIt’s expected
ExtensionArray[item]returns an instance ofExtensionDtype.typefor scalaritem, assuming that value is valid (not NA). NA values do not need to be instances of type.
- class arkouda.pandas.ArkoudaBoolDtype[source]¶
Bases:
_ArkoudaBaseDtypeArkouda-backed boolean dtype.
This dtype integrates Arkouda’s server-backed pdarray<bool> with the pandas ExtensionArray interface via
ArkoudaArray. It allows pandas objects (Series, DataFrame) to store and manipulate distributed boolean arrays without materializing them on the client.- construct_array_type()[source]¶
Returns the
ArkoudaArrayclass used for storage.
- classmethod construct_array_type()[source]¶
Return the ExtensionArray subclass that handles storage for this dtype.
- Returns:
The
ArkoudaArrayclass associated with this dtype.- Return type:
- kind = 'b'¶
A character code (one of ‘biufcmMOSUV’), default ‘O’
This should match the NumPy dtype used when the array is converted to an ndarray, which is probably ‘O’ for object if the extension type cannot be represented as a built-in NumPy type.
See also
numpy.dtype.kind
- na_value = False¶
Default NA value to use for this type.
This is used in e.g. ExtensionArray.take. This should be the user-facing “boxed” version of the NA value, not the physical NA value for storage. e.g. for JSONArray, this is an empty dictionary.
- name = 'bool_'¶
A string identifying the data type.
Will be used for display in, e.g.
Series.dtype
- type¶
The scalar type for the array, e.g.
intIt’s expected
ExtensionArray[item]returns an instance ofExtensionDtype.typefor scalaritem, assuming that value is valid (not NA). NA values do not need to be instances of type.
- class arkouda.pandas.ArkoudaCategorical(data: arkouda.pandas.categorical.Categorical | ArkoudaCategorical | numpy.ndarray | Sequence[Any])[source]¶
Bases:
arkouda.pandas.extension._arkouda_extension_array.ArkoudaExtensionArray,pandas.api.extensions.ExtensionArrayArkouda-backed categorical pandas ExtensionArray.
Ensures the underlying data is an Arkouda
Categorical. Accepts an existingCategoricalor converts from Python/NumPy sequences of labels.- Parameters:
data (Categorical | ArkoudaCategorical | ndarray | Sequence[Any]) – Input to wrap or convert. - If
Categorical, used directly. - If anotherArkoudaCategorical, its backing object is reused. - If list/tuple/ndarray, converted viaak.Categorical(ak.array(data)).- Raises:
TypeError – If
datacannot be converted to ArkoudaCategorical.
- astype(dtype: numpy.dtype[Any], copy: bool = True) numpy.typing.NDArray[Any][source]¶
- astype(dtype: pandas.core.dtypes.dtypes.ExtensionDtype, copy: bool = True) pandas.api.extensions.ExtensionArray
- astype(dtype: Any, copy: bool = True) pandas.api.extensions.ExtensionArray | numpy.typing.NDArray[Any]
Cast to a specified dtype.
If
dtypeis categorical (pandascategory/CategoricalDtype/ArkoudaCategoricalDtype), returns an Arkouda-backedArkoudaCategorical(optionally copied).If
dtyperequestsobject, returns a NumPyndarrayof dtype object containing the category labels (materialized to the client).If
dtyperequests a string dtype, returns an Arkouda-backedArkoudaStringArraycontaining the labels as strings.Otherwise, casts the labels (as strings) to the requested dtype and returns an Arkouda-backed ExtensionArray.
- Parameters:
dtype (Any) – Target dtype.
copy (bool) – Whether to force a copy when possible. If categorical-to-categorical and
copy=True, attempts to copy the underlying ArkoudaCategorical(if supported). Default is True.
- Returns:
The cast result. Returns a NumPy array only when casting to
object; otherwise returns an Arkouda-backed ExtensionArray.- Return type:
Union[ExtensionArray, NDArray[Any]]
Examples
Casting to
categoryreturns an Arkouda-backed categorical array:>>> import arkouda as ak >>> from arkouda.pandas.extension import ArkoudaCategorical >>> c = ArkoudaCategorical(ak.Categorical(ak.array(["x", "y", "x"]))) >>> out = c.astype("category") >>> out is c False
Forcing a copy when casting to the same categorical dtype returns a new array:
>>> out2 = c.astype("category", copy=True) >>> out2 is c False >>> out2.to_ndarray() array(['x', 'y', 'x'], dtype='<U...')
Casting to
objectmaterializes the category labels to a NumPy object array:>>> c.astype(object) array(['x', 'y', 'x'], dtype=object)
Casting to a string dtype returns an Arkouda-backed string array of labels:
>>> s = c.astype("string") >>> s.to_ndarray() array(['x', 'y', 'x'], dtype='<U1')
Casting to another dtype casts the labels-as-strings and returns an Arkouda-backed array:
>>> c_num = ArkoudaCategorical(ak.Categorical(ak.array(["1", "2", "3"]))) >>> a = c_num.astype("int64") >>> a.to_ndarray() array([1, 2, 3])
- property dtype¶
An instance of ExtensionDtype.
See also
api.extensions.ExtensionDtypeBase class for extension dtypes.
api.extensions.ExtensionArrayBase class for extension array types.
api.extensions.ExtensionArray.dtypeThe dtype of an ExtensionArray.
Series.dtypeThe dtype of a Series.
DataFrame.dtypeThe dtype of a DataFrame.
Examples
>>> pd.array([1, 2, 3]).dtype Int64Dtype()
- isna() numpy.ndarray[source]¶
# Return a boolean mask indicating missing values.
# This implements the pandas ExtensionArray.isna contract and returns a # NumPy ndarray[bool] of the same length as this categorical array.
# Returns # ——- # np.ndarray # Boolean mask where True indicates a missing value.
# Raises # —— # TypeError # If the underlying categorical cannot expose its codes or if missing # detection is unsupported. #
- value_counts(dropna: bool = True) pandas.Series[source]¶
Return counts of categories as a pandas Series.
This method computes category frequencies from the underlying Arkouda
Categoricaland returns them as a pandasSeries, where the index contains the category labels and the values contain the corresponding counts.- Parameters:
dropna (bool) – Whether to drop missing values from the result. When
True, the result is filtered using the categorical’sna_value. WhenFalse, all categories returned by the underlying computation are included. Default is True.- Returns:
A Series containing category counts. The index is an
ArkoudaStringArrayof category labels and the values are anArkoudaArrayof counts.- Return type:
pd.Series
Notes
The result is computed server-side in Arkouda; only the (typically small) output of categories and counts is materialized for the pandas
Series.This method does not yet support pandas options such as
normalize,sort, orbins.The handling of missing values depends on the Arkouda
Categoricaldefinition ofna_value.
Examples
>>> import arkouda as ak >>> from arkouda.pandas.extension import ArkoudaCategorical >>> >>> a = ArkoudaCategorical(["a", "b", "a", "c", "b", "a"]) >>> a.value_counts() a 3 b 2 c 1 dtype: int64
- class arkouda.pandas.ArkoudaCategoricalDtype[source]¶
Bases:
_ArkoudaBaseDtypeArkouda-backed categorical dtype.
This dtype integrates Arkouda’s distributed
Categoricaltype with the pandas ExtensionArray interface viaArkoudaCategorical. It enables pandas objects (Series, DataFrame) to hold categorical data stored and processed on the Arkouda server, while exposing familiar pandas APIs.- construct_array_type()[source]¶
Returns the
ArkoudaCategoricalused as the storage class.
- classmethod construct_array_type()[source]¶
Return the ExtensionArray subclass that handles storage for this dtype.
- Returns:
The
ArkoudaCategoricalclass associated with this dtype.- Return type:
- kind = 'O'¶
A character code (one of ‘biufcmMOSUV’), default ‘O’
This should match the NumPy dtype used when the array is converted to an ndarray, which is probably ‘O’ for object if the extension type cannot be represented as a built-in NumPy type.
See also
numpy.dtype.kind
- na_value = -1¶
Default NA value to use for this type.
This is used in e.g. ExtensionArray.take. This should be the user-facing “boxed” version of the NA value, not the physical NA value for storage. e.g. for JSONArray, this is an empty dictionary.
- name = 'category'¶
A string identifying the data type.
Will be used for display in, e.g.
Series.dtype
- type¶
The scalar type for the array, e.g.
intIt’s expected
ExtensionArray[item]returns an instance ofExtensionDtype.typefor scalaritem, assuming that value is valid (not NA). NA values do not need to be instances of type.
- class arkouda.pandas.ArkoudaDataFrameAccessor(pandas_obj)[source]¶
Arkouda DataFrame accessor.
Allows
df.akaccess to Arkouda-backed operations.- collect() pandas.DataFrame[source]¶
Materialize an Arkouda-backed pandas DataFrame into a NumPy-backed one.
This operation retrieves each Arkouda-backed column from the server using
to_ndarray()and constructs a standard pandas DataFrame whose columns are plain NumPyndarrayobjects. The returned DataFrame has no dependency on Arkouda.- Returns:
A pandas DataFrame with NumPy-backed columns.
- Return type:
pd_DataFrame
Examples
Converting an Arkouda-backed DataFrame into a NumPy-backed one:
>>> import pandas as pd >>> import arkouda as ak >>> from arkouda.pandas.extension import ArkoudaDataFrameAccessor
Create a pandas DataFrame and convert it to Arkouda-backed form:
>>> df = pd.DataFrame({"x": [1, 2, 3], "y": ["a", "b", "c"]}) >>> akdf = df.ak.to_ak()
akdfis still a pandas DataFrame, but its columns live on Arkouda:>>> type(akdf["x"].array) <class 'arkouda.pandas.extension._arkouda_array.ArkoudaArray'>
Now fully materialize it to local NumPy arrays:
>>> collected = akdf.ak.collect() >>> collected x y 0 1 a 1 2 b 2 3 c
The columns are now NumPy arrays:
>>> type(collected["x"].values) <class 'numpy.ndarray'>
- static from_ak_legacy(akdf: arkouda.pandas.dataframe.DataFrame) pandas.DataFrame[source]¶
Convert a legacy Arkouda
DataFrameinto a pandasDataFramebacked by Arkouda ExtensionArrays.This is the zero-copy-ish counterpart to
to_ak_legacy(). Instead of materializing columns into NumPy arrays, this function wraps each underlying Arkouda server-side array in the appropriateArkoudaExtensionArraysubclass (ArkoudaArray,ArkoudaStringArray, orArkoudaCategorical). The resulting pandasDataFrametherefore keeps all data on the Arkouda server, enabling scalable operations without transferring data to the Python client.- Parameters:
akdf (ak_DataFrame) – A legacy Arkouda
DataFrame(arkouda.pandas.dataframe.DataFrame) whose columns are Arkouda objects (pdarray,Strings, orCategorical).- Returns:
A pandas
DataFramein which each column is an Arkouda-backed ExtensionArray—typically one of:No materialization to NumPy occurs. All column data remain server-resident.
- Return type:
pd_DataFrame
Notes
This function performs a zero-copy conversion for the underlying Arkouda arrays (server-side). Only lightweight Python wrappers are created.
The resulting pandas
DataFramecan interoperate with most pandas APIs that support extension arrays.Round-tripping through
to_ak_legacy()andfrom_ak_legacy()preserves Arkouda semantics.
Examples
Basic conversion¶
>>> import arkouda as ak >>> akdf = ak.DataFrame({"a": ak.arange(5), "b": ak.array([10,11,12,13,14])})
>>> pdf = pd.DataFrame.ak.from_ak_legacy(akdf) >>> pdf a b 0 0 10 1 1 11 2 2 12 3 3 13 4 4 14
Columns stay Arkouda-backed¶
>>> type(pdf["a"].array) <class 'arkouda.pandas.extension._arkouda_array.ArkoudaArray'>
>>> pdf["a"].array._data array([0 1 2 3 4])
No NumPy materialization occurs¶
>>> pdf["a"].values # pandas always materializes .values ArkoudaArray([0 1 2 3 4])
But the underlying column is still Arkouda: >>> pdf[“a”].array._data array([0 1 2 3 4])
Categorical and Strings columns work as well¶
>>> akdf2 = ak.DataFrame({ ... "s": ak.array(["a","b","a"]), ... "c": ak.Categorical(ak.array(["e","f","g"])) ... }) >>> pdf2 = pd.DataFrame.ak.from_ak_legacy(akdf2)
>>> type(pdf2["s"].array) <class 'arkouda.pandas.extension._arkouda_string_array.ArkoudaStringArray'>
>>> type(pdf2["c"].array) <class 'arkouda.pandas.extension._arkouda_categorical_array.ArkoudaCategorical'>
- merge(right: pandas.DataFrame, on: str | List[str] | None = None, left_on: str | List[str] | None = None, right_on: str | List[str] | None = None, how: str = 'inner', left_suffix: str = '_x', right_suffix: str = '_y', convert_ints: bool = True, sort: bool = True) pandas.DataFrame[source]¶
Merge two Arkouda-backed pandas DataFrames using Arkouda’s join.
- Parameters:
right (pd.DataFrame) – Right-hand DataFrame to merge with
self._obj. All columns must be Arkouda-backed ExtensionArrays.on (Optional[Union[str, List[str]]]) – Column name(s) to join on. Must be present in both left and right DataFrames. If not provided and neither
left_onnorright_onis set, the intersection of column names in left and right is used. Default is None.left_on (Optional[Union[str, List[str]]]) – Column name(s) from the left DataFrame to use as join keys. Must be used together with
right_on. If provided,onis ignored for the left side. Default is Noneright_on (Optional[Union[str, List[str]]]) – Column name(s) from the right DataFrame to use as join keys. Must be used together with
left_on. If provided,onis ignored for the right side. Default is Nonehow (str) – Type of merge to be performed. One of
'left','right','inner', or'outer'. Default is ‘inner’.left_suffix (str) – Suffix to apply to overlapping column names from the left frame that are not part of the join keys. Default is ‘_x’.
right_suffix (str) – Suffix to apply to overlapping column names from the right frame that are not part of the join keys.Default is ‘_y’.
convert_ints (bool) – Whether to allow Arkouda to upcast integer columns as needed (for example, to accommodate missing values) during the merge. Default is True.
sort (bool) – Whether to sort the join keys in the output. Default is True.
- Returns:
A pandas DataFrame whose columns are
ArkoudaArrayExtensionArrays. All column data remain on the Arkouda server.- Return type:
pd.DataFrame
- Raises:
TypeError – If
rightis not apandas.DataFrameor if any column in the left or right DataFrame is not Arkouda-backed.
- to_ak() pandas.DataFrame[source]¶
Convert this pandas DataFrame to an Arkouda-backed pandas DataFrame.
Each column of the original pandas DataFrame is materialized to the Arkouda server via
ak.array()and wrapped in anArkoudaArrayExtensionArray. The result is still a pandas DataFrame, but all column data reside on the Arkouda server and behave according to the Arkouda ExtensionArray API.This method does not return a legacy
ak_DataFrame. For that (server-side DataFrame structure), useto_ak_legacy().- Returns:
A pandas DataFrame whose columns are Arkouda-backed
ArkoudaArrayobjects.- Return type:
pd_DataFrame
Examples
Convert a plain pandas DataFrame to an Arkouda-backed one:
>>> import pandas as pd >>> import arkouda as ak >>> df = pd.DataFrame({"x": [1, 2, 3], "y": ["a", "b", "c"]}) >>> akdf = df.ak.to_ak() >>> type(akdf) <class 'pandas...DataFrame'>
The columns are now Arkouda ExtensionArrays:
>>> isinstance(akdf["x"].array, ArkoudaArray) True >>> akdf["x"].tolist() [np.int64(1), np.int64(2), np.int64(3)]
Arkouda operations work directly on the columns:
>>> akdf["x"].array._data + 10 array([11 12 13])
Converting back to a NumPy-backed DataFrame:
>>> akdf_numpy = akdf.ak.collect() >>> akdf_numpy x y 0 1 a 1 2 b 2 3 c
- to_ak_legacy() arkouda.pandas.dataframe.DataFrame[source]¶
Convert this pandas DataFrame into the legacy
arkouda.DataFrame.This method performs a materializing conversion of a pandas DataFrame into the legacy Arkouda DataFrame structure. Every column is converted to Arkouda server-side data:
Python / NumPy numeric and boolean arrays become
pdarray.String columns become Arkouda string arrays (
Strings).Pandas categoricals become Arkouda
Categoricalobjects.The result is a legacy
ak_DataFramewhose columns all reside on the Arkouda server.
This differs from
to_ak(), which creates Arkouda-backed ExtensionArrays but retains a pandas.DataFrame structure.- Returns:
The legacy Arkouda DataFrame with all columns materialized onto the Arkouda server.
- Return type:
ak_DataFrame
Examples
Convert a plain pandas DataFrame to a legacy Arkouda DataFrame:
>>> import pandas as pd >>> import arkouda as ak >>> df = pd.DataFrame({ ... "i": [1, 2, 3], ... "s": ["a", "b", "c"], ... "c": pd.Series(["low", "low", "high"], dtype="category"), ... }) >>> akdf = df.ak.to_ak_legacy() >>> type(akdf) <class 'arkouda.pandas.dataframe.DataFrame'>
Columns have the appropriate Arkouda types:
>>> from arkouda.numpy.pdarrayclass import pdarray >>> from arkouda.numpy.strings import Strings >>> from arkouda.pandas.categorical import Categorical >>> isinstance(akdf["i"], pdarray) True >>> isinstance(akdf["s"], Strings) True >>> isinstance(akdf["c"], Categorical) True
Values round-trip through the conversion:
>>> akdf["i"].tolist() [1, 2, 3]
- class arkouda.pandas.ArkoudaFloat64Dtype[source]¶
Bases:
_ArkoudaBaseDtypeArkouda-backed 64-bit floating-point dtype.
This dtype integrates Arkouda’s server-backed pdarray<float64> with the pandas ExtensionArray interface via
ArkoudaArray. It allows pandas objects (Series, DataFrame) to store and manipulate large distributed float64 arrays without materializing them on the client.- construct_array_type()[source]¶
Returns the
ArkoudaArrayclass used for storage.
- classmethod construct_array_type()[source]¶
Return the ExtensionArray subclass that handles storage for this dtype.
- Returns:
The
ArkoudaArrayclass associated with this dtype.- Return type:
- kind = 'f'¶
A character code (one of ‘biufcmMOSUV’), default ‘O’
This should match the NumPy dtype used when the array is converted to an ndarray, which is probably ‘O’ for object if the extension type cannot be represented as a built-in NumPy type.
See also
numpy.dtype.kind
- na_value¶
Default NA value to use for this type.
This is used in e.g. ExtensionArray.take. This should be the user-facing “boxed” version of the NA value, not the physical NA value for storage. e.g. for JSONArray, this is an empty dictionary.
- name = 'float64'¶
A string identifying the data type.
Will be used for display in, e.g.
Series.dtype
- type¶
The scalar type for the array, e.g.
intIt’s expected
ExtensionArray[item]returns an instance ofExtensionDtype.typefor scalaritem, assuming that value is valid (not NA). NA values do not need to be instances of type.
- class arkouda.pandas.ArkoudaIndexAccessor(pandas_obj: pandas.Index | pandas.MultiIndex)[source]¶
Arkouda-backed index accessor for pandas
IndexandMultiIndex.This accessor provides methods for converting between:
NumPy-backed pandas indexes
pandas indexes backed by
ArkoudaExtensionArray(zero-copy EA mode)legacy Arkouda
ak.Indexandak.MultiIndexobjects
The
.aknamespace mirrors the DataFrame accessor, providing a consistent interface for distributed index operations. All conversions avoid unnecessary NumPy materialization unless explicitly requested viacollect().- Parameters:
pandas_obj (Union[pd.Index, pd.MultiIndex]) – The pandas
IndexorMultiIndexinstance that this accessor wraps.
Notes
to_ak→ pandas object, Arkouda-backed (ExtensionArrays).to_ak_legacy→ legacy Arkouda index objects.collect→ NumPy-backed pandas object.is_arkouda→ reports whether the index is Arkouda-backed.
Examples
Basic single-level Index conversion:
>>> import pandas as pd >>> import arkouda as ak >>> idx = pd.Index([10, 20, 30], name="vals")
Convert to Arkouda-backed:
>>> ak_idx = idx.ak.to_ak() >>> ak_idx.ak.is_arkouda True
Materialize back:
>>> restored = ak_idx.ak.collect() >>> restored.equals(idx) True
Convert to legacy Arkouda:
>>> ak_legacy = idx.ak.to_ak_legacy() >>> type(ak_legacy) <class 'arkouda.pandas.index.Index'>
MultiIndex conversion:
>>> arrays = [[1, 1, 2], ["red", "blue", "red"]] >>> midx = pd.MultiIndex.from_arrays(arrays, names=["num", "color"]) >>> ak_midx = midx.ak.to_ak() >>> ak_midx.ak.is_arkouda True
- collect() pandas.Index | pandas.MultiIndex[source]¶
Materialize this Index or MultiIndex back to a plain NumPy-backed pandas index.
- Returns:
An Index whose underlying data are plain NumPy arrays.
- Return type:
Union[pd.Index, pd.MultiIndex]
- Raises:
TypeError – If the index is Arkouda-backed but does not expose the expected
_dataattribute, or if the index type is unsupported.
Examples
Single-level Index round-trip:
>>> import pandas as pd >>> import arkouda as ak >>> idx = pd.Index([1, 2, 3], name="x") >>> ak_idx = idx.ak.to_ak() >>> np_idx = ak_idx.ak.collect() >>> np_idx Index([1, 2, 3], dtype='int64', name='x') >>> np_idx.equals(idx) True
Behavior when already NumPy-backed (no-op except shallow copy):
>>> plain = pd.Index([10, 20, 30]) >>> plain2 = plain.ak.collect() >>> plain2.equals(plain) True
Verifying that Arkouda-backed values materialize to NumPy:
>>> ak_idx = pd.Index([5, 6, 7]).ak.to_ak() >>> type(ak_idx.array) <class 'arkouda.pandas.extension._arkouda_array.ArkoudaArray'> >>> out = ak_idx.ak.collect() >>> type(out.array) <class 'pandas...NumpyExtensionArray'>
- concat(other: pandas.Index | pandas.MultiIndex) pandas.Index | pandas.MultiIndex[source]¶
Concatenate this index with another Arkouda-backed index.
Both
self._objandothermust be convertible to legacy Arkoudaak_Index/ak_MultiIndex. The concatenation is performed in Arkouda and the result is wrapped back into an Arkouda-backed pandas Index or MultiIndex.- Parameters:
other (Union[pd.Index, pd.MultiIndex]) – The other index to concatenate with
self._obj. It must be apandas.Indexorpandas.MultiIndex.- Returns:
A pandas Index or MultiIndex backed by Arkouda, containing the concatenated values from
self._objandother.- Return type:
Union[pd.Index, pd.MultiIndex]
- Raises:
TypeError – If
otheris not apandas.Indexorpandas.MultiIndex.
- static from_ak_legacy(akidx: arkouda.pandas.index.Index | arkouda.pandas.index.MultiIndex) pandas.Index | pandas.MultiIndex[source]¶
Convert a legacy Arkouda
ak.Indexorak.MultiIndexinto a pandas Index/MultiIndex backed by Arkouda ExtensionArrays.This is the index analogue of
df.ak.from_ak_legacy_ea(): it performs a zero-copy-style wrapping of Arkouda server-side arrays intoArkoudaExtensionArrayobjects, producing a pandas Index or MultiIndex whose levels remain distributed on the Arkouda server.No materialization to NumPy occurs.
- Parameters:
akidx (Union[ak_Index, ak_MultiIndex]) – The legacy Arkouda Index or MultiIndex to wrap.
- Returns:
A pandas index object whose underlying data are
ArkoudaExtensionArrayinstances referencing the Arkouda server-side arrays.- Return type:
Union[pd.Index, pd.MultiIndex]
Notes
ak.Index→pd.Indexwith Arkouda-backed values.ak.MultiIndex→pd.MultiIndexwhere each level is backed by anArkoudaExtensionArray.This function does not validate whether the input is already wrapped; callers should ensure the argument is a legacy Arkouda index object.
Examples
>>> import arkouda as ak >>> import pandas as pd
Wrap a legacy
ak.Indexinto a pandasIndexwithout copying:>>> ak_idx = ak.Index(ak.arange(5)) >>> pd_idx = pd.Index.ak.from_ak_legacy(ak_idx) >>> pd_idx Index([0, 1, 2, 3, 4], dtype='int64')
The resulting index stores its values on the Arkouda server:
>>> type(pd_idx.array) <class 'arkouda.pandas.extension._arkouda_array.ArkoudaArray'>
MultiIndex example:
>>> ak_lvl1 = ak.array(['a', 'a', 'b', 'b']) >>> ak_lvl2 = ak.array([1, 2, 1, 2]) >>> ak_mi = ak.MultiIndex([ak_lvl1, ak_lvl2], names=['letter', 'number'])
>>> pd_mi = pd.Index.ak.from_ak_legacy(ak_mi) >>> pd_mi MultiIndex([('a', 1), ('a', 2), ('b', 1), ('b', 2)], names=['letter', 'number'])
Each level is backed by an Arkouda ExtensionArray and remains distributed:
>>> [type(level._data) for level in pd_mi.levels] [<class 'arkouda.pandas.extension._arkouda_string_array.ArkoudaStringArray'>, <class 'arkouda.pandas.extension._arkouda_array.ArkoudaArray'>]
No NumPy materialization occurs; the underlying data stay on the Arkouda server.
- property is_arkouda: bool¶
Return whether the underlying Index is Arkouda-backed.
An Index or MultiIndex is considered Arkouda-backed if its underlying storage uses
ArkoudaExtensionArray. This applies to both single-level and multi-level indices.- Returns:
True if the Index/MultiIndex is backed by Arkouda server-side arrays, False otherwise.
- Return type:
Examples
NumPy-backed Index:
>>> import pandas as pd >>> idx = pd.Index([1, 2, 3]) >>> idx.ak.is_arkouda False
Arkouda-backed single-level Index:
>>> import arkouda as ak >>> ak_idx = pd.Index([10, 20, 30]).ak.to_ak() >>> ak_idx.ak.is_arkouda True
Arkouda-backed MultiIndex:
>>> arrays = [[1, 1, 2], ["a", "b", "a"]] >>> midx = pd.MultiIndex.from_arrays(arrays) >>> ak_midx = midx.ak.to_ak() >>> ak_midx.ak.is_arkouda True
- lookup(key: object) arkouda.numpy.pdarrayclass.pdarray[source]¶
Perform a server-side lookup on the underlying Arkouda index.
This is a thin convenience wrapper around the legacy
arkouda.pandas.index.Index.lookup()/arkouda.pandas.index.MultiIndex.lookup()methods. It converts the pandas index to a legacy Arkouda index, performs the lookup on the server, and returns the resulting boolean mask.- Parameters:
key (object) – Lookup key or keys, interpreted in the same way as the legacy Arkouda
Index/MultiIndexlookupmethod. For a single-level index this may be a scalar or an Arkoudapdarray; for MultiIndex it may be a tuple or sequence of values/arrays.- Returns:
A boolean Arkouda array indicating which positions in the index match the given
key.- Return type:
- to_ak() pandas.Index | pandas.MultiIndex[source]¶
Convert this pandas Index or MultiIndex to an Arkouda-backed index.
Unlike
to_ak_legacy(), which returns a legacy Arkouda Index object, this method returns a pandas Index or MultiIndex whose data reside on the Arkouda server and are wrapped inArkoudaExtensionArrayExtensionArrays.The conversion is zero-copy with respect to NumPy: no materialization to local NumPy arrays occurs.
- Returns:
An Index whose underlying data live on the Arkouda server.
- Return type:
Union[pd.Index, pd.MultiIndex]
Examples
Convert a simple Index to Arkouda-backed form:
>>> import pandas as pd >>> import arkouda as ak >>> idx = pd.Index([10, 20, 30], name="values") >>> ak_idx = idx.ak.to_ak() >>> type(ak_idx.array) <class 'arkouda.pandas.extension._arkouda_array.ArkoudaArray'>
Round-trip back to NumPy-backed pandas objects:
>>> restored = ak_idx.ak.collect() >>> restored.equals(idx) True
- to_ak_legacy() arkouda.pandas.index.Index | arkouda.pandas.index.MultiIndex[source]¶
Convert this pandas Index or MultiIndex into a legacy Arkouda
ak.Indexorak.MultiIndexobject.This is the index analogue of
df.ak.to_ak_legacy(), returning the actual Arkouda index objects on the server, rather than a pandas wrapper backed byArkoudaExtensionArray.The conversion is zero-copy with respect to NumPy: values are transferred directly into Arkouda arrays without materializing to local NumPy.
- Returns:
A legacy Arkouda Index/MultiIndex whose data live on the Arkouda server.
- Return type:
Union[ak_Index, ak_MultiIndex]
Examples
Convert a simple pandas Index into a legacy Arkouda Index:
>>> import pandas as pd >>> import arkouda as ak >>> idx = pd.Index([10, 20, 30], name="numbers") >>> ak_idx = idx.ak.to_ak_legacy() >>> type(ak_idx) <class 'arkouda.pandas.index.Index'> >>> ak_idx.name 'numbers'
- to_csv(prefix_path: str, dataset: str = 'index') str[source]¶
Save this index to CSV via the legacy
to_csvimplementation and return the server response message.
- to_dict(labels=None)[source]¶
Convert this index to a dictionary representation if supported.
For MultiIndex, this delegates to
MultiIndex.to_dictand returns a mapping of label -> Index. For single-level Indexes, this will raise a TypeError, since the legacy API only definesto_dicton MultiIndex.
- to_hdf(prefix_path: str, dataset: str = 'index', mode: Literal['truncate', 'append'] = 'truncate', file_type: Literal['single', 'distribute'] = 'distribute') str[source]¶
Save this index to HDF5 via the legacy
to_hdfimplementation and return the server response message.
- class arkouda.pandas.ArkoudaInt64Dtype[source]¶
Bases:
_ArkoudaBaseDtypeExtension dtype for Arkouda-backed 64-bit integers.
This dtype allows seamless use of Arkouda’s distributed
int64arrays inside pandas objects (Series,Index,DataFrame). It is backed byarkouda.pdarraywithdtype='int64'and integrates with pandas via theArkoudaArrayextension array.- construct_array_type()[source]¶
Return the associated extension array class (
ArkoudaArray).
- classmethod construct_array_type()[source]¶
Return the associated pandas ExtensionArray type.
This is part of the pandas ExtensionDtype interface and is used internally by pandas when constructing arrays of this dtype. It ensures that operations like
Series(..., dtype=ArkoudaInt64Dtype())produce the correct Arkouda-backed extension array.- Returns:
The
ArkoudaArrayclass that implements the storage and behavior for this dtype.- Return type:
Notes
This hook tells pandas which ExtensionArray to instantiate whenever this dtype is requested.
All Arkouda dtypes defined in this module will return
ArkoudaArray(or a subclass thereof).
Examples
>>> from arkouda.pandas.extension import ArkoudaInt64Dtype >>> ArkoudaInt64Dtype.construct_array_type() <class 'arkouda.pandas.extension._arkouda_array.ArkoudaArray'>
- kind = 'i'¶
A character code (one of ‘biufcmMOSUV’), default ‘O’
This should match the NumPy dtype used when the array is converted to an ndarray, which is probably ‘O’ for object if the extension type cannot be represented as a built-in NumPy type.
See also
numpy.dtype.kind
- na_value = -1¶
Default NA value to use for this type.
This is used in e.g. ExtensionArray.take. This should be the user-facing “boxed” version of the NA value, not the physical NA value for storage. e.g. for JSONArray, this is an empty dictionary.
- name = 'int64'¶
A string identifying the data type.
Will be used for display in, e.g.
Series.dtype
- type¶
The scalar type for the array, e.g.
intIt’s expected
ExtensionArray[item]returns an instance ofExtensionDtype.typefor scalaritem, assuming that value is valid (not NA). NA values do not need to be instances of type.
- class arkouda.pandas.ArkoudaStringArray(data: arkouda.numpy.strings.Strings | numpy.ndarray | Sequence[Any] | ArkoudaStringArray)[source]¶
Bases:
arkouda.pandas.extension._arkouda_extension_array.ArkoudaExtensionArray,pandas.api.extensions.ExtensionArrayArkouda-backed string pandas ExtensionArray.
Ensures the underlying data is an Arkouda
Stringsobject. Accepts existingStringsor converts from NumPy arrays and Python sequences of strings.- Parameters:
data (Strings | ndarray | Sequence[Any] | ArkoudaStringArray) – Input to wrap or convert. - If
Strings, used directly. - If NumPy/sequence, converted viaak.array. - If anotherArkoudaStringArray, its backingStringsis reused.- Raises:
TypeError – If
datacannot be converted to ArkoudaStrings.
- astype(dtype: numpy.dtype[Any], copy: bool = True) numpy.typing.NDArray[Any][source]¶
- astype(dtype: pandas.core.dtypes.dtypes.ExtensionDtype, copy: bool = True) pandas.api.extensions.ExtensionArray
- astype(dtype: Any, copy: bool = True) pandas.api.extensions.ExtensionArray | numpy.typing.NDArray[Any]
Cast to a specified dtype.
Casting rules:
If
dtyperequestsobject, returns a NumPyNDArray[Any]of dtypeobjectcontaining the string values.If
dtypeis a string dtype (e.g. pandasStringDtype, NumPy unicode, or Arkouda string dtype), returns anArkoudaStringArray. Ifcopy=True, attempts to copy the underlying ArkoudaStringsdata.For all other dtypes, casts the underlying Arkouda
StringsusingStrings.astypeand returns an Arkouda-backedArkoudaExtensionArrayconstructed from the result.
- Parameters:
dtype (Any) – Target dtype. May be a NumPy dtype, pandas dtype, or Arkouda dtype.
copy (bool) – Whether to force a copy when the result is an
ArkoudaStringArray. Default is True.
- Returns:
The cast result. Returns a NumPy array only when casting to
object; otherwise returns an Arkouda-backed ExtensionArray.- Return type:
Union[ExtensionArray, NDArray[Any]]
Examples
Casting to a string dtype returns an Arkouda-backed string array:
>>> import arkouda as ak >>> from arkouda.pandas.extension import ArkoudaStringArray >>> s = ArkoudaStringArray(ak.array(["a", "b", "c"])) >>> out = s.astype("string") >>> out is s False
Forcing a copy when casting to a string dtype returns a new array:
>>> out2 = s.astype("string", copy=True) >>> out2 is s False >>> out2.to_ndarray() array(['a', 'b', 'c'], dtype='<U1')
Casting to
objectmaterializes the data to a NumPy array:>>> s.astype(object) array(['a', 'b', 'c'], dtype=object)
Casting to a non-string dtype uses Arkouda to cast the underlying strings and returns an Arkouda-backed ExtensionArray:
>>> s_num = ArkoudaStringArray(ak.array(["1", "2", "3"])) >>> a = s_num.astype("int64") >>> a.to_ndarray() array([1, 2, 3])
NumPy and pandas dtype objects are also accepted:
>>> import numpy as np >>> a = s_num.astype(np.dtype("float64")) >>> a.to_ndarray() array([1., 2., 3.])
- property dtype¶
An instance of ExtensionDtype.
See also
api.extensions.ExtensionDtypeBase class for extension dtypes.
api.extensions.ExtensionArrayBase class for extension array types.
api.extensions.ExtensionArray.dtypeThe dtype of an ExtensionArray.
Series.dtypeThe dtype of a Series.
DataFrame.dtypeThe dtype of a DataFrame.
Examples
>>> pd.array([1, 2, 3]).dtype Int64Dtype()
- isna()[source]¶
A 1-D array indicating if each value is missing.
- Returns:
In most cases, this should return a NumPy ndarray. For exceptional cases like
SparseArray, where returning an ndarray would be expensive, an ExtensionArray may be returned.- Return type:
numpy.ndarray or pandas.api.extensions.ExtensionArray
See also
ExtensionArray.dropnaReturn ExtensionArray without NA values.
ExtensionArray.fillnaFill NA/NaN values using the specified method.
Notes
If returning an ExtensionArray, then
na_values._is_booleanshould be Truena_valuesshould implementExtensionArray._reduce()na_valuesshould implementExtensionArray._accumulate()na_values.anyandna_values.allshould be implemented
Examples
>>> arr = pd.array([1, 2, np.nan, np.nan]) >>> arr.isna() array([False, False, True, True])
- item(*args, **kwargs)[source]¶
Return the array element at the specified position as a Python scalar.
- Parameters:
index (int, optional) – Position of the element. If not provided, the array must contain exactly one element.
- Returns:
The element at the specified position.
- Return type:
scalar
- Raises:
ValueError – If no index is provided and the array does not have exactly one element.
IndexError – If the specified position is out of bounds.
See also
numpy.ndarray.itemReturn the item of an array as a scalar.
Examples
>>> arr = pd.array([1], dtype="Int64") >>> arr.item() np.int64(1)
>>> arr = pd.array([1, 2, 3], dtype="Int64") >>> arr.item(0) np.int64(1) >>> arr.item(2) np.int64(3)
- value_counts(dropna: bool = True) pandas.Series[source]¶
Return counts of unique strings as a pandas Series.
This method computes the frequency of each distinct string value in the underlying Arkouda
Stringsobject and returns the result as a pandasSeries, with the unique string values as the index and their counts as the data.- Parameters:
dropna (bool) – Whether to exclude missing values. Missing-value handling for Arkouda string arrays is not yet implemented, so this parameter is accepted for pandas compatibility but currently has no effect. Default is True.
- Returns:
A Series containing the counts of unique string values. The index is an
ArkoudaStringArrayof unique values, and the values are anArkoudaArrayof counts.- Return type:
pd.Series
Notes
The following pandas options are not yet implemented:
normalize,sort, andbins.Counting is performed server-side in Arkouda; only the small result (unique values and counts) is materialized on the client.
Examples
Basic usage:
>>> import arkouda as ak >>> from arkouda.pandas.extension import ArkoudaStringArray >>> >>> s = ArkoudaStringArray(["red", "blue", "red", "green", "blue", "red"]) >>> s.value_counts() red 3 blue 2 green 1 dtype: int64
Empty input:
>>> empty = ArkoudaStringArray([]) >>> empty.value_counts() Series([], dtype: int64)
- class arkouda.pandas.ArkoudaStringDtype[source]¶
Bases:
_ArkoudaBaseDtypeArkouda-backed string dtype.
This dtype integrates Arkouda’s distributed
Stringstype with the pandas ExtensionArray interface viaArkoudaStringArray. It enables pandas objects (Series, DataFrame) to hold large, server-backed string columns without converting to NumPy or Python objects.- construct_array_type()[source]¶
Returns the
ArkoudaStringArrayused as the storage class.
- classmethod construct_array_type()[source]¶
Return the ExtensionArray subclass that handles storage for this dtype.
- Returns:
The
ArkoudaStringArrayclass associated with this dtype.- Return type:
- kind = 'O'¶
A character code (one of ‘biufcmMOSUV’), default ‘O’
This should match the NumPy dtype used when the array is converted to an ndarray, which is probably ‘O’ for object if the extension type cannot be represented as a built-in NumPy type.
See also
numpy.dtype.kind
- na_value = ''¶
Default NA value to use for this type.
This is used in e.g. ExtensionArray.take. This should be the user-facing “boxed” version of the NA value, not the physical NA value for storage. e.g. for JSONArray, this is an empty dictionary.
- name = 'string'¶
A string identifying the data type.
Will be used for display in, e.g.
Series.dtype
- type¶
The scalar type for the array, e.g.
intIt’s expected
ExtensionArray[item]returns an instance ofExtensionDtype.typefor scalaritem, assuming that value is valid (not NA). NA values do not need to be instances of type.
- class arkouda.pandas.ArkoudaUint64Dtype[source]¶
Bases:
_ArkoudaBaseDtypeArkouda-backed unsigned 64-bit integer dtype.
This dtype integrates Arkouda’s
uint64arrays with pandas, allowing users to createpandas.Seriesorpandas.DataFrameobjects that store their data on the Arkouda server while still conforming to the pandas ExtensionArray API.- construct_array_type()[source]¶
Return the
ArkoudaArrayclass used as the storage container for this dtype.
Examples
>>> import arkouda as ak >>> import pandas as pd >>> from arkouda.pandas.extension import ArkoudaUint64Dtype, ArkoudaArray
>>> arr = ArkoudaArray(ak.array([1, 2, 3], dtype="uint64")) >>> s = pd.Series(arr, dtype=ArkoudaUint64Dtype()) >>> s 0 1 1 2 2 3 dtype: uint64
- classmethod construct_array_type()[source]¶
Return the ExtensionArray class associated with this dtype.
This is required by the pandas ExtensionDtype API. It tells pandas which
ExtensionArraysubclass should be used to hold data of this dtype inside apandas.Seriesorpandas.DataFrame.- Returns:
The
ArkoudaArrayclass, which implements the storage and operations for Arkouda-backed arrays.- Return type:
- kind = 'u'¶
A character code (one of ‘biufcmMOSUV’), default ‘O’
This should match the NumPy dtype used when the array is converted to an ndarray, which is probably ‘O’ for object if the extension type cannot be represented as a built-in NumPy type.
See also
numpy.dtype.kind
- na_value = -1¶
Default NA value to use for this type.
This is used in e.g. ExtensionArray.take. This should be the user-facing “boxed” version of the NA value, not the physical NA value for storage. e.g. for JSONArray, this is an empty dictionary.
- name = 'uint64'¶
A string identifying the data type.
Will be used for display in, e.g.
Series.dtype
- type¶
The scalar type for the array, e.g.
intIt’s expected
ExtensionArray[item]returns an instance ofExtensionDtype.typefor scalaritem, assuming that value is valid (not NA). NA values do not need to be instances of type.
- class arkouda.pandas.ArkoudaUint8Dtype[source]¶
Bases:
_ArkoudaBaseDtypeArkouda-backed unsigned 8-bit integer dtype.
This dtype integrates Arkouda’s
uint8arrays with the pandas ExtensionArray API, allowing pandasSeriesandDataFrameobjects to store and operate on Arkouda-backed unsigned 8-bit integers. The underlying storage is an Arkoudapdarray<uint8>, exposed through theArkoudaArrayextension array.- construct_array_type()[source]¶
Returns the
ArkoudaArraytype that provides the storage and behavior for this dtype.
- classmethod construct_array_type()[source]¶
Return the ExtensionArray subclass that handles storage for this dtype.
This method is required by the pandas ExtensionDtype interface. It tells pandas which ExtensionArray class to use when creating arrays of this dtype (for example, when calling
Series(..., dtype="arkouda.uint8")).- Returns:
The
ArkoudaArrayclass associated with this dtype.- Return type:
- kind = 'u'¶
A character code (one of ‘biufcmMOSUV’), default ‘O’
This should match the NumPy dtype used when the array is converted to an ndarray, which is probably ‘O’ for object if the extension type cannot be represented as a built-in NumPy type.
See also
numpy.dtype.kind
- na_value = -1¶
Default NA value to use for this type.
This is used in e.g. ExtensionArray.take. This should be the user-facing “boxed” version of the NA value, not the physical NA value for storage. e.g. for JSONArray, this is an empty dictionary.
- name = 'uint8'¶
A string identifying the data type.
Will be used for display in, e.g.
Series.dtype
- type¶
The scalar type for the array, e.g.
intIt’s expected
ExtensionArray[item]returns an instance ofExtensionDtype.typefor scalaritem, assuming that value is valid (not NA). NA values do not need to be instances of type.
- class arkouda.pandas.CachedAccessor(name: str, accessor)[source]¶
Descriptor for caching namespace-based accessors.
This custom property-like object enables lazy initialization of accessors (e.g., .str, .dt) on Series-like objects, similar to pandas-style extension accessors.
- Parameters:
Notes
The accessor class’s
__init__method must accept a single positional argument, which should be one ofSeries,DataFrame, orIndex.
- class arkouda.pandas.DatetimeAccessor(series)[source]¶
Bases:
PropertiesAccessor for datetime-like operations on Arkouda Series.
Provides datetime methods such as .floor(), .ceil(), and .round(), mirroring the .dt accessor in pandas.
This accessor is automatically attached to Series objects that wrap arkouda.Datetime values. It should not be instantiated directly.
- Parameters:
series (arkouda.pandas.Series) – The Series object containing Datetime values.
- Raises:
AttributeError – If the underlying Series values are not of type arkouda.Datetime.
Examples
>>> import arkouda as ak >>> from arkouda import Datetime, Series >>> s = Series(Datetime(ak.array([1_000_000_000_000]))) >>> s.dt.floor("D") 0 1970-01-01 dtype: datetime64[ns]
- series¶
- class arkouda.pandas.Properties[source]¶
Base class for accessor implementations in Arkouda.
Provides the _make_op class method to dynamically generate accessor methods that wrap underlying Strings or Datetime operations and return new Series.
Notes
This class is subclassed by StringAccessor and DatetimeAccessor, and is not intended to be used directly.
Examples
Subclasses should define _make_op(“operation_name”), which will generate a method that applies series.values.operation_name(…) and returns a new Series.
- class arkouda.pandas.Row(dict=None, /, **kwargs)[source]¶
Bases:
collections.UserDictDictionary-like representation of a single row in an Arkouda
DataFrame.Wraps the column-to-value mapping for one row and provides convenient ASCII and HTML formatting for display.
- Parameters:
data (dict) – Mapping of column names to their corresponding values for this row.
Examples
>>> import arkouda as ak >>> from arkouda.pandas.row import Row >>> df = ak.DataFrame({"x": ak.array([10, 20]), "y": ak.array(["a", "b"])})
Suppose
df[0]returns{"x": 10, "y": "a}:>>> row = Row({"x": 10, "y": "a"}) >>> print(row) keys values ------ -------- x 10 y a
- class arkouda.pandas.Series(data: Tuple | List | arkouda.pandas.groupbyclass.groupable_element_type | Series | arkouda.numpy.segarray.SegArray | pandas.Series | pandas.Categorical, name=None, index: arkouda.numpy.pdarrayclass.pdarray | arkouda.numpy.strings.Strings | Tuple | List | arkouda.pandas.index.Index | None = None)[source]¶
One-dimensional Arkouda array with axis labels.
- Parameters:
- Raises:
TypeError – Raised if
indexis not a pdarray or Strings object. Raised ifdatais not a supported type.ValueError – Raised if the index size does not match the data size.
Notes
The Series class accepts either positional arguments or keyword arguments.
- Positional arguments
Series(data):datais provided and an index is generated automatically.Series(data, index): bothdataandindexare provided.
- Keyword arguments
Series(data=..., index=...):indexis optional but must match the size ofdatawhen provided.
- property at: _LocIndexer¶
Accesses entries of a Series by label.
- Returns:
An indexer for label-based access to Series entries.
- Return type:
_LocIndexer
- static concat(arrays: List, axis: int = 0, index_labels: List[str] | None = None, value_labels: List[str] | None = None, ordered: bool = False) arkouda.pandas.dataframe.DataFrame | Series[source]¶
Concatenate a list of Arkouda Series or grouped arrays horizontally or vertically.
If a list of grouped Arkouda arrays is passed, they are converted to Series. Each grouping is a 2-tuple where the first item is the key(s) and the second is the value. If concatenating horizontally (axis=1), all series/groupings must have the same length and the same index. The index is converted to a column in the resulting DataFrame; if it’s a MultiIndex, each level is converted to a separate column.
- Parameters:
arrays (List) – A list of Series or groupings (tuples of index and values) to concatenate.
axis (int) – The axis to concatenate along: - 0 = vertical (stack series into one) - 1 = horizontal (align by index and produce a DataFrame) Defaults to 0.
index_labels (List[str] or None, optional) – Column name(s) to label the index when axis=1.
value_labels (List[str] or None, optional) – Column names to label the values of each Series.
ordered (bool) – Unused parameter. Reserved for future support of deterministic vs. performance-optimized concatenation. Defaults to False.
- Returns:
If axis=0: a new Series
If axis=1: a new DataFrame
- Return type:
- diff() Series[source]¶
Diffs consecutive values of the series.
Returns a new series with the same index and length. First value is set to NaN.
- dt¶
- property dtype: numpy.dtype¶
- fillna(value: supported_scalars | Series | arkouda.numpy.pdarrayclass.pdarray) Series[source]¶
Fill NA/NaN values using the specified method.
- Parameters:
value (supported_scalars, Series, or pdarray) – Value to use to fill holes (e.g. 0), alternately a Series of values specifying which value to use for each index. Values not in the Series will not be filled. This value cannot be a list.
- Returns:
Object with missing values filled.
- Return type:
Examples
>>> import arkouda as ak >>> from arkouda import Series >>> import numpy as np
>>> data = ak.Series([1, np.nan, 3, np.nan, 5]) >>> data 0 1.0 1 NaN 2 3.0 3 NaN 4 5.0 dtype: float64
>>> fill_values1 = ak.ones(5) >>> data.fillna(fill_values1) 0 1.0 1 1.0 2 3.0 3 1.0 4 5.0 dtype: float64
>>> fill_values2 = Series(ak.ones(5)) >>> data.fillna(fill_values2) 0 1.0 1 1.0 2 3.0 3 1.0 4 5.0 dtype: float64
>>> fill_values3 = 100.0 >>> data.fillna(fill_values3) 0 1.0 1 100.0 2 3.0 3 100.0 4 5.0 dtype: float64
- classmethod from_return_msg(rep_msg: str) Series[source]¶
Return a Series instance pointing to components created by the arkouda server.
The user should not call this function directly.
- Parameters:
rep_msg (builtin_str) –
delimited string containing the values and indexes.
- Returns:
A Series representing a set of pdarray components on the server.
- Return type:
- Raises:
RuntimeError – Raised if a server-side error is thrown in the process of creating the Series instance.
- has_repeat_labels() bool[source]¶
Return whether the Series has any labels that appear more than once.
- hasnans() arkouda.numpy.dtypes.bool_scalars[source]¶
Return True if there are any NaNs.
- Return type:
Examples
>>> import arkouda as ak >>> from arkouda import Series >>> import numpy as np
>>> s = ak.Series(ak.array([1, 2, 3, np.nan])) >>> s 0 1.0 1 2.0 2 3.0 3 NaN dtype: float64
>>> s.hasnans() np.True_
- property iat: _iLocIndexer¶
Accesses entries of a Series by position.
- Returns:
An indexer for position-based access to a single element.
- Return type:
_iLocIndexer
- property iloc: _iLocIndexer¶
Accesses entries of a Series by position.
- Returns:
An indexer for position-based access to Series entries.
- Return type:
_iLocIndexer
- is_registered() bool[source]¶
- Return True iff the object is contained in the registry or is a component of a
registered object.
- Returns:
Indicates if the object is contained in the registry
- Return type:
- Raises:
RegistrationError – Raised if there’s a server-side error or a mis-match of registered components
See also
register,attach,unregisterNotes
Objects registered with the server are immune to deletion until they are unregistered.
- isin(lst: arkouda.numpy.pdarrayclass.pdarray | arkouda.numpy.strings.Strings | List) Series[source]¶
Find Series elements whose values are in the specified list.
- isna() Series[source]¶
Detect missing values.
Return a boolean same-sized object indicating if the values are NA. NA values, such as numpy.NaN, gets mapped to True values. Everything else gets mapped to False values. Characters such as empty strings ‘’ are not considered NA values.
- Returns:
Mask of bool values for each element in Series that indicates whether an element is an NA value.
- Return type:
Examples
>>> import arkouda as ak >>> from arkouda import Series >>> import numpy as np
>>> s = Series(ak.array([1, 2, np.nan]), index = ak.array([1, 2, 4])) >>> s.isna() 1 False 2 False 4 True dtype: bool
- isnull() Series[source]¶
Series.isnull is an alias for Series.isna.
Detect missing values.
Return a boolean same-sized object indicating if the values are NA. NA values, such as numpy.NaN, gets mapped to True values. Everything else gets mapped to False values. Characters such as empty strings ‘’ are not considered NA values.
- Returns:
Mask of bool values for each element in Series that indicates whether an element is an NA value.
- Return type:
Examples
>>> import arkouda as ak >>> from arkouda import Series >>> import numpy as np
>>> s = Series(ak.array([1, 2, np.nan]), index = ak.array([1, 2, 4])) >>> s.isnull() 1 False 2 False 4 True dtype: bool
- property loc: _LocIndexer¶
Accesses entries of a Series by label.
- Returns:
An indexer for label-based access to Series entries.
- Return type:
_LocIndexer
- locate(key: int | arkouda.numpy.pdarrayclass.pdarray | arkouda.pandas.index.Index | Series | List | Tuple) Series[source]¶
Lookup values by index label.
- Parameters:
key (int, pdarray, Index, Series, List, or Tuple) –
The key or keys to look up. This can be: - A scalar - A list of scalars - A list of lists (for MultiIndex) - A Series (in which case labels are preserved, and its values are used as keys)
Keys will be converted to Arkouda arrays as needed.
- Returns:
A Series containing the values corresponding to the key.
- Return type:
- map(arg: dict | arkouda.Series) arkouda.Series[source]¶
Map values of Series according to an input mapping.
- Parameters:
arg (dict or Series) – The mapping correspondence.
- Returns:
A new series with the same index as the caller. When the input Series has Categorical values, the return Series will have Strings values. Otherwise, the return type will match the input type.
- Return type:
- Raises:
TypeError – Raised if arg is not of type dict or arkouda.Series. Raised if series values not of type pdarray, Categorical, or Strings.
Examples
>>> import arkouda as ak >>> s = ak.Series(ak.array([2, 3, 2, 3, 4])) >>> s 0 2 1 3 2 2 3 3 4 4 dtype: int64
>>> s.map({4: 25.0, 2: 30.0, 1: 7.0, 3: 5.0}) 0 30.0 1 5.0 2 30.0 3 5.0 4 25.0 dtype: float64
>>> s2 = ak.Series(ak.array(["a","b","c","d"]), index = ak.array([4,2,1,3])) >>> s.map(s2) 0 b 1 d 2 b 3 d 4 a dtype: ...
- memory_usage(index: bool = True, unit: Literal['B', 'KB', 'MB', 'GB'] = 'B') int[source]¶
Return the memory usage of the Series.
The memory usage can optionally include the contribution of the index.
- Parameters:
index (bool) – Specifies whether to include the memory usage of the Series index. Defaults to True.
unit ({"B", "KB", "MB", "GB"}) – Unit to return. One of {‘B’, ‘KB’, ‘MB’, ‘GB’}. Defaults to “B”.
- Returns:
Bytes of memory consumed.
- Return type:
int
See also
arkouda.numpy.pdarrayclass.nbytes,arkouda.Index.memory_usage,arkouda.pandas.series.Series.memory_usage,arkouda.pandas.datafame.DataFrame.memory_usageExamples
>>> import arkouda as ak >>> from arkouda.pandas.series import Series >>> s = ak.Series(ak.arange(3)) >>> s.memory_usage() 48
Not including the index gives the size of the rest of the data, which is necessarily smaller:
>>> s.memory_usage(index=False) 24
Select the units:
>>> s = ak.Series(ak.arange(3000)) >>> s.memory_usage(unit="KB") 46.875
- property ndim: int¶
- notna() Series[source]¶
Detect existing (non-missing) values.
Return a boolean same-sized object indicating if the values are not NA. Non-missing values get mapped to True. Characters such as empty strings ‘’ are not considered NA values. NA values, such as numpy.NaN, get mapped to False values.
- Returns:
Mask of bool values for each element in Series that indicates whether an element is not an NA value.
- Return type:
Examples
>>> import arkouda as ak >>> from arkouda import Series >>> import numpy as np
>>> s = Series(ak.array([1, 2, np.nan]), index = ak.array([1, 2, 4])) >>> s.notna() 1 True 2 True 4 False dtype: bool
- notnull() Series[source]¶
Series.notnull is an alias for Series.notna.
Detect existing (non-missing) values.
Return a boolean same-sized object indicating if the values are not NA. Non-missing values get mapped to True. Characters such as empty strings ‘’ are not considered NA values. NA values, such as numpy.NaN, get mapped to False values.
- Returns:
Mask of bool values for each element in Series that indicates whether an element is not an NA value.
- Return type:
Examples
>>> import arkouda as ak >>> from arkouda import Series >>> import numpy as np
>>> s = Series(ak.array([1, 2, np.nan]), index = ak.array([1, 2, 4])) >>> s.notnull() 1 True 2 True 4 False dtype: bool
- objType = 'Series'¶
- static pdconcat(arrays: List, axis: int = 0, labels: arkouda.numpy.strings.Strings | None = None) pandas.Series | pandas.DataFrame[source]¶
Concatenate a list of Arkouda Series or grouped arrays, returning a local pandas object.
If a list of grouped Arkouda arrays is passed, they are converted to Series. Each grouping is a 2-tuple with the first item being the key(s) and the second the value.
If axis=1 (horizontal), each Series or grouping must have the same length and the same index. The index is converted to a column in the resulting DataFrame. If it is a MultiIndex, each level is converted to a separate column.
- Parameters:
arrays (List) – A list of Series or groupings (tuples of index and values) to concatenate.
axis (int) – The axis along which to concatenate: - 0 = vertical (stack into a Series) - 1 = horizontal (align by index into a DataFrame) Defaults to 0.
labels (Strings or None, optional) – Names to assign to the resulting columns in the DataFrame.
- Returns:
If axis=0: a local pandas Series
If axis=1: a local pandas DataFrame
- Return type:
- register(user_defined_name: str)[source]¶
Register this Series object and underlying components with the Arkouda server.
- Parameters:
user_defined_name (builtin_str) – User-defined name the Series is to be registered under. This will be the root name for the underlying components.
- Returns:
The same Series which is now registered with the arkouda server and has an updated name. This is an in-place modification, the original is returned to support a fluid programming style. Please note you cannot register two different Series with the same name.
- Return type:
- Raises:
TypeError – Raised if user_defined_name is not a str
RegistrationError – If the server was unable to register the Series with the user_defined_name
See also
unregister,attach,is_registeredNotes
Objects registered with the server are immune to deletion until they are unregistered.
- property shape: Tuple[int]¶
- size¶
- str¶
- to_dataframe(index_labels: List[str] | None = None, value_label: str | None = None) arkouda.pandas.dataframe.DataFrame[source]¶
Convert the Series to an Arkouda DataFrame.
- to_markdown(mode='wt', index=True, tablefmt='grid', storage_options=None, **kwargs)[source]¶
Print Series in Markdown-friendly format.
- Parameters:
mode (str, optional) – Mode in which file is opened, “wt” by default.
index (bool, optional, default True) – Add index (row) labels.
tablefmt (str = "grid") – Table format to call from tablulate: https://pypi.org/project/tabulate/
storage_options (dict, optional) – Extra options that make sense for a particular storage connection, e.g. host, port, username, password, etc., if using a URL that will be parsed by fsspec, e.g., starting “s3://”, “gcs://”. An error will be raised if providing this argument with a non-fsspec URL. See the fsspec and backend storage implementation docs for the set of allowed keys and values.
**kwargs – These parameters will be passed to tabulate.
Note
This function should only be called on small Series as it calls pandas.Series.to_markdown: https://pandas.pydata.org/docs/reference/api/pandas.Series.to_markdown.html
Examples
>>> import arkouda as ak
>>> s = ak.Series(["elk", "pig", "dog", "quetzal"], name="animal") >>> print(s.to_markdown()) +----+----------+ | | animal | +====+==========+ | 0 | elk | +----+----------+ | 1 | pig | +----+----------+ | 2 | dog | +----+----------+ | 3 | quetzal | +----+----------+
Output markdown with a tabulate option.
>>> print(s.to_markdown(tablefmt="grid")) +----+----------+ | | animal | +====+==========+ | 0 | elk | +----+----------+ | 1 | pig | +----+----------+ | 2 | dog | +----+----------+ | 3 | quetzal | +----+----------+
- to_pandas() pandas.Series[source]¶
Convert the series to a local PANDAS series.
- topn(n: int = 10) Series[source]¶
Return the top values of the Series.
- Parameters:
n (int) – Number of values to return. Defaults to 10.
- Returns:
A new Series containing the top n values.
- Return type:
- unregister()[source]¶
Unregister this Series object in the arkouda server which was previously registered using register() and/or attached to using attach().
- Raises:
RegistrationError – If the object is already unregistered or if there is a server error when attempting to unregister
See also
register,attach,is_registeredNotes
Objects registered with the server are immune to deletion until they are unregistered.
- validate_key(key: Series | arkouda.numpy.pdarrayclass.pdarray | arkouda.numpy.strings.Strings | arkouda.pandas.categorical.Categorical | List | supported_scalars | arkouda.numpy.segarray.SegArray) arkouda.numpy.pdarrayclass.pdarray | arkouda.numpy.strings.Strings | arkouda.pandas.categorical.Categorical | supported_scalars | arkouda.numpy.segarray.SegArray[source]¶
Validate type requirements for keys when reading or writing the Series.
Also converts list and tuple arguments into pdarrays.
- Parameters:
key (Series, pdarray, Strings, Categorical, List, supported_scalars, or SegArray) – The key or container of keys that might be used to index into the Series.
- Return type:
The validated key(s), with lists and tuples converted to pdarrays
- Raises:
TypeError – Raised if keys are not boolean values or the type of the labels Raised if key is not one of the supported types
KeyError – Raised if container of keys has keys not present in the Series
IndexError – Raised if the length of a boolean key array is different from the Series
- validate_val(val: arkouda.numpy.pdarrayclass.pdarray | arkouda.numpy.strings.Strings | supported_scalars | List) arkouda.numpy.pdarrayclass.pdarray | arkouda.numpy.strings.Strings | supported_scalars[source]¶
Validate type requirements for values being written into the Series.
Also converts list and tuple arguments into pdarrays.
- Parameters:
val (pdarray, Strings, supported_scalars, or List) – The value or container of values that might be assigned into the Series.
- Return type:
The validated value, with lists converted to pdarrays
- Raises:
TypeError –
- Raised if val is not the same type or a container with elements
of the same time as the Series
Raised if val is a string or Strings type. Raised if val is not one of the supported types
- class arkouda.pandas.StringAccessor(series)[source]¶
Bases:
PropertiesAccessor for string operations on Arkouda Series.
Provides string-like methods such as .contains(), .startswith(), and .endswith() via the .str accessor, similar to pandas.
This accessor is automatically attached to Series objects that wrap arkouda.Strings or arkouda.Categorical values. It should not be instantiated directly.
- Parameters:
series (arkouda.pandas.Series) – The Series object containing Strings or Categorical values.
- Raises:
AttributeError – If the underlying Series values are not Strings or Categorical.
Examples
>>> import arkouda as ak >>> from arkouda import Series >>> s = Series(["apple", "banana", "apricot"]) >>> s.str.startswith("a") 0 True 1 False 2 True dtype: bool
- series¶
- arkouda.pandas.compute_join_size(a: arkouda.numpy.pdarrayclass.pdarray, b: arkouda.numpy.pdarrayclass.pdarray) Tuple[int, int][source]¶
Compute the internal size of a hypothetical join between a and b. Returns both the number of elements and number of bytes required for the join.
- arkouda.pandas.date_operators(cls)[source]¶
Add common datetime operation methods to a DatetimeAccessor class.
This class decorator dynamically attaches datetime operations (floor, ceil, round) to the given class using the _make_op helper.
- Parameters:
cls (type) – The accessor class to decorate.
- Returns:
The accessor class with datetime methods added.
- Return type:
Notes
Used internally to implement the .dt accessor API.
- arkouda.pandas.from_series(series: pandas.Series, dtype: type | str | None = None) arkouda.numpy.pdarrayclass.pdarray | arkouda.numpy.strings.Strings[source]¶
Convert a pandas
Seriesto an ArkoudapdarrayorStrings.If
dtypeis not provided, the dtype is inferred from the pandasSeries(using pandas dtype metadata). Ifdtypeis provided, it is used as an override and normalized via Arkouda’s dtype resolution rules.In addition to the core numeric and boolean types, this function supports datetime and timedelta
Seriesof any resolution (ns,us,ms, etc.) by converting them to anint64pdarrayof nanoseconds.- Parameters:
series (pd.Series) – The pandas
Seriesto convert.dtype (Optional[Union[type, str]], optional) –
Optional dtype override. This may be a Python type (e.g.
bool), a NumPy scalar type (e.g.np.int64), or a dtype string.String-like spellings are normalized to Arkouda string dtype, including
"object","str","string","string[python]", and"string[pyarrow]".
- Returns:
An Arkouda
pdarrayfor numeric, boolean, datetime, or timedelta inputs, or an ArkoudaStringsfor string inputs.- Return type:
- Raises:
ValueError – Raised if the dtype cannot be interpreted or is unsupported for conversion.
Examples
>>> import arkouda as ak >>> import numpy as np >>> import pandas as pd
Integers:
>>> np.random.seed(1701) >>> ak.from_series(pd.Series(np.random.randint(0, 10, 5))) array([4 3 3 5 0])
>>> ak.from_series(pd.Series(['1', '2', '3', '4', '5']), dtype=np.int64) array([1 2 3 4 5])
Floats:
>>> np.random.seed(1701) >>> ak.from_series(pd.Series(np.random.uniform(low=0.0, high=1.0, size=3))) array([0.089433234324597599 0.1153776854774361 0.51874393620990389])
Booleans:
>>> np.random.seed(1864) >>> ak.from_series(pd.Series(np.random.choice([True, False], size=5))) array([True True True False False])
Strings (pandas dtype spellings normalized to Arkouda
Strings):>>> ak.from_series(pd.Series(['a', 'b', 'c', 'd', 'e'], dtype="string")) array(['a', 'b', 'c', 'd', 'e'])
>>> ak.from_series(pd.Series(['a', 'b', 'c'], dtype="string[pyarrow]")) array(['a', 'b', 'c'])
Datetime (any resolution is accepted and returned as
int64nanoseconds):>>> ak.from_series(pd.Series(pd.to_datetime(['1/1/2018', np.datetime64('2018-01-01')]))) array([1514764800000000000 1514764800000000000])
Notes
Datetime and timedelta
Seriesare converted toint64nanoseconds.String-like pandas dtypes (including
object) are treated as string and converted to ArkoudaStrings.
- arkouda.pandas.gen_ranges(starts, ends, stride=1, return_lengths=False)[source]¶
Generate a segmented array of variable-length, contiguous ranges between pairs of start- and end-points.
- Parameters:
- Returns:
- segmentspdarray, int64
The starting index of each range in the resulting array
- rangespdarray, int64
The actual ranges, flattened into a single array
- lengthspdarray, int64
The lengths of each segment. Only returned if return_lengths=True.
- Return type:
- arkouda.pandas.join_on_eq_with_dt(a1: arkouda.numpy.pdarrayclass.pdarray, a2: arkouda.numpy.pdarrayclass.pdarray, t1: arkouda.numpy.pdarrayclass.pdarray, t2: arkouda.numpy.pdarrayclass.pdarray, dt: int | numpy.int64, pred: str, result_limit: int | numpy.int64 = 1000) Tuple[arkouda.numpy.pdarrayclass.pdarray, arkouda.numpy.pdarrayclass.pdarray][source]¶
Inner-join on equality between two integer arrays where the time-window predicate is also true.
- Parameters:
a1 (pdarray) – Values to join (must be int64 dtype).
a2 (pdarray) – Values to join (must be int64 dtype).
t1 (pdarray) – timestamps in millis corresponding to the a1 pdarray
t2 (pdarray) – timestamps in millis corresponding to the a2 pdarray
dt (Union[int,np.int64]) – time delta
pred (str) – time window predicate
result_limit (Union[int,np.int64]) – size limit for returned result
- Returns:
- result_array_onepdarray, int64
a1 indices where a1 == a2
- result_array_onepdarray, int64
a2 indices where a2 == a1
- Return type:
- Raises:
TypeError – Raised if a1, a2, t1, or t2 is not a pdarray, or if dt or result_limit is not an int
ValueError – if a1, a2, t1, or t2 dtype is not int64, pred is not ‘true_dt’, ‘abs_dt’, or ‘pos_dt’, or result_limit is < 0
- arkouda.pandas.string_operators(cls)[source]¶
Add common string operation methods to a StringAccessor class.
This class decorator dynamically attaches string operations (contains, startswith, endswith) to the given class using the _make_op helper.
- Parameters:
cls (type) – The accessor class to decorate.
- Returns:
The accessor class with string methods added.
- Return type:
Notes
Used internally to implement the .str accessor API.