arkouda.pandas.extension ======================== .. py:module:: arkouda.pandas.extension .. autoapi-nested-parse:: Experimental pandas extension types backed by Arkouda arrays. This subpackage provides experimental implementations of :pandas:`pandas.api.extensions.ExtensionArray` and corresponding extension dtypes that wrap Arkouda distributed arrays. These classes make it possible to use Arkouda arrays inside pandas objects such as ``Series`` and ``DataFrame``. They aim to provide familiar pandas semantics while leveraging Arkouda's distributed, high-performance backend. .. warning:: This module is **experimental**. The API is not stable and may change without notice between releases. Use with caution in production environments. Classes ------- .. autoapisummary:: arkouda.pandas.extension.ArkoudaArray arkouda.pandas.extension.ArkoudaBigintDtype arkouda.pandas.extension.ArkoudaBoolDtype arkouda.pandas.extension.ArkoudaCategorical arkouda.pandas.extension.ArkoudaCategoricalDtype arkouda.pandas.extension.ArkoudaDataFrameAccessor arkouda.pandas.extension.ArkoudaExtensionArray arkouda.pandas.extension.ArkoudaFloat64Dtype arkouda.pandas.extension.ArkoudaIndexAccessor arkouda.pandas.extension.ArkoudaInt64Dtype arkouda.pandas.extension.ArkoudaSeriesAccessor arkouda.pandas.extension.ArkoudaStringArray arkouda.pandas.extension.ArkoudaStringDtype arkouda.pandas.extension.ArkoudaUint64Dtype arkouda.pandas.extension.ArkoudaUint8Dtype Package Contents ---------------- .. py:class:: ArkoudaArray(data: arkouda.numpy.pdarrayclass.pdarray | numpy.ndarray | Sequence[Any] | ArkoudaArray, dtype: Any = None, copy: bool = False) Bases: :py:obj:`arkouda.pandas.extension._arkouda_extension_array.ArkoudaExtensionArray`, :py:obj:`pandas.api.extensions.ExtensionArray` Arkouda-backed numeric/bool pandas ExtensionArray. Wraps or converts supported inputs into an Arkouda ``pdarray`` to serve as the backing store. Ensures the underlying array is 1-D and lives on the Arkouda server. :param data: Input to wrap or convert. - If an Arkouda ``pdarray``, it is used directly unless ``dtype`` is given or ``copy=True``, in which case a new array is created via ``ak.array``. - If a NumPy array, it is transferred to Arkouda via ``ak.array``. - If a Python sequence, it is converted to NumPy then to Arkouda. - If another ``ArkoudaArray``, its underlying ``pdarray`` is reused. :type data: pdarray | ndarray | Sequence[Any] | ArkoudaArray :param dtype: Desired dtype to cast to (NumPy dtype or Arkouda dtype string). If omitted, dtype is inferred from ``data``. :type dtype: Any, optional :param copy: If True, attempt to copy the underlying data when converting/wrapping. Default is False. :type copy: bool :raises TypeError: If ``data`` cannot be interpreted as an Arkouda array-like object. :raises ValueError: If the resulting array is not one-dimensional. .. attribute:: default_fill_value Sentinel used when filling missing values (default: -1). :type: int .. rubric:: Examples >>> import arkouda as ak >>> from arkouda.pandas.extension import ArkoudaArray >>> ArkoudaArray(ak.arange(5)) ArkoudaArray([0 1 2 3 4]) >>> ArkoudaArray([10, 20, 30]) ArkoudaArray([10 20 30]) .. py:method:: all(axis=0, skipna=True, **kwargs) Return whether all elements are True. This is mainly to support pandas' BaseExtensionArray.equals, which calls `.all()` on the result of a boolean expression. .. py:method:: any(axis=0, skipna=True, **kwargs) Return whether any element is True. Added for symmetry with `.all()` and to support potential pandas boolean-reduction calls. .. py:method:: astype(dtype: numpy.dtype[Any], copy: bool = True) -> numpy.typing.NDArray[Any] astype(dtype: pandas.core.dtypes.dtypes.ExtensionDtype, copy: bool = True) -> pandas.api.extensions.ExtensionArray astype(dtype: Any, copy: bool = True) -> Union[pandas.api.extensions.ExtensionArray, numpy.typing.NDArray[Any]] Cast the array to a specified dtype. Casting rules: * If ``dtype`` requests ``object``, returns a NumPy ``NDArray[Any]`` of dtype ``object`` containing the array values. * Otherwise, the target dtype is normalized using Arkouda's dtype resolution rules. * If the normalized dtype matches the current dtype and ``copy=False``, returns ``self``. * In all other cases, casts the underlying Arkouda array to the target dtype and returns an Arkouda-backed ``ArkoudaExtensionArray``. :param dtype: Target dtype. May be a NumPy dtype, pandas dtype, Arkouda dtype, or any dtype-like object accepted by Arkouda. :type dtype: Any :param copy: Whether to force a copy when the target dtype matches the current dtype. Default is True. :type copy: bool :returns: The cast result. Returns a NumPy array only when casting to ``object``; otherwise returns an Arkouda-backed ExtensionArray. :rtype: Union[ExtensionArray, NDArray[Any]] .. rubric:: Examples Basic numeric casting returns an Arkouda-backed array: >>> import arkouda as ak >>> from arkouda.pandas.extension import ArkoudaArray >>> a = ArkoudaArray(ak.array([1, 2, 3], dtype="int64")) >>> a.astype("float64").to_ndarray() array([1., 2., 3.]) Casting to the same dtype with ``copy=False`` returns the original object: >>> b = a.astype("int64", copy=False) >>> b is a True Forcing a copy when the dtype is unchanged returns a new array: >>> c = a.astype("int64", copy=True) >>> c is a False >>> c.to_ndarray() array([1, 2, 3]) Casting to ``object`` materializes the data to a NumPy array: >>> a.astype(object) array([1, 2, 3], dtype=object) NumPy and pandas dtype objects are also accepted: >>> import numpy as np >>> a.astype(np.dtype("bool")).to_ndarray() array([ True, True, True]) .. py:attribute:: default_fill_value :type: int :value: -1 .. py:property:: dtype An instance of ExtensionDtype. .. seealso:: :py:obj:`api.extensions.ExtensionDtype` Base class for extension dtypes. :py:obj:`api.extensions.ExtensionArray` Base class for extension array types. :py:obj:`api.extensions.ExtensionArray.dtype` The dtype of an ExtensionArray. :py:obj:`Series.dtype` The dtype of a Series. :py:obj:`DataFrame.dtype` The dtype of a DataFrame. .. rubric:: Examples >>> pd.array([1, 2, 3]).dtype Int64Dtype() .. py:method:: equals(other) Return if another array is equivalent to this array. Equivalent means that both arrays have the same shape and dtype, and all values compare equal. Missing values in the same location are considered equal (in contrast with normal equality). :param other: Array to compare to this Array. :type other: ExtensionArray :returns: Whether the arrays are equivalent. :rtype: boolean .. seealso:: :py:obj:`numpy.array_equal` Equivalent method for numpy array. :py:obj:`Series.equals` Equivalent method for Series. :py:obj:`DataFrame.equals` Equivalent method for DataFrame. .. rubric:: Examples >>> arr1 = pd.array([1, 2, np.nan]) >>> arr2 = pd.array([1, 2, np.nan]) >>> arr1.equals(arr2) True >>> arr1 = pd.array([1, 3, np.nan]) >>> arr2 = pd.array([1, 2, np.nan]) >>> arr1.equals(arr2) False .. py:method:: isna() -> numpy.ndarray Return a boolean mask indicating missing values. This method implements the pandas ExtensionArray.isna contract and always returns a NumPy ndarray of dtype ``bool`` with the same length as the array. :returns: A boolean mask where ``True`` marks elements considered missing. :rtype: np.ndarray :raises TypeError: If the underlying data buffer does not support missing-value detection or cannot produce a boolean mask. .. py:method:: isnull() Alias for isna(). .. py:property:: nbytes The number of bytes needed to store this object in memory. .. seealso:: :py:obj:`ExtensionArray.shape` Return a tuple of the array dimensions. :py:obj:`ExtensionArray.size` The number of elements in the array. .. rubric:: Examples >>> pd.array([1, 2, 3]).nbytes 27 .. py:method:: value_counts(dropna: bool = True) -> pandas.Series Return counts of unique values as a pandas Series. This method computes the frequency of each distinct value in the underlying Arkouda array and returns the result as a pandas ``Series``, with the unique values as the index and their counts as the data. :param dropna: Whether to exclude missing values. Currently, missing-value handling is supported only for floating-point data, where ``NaN`` values are treated as missing. Default is True. :type dropna: bool :returns: A Series containing the counts of unique values. The index is an ``ArkoudaArray`` of unique values, and the values are an ``ArkoudaArray`` of counts. :rtype: pd.Series .. rubric:: Notes - Only ``dropna=True`` is supported. - The following pandas options are not yet implemented: ``normalize``, ``sort``, and ``bins``. - Counting is performed server-side in Arkouda; only the small result (unique values and counts) is materialized on the client. .. rubric:: Examples >>> import arkouda as ak >>> from arkouda.pandas.extension import ArkoudaArray >>> >>> a = ArkoudaArray(ak.array([1, 2, 1, 3, 2, 1])) >>> a.value_counts() 1 3 2 2 3 1 dtype: int64 Floating-point data with NaN values: >>> b = ArkoudaArray(ak.array([1.0, 2.0, float("nan"), 1.0])) >>> b.value_counts() 1.0 2 2.0 1 dtype: int64 .. py:class:: ArkoudaBigintDtype Bases: :py:obj:`_ArkoudaBaseDtype` Arkouda-backed arbitrary-precision integer dtype. This dtype integrates Arkouda's server-backed ``pdarray`` with the pandas ExtensionArray interface via :class:`ArkoudaArray`. It enables pandas objects (Series, DataFrame) to hold and operate on very large integers that exceed 64-bit precision, while keeping the data distributed on the Arkouda server. .. method:: construct_array_type() Returns the :class:`ArkoudaArray` class used for storage. .. py:method:: construct_array_type() :classmethod: Return the ExtensionArray subclass that handles storage for this dtype. :returns: The :class:`ArkoudaArray` class associated with this dtype. :rtype: type .. py:attribute:: kind :value: 'O' A character code (one of 'biufcmMOSUV'), default 'O' This should match the NumPy dtype used when the array is converted to an ndarray, which is probably 'O' for object if the extension type cannot be represented as a built-in NumPy type. .. seealso:: :py:obj:`numpy.dtype.kind` .. py:attribute:: na_value :value: -1 Default NA value to use for this type. This is used in e.g. ExtensionArray.take. This should be the user-facing "boxed" version of the NA value, not the physical NA value for storage. e.g. for JSONArray, this is an empty dictionary. .. py:attribute:: name :value: 'bigint' A string identifying the data type. Will be used for display in, e.g. ``Series.dtype`` .. py:attribute:: type The scalar type for the array, e.g. ``int`` It's expected ``ExtensionArray[item]`` returns an instance of ``ExtensionDtype.type`` for scalar ``item``, assuming that value is valid (not NA). NA values do not need to be instances of `type`. .. py:class:: ArkoudaBoolDtype Bases: :py:obj:`_ArkoudaBaseDtype` Arkouda-backed boolean dtype. This dtype integrates Arkouda's server-backed `pdarray` with the pandas ExtensionArray interface via :class:`ArkoudaArray`. It allows pandas objects (Series, DataFrame) to store and manipulate distributed boolean arrays without materializing them on the client. .. method:: construct_array_type() Returns the :class:`ArkoudaArray` class used for storage. .. py:method:: construct_array_type() :classmethod: Return the ExtensionArray subclass that handles storage for this dtype. :returns: The :class:`ArkoudaArray` class associated with this dtype. :rtype: type .. py:attribute:: kind :value: 'b' A character code (one of 'biufcmMOSUV'), default 'O' This should match the NumPy dtype used when the array is converted to an ndarray, which is probably 'O' for object if the extension type cannot be represented as a built-in NumPy type. .. seealso:: :py:obj:`numpy.dtype.kind` .. py:attribute:: na_value :value: False Default NA value to use for this type. This is used in e.g. ExtensionArray.take. This should be the user-facing "boxed" version of the NA value, not the physical NA value for storage. e.g. for JSONArray, this is an empty dictionary. .. py:attribute:: name :value: 'bool_' A string identifying the data type. Will be used for display in, e.g. ``Series.dtype`` .. py:attribute:: type The scalar type for the array, e.g. ``int`` It's expected ``ExtensionArray[item]`` returns an instance of ``ExtensionDtype.type`` for scalar ``item``, assuming that value is valid (not NA). NA values do not need to be instances of `type`. .. py:class:: ArkoudaCategorical(data: arkouda.pandas.categorical.Categorical | ArkoudaCategorical | numpy.ndarray | Sequence[Any]) Bases: :py:obj:`arkouda.pandas.extension._arkouda_extension_array.ArkoudaExtensionArray`, :py:obj:`pandas.api.extensions.ExtensionArray` Arkouda-backed categorical pandas ExtensionArray. Ensures the underlying data is an Arkouda ``Categorical``. Accepts an existing ``Categorical`` or converts from Python/NumPy sequences of labels. :param data: Input to wrap or convert. - If ``Categorical``, used directly. - If another ``ArkoudaCategorical``, its backing object is reused. - If list/tuple/ndarray, converted via ``ak.Categorical(ak.array(data))``. :type data: Categorical | ArkoudaCategorical | ndarray | Sequence[Any] :raises TypeError: If ``data`` cannot be converted to Arkouda ``Categorical``. .. attribute:: default_fill_value Sentinel used when filling missing values (default: ""). :type: str .. py:method:: add_categories(*args, **kwargs) .. py:method:: as_ordered(*args, **kwargs) .. py:method:: as_unordered(*args, **kwargs) .. py:method:: astype(dtype: numpy.dtype[Any], copy: bool = True) -> numpy.typing.NDArray[Any] astype(dtype: pandas.core.dtypes.dtypes.ExtensionDtype, copy: bool = True) -> pandas.api.extensions.ExtensionArray astype(dtype: Any, copy: bool = True) -> Union[pandas.api.extensions.ExtensionArray, numpy.typing.NDArray[Any]] Cast to a specified dtype. * If ``dtype`` is categorical (pandas ``category`` / ``CategoricalDtype`` / ``ArkoudaCategoricalDtype``), returns an Arkouda-backed ``ArkoudaCategorical`` (optionally copied). * If ``dtype`` requests ``object``, returns a NumPy ``ndarray`` of dtype object containing the category labels (materialized to the client). * If ``dtype`` requests a string dtype, returns an Arkouda-backed ``ArkoudaStringArray`` containing the labels as strings. * Otherwise, casts the labels (as strings) to the requested dtype and returns an Arkouda-backed ExtensionArray. :param dtype: Target dtype. :type dtype: Any :param copy: Whether to force a copy when possible. If categorical-to-categorical and ``copy=True``, attempts to copy the underlying Arkouda ``Categorical`` (if supported). Default is True. :type copy: bool :returns: The cast result. Returns a NumPy array only when casting to ``object``; otherwise returns an Arkouda-backed ExtensionArray. :rtype: Union[ExtensionArray, NDArray[Any]] .. rubric:: Examples Casting to ``category`` returns an Arkouda-backed categorical array: >>> import arkouda as ak >>> from arkouda.pandas.extension import ArkoudaCategorical >>> c = ArkoudaCategorical(ak.Categorical(ak.array(["x", "y", "x"]))) >>> out = c.astype("category") >>> out is c False Forcing a copy when casting to the same categorical dtype returns a new array: >>> out2 = c.astype("category", copy=True) >>> out2 is c False >>> out2.to_ndarray() array(['x', 'y', 'x'], dtype='>> c.astype(object) array(['x', 'y', 'x'], dtype=object) Casting to a string dtype returns an Arkouda-backed string array of labels: >>> s = c.astype("string") >>> s.to_ndarray() array(['x', 'y', 'x'], dtype='>> c_num = ArkoudaCategorical(ak.Categorical(ak.array(["1", "2", "3"]))) >>> a = c_num.astype("int64") >>> a.to_ndarray() array([1, 2, 3]) .. py:method:: check_for_ordered(*args, **kwargs) .. py:attribute:: default_fill_value :type: str :value: '' .. py:method:: describe(*args, **kwargs) .. py:property:: dtype An instance of ExtensionDtype. .. seealso:: :py:obj:`api.extensions.ExtensionDtype` Base class for extension dtypes. :py:obj:`api.extensions.ExtensionArray` Base class for extension array types. :py:obj:`api.extensions.ExtensionArray.dtype` The dtype of an ExtensionArray. :py:obj:`Series.dtype` The dtype of a Series. :py:obj:`DataFrame.dtype` The dtype of a DataFrame. .. rubric:: Examples >>> pd.array([1, 2, 3]).dtype Int64Dtype() .. py:method:: from_codes(*args, **kwargs) :classmethod: :abstractmethod: .. py:method:: isna() -> numpy.ndarray # Return a boolean mask indicating missing values. # This implements the pandas ExtensionArray.isna contract and returns a # NumPy ndarray[bool] of the same length as this categorical array. # Returns # ------- # np.ndarray # Boolean mask where True indicates a missing value. # Raises # ------ # TypeError # If the underlying categorical cannot expose its codes or if missing # detection is unsupported. # .. py:method:: isnull() Alias for isna(). .. py:method:: max(*args, **kwargs) .. py:method:: memory_usage(*args, **kwargs) .. py:method:: min(*args, **kwargs) .. py:method:: notna(*args, **kwargs) .. py:method:: notnull(*args, **kwargs) .. py:method:: remove_categories(*args, **kwargs) .. py:method:: remove_unused_categories(*args, **kwargs) .. py:method:: rename_categories(*args, **kwargs) .. py:method:: reorder_categories(*args, **kwargs) .. py:method:: set_categories(*args, **kwargs) .. py:method:: set_ordered(*args, **kwargs) .. py:method:: sort_values(*args, **kwargs) .. py:method:: to_list(*args, **kwargs) .. py:method:: value_counts(dropna: bool = True) -> pandas.Series Return counts of categories as a pandas Series. This method computes category frequencies from the underlying Arkouda ``Categorical`` and returns them as a pandas ``Series``, where the index contains the category labels and the values contain the corresponding counts. :param dropna: Whether to drop missing values from the result. When ``True``, the result is filtered using the categorical's ``na_value``. When ``False``, all categories returned by the underlying computation are included. Default is True. :type dropna: bool :returns: A Series containing category counts. The index is an ``ArkoudaStringArray`` of category labels and the values are an ``ArkoudaArray`` of counts. :rtype: pd.Series .. rubric:: Notes - The result is computed server-side in Arkouda; only the (typically small) output of categories and counts is materialized for the pandas ``Series``. - This method does not yet support pandas options such as ``normalize``, ``sort``, or ``bins``. - The handling of missing values depends on the Arkouda ``Categorical`` definition of ``na_value``. .. rubric:: Examples >>> import arkouda as ak >>> from arkouda.pandas.extension import ArkoudaCategorical >>> >>> a = ArkoudaCategorical(["a", "b", "a", "c", "b", "a"]) >>> a.value_counts() a 3 b 2 c 1 dtype: int64 .. py:class:: ArkoudaCategoricalDtype Bases: :py:obj:`_ArkoudaBaseDtype` Arkouda-backed categorical dtype. This dtype integrates Arkouda's distributed ``Categorical`` type with the pandas ExtensionArray interface via :class:`ArkoudaCategorical`. It enables pandas objects (Series, DataFrame) to hold categorical data stored and processed on the Arkouda server, while exposing familiar pandas APIs. .. method:: construct_array_type() Returns the :class:`ArkoudaCategorical` used as the storage class. .. py:method:: construct_array_type() :classmethod: Return the ExtensionArray subclass that handles storage for this dtype. :returns: The :class:`ArkoudaCategorical` class associated with this dtype. :rtype: type .. py:attribute:: kind :value: 'O' A character code (one of 'biufcmMOSUV'), default 'O' This should match the NumPy dtype used when the array is converted to an ndarray, which is probably 'O' for object if the extension type cannot be represented as a built-in NumPy type. .. seealso:: :py:obj:`numpy.dtype.kind` .. py:attribute:: na_value :value: -1 Default NA value to use for this type. This is used in e.g. ExtensionArray.take. This should be the user-facing "boxed" version of the NA value, not the physical NA value for storage. e.g. for JSONArray, this is an empty dictionary. .. py:attribute:: name :value: 'category' A string identifying the data type. Will be used for display in, e.g. ``Series.dtype`` .. py:attribute:: type The scalar type for the array, e.g. ``int`` It's expected ``ExtensionArray[item]`` returns an instance of ``ExtensionDtype.type`` for scalar ``item``, assuming that value is valid (not NA). NA values do not need to be instances of `type`. .. py:class:: ArkoudaDataFrameAccessor(pandas_obj) Arkouda DataFrame accessor. Allows ``df.ak`` access to Arkouda-backed operations. .. py:method:: collect() -> pandas.DataFrame Materialize an Arkouda-backed pandas DataFrame into a NumPy-backed one. This operation retrieves each Arkouda-backed column from the server using ``to_ndarray()`` and constructs a standard pandas DataFrame whose columns are plain NumPy ``ndarray`` objects. The returned DataFrame has no dependency on Arkouda. :returns: A pandas DataFrame with NumPy-backed columns. :rtype: pd_DataFrame .. rubric:: Examples Converting an Arkouda-backed DataFrame into a NumPy-backed one: >>> import pandas as pd >>> import arkouda as ak >>> from arkouda.pandas.extension import ArkoudaDataFrameAccessor Create a pandas DataFrame and convert it to Arkouda-backed form: >>> df = pd.DataFrame({"x": [1, 2, 3], "y": ["a", "b", "c"]}) >>> akdf = df.ak.to_ak() ``akdf`` is still a pandas DataFrame, but its columns live on Arkouda: >>> type(akdf["x"].array) Now fully materialize it to local NumPy arrays: >>> collected = akdf.ak.collect() >>> collected x y 0 1 a 1 2 b 2 3 c The columns are now NumPy arrays: >>> type(collected["x"].values) .. py:method:: from_ak_legacy(akdf: arkouda.pandas.dataframe.DataFrame) -> pandas.DataFrame :staticmethod: Convert a legacy Arkouda ``DataFrame`` into a pandas ``DataFrame`` backed by Arkouda ExtensionArrays. This is the zero-copy-ish counterpart to :meth:`to_ak_legacy`. Instead of materializing columns into NumPy arrays, this function wraps each underlying Arkouda server-side array in the appropriate ``ArkoudaExtensionArray`` subclass (``ArkoudaArray``, ``ArkoudaStringArray``, or ``ArkoudaCategorical``). The resulting pandas ``DataFrame`` therefore keeps all data on the Arkouda server, enabling scalable operations without transferring data to the Python client. :param akdf: A legacy Arkouda ``DataFrame`` (``arkouda.pandas.dataframe.DataFrame``) whose columns are Arkouda objects (``pdarray``, ``Strings``, or ``Categorical``). :type akdf: ak_DataFrame :returns: A pandas ``DataFrame`` in which each column is an Arkouda-backed ExtensionArray—typically one of: * :class:`ArkoudaArray` * :class:`ArkoudaStringArray` * :class:`ArkoudaCategorical` No materialization to NumPy occurs. All column data remain server-resident. :rtype: pd_DataFrame .. rubric:: Notes * This function performs a **zero-copy** conversion for the underlying Arkouda arrays (server-side). Only lightweight Python wrappers are created. * The resulting pandas ``DataFrame`` can interoperate with most pandas APIs that support extension arrays. * Round-tripping through ``to_ak_legacy()`` and ``from_ak_legacy()`` preserves Arkouda semantics. .. rubric:: Examples Basic conversion ~~~~~~~~~~~~~~~~ >>> import arkouda as ak >>> akdf = ak.DataFrame({"a": ak.arange(5), "b": ak.array([10,11,12,13,14])}) >>> pdf = pd.DataFrame.ak.from_ak_legacy(akdf) >>> pdf a b 0 0 10 1 1 11 2 2 12 3 3 13 4 4 14 Columns stay Arkouda-backed ~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>> type(pdf["a"].array) >>> pdf["a"].array._data array([0 1 2 3 4]) No NumPy materialization occurs ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>> pdf["a"].values # pandas always materializes .values ArkoudaArray([0 1 2 3 4]) But the underlying column is still Arkouda: >>> pdf["a"].array._data array([0 1 2 3 4]) Categorical and Strings columns work as well ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>> akdf2 = ak.DataFrame({ ... "s": ak.array(["a","b","a"]), ... "c": ak.Categorical(ak.array(["e","f","g"])) ... }) >>> pdf2 = pd.DataFrame.ak.from_ak_legacy(akdf2) >>> type(pdf2["s"].array) >>> type(pdf2["c"].array) .. py:method:: merge(right: pandas.DataFrame, on: Optional[Union[str, List[str]]] = None, left_on: Optional[Union[str, List[str]]] = None, right_on: Optional[Union[str, List[str]]] = None, how: str = 'inner', left_suffix: str = '_x', right_suffix: str = '_y', convert_ints: bool = True, sort: bool = True) -> pandas.DataFrame Merge two Arkouda-backed pandas DataFrames using Arkouda's join. :param right: Right-hand DataFrame to merge with ``self._obj``. All columns must be Arkouda-backed ExtensionArrays. :type right: pd.DataFrame :param on: Column name(s) to join on. Must be present in both left and right DataFrames. If not provided and neither ``left_on`` nor ``right_on`` is set, the intersection of column names in left and right is used. Default is None. :type on: Optional[Union[str, List[str]]] :param left_on: Column name(s) from the left DataFrame to use as join keys. Must be used together with ``right_on``. If provided, ``on`` is ignored for the left side. Default is None :type left_on: Optional[Union[str, List[str]]] :param right_on: Column name(s) from the right DataFrame to use as join keys. Must be used together with ``left_on``. If provided, ``on`` is ignored for the right side. Default is None :type right_on: Optional[Union[str, List[str]]] :param how: Type of merge to be performed. One of ``'left'``, ``'right'``, ``'inner'``, or ``'outer'``. Default is 'inner'. :type how: str :param left_suffix: Suffix to apply to overlapping column names from the left frame that are not part of the join keys. Default is '_x'. :type left_suffix: str :param right_suffix: Suffix to apply to overlapping column names from the right frame that are not part of the join keys.Default is '_y'. :type right_suffix: str :param convert_ints: Whether to allow Arkouda to upcast integer columns as needed (for example, to accommodate missing values) during the merge. Default is True. :type convert_ints: bool :param sort: Whether to sort the join keys in the output. Default is True. :type sort: bool :returns: A pandas DataFrame whose columns are :class:`ArkoudaArray` ExtensionArrays. All column data remain on the Arkouda server. :rtype: pd.DataFrame :raises TypeError: If ``right`` is not a :class:`pandas.DataFrame` or if any column in the left or right DataFrame is not Arkouda-backed. .. py:method:: to_ak() -> pandas.DataFrame Convert this pandas DataFrame to an Arkouda-backed pandas DataFrame. Each column of the original pandas DataFrame is materialized to the Arkouda server via :func:`ak.array` and wrapped in an :class:`ArkoudaArray` ExtensionArray. The result is still a *pandas* DataFrame, but all column data reside on the Arkouda server and behave according to the Arkouda ExtensionArray API. This method does **not** return a legacy :class:`ak_DataFrame`. For that (server-side DataFrame structure), use :meth:`to_ak_legacy`. :returns: A pandas DataFrame whose columns are Arkouda-backed :class:`ArkoudaArray` objects. :rtype: pd_DataFrame .. rubric:: Examples Convert a plain pandas DataFrame to an Arkouda-backed one: >>> import pandas as pd >>> import arkouda as ak >>> df = pd.DataFrame({"x": [1, 2, 3], "y": ["a", "b", "c"]}) >>> akdf = df.ak.to_ak() >>> type(akdf) The columns are now Arkouda ExtensionArrays: >>> isinstance(akdf["x"].array, ArkoudaArray) True >>> akdf["x"].tolist() [np.int64(1), np.int64(2), np.int64(3)] Arkouda operations work directly on the columns: >>> akdf["x"].array._data + 10 array([11 12 13]) Converting back to a NumPy-backed DataFrame: >>> akdf_numpy = akdf.ak.collect() >>> akdf_numpy x y 0 1 a 1 2 b 2 3 c .. py:method:: to_ak_legacy() -> arkouda.pandas.dataframe.DataFrame Convert this pandas DataFrame into the legacy :class:`arkouda.DataFrame`. This method performs a *materializing* conversion of a pandas DataFrame into the legacy Arkouda DataFrame structure. Every column is converted to Arkouda server-side data: * Python / NumPy numeric and boolean arrays become :class:`pdarray`. * String columns become Arkouda string arrays (``Strings``). * Pandas categoricals become Arkouda ``Categorical`` objects. * The result is a legacy :class:`ak_DataFrame` whose columns all reside on the Arkouda server. This differs from :meth:`to_ak`, which creates Arkouda-backed ExtensionArrays but retains a pandas.DataFrame structure. :returns: The legacy Arkouda DataFrame with all columns materialized onto the Arkouda server. :rtype: ak_DataFrame .. rubric:: Examples Convert a plain pandas DataFrame to a legacy Arkouda DataFrame: >>> import pandas as pd >>> import arkouda as ak >>> df = pd.DataFrame({ ... "i": [1, 2, 3], ... "s": ["a", "b", "c"], ... "c": pd.Series(["low", "low", "high"], dtype="category"), ... }) >>> akdf = df.ak.to_ak_legacy() >>> type(akdf) Columns have the appropriate Arkouda types: >>> from arkouda.numpy.pdarrayclass import pdarray >>> from arkouda.numpy.strings import Strings >>> from arkouda.pandas.categorical import Categorical >>> isinstance(akdf["i"], pdarray) True >>> isinstance(akdf["s"], Strings) True >>> isinstance(akdf["c"], Categorical) True Values round-trip through the conversion: >>> akdf["i"].tolist() [1, 2, 3] .. py:class:: ArkoudaExtensionArray(data) Bases: :py:obj:`pandas.api.extensions.ExtensionArray` Abstract base class for custom 1-D array types. pandas will recognize instances of this class as proper arrays with a custom type and will not attempt to coerce them to objects. They may be stored directly inside a :class:`DataFrame` or :class:`Series`. .. attribute:: dtype .. attribute:: nbytes .. attribute:: ndim .. attribute:: shape .. method:: argsort .. method:: astype .. method:: copy .. method:: dropna .. method:: duplicated .. method:: factorize .. method:: fillna .. method:: equals .. method:: insert .. method:: interpolate .. method:: isin .. method:: isna .. method:: item .. method:: ravel .. method:: repeat .. method:: searchsorted .. method:: shift .. method:: take .. method:: tolist .. method:: unique .. method:: view .. method:: _accumulate .. method:: _concat_same_type .. method:: _explode .. method:: _formatter .. method:: _from_factorized .. method:: _from_sequence .. method:: _from_sequence_of_strings .. method:: _hash_pandas_object .. method:: _pad_or_backfill .. method:: _reduce .. method:: _values_for_argsort .. method:: _values_for_factorize .. seealso:: :py:obj:`api.extensions.ExtensionDtype` A custom data type, to be paired with an ExtensionArray. :py:obj:`api.extensions.ExtensionArray.dtype` An instance of ExtensionDtype. .. rubric:: Notes The interface includes the following abstract methods that must be implemented by subclasses: * _from_sequence * _from_factorized * __getitem__ * __len__ * __eq__ * dtype * nbytes * isna * take * copy * _concat_same_type * interpolate A default repr displaying the type, (truncated) data, length, and dtype is provided. It can be customized or replaced by by overriding: * __repr__ : A default repr for the ExtensionArray. * _formatter : Print scalars inside a Series or DataFrame. Some methods require casting the ExtensionArray to an ndarray of Python objects with ``self.astype(object)``, which may be expensive. When performance is a concern, we highly recommend overriding the following methods: * fillna * _pad_or_backfill * dropna * unique * factorize / _values_for_factorize * argsort, argmax, argmin / _values_for_argsort * searchsorted * map The remaining methods implemented on this class should be performant, as they only compose abstract methods. Still, a more efficient implementation may be available, and these methods can be overridden. One can implement methods to handle array accumulations or reductions. * _accumulate * _reduce One can implement methods to handle parsing from strings that will be used in methods such as ``pandas.io.parsers.read_csv``. * _from_sequence_of_strings This class does not inherit from 'abc.ABCMeta' for performance reasons. Methods and properties required by the interface raise ``pandas.errors.AbstractMethodError`` and no ``register`` method is provided for registering virtual subclasses. ExtensionArrays are limited to 1 dimension. They may be backed by none, one, or many NumPy arrays. For example, ``pandas.Categorical`` is an extension array backed by two arrays, one for codes and one for categories. An array of IPv6 address may be backed by a NumPy structured array with two fields, one for the lower 64 bits and one for the upper 64 bits. Or they may be backed by some other storage type, like Python lists. Pandas makes no assumptions on how the data are stored, just that it can be converted to a NumPy array. The ExtensionArray interface does not impose any rules on how this data is stored. However, currently, the backing data cannot be stored in attributes called ``.values`` or ``._values`` to ensure full compatibility with pandas internals. But other names as ``.data``, ``._data``, ``._items``, ... can be freely used. If implementing NumPy's ``__array_ufunc__`` interface, pandas expects that 1. You defer by returning ``NotImplemented`` when any Series are present in `inputs`. Pandas will extract the arrays and call the ufunc again. 2. You define a ``_HANDLED_TYPES`` tuple as an attribute on the class. Pandas inspect this to determine whether the ufunc is valid for the types present. See :ref:`extending.extension.ufunc` for more. By default, ExtensionArrays are not hashable. Immutable subclasses may override this behavior. .. rubric:: Examples Please see the following: https://github.com/pandas-dev/pandas/blob/main/pandas/tests/extension/list/array.py .. py:method:: argmax(axis=None, out=None) :abstractmethod: Return the index of maximum value. In case of multiple occurrences of the maximum value, the index corresponding to the first occurrence is returned. :param skipna: :type skipna: bool, default True :rtype: int .. seealso:: :py:obj:`ExtensionArray.argmin` Return the index of the minimum value. .. rubric:: Examples >>> arr = pd.array([3, 1, 2, 5, 4]) >>> arr.argmax() np.int64(3) .. py:method:: argmin(axis=None, out=None) :abstractmethod: Return the index of minimum value. In case of multiple occurrences of the minimum value, the index corresponding to the first occurrence is returned. :param skipna: :type skipna: bool, default True :rtype: int .. seealso:: :py:obj:`ExtensionArray.argmax` Return the index of the maximum value. .. rubric:: Examples >>> arr = pd.array([3, 1, 2, 5, 4]) >>> arr.argmin() np.int64(1) .. py:method:: argsort(*, ascending: bool = True, kind: str = 'quicksort', **kwargs: object) -> numpy.typing.NDArray[numpy.intp] Return the indices that would sort the array. This method computes the permutation indices that would sort the underlying Arkouda data and returns them as a NumPy array, in accordance with the pandas ``ExtensionArray`` contract. The indices can be used to reorder the array via ``take`` or ``iloc``. For floating-point data, ``NaN`` values are handled according to the ``na_position`` keyword argument. :param ascending: If True, sort values in ascending order. If False, sort in descending order. :type ascending: bool, default True :param kind: Sorting algorithm. Present for API compatibility with NumPy and pandas but currently ignored. :type kind: str, default "quicksort" :param \*\*kwargs: Additional keyword arguments for compatibility. Supported keyword: * ``na_position`` : {"first", "last"}, default "last" Where to place ``NaN`` values in the sorted result. This option is currently only applied for floating-point ``pdarray`` data; for ``Strings`` and ``Categorical`` data it has no effect. :returns: A 1D NumPy array of dtype ``np.intp`` containing the indices that would sort the array. :rtype: numpy.ndarray :raises ValueError: If ``na_position`` is not "first" or "last". :raises TypeError: If the underlying data type does not support sorting. .. rubric:: Notes * Supports Arkouda ``pdarray``, ``Strings``, and ``Categorical`` data. * For floating-point arrays, ``NaN`` values are repositioned according to ``na_position``. * The sorting computation occurs on the Arkouda server, but the resulting permutation indices are materialized on the client as a NumPy array, as required by pandas internals. .. rubric:: Examples >>> import arkouda as ak >>> from arkouda.pandas.extension import ArkoudaArray >>> a = ArkoudaArray(ak.array([3.0, float("nan"), 1.0])) >>> a.argsort() # NA last by default array([2, 0, 1]) >>> a.argsort(na_position="first") array([1, 2, 0]) .. py:method:: broadcast_arrays(*arrays) :abstractmethod: .. py:method:: broadcast_to(x, shape, /) :abstractmethod: .. py:method:: concat(arrays, /, *, axis=0) :abstractmethod: .. py:method:: copy(deep: bool = True) Return a copy of the array. :param deep: Whether to make a deep copy of the underlying Arkouda data. - If ``True``, the underlying server-side array is duplicated. - If ``False``, a new ExtensionArray wrapper is created but the underlying data is shared (no server-side copy). :type deep: bool, default True :returns: A new instance of the same concrete subclass containing either a deep copy or a shared reference to the underlying data. :rtype: ArkoudaExtensionArray .. rubric:: Notes Pandas semantics: ``deep=False`` creates a new wrapper but may share memory. ``deep=True`` must create an independent copy of the data. Arkouda semantics: Arkouda arrays do not presently support views. Therefore: - ``deep=False`` returns a new wrapper around the *same* server-side array. - ``deep=True`` forces a full server-side copy. .. rubric:: Examples Shallow copy (shared data): >>> import arkouda as ak >>> from arkouda.pandas.extension import ArkoudaArray >>> arr = ArkoudaArray(ak.arange(5)) >>> c1 = arr.copy(deep=False) >>> c1 ArkoudaArray([0 1 2 3 4]) Underlying data is the same object: >>> arr._data is c1._data True Deep copy (independent server-side data): >>> c2 = arr.copy(deep=True) >>> c2 ArkoudaArray([0 1 2 3 4]) Underlying data is a distinct pdarray on the server: >>> arr._data is c2._data False .. py:attribute:: default_fill_value :type: Optional[Union[arkouda.numpy.dtypes.all_scalars, str]] :value: -1 .. py:method:: duplicated(arrays, /, *, axis=0) :abstractmethod: Return boolean ndarray denoting duplicate values. :param keep: - ``first`` : Mark duplicates as ``True`` except for the first occurrence. - ``last`` : Mark duplicates as ``True`` except for the last occurrence. - False : Mark all duplicates as ``True``. :type keep: {'first', 'last', False}, default 'first' :returns: With true in indices where elements are duplicated and false otherwise. :rtype: ndarray[bool] .. seealso:: :py:obj:`DataFrame.duplicated` Return boolean Series denoting duplicate rows. :py:obj:`Series.duplicated` Indicate duplicate Series values. :py:obj:`api.extensions.ExtensionArray.unique` Compute the ExtensionArray of unique values. .. rubric:: Examples >>> pd.array([1, 1, 2, 3, 3], dtype="Int64").duplicated() array([False, True, False, False, True]) .. py:method:: expand_dims(x, /, *, axis) :abstractmethod: .. py:method:: factorize(use_na_sentinel=True) -> Tuple[numpy.typing.NDArray[numpy.intp], ArkoudaExtensionArray] Encode the values of this array as integer codes and unique values. This is similar to :func:`pandas.factorize`, but the grouping/factorization work is performed in Arkouda. The returned ``codes`` are a NumPy array for pandas compatibility, while ``uniques`` are returned as an ExtensionArray of the same type as ``self``. Each distinct non-missing value is assigned a unique integer code. For floating dtypes, ``NaN`` is treated as missing; for all other dtypes, no values are considered missing. :param use_na_sentinel: If True, missing values are encoded as ``-1`` in the returned codes. If False, missing values are assigned the code ``len(uniques)``. (Missingness is only detected for floating dtypes via ``NaN``.) :type use_na_sentinel: bool, default True :returns: A pair ``(codes, uniques)`` where: * ``codes`` is a 1D NumPy array of dtype ``np.intp`` with the same length as this array, containing the factor codes for each element. * ``uniques`` is an ExtensionArray containing the unique (non-missing) values, with the same extension type as ``self``. If ``use_na_sentinel=True``, missing values in ``codes`` are ``-1``. Otherwise they receive the code ``len(uniques)``. :rtype: (numpy.ndarray, ExtensionArray) .. rubric:: Notes * Only floating-point dtypes treat ``NaN`` as missing; for other dtypes, all values are treated as non-missing. * ``uniques`` are constructed from Arkouda's unique keys and returned as ``type(self)(uniques_ak)`` so that pandas internals (e.g. ``groupby``) can treat them as an ExtensionArray. * String/None/null missing-value behavior is not yet unified with pandas. .. rubric:: Examples >>> import arkouda as ak >>> from arkouda.pandas.extension import ArkoudaArray >>> arr = ArkoudaArray(ak.array([1, 2, 1, 3])) >>> codes, uniques = arr.factorize() >>> codes array([0, 1, 0, 2]) >>> uniques ArkoudaArray([1 2 3]) .. py:method:: interpolate(method='linear', *, limit=None, **kwargs) :abstractmethod: Fill NaN values using an interpolation method. :param method: Interpolation technique to use. One of: * 'linear': Ignore the index and treat the values as equally spaced. This is the only method supported on MultiIndexes. * 'time': Works on daily and higher resolution data to interpolate given length of interval. * 'index', 'values': use the actual numerical values of the index. * 'nearest', 'zero', 'slinear', 'quadratic', 'cubic', 'barycentric', 'polynomial': Passed to scipy.interpolate.interp1d, whereas 'spline' is passed to scipy.interpolate.UnivariateSpline. These methods use the numerical values of the index. Both 'polynomial' and 'spline' require that you also specify an order (int), e.g. arr.interpolate(method='polynomial', order=5). * 'krogh', 'piecewise_polynomial', 'spline', 'pchip', 'akima', 'cubicspline': Wrappers around the SciPy interpolation methods of similar names. See Notes. * 'from_derivatives': Refers to scipy.interpolate.BPoly.from_derivatives. :type method: str, default 'linear' :param axis: Axis to interpolate along. For 1-dimensional data, use 0. :type axis: int :param index: Index to use for interpolation. :type index: Index :param limit: Maximum number of consecutive NaNs to fill. Must be greater than 0. :type limit: int or None :param limit_direction: Consecutive NaNs will be filled in this direction. :type limit_direction: {'forward', 'backward', 'both'} :param limit_area: If limit is specified, consecutive NaNs will be filled with this restriction. * None: No fill restriction. * 'inside': Only fill NaNs surrounded by valid values (interpolate). * 'outside': Only fill NaNs outside valid values (extrapolate). :type limit_area: {'inside', 'outside'} or None :param copy: If True, a copy of the object is returned with interpolated values. :type copy: bool :param \*\*kwargs: Keyword arguments to pass on to the interpolating function. :type \*\*kwargs: optional :returns: An ExtensionArray with interpolated values. :rtype: ExtensionArray .. seealso:: :py:obj:`Series.interpolate` Interpolate values in a Series. :py:obj:`DataFrame.interpolate` Interpolate values in a DataFrame. .. rubric:: Notes - All parameters must be specified as keyword arguments. - The 'krogh', 'piecewise_polynomial', 'spline', 'pchip' and 'akima' methods are wrappers around the respective SciPy implementations of similar names. These use the actual numerical values of the index. .. rubric:: Examples Interpolating values in a NumPy array: >>> arr = pd.arrays.NumpyExtensionArray(np.array([0, 1, np.nan, 3])) >>> arr.interpolate( ... method="linear", ... limit=3, ... limit_direction="forward", ... index=pd.Index(range(len(arr))), ... fill_value=1, ... copy=False, ... axis=0, ... limit_area="inside", ... ) [0.0, 1.0, 2.0, 3.0] Length: 4, dtype: float64 Interpolating values in a FloatingArray: >>> arr = pd.array([1.0, pd.NA, 3.0, 4.0, pd.NA, 6.0], dtype="Float64") >>> arr.interpolate( ... method="linear", ... axis=0, ... index=pd.Index(range(len(arr))), ... limit=None, ... limit_direction="both", ... limit_area=None, ... copy=True, ... ) [1.0, 2.0, 3.0, 4.0, 5.0, 6.0] Length: 6, dtype: Float64 .. py:method:: kurt(*args, **kwargs) .. py:method:: median(*args, **kwargs) .. py:method:: permute_dims(x, /, axes) :abstractmethod: .. py:method:: reshape(x, /, shape) :abstractmethod: .. py:method:: sem(*args, **kwargs) .. py:method:: skew(*args, **kwargs) .. py:method:: split(x, indices_or_sections, /, *, axis=0) :abstractmethod: .. py:method:: squeeze(x, /, *, axis=None) :abstractmethod: .. py:method:: stack(arrays, /, *, axis=0) :abstractmethod: .. py:method:: swapaxes(*args, **kwargs) .. py:method:: take(indexer, fill_value=None, allow_fill=False) Take elements by (0-based) position, returning a new array. This implementation: * normalizes the indexer to Arkouda int64, * explicitly emulates NumPy-style negative wrapping when allow_fill=False, * If ``allow_fill=True``, then **only** ``-1`` is allowed as a sentinel for missing; those positions are filled with ``fill_value``. Any other negative index raises ``ValueError``. * validates bounds (raising IndexError) when allow_fill=True, * gathers once, then fills masked positions in a single pass. .. py:method:: to_ndarray() -> numpy.ndarray Convert to a NumPy ndarray, without any dtype conversion or copy options. :returns: A new NumPy array materialized from the underlying Arkouda data. :rtype: numpy.ndarray .. rubric:: Notes This is a lightweight convenience wrapper around the backend's ``.to_ndarray()`` method. Unlike :meth:`to_numpy`, this method does not accept ``dtype`` or ``copy`` arguments and always performs a materialization step. .. py:method:: to_numpy(dtype=None, copy=False, na_value=None) Convert the array to a NumPy ndarray. :param dtype: Desired dtype for the result. If None, the underlying dtype is preserved. :type dtype: str, numpy.dtype, optional :param copy: Whether to ensure a copy is made: - If False, a view of the underlying buffer may be returned when possible. - If True, always return a new NumPy array. :type copy: bool, default False :returns: NumPy array representation of the data. :rtype: numpy.ndarray .. py:method:: view(dtype=None) Return a shallow view of the ExtensionArray. This method is used by pandas internals (e.g. ``BlockManager.copy(deep=False)``) to create a new ``ExtensionArray`` wrapper that shares the same underlying Arkouda data without materializing or copying server-side arrays. :param dtype: If provided and different from the current dtype, a dtype conversion is requested. In this case, the operation is delegated to ``astype(dtype, copy=False)`` and a new array with the requested dtype is returned. :type dtype: optional :returns: A new ExtensionArray instance of the same concrete class that references the same underlying Arkouda data. :rtype: ArkoudaExtensionArray .. rubric:: Notes * This method performs a **shallow** copy only: the underlying Arkouda server-side array is shared between the original and the returned object. * No data is materialized, copied, or cast unless ``dtype`` is explicitly requested. * Optional internal attributes (e.g. masks, categorical metadata, caches) are copied by reference when present, to preserve logical consistency. * This method exists to satisfy pandas' expectations around ``.view()`` and ``copy(deep=False)`` semantics for ``ExtensionArray`` implementations. .. rubric:: Examples Create a shallow view that shares the same underlying data: >>> import arkouda as ak >>> from arkouda.pandas.extension._arkouda_array import ArkoudaArray >>> ak_arr = ak.arange(5) >>> ea = ArkoudaArray(ak_arr) >>> v = ea.view() >>> v is ea False >>> v._data is ea._data True Requesting a dtype conversion delegates to ``astype`` without copying the underlying data unless required: >>> v2 = ea.view(dtype="float64") >>> v2.dtype == ea.astype("float64").dtype True This method is commonly invoked indirectly by pandas during operations that require shallow copies: >>> import pandas as pd >>> s = pd.Series(ea) >>> df = pd.DataFrame({"col": s}) # does not raise .. seealso:: :py:obj:`copy` Create a shallow or deep copy of the array. :py:obj:`astype` Cast the array to a new dtype. .. py:class:: ArkoudaFloat64Dtype Bases: :py:obj:`_ArkoudaBaseDtype` Arkouda-backed 64-bit floating-point dtype. This dtype integrates Arkouda's server-backed `pdarray` with the pandas ExtensionArray interface via :class:`ArkoudaArray`. It allows pandas objects (Series, DataFrame) to store and manipulate large distributed float64 arrays without materializing them on the client. .. method:: construct_array_type() Returns the :class:`ArkoudaArray` class used for storage. .. py:method:: construct_array_type() :classmethod: Return the ExtensionArray subclass that handles storage for this dtype. :returns: The :class:`ArkoudaArray` class associated with this dtype. :rtype: type .. py:attribute:: kind :value: 'f' A character code (one of 'biufcmMOSUV'), default 'O' This should match the NumPy dtype used when the array is converted to an ndarray, which is probably 'O' for object if the extension type cannot be represented as a built-in NumPy type. .. seealso:: :py:obj:`numpy.dtype.kind` .. py:attribute:: na_value Default NA value to use for this type. This is used in e.g. ExtensionArray.take. This should be the user-facing "boxed" version of the NA value, not the physical NA value for storage. e.g. for JSONArray, this is an empty dictionary. .. py:attribute:: name :value: 'float64' A string identifying the data type. Will be used for display in, e.g. ``Series.dtype`` .. py:attribute:: type The scalar type for the array, e.g. ``int`` It's expected ``ExtensionArray[item]`` returns an instance of ``ExtensionDtype.type`` for scalar ``item``, assuming that value is valid (not NA). NA values do not need to be instances of `type`. .. py:class:: ArkoudaIndexAccessor(pandas_obj: Union[pandas.Index, pandas.MultiIndex]) Arkouda-backed index accessor for pandas ``Index`` and ``MultiIndex``. This accessor provides methods for converting between: * NumPy-backed pandas indexes * pandas indexes backed by :class:`ArkoudaExtensionArray` (zero-copy EA mode) * legacy Arkouda ``ak.Index`` and ``ak.MultiIndex`` objects The ``.ak`` namespace mirrors the DataFrame accessor, providing a consistent interface for distributed index operations. All conversions avoid unnecessary NumPy materialization unless explicitly requested via :meth:`collect`. :param pandas_obj: The pandas ``Index`` or ``MultiIndex`` instance that this accessor wraps. :type pandas_obj: Union[pd.Index, pd.MultiIndex] .. rubric:: Notes * ``to_ak`` → pandas object, Arkouda-backed (ExtensionArrays). * ``to_ak_legacy`` → legacy Arkouda index objects. * ``collect`` → NumPy-backed pandas object. * ``is_arkouda`` → reports whether the index is Arkouda-backed. .. rubric:: Examples Basic single-level Index conversion: >>> import pandas as pd >>> import arkouda as ak >>> idx = pd.Index([10, 20, 30], name="vals") Convert to Arkouda-backed: >>> ak_idx = idx.ak.to_ak() >>> ak_idx.ak.is_arkouda True Materialize back: >>> restored = ak_idx.ak.collect() >>> restored.equals(idx) True Convert to legacy Arkouda: >>> ak_legacy = idx.ak.to_ak_legacy() >>> type(ak_legacy) MultiIndex conversion: >>> arrays = [[1, 1, 2], ["red", "blue", "red"]] >>> midx = pd.MultiIndex.from_arrays(arrays, names=["num", "color"]) >>> ak_midx = midx.ak.to_ak() >>> ak_midx.ak.is_arkouda True .. py:method:: collect() -> Union[pandas.Index, pandas.MultiIndex] Materialize this Index or MultiIndex back to a plain NumPy-backed pandas index. :returns: An Index whose underlying data are plain NumPy arrays. :rtype: Union[pd.Index, pd.MultiIndex] :raises TypeError: If the index is Arkouda-backed but does not expose the expected ``_data`` attribute, or if the index type is unsupported. .. rubric:: Examples Single-level Index round-trip: >>> import pandas as pd >>> import arkouda as ak >>> idx = pd.Index([1, 2, 3], name="x") >>> ak_idx = idx.ak.to_ak() >>> np_idx = ak_idx.ak.collect() >>> np_idx Index([1, 2, 3], dtype='int64', name='x') >>> np_idx.equals(idx) True Behavior when already NumPy-backed (no-op except shallow copy): >>> plain = pd.Index([10, 20, 30]) >>> plain2 = plain.ak.collect() >>> plain2.equals(plain) True Verifying that Arkouda-backed values materialize to NumPy: >>> ak_idx = pd.Index([5, 6, 7]).ak.to_ak() >>> type(ak_idx.array) >>> out = ak_idx.ak.collect() >>> type(out.array) .. py:method:: concat(other: Union[pandas.Index, pandas.MultiIndex]) -> Union[pandas.Index, pandas.MultiIndex] Concatenate this index with another Arkouda-backed index. Both ``self._obj`` and ``other`` must be convertible to legacy Arkouda :class:`ak_Index` / :class:`ak_MultiIndex`. The concatenation is performed in Arkouda and the result is wrapped back into an Arkouda-backed pandas Index or MultiIndex. :param other: The other index to concatenate with ``self._obj``. It must be a :class:`pandas.Index` or :class:`pandas.MultiIndex`. :type other: Union[pd.Index, pd.MultiIndex] :returns: A pandas Index or MultiIndex backed by Arkouda, containing the concatenated values from ``self._obj`` and ``other``. :rtype: Union[pd.Index, pd.MultiIndex] :raises TypeError: If ``other`` is not a :class:`pandas.Index` or :class:`pandas.MultiIndex`. .. py:method:: from_ak_legacy(akidx: Union[arkouda.pandas.index.Index, arkouda.pandas.index.MultiIndex]) -> Union[pandas.Index, pandas.MultiIndex] :staticmethod: Convert a legacy Arkouda ``ak.Index`` or ``ak.MultiIndex`` into a pandas Index/MultiIndex backed by Arkouda ExtensionArrays. This is the index analogue of ``df.ak.from_ak_legacy_ea()``: it performs a zero-copy-style wrapping of Arkouda server-side arrays into :class:`ArkoudaExtensionArray` objects, producing a pandas Index or MultiIndex whose levels remain distributed on the Arkouda server. No materialization to NumPy occurs. :param akidx: The legacy Arkouda Index or MultiIndex to wrap. :type akidx: Union[ak_Index, ak_MultiIndex] :returns: A pandas index object whose underlying data are :class:`ArkoudaExtensionArray` instances referencing the Arkouda server-side arrays. :rtype: Union[pd.Index, pd.MultiIndex] .. rubric:: Notes * ``ak.Index`` → ``pd.Index`` with Arkouda-backed values. * ``ak.MultiIndex`` → ``pd.MultiIndex`` where each level is backed by an :class:`ArkoudaExtensionArray`. * This function does not validate whether the input is already wrapped; callers should ensure the argument is a legacy Arkouda index object. .. rubric:: Examples >>> import arkouda as ak >>> import pandas as pd Wrap a legacy ``ak.Index`` into a pandas ``Index`` without copying: >>> ak_idx = ak.Index(ak.arange(5)) >>> pd_idx = pd.Index.ak.from_ak_legacy(ak_idx) >>> pd_idx Index([0, 1, 2, 3, 4], dtype='int64') The resulting index stores its values on the Arkouda server: >>> type(pd_idx.array) MultiIndex example: >>> ak_lvl1 = ak.array(['a', 'a', 'b', 'b']) >>> ak_lvl2 = ak.array([1, 2, 1, 2]) >>> ak_mi = ak.MultiIndex([ak_lvl1, ak_lvl2], names=['letter', 'number']) >>> pd_mi = pd.Index.ak.from_ak_legacy(ak_mi) >>> pd_mi MultiIndex([('a', 1), ('a', 2), ('b', 1), ('b', 2)], names=['letter', 'number']) Each level is backed by an Arkouda ExtensionArray and remains distributed: >>> [type(level._data) for level in pd_mi.levels] [, ] No NumPy materialization occurs; the underlying data stay on the Arkouda server. .. py:property:: is_arkouda :type: bool Return whether the underlying Index is Arkouda-backed. An Index or MultiIndex is considered Arkouda-backed if its underlying storage uses :class:`ArkoudaExtensionArray`. This applies to both single-level and multi-level indices. :returns: True if the Index/MultiIndex is backed by Arkouda server-side arrays, False otherwise. :rtype: bool .. rubric:: Examples NumPy-backed Index: >>> import pandas as pd >>> idx = pd.Index([1, 2, 3]) >>> idx.ak.is_arkouda False Arkouda-backed single-level Index: >>> import arkouda as ak >>> ak_idx = pd.Index([10, 20, 30]).ak.to_ak() >>> ak_idx.ak.is_arkouda True Arkouda-backed MultiIndex: >>> arrays = [[1, 1, 2], ["a", "b", "a"]] >>> midx = pd.MultiIndex.from_arrays(arrays) >>> ak_midx = midx.ak.to_ak() >>> ak_midx.ak.is_arkouda True .. py:method:: lookup(key: object) -> arkouda.numpy.pdarrayclass.pdarray Perform a server-side lookup on the underlying Arkouda index. This is a thin convenience wrapper around the legacy :meth:`arkouda.pandas.index.Index.lookup` / :meth:`arkouda.pandas.index.MultiIndex.lookup` methods. It converts the pandas index to a legacy Arkouda index, performs the lookup on the server, and returns the resulting boolean mask. :param key: Lookup key or keys, interpreted in the same way as the legacy Arkouda ``Index`` / ``MultiIndex`` ``lookup`` method. For a single-level index this may be a scalar or an Arkouda ``pdarray``; for MultiIndex it may be a tuple or sequence of values/arrays. :type key: object :returns: A boolean Arkouda array indicating which positions in the index match the given ``key``. :rtype: pdarray .. py:method:: to_ak() -> Union[pandas.Index, pandas.MultiIndex] Convert this pandas Index or MultiIndex to an Arkouda-backed index. Unlike :meth:`to_ak_legacy`, which returns a legacy Arkouda Index object, this method returns a *pandas* Index or MultiIndex whose data reside on the Arkouda server and are wrapped in :class:`ArkoudaExtensionArray` ExtensionArrays. The conversion is zero-copy with respect to NumPy: no materialization to local NumPy arrays occurs. :returns: An Index whose underlying data live on the Arkouda server. :rtype: Union[pd.Index, pd.MultiIndex] .. rubric:: Examples Convert a simple Index to Arkouda-backed form: >>> import pandas as pd >>> import arkouda as ak >>> idx = pd.Index([10, 20, 30], name="values") >>> ak_idx = idx.ak.to_ak() >>> type(ak_idx.array) Round-trip back to NumPy-backed pandas objects: >>> restored = ak_idx.ak.collect() >>> restored.equals(idx) True .. py:method:: to_ak_legacy() -> Union[arkouda.pandas.index.Index, arkouda.pandas.index.MultiIndex] Convert this pandas Index or MultiIndex into a legacy Arkouda ``ak.Index`` or ``ak.MultiIndex`` object. This is the index analogue of ``df.ak.to_ak_legacy()``, returning the *actual* Arkouda index objects on the server, rather than a pandas wrapper backed by :class:`ArkoudaExtensionArray`. The conversion is zero-copy with respect to NumPy: values are transferred directly into Arkouda arrays without materializing to local NumPy. :returns: A legacy Arkouda Index/MultiIndex whose data live on the Arkouda server. :rtype: Union[ak_Index, ak_MultiIndex] .. rubric:: Examples Convert a simple pandas Index into a legacy Arkouda Index: >>> import pandas as pd >>> import arkouda as ak >>> idx = pd.Index([10, 20, 30], name="numbers") >>> ak_idx = idx.ak.to_ak_legacy() >>> type(ak_idx) >>> ak_idx.name 'numbers' .. py:method:: to_csv(prefix_path: str, dataset: str = 'index') -> str Save this index to CSV via the legacy ``to_csv`` implementation and return the server response message. .. py:method:: to_dict(labels=None) Convert this index to a dictionary representation if supported. For MultiIndex, this delegates to ``MultiIndex.to_dict`` and returns a mapping of label -> Index. For single-level Indexes, this will raise a TypeError, since the legacy API only defines ``to_dict`` on MultiIndex. .. py:method:: to_hdf(prefix_path: str, dataset: str = 'index', mode: Literal['truncate', 'append'] = 'truncate', file_type: Literal['single', 'distribute'] = 'distribute') -> str Save this index to HDF5 via the legacy ``to_hdf`` implementation and return the server response message. .. py:method:: to_parquet(prefix_path: str, dataset: str = 'index', mode: Literal['truncate', 'append'] = 'truncate') -> str Save this index to Parquet via the legacy ``to_parquet`` implementation and return the server response message. .. py:method:: update_hdf(prefix_path: str, dataset: str = 'index', repack: bool = True) Overwrite or append this index into an existing HDF5 dataset via the legacy ``update_hdf`` implementation. .. py:class:: ArkoudaInt64Dtype Bases: :py:obj:`_ArkoudaBaseDtype` Extension dtype for Arkouda-backed 64-bit integers. This dtype allows seamless use of Arkouda's distributed ``int64`` arrays inside pandas objects (``Series``, ``Index``, ``DataFrame``). It is backed by :class:`arkouda.pdarray` with ``dtype='int64'`` and integrates with pandas via the :class:`~arkouda.pandas.extension._arkouda_array.ArkoudaArray` extension array. .. method:: construct_array_type() Return the associated extension array class (:class:`ArkoudaArray`). .. py:method:: construct_array_type() :classmethod: Return the associated pandas ExtensionArray type. This is part of the pandas ExtensionDtype interface and is used internally by pandas when constructing arrays of this dtype. It ensures that operations like ``Series(..., dtype=ArkoudaInt64Dtype())`` produce the correct Arkouda-backed extension array. :returns: The :class:`ArkoudaArray` class that implements the storage and behavior for this dtype. :rtype: type .. rubric:: Notes - This hook tells pandas which ExtensionArray to instantiate whenever this dtype is requested. - All Arkouda dtypes defined in this module will return :class:`ArkoudaArray` (or a subclass thereof). .. rubric:: Examples >>> from arkouda.pandas.extension import ArkoudaInt64Dtype >>> ArkoudaInt64Dtype.construct_array_type() .. py:attribute:: kind :value: 'i' A character code (one of 'biufcmMOSUV'), default 'O' This should match the NumPy dtype used when the array is converted to an ndarray, which is probably 'O' for object if the extension type cannot be represented as a built-in NumPy type. .. seealso:: :py:obj:`numpy.dtype.kind` .. py:attribute:: na_value :value: -1 Default NA value to use for this type. This is used in e.g. ExtensionArray.take. This should be the user-facing "boxed" version of the NA value, not the physical NA value for storage. e.g. for JSONArray, this is an empty dictionary. .. py:attribute:: name :value: 'int64' A string identifying the data type. Will be used for display in, e.g. ``Series.dtype`` .. py:attribute:: type The scalar type for the array, e.g. ``int`` It's expected ``ExtensionArray[item]`` returns an instance of ``ExtensionDtype.type`` for scalar ``item``, assuming that value is valid (not NA). NA values do not need to be instances of `type`. .. py:class:: ArkoudaSeriesAccessor(pandas_obj: pandas.Series) Arkouda-backed Series accessor. Provides a symmetric API to the Index accessor for Series-level conversion and materialization. :param pandas_obj: The Series this accessor wraps. :type pandas_obj: pd.Series .. rubric:: Examples >>> import pandas as pd >>> import arkouda as ak >>> s = pd.Series([1, 2, 3], name="nums") Convert to Arkouda-backed: >>> ak_s = s.ak.to_ak() >>> ak_s.ak.is_arkouda True Materialize back: >>> restored = ak_s.ak.collect() >>> restored.equals(s) True Convert to legacy Arkouda: >>> ak_arr = s.ak.to_ak_legacy() >>> type(ak_arr) .. py:method:: apply(func: Union[Callable[[Any], Any], str], result_dtype: Optional[Union[numpy.dtype, str]] = None) -> pandas.Series Apply a Python function element-wise to this Arkouda-backed Series. This delegates to :func:`arkouda.apply.apply`, executing the function on the Arkouda server without materializing to NumPy. :param func: A Python callable or a specially formatted lambda string (e.g. ``"lambda x,: x+1"``). :type func: Union[Callable[[Any], Any], str] :param result_dtype: The dtype of the resulting array. Required if the function changes dtype. Must be compatible with :func:`arkouda.apply.apply`. Default is None. :type result_dtype: Optional[Union[np.dtype, str]] :returns: A new Arkouda-backed Series containing the transformed values. :rtype: pd.Series :raises TypeError: If the Series is not Arkouda-backed or if its values are not a numeric pdarray. .. py:method:: argsort(*, ascending: bool = True, **kwargs: object) -> pandas.Series Return the integer indices that would sort the Series values. This mirrors ``pandas.Series.argsort`` but returns an Arkouda-backed pandas Series (distributed), not a NumPy-backed result. :param ascending: Sort values in ascending order if True, descending order if False. Default is True. :type ascending: bool :param \*\*kwargs: Additional keyword arguments. Supported keyword arguments ~~~~~~~~~~~~~~~~~~~~~~~~~~~ na_position : {"first", "last"}, default "last" Where to place NaN values in the sorted result. Currently only applied for floating-point ``pdarray`` data; for ``Strings`` and ``Categorical`` it has no effect. :type \*\*kwargs: object :returns: An Arkouda-backed Series of integer permutation indices. The returned Series has the same index as the original. :rtype: pd.Series :raises TypeError: If the Series is not Arkouda-backed, or the underlying dtype does not support sorting. :raises ValueError: If ``na_position`` is not "first" or "last". .. py:method:: collect() -> pandas.Series Materialize this Series back to a NumPy-backed pandas Series. :returns: A NumPy-backed Series. :rtype: pd.Series .. rubric:: Examples >>> s = pd.Series([1,2,3]).ak.to_ak() >>> out = s.ak.collect() >>> type(out.array) .. py:method:: from_ak_legacy(akarr: Any, name: str | None = None) -> pandas.Series :staticmethod: Construct an Arkouda-backed pandas Series directly from a legacy Arkouda array. This performs zero-copy wrapping using ArkoudaExtensionArray and does not materialize data. :param akarr: A legacy Arkouda array (pdarray, Strings, or Categorical). :type akarr: Any :param name: Optional. Name of the resulting Series. :type name: str | None :returns: A pandas Series backed by ArkoudaExtensionArray. :rtype: pd.Series .. rubric:: Examples >>> import arkouda as ak >>> import pandas as pd Basic example with a legacy ``pdarray``: >>> ak_arr = ak.arange(5) >>> s = pd.Series.ak.from_ak_legacy(ak_arr, name="values") >>> s 0 0 1 1 2 2 3 3 4 4 Name: values, dtype: int64 The underlying data remain on the Arkouda server: >>> type(s._values) Using a legacy ``Strings`` object: >>> ak_str = ak.array(["a", "b", "c"]) >>> s_str = pd.Series.ak.from_ak_legacy(ak_str, name="letters") >>> s_str 0 a 1 b 2 c Name: letters, dtype: string Using a legacy ``Categorical``: >>> ak_cat = ak.Categorical(ak.array(["red", "blue", "red"])) >>> s_cat = pd.Series.ak.from_ak_legacy(ak_cat, name="color") >>> s_cat 0 red 1 blue 2 red Name: color, dtype: category No NumPy copies are made—the Series is a zero-copy wrapper over Arkouda server-side arrays. .. py:method:: groupby() -> arkouda.pandas.groupbyclass.GroupBy Return an Arkouda GroupBy object for this Series, without materializing. :rtype: GroupBy :raises TypeError: Returns TypeError if Series is not arkouda backed. .. rubric:: Examples >>> import arkouda as ak >>> import pandas as pd >>> s = pd.Series([80, 443, 80]).ak.to_ak() >>> g = s.ak.groupby() >>> keys, counts = g.size() .. py:property:: is_arkouda :type: bool Return True if this Series is fully Arkouda-backed. A Series is considered Arkouda-backed when both: 1. Its values are stored in an ``ArkoudaExtensionArray``. 2. Its index (including each level of a MultiIndex) is backed by ``ArkoudaExtensionArray``. :returns: True if both data and index are Arkouda-backed, otherwise False. :rtype: bool .. rubric:: Examples >>> s = pd.Series([1, 2, 3]) >>> s.ak.is_arkouda False >>> ak_s = s.ak.to_ak() >>> ak_s.ak.is_arkouda True .. py:method:: locate(key: object) -> pandas.Series Lookup values by index label on the Arkouda server. This is a thin wrapper around the legacy :meth:`arkouda.pandas.series.Series.locate` method. It converts the pandas Series to a legacy Arkouda ``ak.Series``, performs the locate operation on the server, and wraps the result back into an Arkouda-backed pandas Series (ExtensionArray-backed) without NumPy materialization. :param key: Lookup key or keys. Interpreted in the same way as the legacy Arkouda ``Series.locate`` method. This may be: - a scalar - a list/tuple of scalars - an Arkouda ``pdarray`` - an Arkouda ``Index`` / ``MultiIndex`` - an Arkouda ``Series`` (special case: preserves key index) :type key: object :returns: A pandas Series backed by Arkouda ExtensionArrays containing the located values. The returned Series remains distributed (no NumPy materialization) and is sorted by index. :rtype: pd.Series .. rubric:: Notes * This method is Arkouda-specific; pandas does not define ``Series.locate``. * If ``key`` is a pandas Index/MultiIndex, consider converting it via ``key.ak.to_ak_legacy()`` before calling ``locate`` for the most direct path. .. rubric:: Examples >>> import arkouda as ak >>> import pandas as pd >>> s = pd.Series([10, 20, 30], index=pd.Index([1, 2, 3])).ak.to_ak() >>> out = s.ak.locate([3, 1]) >>> out.tolist() [np.int64(10), np.int64(30)] .. py:method:: to_ak() -> pandas.Series Convert this pandas Series into an Arkouda-backed Series. This method produces a pandas ``Series`` whose underlying storage uses :class:`~arkouda.pandas.extension.ArkoudaExtensionArray`, meaning the data reside on the Arkouda server rather than in local NumPy buffers. The conversion is zero-copy with respect to NumPy: data are only materialized if the original Series is NumPy-backed. The returned Series preserves the original index (including index names) and the original Series ``name``. :returns: A Series backed by an :class:`ArkoudaExtensionArray`, referencing Arkouda server-side arrays. The resulting Series retains the original index and name. :rtype: pd.Series .. rubric:: Notes * If the Series is already Arkouda-backed, this method returns a new Series that is semantically equivalent and still Arkouda-backed. * If the Series is NumPy-backed, values are transferred to Arkouda server-side arrays via ``ak.array``. * No NumPy-side materialization occurs when converting an already Arkouda-backed Series. .. rubric:: Examples Basic numeric conversion: >>> import pandas as pd >>> import arkouda as ak >>> s = pd.Series([1, 2, 3], name="nums") >>> s_ak = s.ak.to_ak() >>> type(s_ak.array) >>> s_ak.tolist() [np.int64(1), np.int64(2), np.int64(3)] Preserving the index and name: >>> idx = pd.Index([10, 20, 30], name="id") >>> s = pd.Series([100, 200, 300], index=idx, name="values") >>> s_ak = s.ak.to_ak() >>> s_ak.name 'values' >>> s_ak.index.name 'id' String data: >>> s = pd.Series(["red", "blue", "green"], name="colors") >>> s_ak = s.ak.to_ak() >>> s_ak.tolist() [np.str_('red'), np.str_('blue'), np.str_('green')] Idempotence (calling ``to_ak`` repeatedly stays Arkouda-backed): >>> s_ak2 = s_ak.ak.to_ak() >>> s_ak2.ak.is_arkouda True >>> s_ak2.tolist() == s_ak.tolist() True .. py:method:: to_ak_legacy() -> arkouda.pandas.series.Series Convert this Series into a legacy Arkouda Series. :returns: The legacy Arkouda Series.. :rtype: ak_Series .. rubric:: Examples >>> import pandas as pd >>> s = pd.Series([10,20,30]) >>> ak_arr = s.ak.to_ak_legacy() >>> type(ak_arr) .. py:class:: ArkoudaStringArray(data: arkouda.numpy.strings.Strings | numpy.ndarray | Sequence[Any] | ArkoudaStringArray) Bases: :py:obj:`arkouda.pandas.extension._arkouda_extension_array.ArkoudaExtensionArray`, :py:obj:`pandas.api.extensions.ExtensionArray` Arkouda-backed string pandas ExtensionArray. Ensures the underlying data is an Arkouda ``Strings`` object. Accepts existing ``Strings`` or converts from NumPy arrays and Python sequences of strings. :param data: Input to wrap or convert. - If ``Strings``, used directly. - If NumPy/sequence, converted via ``ak.array``. - If another ``ArkoudaStringArray``, its backing ``Strings`` is reused. :type data: Strings | ndarray | Sequence[Any] | ArkoudaStringArray :raises TypeError: If ``data`` cannot be converted to Arkouda ``Strings``. .. attribute:: default_fill_value Sentinel used when filling missing values (default: ""). :type: str .. py:method:: all(*args, **kwargs) .. py:method:: any(*args, **kwargs) .. py:method:: argpartition(*args, **kwargs) .. py:method:: astype(dtype: numpy.dtype[Any], copy: bool = True) -> numpy.typing.NDArray[Any] astype(dtype: pandas.core.dtypes.dtypes.ExtensionDtype, copy: bool = True) -> pandas.api.extensions.ExtensionArray astype(dtype: Any, copy: bool = True) -> Union[pandas.api.extensions.ExtensionArray, numpy.typing.NDArray[Any]] Cast to a specified dtype. Casting rules: * If ``dtype`` requests ``object``, returns a NumPy ``NDArray[Any]`` of dtype ``object`` containing the string values. * If ``dtype`` is a string dtype (e.g. pandas ``StringDtype``, NumPy unicode, or Arkouda string dtype), returns an ``ArkoudaStringArray``. If ``copy=True``, attempts to copy the underlying Arkouda ``Strings`` data. * For all other dtypes, casts the underlying Arkouda ``Strings`` using ``Strings.astype`` and returns an Arkouda-backed ``ArkoudaExtensionArray`` constructed from the result. :param dtype: Target dtype. May be a NumPy dtype, pandas dtype, or Arkouda dtype. :type dtype: Any :param copy: Whether to force a copy when the result is an ``ArkoudaStringArray``. Default is True. :type copy: bool :returns: The cast result. Returns a NumPy array only when casting to ``object``; otherwise returns an Arkouda-backed ExtensionArray. :rtype: Union[ExtensionArray, NDArray[Any]] .. rubric:: Examples Casting to a string dtype returns an Arkouda-backed string array: >>> import arkouda as ak >>> from arkouda.pandas.extension import ArkoudaStringArray >>> s = ArkoudaStringArray(ak.array(["a", "b", "c"])) >>> out = s.astype("string") >>> out is s False Forcing a copy when casting to a string dtype returns a new array: >>> out2 = s.astype("string", copy=True) >>> out2 is s False >>> out2.to_ndarray() array(['a', 'b', 'c'], dtype='>> s.astype(object) array(['a', 'b', 'c'], dtype=object) Casting to a non-string dtype uses Arkouda to cast the underlying strings and returns an Arkouda-backed ExtensionArray: >>> s_num = ArkoudaStringArray(ak.array(["1", "2", "3"])) >>> a = s_num.astype("int64") >>> a.to_ndarray() array([1, 2, 3]) NumPy and pandas dtype objects are also accepted: >>> import numpy as np >>> a = s_num.astype(np.dtype("float64")) >>> a.to_ndarray() array([1., 2., 3.]) .. py:method:: byteswap(*args, **kwargs) .. py:method:: choose(*args, **kwargs) .. py:method:: clip(*args, **kwargs) .. py:method:: compress(*args, **kwargs) .. py:method:: conj(*args, **kwargs) .. py:method:: conjugate(*args, **kwargs) .. py:method:: cumprod(*args, **kwargs) .. py:method:: cumsum(*args, **kwargs) .. py:attribute:: default_fill_value :type: str :value: '' .. py:method:: diagonal(*args, **kwargs) .. py:method:: dot(*args, **kwargs) .. py:property:: dtype An instance of ExtensionDtype. .. seealso:: :py:obj:`api.extensions.ExtensionDtype` Base class for extension dtypes. :py:obj:`api.extensions.ExtensionArray` Base class for extension array types. :py:obj:`api.extensions.ExtensionArray.dtype` The dtype of an ExtensionArray. :py:obj:`Series.dtype` The dtype of a Series. :py:obj:`DataFrame.dtype` The dtype of a DataFrame. .. rubric:: Examples >>> pd.array([1, 2, 3]).dtype Int64Dtype() .. py:method:: dump(*args, **kwargs) .. py:method:: dumps(*args, **kwargs) .. py:method:: fill(*args, **kwargs) .. py:method:: flatten(*args, **kwargs) .. py:method:: getfield(*args, **kwargs) .. py:method:: isna() A 1-D array indicating if each value is missing. :returns: In most cases, this should return a NumPy ndarray. For exceptional cases like ``SparseArray``, where returning an ndarray would be expensive, an ExtensionArray may be returned. :rtype: numpy.ndarray or pandas.api.extensions.ExtensionArray .. seealso:: :py:obj:`ExtensionArray.dropna` Return ExtensionArray without NA values. :py:obj:`ExtensionArray.fillna` Fill NA/NaN values using the specified method. .. rubric:: Notes If returning an ExtensionArray, then * ``na_values._is_boolean`` should be True * ``na_values`` should implement :func:`ExtensionArray._reduce` * ``na_values`` should implement :func:`ExtensionArray._accumulate` * ``na_values.any`` and ``na_values.all`` should be implemented .. rubric:: Examples >>> arr = pd.array([1, 2, np.nan, np.nan]) >>> arr.isna() array([False, False, True, True]) .. py:method:: item(*args, **kwargs) Return the array element at the specified position as a Python scalar. :param index: Position of the element. If not provided, the array must contain exactly one element. :type index: int, optional :returns: The element at the specified position. :rtype: scalar :raises ValueError: If no index is provided and the array does not have exactly one element. :raises IndexError: If the specified position is out of bounds. .. seealso:: :py:obj:`numpy.ndarray.item` Return the item of an array as a scalar. .. rubric:: Examples >>> arr = pd.array([1], dtype="Int64") >>> arr.item() np.int64(1) >>> arr = pd.array([1, 2, 3], dtype="Int64") >>> arr.item(0) np.int64(1) >>> arr.item(2) np.int64(3) .. py:method:: max(*args, **kwargs) .. py:method:: mean(*args, **kwargs) .. py:method:: min(*args, **kwargs) .. py:method:: nonzero(*args, **kwargs) .. py:method:: partition(*args, **kwargs) .. py:method:: prod(*args, **kwargs) .. py:method:: put(*args, **kwargs) .. py:method:: resize(*args, **kwargs) .. py:method:: round(*args, **kwargs) .. py:method:: setfield(*args, **kwargs) .. py:method:: setflags(*args, **kwargs) .. py:method:: sort(*args, **kwargs) .. py:method:: std(*args, **kwargs) .. py:method:: sum(*args, **kwargs) .. py:method:: swapaxes(*args, **kwargs) .. py:method:: to_device(*args, **kwargs) .. py:method:: tobytes(*args, **kwargs) .. py:method:: tofile(*args, **kwargs) .. py:method:: trace(*args, **kwargs) .. py:method:: value_counts(dropna: bool = True) -> pandas.Series Return counts of unique strings as a pandas Series. This method computes the frequency of each distinct string value in the underlying Arkouda ``Strings`` object and returns the result as a pandas ``Series``, with the unique string values as the index and their counts as the data. :param dropna: Whether to exclude missing values. Missing-value handling for Arkouda string arrays is not yet implemented, so this parameter is accepted for pandas compatibility but currently has no effect. Default is True. :type dropna: bool :returns: A Series containing the counts of unique string values. The index is an ``ArkoudaStringArray`` of unique values, and the values are an ``ArkoudaArray`` of counts. :rtype: pd.Series .. rubric:: Notes - The following pandas options are not yet implemented: ``normalize``, ``sort``, and ``bins``. - Counting is performed server-side in Arkouda; only the small result (unique values and counts) is materialized on the client. .. rubric:: Examples Basic usage: >>> import arkouda as ak >>> from arkouda.pandas.extension import ArkoudaStringArray >>> >>> s = ArkoudaStringArray(["red", "blue", "red", "green", "blue", "red"]) >>> s.value_counts() red 3 blue 2 green 1 dtype: int64 Empty input: >>> empty = ArkoudaStringArray([]) >>> empty.value_counts() Series([], dtype: int64) .. py:method:: var(*args, **kwargs) .. py:class:: ArkoudaStringDtype Bases: :py:obj:`_ArkoudaBaseDtype` Arkouda-backed string dtype. This dtype integrates Arkouda's distributed ``Strings`` type with the pandas ExtensionArray interface via :class:`ArkoudaStringArray`. It enables pandas objects (Series, DataFrame) to hold large, server-backed string columns without converting to NumPy or Python objects. .. method:: construct_array_type() Returns the :class:`ArkoudaStringArray` used as the storage class. .. py:method:: construct_array_type() :classmethod: Return the ExtensionArray subclass that handles storage for this dtype. :returns: The :class:`ArkoudaStringArray` class associated with this dtype. :rtype: type .. py:attribute:: kind :value: 'O' A character code (one of 'biufcmMOSUV'), default 'O' This should match the NumPy dtype used when the array is converted to an ndarray, which is probably 'O' for object if the extension type cannot be represented as a built-in NumPy type. .. seealso:: :py:obj:`numpy.dtype.kind` .. py:attribute:: na_value :value: '' Default NA value to use for this type. This is used in e.g. ExtensionArray.take. This should be the user-facing "boxed" version of the NA value, not the physical NA value for storage. e.g. for JSONArray, this is an empty dictionary. .. py:attribute:: name :value: 'string' A string identifying the data type. Will be used for display in, e.g. ``Series.dtype`` .. py:attribute:: type The scalar type for the array, e.g. ``int`` It's expected ``ExtensionArray[item]`` returns an instance of ``ExtensionDtype.type`` for scalar ``item``, assuming that value is valid (not NA). NA values do not need to be instances of `type`. .. py:class:: ArkoudaUint64Dtype Bases: :py:obj:`_ArkoudaBaseDtype` Arkouda-backed unsigned 64-bit integer dtype. This dtype integrates Arkouda’s ``uint64`` arrays with pandas, allowing users to create :class:`pandas.Series` or :class:`pandas.DataFrame` objects that store their data on the Arkouda server while still conforming to the pandas ExtensionArray API. .. method:: construct_array_type() Return the :class:`ArkoudaArray` class used as the storage container for this dtype. .. rubric:: Examples >>> import arkouda as ak >>> import pandas as pd >>> from arkouda.pandas.extension import ArkoudaUint64Dtype, ArkoudaArray >>> arr = ArkoudaArray(ak.array([1, 2, 3], dtype="uint64")) >>> s = pd.Series(arr, dtype=ArkoudaUint64Dtype()) >>> s 0 1 1 2 2 3 dtype: uint64 .. py:method:: construct_array_type() :classmethod: Return the ExtensionArray class associated with this dtype. This is required by the pandas ExtensionDtype API. It tells pandas which :class:`~pandas.api.extensions.ExtensionArray` subclass should be used to hold data of this dtype inside a :class:`pandas.Series` or :class:`pandas.DataFrame`. :returns: The :class:`ArkoudaArray` class, which implements the storage and operations for Arkouda-backed arrays. :rtype: type .. py:attribute:: kind :value: 'u' A character code (one of 'biufcmMOSUV'), default 'O' This should match the NumPy dtype used when the array is converted to an ndarray, which is probably 'O' for object if the extension type cannot be represented as a built-in NumPy type. .. seealso:: :py:obj:`numpy.dtype.kind` .. py:attribute:: na_value :value: -1 Default NA value to use for this type. This is used in e.g. ExtensionArray.take. This should be the user-facing "boxed" version of the NA value, not the physical NA value for storage. e.g. for JSONArray, this is an empty dictionary. .. py:attribute:: name :value: 'uint64' A string identifying the data type. Will be used for display in, e.g. ``Series.dtype`` .. py:attribute:: type The scalar type for the array, e.g. ``int`` It's expected ``ExtensionArray[item]`` returns an instance of ``ExtensionDtype.type`` for scalar ``item``, assuming that value is valid (not NA). NA values do not need to be instances of `type`. .. py:class:: ArkoudaUint8Dtype Bases: :py:obj:`_ArkoudaBaseDtype` Arkouda-backed unsigned 8-bit integer dtype. This dtype integrates Arkouda's ``uint8`` arrays with the pandas ExtensionArray API, allowing pandas ``Series`` and ``DataFrame`` objects to store and operate on Arkouda-backed unsigned 8-bit integers. The underlying storage is an Arkouda ``pdarray``, exposed through the :class:`ArkoudaArray` extension array. .. method:: construct_array_type() Returns the :class:`ArkoudaArray` type that provides the storage and behavior for this dtype. .. py:method:: construct_array_type() :classmethod: Return the ExtensionArray subclass that handles storage for this dtype. This method is required by the pandas ExtensionDtype interface. It tells pandas which ExtensionArray class to use when creating arrays of this dtype (for example, when calling ``Series(..., dtype="arkouda.uint8")``). :returns: The :class:`ArkoudaArray` class associated with this dtype. :rtype: type .. py:attribute:: kind :value: 'u' A character code (one of 'biufcmMOSUV'), default 'O' This should match the NumPy dtype used when the array is converted to an ndarray, which is probably 'O' for object if the extension type cannot be represented as a built-in NumPy type. .. seealso:: :py:obj:`numpy.dtype.kind` .. py:attribute:: na_value :value: -1 Default NA value to use for this type. This is used in e.g. ExtensionArray.take. This should be the user-facing "boxed" version of the NA value, not the physical NA value for storage. e.g. for JSONArray, this is an empty dictionary. .. py:attribute:: name :value: 'uint8' A string identifying the data type. Will be used for display in, e.g. ``Series.dtype`` .. py:attribute:: type The scalar type for the array, e.g. ``int`` It's expected ``ExtensionArray[item]`` returns an instance of ``ExtensionDtype.type`` for scalar ``item``, assuming that value is valid (not NA). NA values do not need to be instances of `type`.