arkouda.pandas.io
=================

.. py:module:: arkouda.pandas.io

.. autoapi-nested-parse::

   Input/output utilities for Arkouda.

   The ``arkouda.io`` module provides an interface for reading from and writing to
   several file formats including HDF5, Parquet, CSV, and Zarr. It supports
   importing and exporting data between Arkouda and pandas, checkpointing data,
   and snapshot/restore functionality for Arkouda server state.

   Core functionality includes

   - File format detection and dataset inspection
   - Reading and writing structured datasets using HDF5 and Parquet
   - CSV read/write support with header parsing
   - Zarr format support for chunked array storage
   - pandas interoperability via ``import_data`` and ``export``
   - Checkpointing via ``save_checkpoint`` and ``load_checkpoint``
   - Serialization and deserialization of Arkouda objects via ``snapshot`` and ``restore``
   - Dataset tagging for provenance tracking during read operations
   - Transferring arrays and DataFrames between Arkouda server instances
     (``receive`` and ``receive_dataframe``)

   Supported data types include ``pdarray``, ``Strings``, ``SegArray``,
   ``Categorical``, ``DataFrame``, ``Index``, and ``MultiIndex``. Many operations
   also support compatibility with standard pandas file formats for interoperability.

   Functions
   ---------
   File inspection
       ``get_filetype``, ``ls``, ``ls_csv``, ``get_datasets``, ``get_columns``

   Data import/export
       ``read_hdf``, ``read_parquet``, ``read_csv``, ``read_zarr``, ``read``,
       ``to_hdf``, ``to_parquet``, ``to_csv``, ``to_zarr``, ``import_data``, ``export``

   Snapshotting
       ``snapshot``, ``restore``, ``save_checkpoint``, ``load_checkpoint``

   Advanced features
       ``update_hdf``, ``load``, ``load_all``, ``read_tagged_data``,
       ``receive``, ``receive_dataframe``

   .. rubric:: Examples

   >>> import arkouda as ak
   >>> from arkouda.pandas.io import to_parquet, read_parquet
   >>> import os.path
   >>> from pathlib import Path
   >>> my_path = os.path.join(os.getcwd(), "output")
   >>> Path(my_path).mkdir(parents=True, exist_ok=True)

   Create and save a DataFrame:

   >>> data = [ak.arange(10), ak.linspace(0, 1, 10)]
   >>> Path(my_path + "/parquet_data").mkdir(parents=True, exist_ok=True)
   >>> to_parquet(data, my_path + "/parquet_data/data.parquet")

   Load the DataFrame back:

   >>> data2 = read_parquet(my_path + "/parquet_data/data*")

   Save to HDF5:

   >>> ak.to_hdf(data, my_path + "data.hdf5")

   Read from HDF5 with explicit dataset name:

   >>> data3 = ak.read_hdf(my_path + "data*")

   Export to pandas-compatible Parquet:

   >>> df = ak.DataFrame({"a": ak.arange(10), "b": ak.linspace(0, 1, 10)})
   >>> df2 = ak.export(my_path + "/parquet_data/data.parquet")

   .. seealso:: :py:obj:`arkouda.DataFrame`, :py:obj:`arkouda.pdarray`, :py:obj:`arkouda.strings.Strings`, :py:obj:`arkouda.segarray.SegArray`, :py:obj:`arkouda.categorical.Categorical`, :py:obj:`arkouda.index.Index`, :py:obj:`arkouda.index.MultiIndex`


Functions
---------

.. autoapisummary::

   arkouda.pandas.io.export
   arkouda.pandas.io.get_columns
   arkouda.pandas.io.get_datasets
   arkouda.pandas.io.get_filetype
   arkouda.pandas.io.get_null_indices
   arkouda.pandas.io.import_data
   arkouda.pandas.io.load
   arkouda.pandas.io.load_all
   arkouda.pandas.io.load_checkpoint
   arkouda.pandas.io.ls
   arkouda.pandas.io.ls_csv
   arkouda.pandas.io.read
   arkouda.pandas.io.read_csv
   arkouda.pandas.io.read_hdf
   arkouda.pandas.io.read_parquet
   arkouda.pandas.io.read_tagged_data
   arkouda.pandas.io.read_zarr
   arkouda.pandas.io.receive
   arkouda.pandas.io.receive_dataframe
   arkouda.pandas.io.restore
   arkouda.pandas.io.save_checkpoint
   arkouda.pandas.io.snapshot
   arkouda.pandas.io.to_csv
   arkouda.pandas.io.to_hdf
   arkouda.pandas.io.to_parquet
   arkouda.pandas.io.to_zarr
   arkouda.pandas.io.update_hdf


Module Contents
---------------

.. py:function:: export(read_path: str, dataset_name: str = 'ak_data', write_file: Optional[str] = None, return_obj: bool = True, index: bool = False)

   Export data from arkouda to pandas.

   Export data from Arkouda file (Parquet/HDF5)
   to Pandas object or file formatted to be readable by Pandas.

   :param read_path: path to file where arkouda data is stored.
   :type read_path: str
   :param dataset_name: name to store dataset under
   :type dataset_name: str
   :param write_file: path to file to write pandas formatted data to. Only write the file if this is set.
                      Default is None.
   :type write_file: str
   :param return_obj: When True (default) return the Pandas DataFrame object, otherwise return None.
   :type return_obj: bool
   :param index: Default False. When True, maintain the indexes loaded from the pandas file
   :type index: bool

   :raises RuntimeError: - Unsupported file type

   :returns: When `return_obj=True`
   :rtype: pd.DataFrame

   .. seealso:: :py:obj:`pandas.DataFrame.to_parquet`, :py:obj:`pandas.DataFrame.to_hdf`, :py:obj:`pandas.DataFrame.read_parquet`, :py:obj:`pandas.DataFrame.read_hdf`, :py:obj:`ak.import_data`

   .. rubric:: Notes

   - If Arkouda file is exported for pandas, the format will not change. This mean parquet files
     will remain parquet and hdf5 will remain hdf5.
   - Export can only be performed from hdf5 or parquet files written by Arkouda. The result will be
     the same file type, but formatted to be read by Pandas.


.. py:function:: get_columns(filenames: Union[str, List[str]], col_delim: str = ',', allow_errors: bool = False) -> List[str]

   Get a list of column names from CSV file(s).


.. py:function:: get_datasets(filenames: Union[str, List[str]], allow_errors: bool = False, column_delim: str = ',', read_nested: bool = True) -> List[str]

   Get the names of the datasets in the provide files.

   :param filenames: Name of the file/s from which to return datasets
   :type filenames: str or List[str]
   :param allow_errors: Default: False
                        Whether or not to allow errors while accessing datasets
   :type allow_errors: bool
   :param column_delim: Column delimiter to be used if dataset is CSV. Otherwise, unused.
   :type column_delim: str
   :param read_nested: Default True, when True, SegArray objects will be read from the file. When False,
                       SegArray (or other nested Parquet columns) will be ignored.
                       Only used for Parquet Files.
   :type read_nested: bool

   :rtype: List[str] of names of the datasets

   :raises RuntimeError: - If no datasets are returned

   .. rubric:: Notes

   - This function currently supports HDF5 and Parquet formats.
   - Future updates to Parquet will deprecate this functionality on that format,
   but similar support will be added for Parquet at that time.
   - If a list of files is provided, only the datasets in the first file will be returned

   .. seealso:: :py:obj:`ls`


.. py:function:: get_filetype(filenames: Union[str, List[str]]) -> str

   Get the type of a file accessible to the server.

   Supported file types and possible return strings are 'HDF5' and 'Parquet'.

   :param filenames: A file or list of files visible to the arkouda server
   :type filenames: Union[str, List[str]]

   :returns: Type of the file returned as a string, either 'HDF5', 'Parquet' or 'CSV
   :rtype: str

   :raises ValueError: Raised if filename is empty or contains only whitespace

   .. rubric:: Notes

   - When list provided, it is assumed that all files are the same type
   - CSV Files without the Arkouda Header are not supported

   .. seealso:: :py:obj:`read_parquet`, :py:obj:`read_hdf`


.. py:function:: get_null_indices(filenames: Union[str, List[str]], datasets: Optional[Union[str, List[str]]] = None) -> Union[arkouda.numpy.pdarrayclass.pdarray, Mapping[str, arkouda.numpy.pdarrayclass.pdarray]]

   Get null indices of a string column in a Parquet file.

   :param filenames: Either a list of filenames or shell expression
   :type filenames: list or str
   :param datasets: (List of) name(s) of dataset(s) to read. Each dataset must be a string
                    column. There is no default value for this function, the datasets to be
                    read must be specified.
   :type datasets: list or str or None

   :returns: Dictionary of {datasetName: pdarray}
   :rtype: returns a dictionary of Arkouda pdarrays

   :raises RuntimeError: Raised if one or more of the specified files cannot be opened.
   :raises TypeError: Raised if we receive an unknown arkouda_type returned from the server

   .. seealso:: :py:obj:`get_datasets`, :py:obj:`ls`


.. py:function:: import_data(read_path: str, write_file: Optional[str] = None, return_obj: bool = True, index: bool = False)

   Import data from a file saved by Pandas (HDF5/Parquet).

   Import data from a file saved by Pandas (HDF5/Parquet) to Arkouda object and/or
   a file formatted to be read by Arkouda.

   :param read_path: path to file where pandas data is stored. This can be glob expression for parquet formats.
   :type read_path: str
   :param write_file: path to file to write arkouda formatted data to. Only write file if provided
   :type write_file: str, optional
   :param return_obj: If True (default), return the Arkouda DataFrame object. If False, return None.
   :type return_obj: bool
   :param index: If True, maintain the indexes loaded from the pandas file. Default is False.
   :type index: bool

   :raises RuntimeWarning: - Export attempted on Parquet file. Arkouda formatted Parquet files are readable by pandas.
   :raises RuntimeError: - Unsupported file type

   :returns: When `return_obj=True`
   :rtype: pd.DataFrame

   .. seealso:: :py:obj:`pandas.DataFrame.to_parquet`, :py:obj:`pandas.DataFrame.to_hdf`, :py:obj:`pandas.DataFrame.read_parquet`, :py:obj:`pandas.DataFrame.read_hdf`, :py:obj:`ak.export`

   .. rubric:: Notes

   - Import can only be performed from hdf5 or parquet files written by pandas.


.. py:function:: load(path_prefix: str, file_format: str = 'INFER', dataset: str = 'array', calc_string_offsets: bool = False, column_delim: str = ',') -> Union[Mapping[str, Union[arkouda.numpy.pdarrayclass.pdarray, arkouda.numpy.strings.Strings, arkouda.numpy.segarray.SegArray, arkouda.pandas.categorical.Categorical, arkouda.pandas.dataframe.DataFrame, arkouda.client_dtypes.IPv4, arkouda.numpy.timeclass.Datetime, arkouda.numpy.timeclass.Timedelta, arkouda.pandas.index.Index]]]

   Load objects previously saved with ``pdarray.save()``.

   :param path_prefix: Filename prefix used when saving the original object.
   :type path_prefix: str
   :param file_format: File format to load. One of ``"INFER"``, ``"HDF5"``, or ``"Parquet"``.
                       If ``"INFER"``, the format will be detected automatically.
   :type file_format: str, default="INFER"
   :param dataset: Dataset name where the object was saved.
   :type dataset: str, default="array"
   :param calc_string_offsets: If ``True``, the server ignores the segmented ``Strings`` ``offsets``
                               array and derives offsets from null-byte terminators.
   :type calc_string_offsets: bool, default=False
   :param column_delim: Column delimiter used if the dataset is CSV. Otherwise unused.
   :type column_delim: str, default=","

   :returns: DataFrame, IPv4, Datetime, Timedelta, Index]]
             Dictionary mapping ``datasetName`` to the loaded object. The values may
             be ``pdarray``, ``Strings``, ``SegArray``, ``Categorical``,
             ``DataFrame``, ``IPv4``, ``Datetime``, ``Timedelta``, or ``Index``.
   :rtype: Mapping[str, Union[pdarray, Strings, SegArray, Categorical,

   :raises TypeError: Raised if either ``path_prefix`` or ``dataset`` is not a ``str``.
   :raises ValueError: Raised if an invalid ``file_format`` is given, if the dataset is not
       present in all HDF5 files, or if ``path_prefix`` does not correspond
       to files accessible to Arkouda.
   :raises RuntimeError: Raised if the HDF5 files are present but an error occurs while opening
       one or more of them.

   .. seealso:: :py:obj:`to_parquet`, :py:obj:`to_hdf`, :py:obj:`load_all`, :py:obj:`read`

   .. rubric:: Notes

   If a previously saved Parquet file raises a ``FileNotFoundError``, try
   loading it with ``.parquet`` appended to ``path_prefix``. Older versions
   of Arkouda always stored Parquet files with a ``.parquet`` extension.

   ``ak.load`` does not support loading a single file. To load a single HDF5
   file without the ``_LOCALE####`` suffix, use ``ak.read()``.

   CSV files without the Arkouda header are not supported.

   .. rubric:: Examples

   >>> import arkouda as ak

   Loading from file without extension:

   >>> obj = ak.load("path/prefix")  # doctest: +SKIP

   This loads the array from ``numLocales`` files with the name
   ``cwd/path/name_prefix_LOCALE####``. The file type is inferred
   automatically.

   Loading with an extension (HDF5):

   >>> obj = ak.load("path/prefix.test")  # doctest: +SKIP

   This loads the object from ``numLocales`` files with the name
   ``cwd/path/name_prefix_LOCALE####.test`` where ``####`` corresponds
   to each locale number. Because the file type is inferred, the
   extension does not need to correspond to a specific format.


.. py:function:: load_all(path_prefix: str, file_format: str = 'INFER', column_delim: str = ',', read_nested: bool = True) -> Mapping[str, arkouda.numpy.pdarrayclass.pdarray | arkouda.numpy.strings.Strings | arkouda.numpy.segarray.SegArray | arkouda.pandas.categorical.Categorical | arkouda.pandas.dataframe.DataFrame | arkouda.client_dtypes.IPv4 | arkouda.numpy.timeclass.Datetime | arkouda.numpy.timeclass.Timedelta | arkouda.pandas.index.Index]

   Load multiple pdarrays, Strings, SegArrays, or Categoricals previously saved with ``save_all()``.

   :param path_prefix: Filename prefix used to save the original pdarray
   :type path_prefix: str
   :param file_format: 'INFER', 'HDF5', 'Parquet', or 'CSV'. Defaults to 'INFER'. Indicates the format being loaded.
                       When 'INFER' the processing will detect the format
                       Defaults to 'INFER'
   :type file_format: str
   :param column_delim: Column delimiter to be used if dataset is CSV. Otherwise, unused.
   :type column_delim: str
   :param read_nested: Default True, when True, SegArray objects will be read from the file. When False,
                       SegArray (or other nested Parquet columns) will be ignored.
                       Parquet files only
   :type read_nested: bool

   :returns: Dictionary of {datsetName: Union[pdarray, Strings, SegArray, Categorical]}
             with the previously saved pdarrays, Strings, SegArrays, or Categoricals
   :rtype: Mapping[str, Union[pdarray, Strings, SegArray, Categorical]]

   :raises TypeError: Raised if path_prefix is not a str
   :raises ValueError: Raised if file_format/extension is encountered that is not hdf5 or parquet or
       if all datasets are not present in all hdf5/parquet files or if the
       path_prefix does not correspond to files accessible to Arkouda
   :raises RuntimeError: Raised if the hdf5 files are present but there is an error in opening
       one or more of them

   .. seealso:: :py:obj:`to_parquet`, :py:obj:`to_hdf`, :py:obj:`load`, :py:obj:`read`

   .. rubric:: Notes

   This function has been updated to determine the file extension based on the file format variable

   This function will be deprecated when glob flags are added to read_* methods

   CSV files without the Arkouda Header are not supported.


.. py:function:: load_checkpoint(name, path='.akdata')

   Load server's state.

   The server metadata must match the current
   configuration (e.g. same number of locales must be used).

   :param name: Name of the checkpoint. ``<path>/<name>`` must be a directory.
   :type name: str
   :param path: The directory to save the checkpoint.
   :type path: str

   :returns: The checkpoint name, which will be the same as the ``name`` argument.
   :rtype: str

   .. rubric:: Examples

   >>> import arkouda as ak
   >>> arr = ak.zeros(10, int)
   >>> arr[2] = 2
   >>> arr[2]
   np.int64(2)
   >>> cp_name = ak.save_checkpoint()
   >>> arr[2] = 3
   >>> arr[2]
   np.int64(3)
   >>> ak.load_checkpoint(cp_name) # doctest: +SKIP
   >>> arr[2]
   np.int64(3)

   .. seealso:: :py:obj:`save_checkpoint`


.. py:function:: ls(filename: str, col_delim: str = ',', read_nested: bool = True) -> List[str]

   List the contents of an HDF5 or Parquet file on the Arkouda server.

   This function invokes the HDF5 `h5ls` utility on a file visible to the
   Arkouda server, or simulates a similar listing for Parquet files. For CSV
   files without headers, see `ls_csv`.

   :param filename: Path to the file on the Arkouda server. Must be a non-empty string.
   :type filename: str
   :param col_delim: Delimiter to use when interpreting CSV files. Default is ",".
   :type col_delim: str
   :param read_nested: If True, include nested Parquet columns (e.g., `SegArray`). If False,
                       nested columns are ignored. Only applies to Parquet files.
                       Default is True.
   :type read_nested: bool

   :returns: A list of lines describing each dataset or column in the file.
   :rtype: List[str]

   :raises TypeError: If `filename` is not a string.
   :raises ValueError: If `filename` is empty or contains only whitespace.
   :raises RuntimeError: If an error occurs when running `h5ls` or simulating the Parquet listing.

   .. rubric:: Notes

   - Parquet support is limited and may change in future releases.
   - Output lines mirror the format of the HDF5 `h5ls` output.
   - For CSV files lacking headers, use `ls_csv`.

   .. seealso::

      :py:obj:`ls_csv`
          List the contents of CSV files without headers.


.. py:function:: ls_csv(filename: str, col_delim: str = ',') -> List[str]

   List the datasets within a file when a CSV does not have a header.

   :param filename: The name of the file to pass to the server
   :type filename: str
   :param col_delim: The delimiter used to separate columns if the file is a csv
   :type col_delim: str

   :returns: The string output of the datasets from the server
   :rtype: str

   .. seealso:: :py:obj:`ls`


.. py:function:: read(filenames: Union[str, List[str]], datasets: Optional[Union[str, List[str]]] = None, iterative: bool = False, strictTypes: bool = True, allow_errors: bool = False, calc_string_offsets: bool = False, column_delim: str = ',', read_nested: bool = True, has_non_float_nulls: bool = False, fixed_len: int = -1) -> Union[Mapping[str, Union[arkouda.numpy.pdarrayclass.pdarray, arkouda.numpy.strings.Strings, arkouda.numpy.segarray.SegArray, arkouda.pandas.categorical.Categorical, arkouda.pandas.dataframe.DataFrame, arkouda.client_dtypes.IPv4, arkouda.numpy.timeclass.Datetime, arkouda.numpy.timeclass.Timedelta, arkouda.pandas.index.Index]]]

   Read datasets from files.

   The file type is determined automatically.

   :param filenames: Either a list of filenames or a shell expression.
   :type filenames: Union[str, List[str]]
   :param datasets: Name or list of names of datasets to read. If ``None``, all
                    available datasets are read.
   :type datasets: Optional[Union[str, List[str]]], default=None
   :param iterative: If ``True``, make iterative function calls to the server. If
                     ``False``, make a single function call to the server.
   :type iterative: bool, default=False
   :param strictTypes: If ``True``, require all dtypes of a given dataset to have the
                       same precision and sign. If ``False``, allow dtypes of different
                       precision and sign across different files. For example, if one
                       file contains a ``uint32`` dataset and another contains an
                       ``int64`` dataset with the same name, the contents of both will
                       be read into an ``int64`` ``pdarray``.
   :type strictTypes: bool, default=True
   :param allow_errors: If ``True``, files with read errors may be skipped instead of
                        causing the operation to fail. A warning will be included in the
                        return containing the total number of files skipped due to failure
                        and up to 10 filenames.
   :type allow_errors: bool, default=False
   :param calc_string_offsets: If ``True``, instruct the server to calculate the offsets or
                               segments array instead of loading it from HDF5 files.
   :type calc_string_offsets: bool, default=False
   :param column_delim: Column delimiter to use if the dataset is CSV. Otherwise unused.
   :type column_delim: str, default=","
   :param read_nested: If ``True``, ``SegArray`` objects are read from the file. If
                       ``False``, ``SegArray`` objects and other nested Parquet columns
                       are ignored. Ignored if ``datasets`` is not ``None``. Parquet
                       only.
   :type read_nested: bool, default=True
   :param has_non_float_nulls: Must be set to ``True`` to read non-float Parquet columns that
                               contain null values.
   :type has_non_float_nulls: bool, default=False
   :param fixed_len: Fixed string length to use when reading Parquet string columns
                     if the length of each string is known at runtime. This can avoid
                     byte calculation and may improve performance.
   :type fixed_len: int, default=-1

   :returns: DataFrame, IPv4, Datetime, Timedelta, Index]]
             Dictionary mapping ``datasetName`` to the loaded object. The values may
             be ``pdarray``, ``Strings``, ``SegArray``, ``Categorical``,
             ``DataFrame``, ``IPv4``, ``Datetime``, ``Timedelta``, or ``Index``.
   :rtype: Mapping[str, Union[pdarray, Strings, SegArray, Categorical,

   :raises RuntimeError: Raised if an invalid file type is detected.

   .. seealso:: :py:obj:`get_datasets`, :py:obj:`ls`, :py:obj:`read_parquet`, :py:obj:`read_hdf`

   .. rubric:: Notes

   If ``filenames`` is a string, it is interpreted as a shell expression.
   A single filename is a valid expression, so it will also work. The
   expression is expanded with ``glob`` to read all matching files.

   If ``iterative=True``, each dataset name and filename is passed to the
   server independently in sequence. If ``iterative=False``, all dataset
   names and filenames are passed to the server in a single string.

   If ``datasets`` is ``None``, dataset names are inferred from the first
   file and all datasets are read. Use ``get_datasets`` to show the names
   of datasets in HDF5 or Parquet files.

   CSV files without the Arkouda header are not supported.

   .. rubric:: Examples

   >>> import arkouda as ak

   Read a file with an extension:

   >>> x = ak.read("path/name_prefix.h5")  # doctest: +SKIP

   The file type is determined from file contents, not the extension.

   Read a Parquet file:

   >>> x = ak.read("path/name_prefix.parquet")  # doctest: +SKIP

   Read files matching a glob expression:

   >>> x = ak.read("path/name_prefix*")  # doctest: +SKIP


.. py:function:: read_csv(filenames: Union[str, List[str]], datasets: Optional[Union[str, List[str]]] = None, column_delim: str = ',', allow_errors: bool = False) -> Union[Mapping[str, Union[arkouda.numpy.pdarrayclass.pdarray, arkouda.numpy.strings.Strings, arkouda.numpy.segarray.SegArray, arkouda.pandas.categorical.Categorical, arkouda.pandas.dataframe.DataFrame, arkouda.client_dtypes.IPv4, arkouda.numpy.timeclass.Datetime, arkouda.numpy.timeclass.Timedelta, arkouda.pandas.index.Index]]]

   Read CSV file(s) into Arkouda objects.

   If more than one dataset is found, the objects
   will be returned in a dictionary mapping the dataset name to the Arkouda object
   containing the data. If the file contains the appropriately formatted header, typed
   data will be returned. Otherwise, all data will be returned as a Strings object.

   :param filenames: The filenames to read data from
   :type filenames: str or List[str]
   :param datasets: names of the datasets to read. When `None`, all datasets will be read.
   :type datasets: str or List[str] (Optional)
   :param column_delim: The delimiter for column names and data. Defaults to ",".
   :type column_delim: str
   :param allow_errors: Default False, if True will allow files with read errors to be skipped
                        instead of failing.  A warning will be included in the return containing
                        the total number of files skipped due to failure and up to 10 filenames.
   :type allow_errors: bool

   :returns: Dictionary of {datasetName: pdarray, String, or SegArray}
   :rtype: Returns a dictionary of Arkouda pdarrays, Arkouda Strings, or Arkouda Segarrays.

   :raises ValueError: Raised if all datasets are not present in all parquet files or if one or
       more of the specified files do not exist
   :raises RuntimeError: Raised if one or more of the specified files cannot be opened.
       If `allow_errors` is true this may be raised if no values are returned
       from the server.
   :raises TypeError: Raised if we receive an unknown arkouda_type returned from the server

   .. seealso:: :py:obj:`to_csv`

   .. rubric:: Notes

   - CSV format is not currently supported by load/load_all operations
   - The column delimiter is expected to be the same for column names and data
   - Be sure that column delimiters are not found within your data.
   - All CSV files must delimit rows using newline (``\\n``) at this time.
   - Unlike other file formats, CSV files store Strings as their UTF-8 format instead of storing
     bytes as uint(8).


.. py:function:: read_hdf(filenames: Union[str, List[str]], datasets: Optional[Union[str, List[str]]] = None, iterative: bool = False, strict_types: bool = True, allow_errors: bool = False, calc_string_offsets: bool = False, tag_data: bool = False) -> Union[Mapping[str, Union[arkouda.numpy.pdarrayclass.pdarray, arkouda.numpy.strings.Strings, arkouda.numpy.segarray.SegArray, arkouda.pandas.categorical.Categorical, arkouda.pandas.dataframe.DataFrame, arkouda.client_dtypes.IPv4, arkouda.numpy.timeclass.Datetime, arkouda.numpy.timeclass.Timedelta, arkouda.pandas.index.Index]]]

   Read Arkouda objects from HDF5 files.

   :param filenames: Filename or list of filenames to read objects from.
   :type filenames: Union[str, List[str]]
   :param datasets: Dataset name or list of dataset names to read from the provided
                    files. If ``None``, all datasets are read.
   :type datasets: Optional[Union[str, List[str]]], default=None
   :param iterative: If ``True``, make iterative function calls to the server. If
                     ``False``, make a single function call to the server.
   :type iterative: bool, default=False
   :param strict_types: If ``True``, require all dtypes of a given dataset to have the
                        same precision and sign. If ``False``, allow dtypes of different
                        precision and sign across different files. For example, if one
                        file contains a ``uint32`` dataset and another contains an
                        ``int64`` dataset with the same name, the contents of both will
                        be read into an ``int64`` ``pdarray``.
   :type strict_types: bool, default=True
   :param allow_errors: If ``True``, files with read errors may be skipped instead of
                        causing the operation to fail. A warning will be included in the
                        return containing the total number of files skipped due to failure
                        and up to 10 filenames.
   :type allow_errors: bool, default=False
   :param calc_string_offsets: If ``True``, instruct the server to calculate the offsets or
                               segments array instead of loading it from HDF5 files. In the
                               future, this option may become the default.
   :type calc_string_offsets: bool, default=False
   :param tag_data: If ``True``, tag the returned data with the code associated with
                    the filename from which it was read.
   :type tag_data: bool, default=False

   :returns: * *Mapping[* -- str,
               Union[
                   pdarray,
                   Strings,
                   SegArray,
                   Categorical,
                   DataFrame,
                   IPv4,
                   Datetime,
                   Timedelta,
                   Index,
               ],
             * *]*
             * Dictionary mapping ``datasetName`` to the loaded object. The values
             * may be ``pdarray``, ``Strings``, ``SegArray``, ``Categorical``,
             * ``DataFrame``, ``IPv4``, ``Datetime``, ``Timedelta``, or ``Index``. -- Dictionary mapping ``datasetName`` to the loaded object. The values
               may be ``pdarray``, ``Strings``, ``SegArray``, ``Categorical``,
               ``DataFrame``, ``IPv4``, ``Datetime``, ``Timedelta``, or ``Index``.

   :raises ValueError: Raised if not all datasets are present in all HDF5 files or if one
       or more of the specified files do not exist.
   :raises RuntimeError: Raised if one or more of the specified files cannot be opened.
       If ``allow_errors`` is ``True``, this may also be raised if no
       values are returned from the server.
   :raises TypeError: Raised if an unknown Arkouda type is returned from the server.

   .. rubric:: Notes

   If ``filenames`` is a string, it is interpreted as a shell expression.
   A single filename is a valid expression, so it will also work. The
   expression is expanded with ``glob`` to read all matching files.

   If ``iterative=True``, each dataset name and filename is passed to the
   server independently in sequence. If ``iterative=False``, all dataset
   names and filenames are passed to the server in a single string.

   If ``datasets`` is ``None``, dataset names are inferred from the first
   file and all datasets are read. Use ``get_datasets`` to show dataset
   names in HDF5 files.

   .. seealso:: :py:obj:`read_tagged_data`

   .. rubric:: Examples

   >>> import arkouda as ak

   Read a file with an extension:

   >>> x = ak.read_hdf("path/name_prefix.h5")  # doctest: +SKIP

   Read files matching a glob expression:

   >>> x = ak.read_hdf("path/name_prefix*")  # doctest: +SKIP


.. py:function:: read_parquet(filenames: Union[str, List[str]], datasets: Optional[Union[str, List[str]]] = None, iterative: bool = False, strict_types: bool = True, allow_errors: bool = False, tag_data: bool = False, read_nested: bool = True, has_non_float_nulls: bool = False, null_handling: Optional[str] = None, fixed_len: int = -1) -> Union[Mapping[str, Union[arkouda.numpy.pdarrayclass.pdarray, arkouda.numpy.strings.Strings, arkouda.numpy.segarray.SegArray, arkouda.pandas.categorical.Categorical, arkouda.pandas.dataframe.DataFrame, arkouda.client_dtypes.IPv4, arkouda.numpy.timeclass.Datetime, arkouda.numpy.timeclass.Timedelta, arkouda.pandas.index.Index]]]

   Read Arkouda objects from Parquet files.

   :param filenames: Filename or list of filenames to read objects from.
   :type filenames: Union[str, List[str]]
   :param datasets: Dataset name or list of dataset names to read from the provided files.
                    If ``None``, all datasets are read.
   :type datasets: Optional[Union[str, List[str]]], default=None
   :param iterative: If ``True``, make iterative function calls to the server. If
                     ``False``, make a single function call to the server.
   :type iterative: bool, default=False
   :param strict_types: If ``True``, require all dtypes of a given dataset to have the
                        same precision and sign. If ``False``, allow dtypes of different
                        precision and sign across different files. For example, if one
                        file contains a ``uint32`` dataset and another contains an
                        ``int64`` dataset with the same name, the contents of both will
                        be read into an ``int64`` ``pdarray``.
   :type strict_types: bool, default=True
   :param allow_errors: If ``True``, files with read errors may be skipped instead of
                        causing the operation to fail. A warning will be included in the
                        return containing the total number of files skipped due to failure
                        and up to 10 filenames.
   :type allow_errors: bool, default=False
   :param tag_data: If ``True``, tag the data with the code associated with the
                    filename from which the data was read.
   :type tag_data: bool, default=False
   :param read_nested: If ``True``, ``SegArray`` objects are read from the file. If
                       ``False``, ``SegArray`` objects and other nested Parquet columns
                       are ignored. If ``datasets`` is not ``None``, this parameter is
                       ignored.
   :type read_nested: bool, default=True
   :param has_non_float_nulls: Deprecated. Use ``null_handling`` instead.

                               This flag must be set to ``True`` to read non-float Parquet columns
                               that contain null values.
   :type has_non_float_nulls: bool, default=False
   :param null_handling: Null-handling mode. Supported values are ``"none"``,
                         ``"only floats"``, and ``"all"``. If ``None``, the default is
                         ``"only floats"``.

                         If ``"none"``, the data is assumed to contain no nulls. This gives
                         the best performance, but behavior is undefined if nulls are
                         present.

                         If ``"only floats"``, only floating-point columns may contain
                         nulls. This improves performance for other data types.

                         If ``"all"``, any column may contain nulls. This is the most
                         general mode, but it is slower overall.
   :type null_handling: Optional[str], default=None
   :param fixed_len: Fixed string length to use when reading Parquet string columns if
                     the length of each string is known at runtime. This can avoid byte
                     calculation and may improve performance.
   :type fixed_len: int, default=-1

   :returns: DataFrame, IPv4, Datetime, Timedelta, Index]]
             Dictionary mapping ``datasetName`` to the loaded object. The values may
             be ``pdarray``, ``Strings``, ``SegArray``, ``Categorical``,
             ``DataFrame``, ``IPv4``, ``Datetime``, ``Timedelta``, or ``Index``.
   :rtype: Mapping[str, Union[pdarray, Strings, SegArray, Categorical,

   :raises ValueError: Raised if not all datasets are present in all Parquet files or if
       one or more of the specified files do not exist.
   :raises RuntimeError: Raised if one or more of the specified files cannot be opened.
       If ``allow_errors`` is ``True``, this may also be raised if no
       values are returned from the server.
   :raises TypeError: Raised if an unknown Arkouda type is returned from the server.

   .. rubric:: Notes

   If ``filenames`` is a string, it is interpreted as a shell expression.
   A single filename is a valid expression, so it will also work. The
   expression is expanded with ``glob`` to read all matching files.

   If ``iterative=True``, each dataset name and filename is passed to the
   server independently in sequence. If ``iterative=False``, all dataset
   names and filenames are passed to the server in a single string.

   If ``datasets`` is ``None``, dataset names are inferred from the first
   file and all datasets are read. Use ``get_datasets`` to show the names
   of datasets in Parquet files.

   Parquet currently always recomputes offsets. This note should be
   updated when the Parquet workflow changes.

   .. seealso:: :py:obj:`read_tagged_data`

   .. rubric:: Examples

   >>> import arkouda as ak

   Read a Parquet file:

   >>> x = ak.read_parquet("path/name_prefix.parquet")  # doctest: +SKIP

   Read files matching a glob expression:

   >>> x = ak.read_parquet("path/name_prefix*")  # doctest: +SKIP


.. py:function:: read_tagged_data(filenames: Union[str, List[str]], datasets: Optional[Union[str, List[str]]] = None, strictTypes: bool = True, allow_errors: bool = False, calc_string_offsets: bool = False, read_nested: bool = True, has_non_float_nulls: bool = False)

   Read datasets from files and tag each record with the file it was read from.

   The file type is determined automatically.

   :param filenames: Either a list of filenames or a shell expression.
   :type filenames: Union[str, List[str]]
   :param datasets: Dataset name or list of dataset names to read. If ``None``, all
                    available datasets are read.
   :type datasets: Optional[Union[str, List[str]]], default=None
   :param strictTypes: If ``True``, require all dtypes of a given dataset to have the
                       same precision and sign. If ``False``, allow dtypes of different
                       precision and sign across different files. For example, if one
                       file contains a ``uint32`` dataset and another contains an
                       ``int64`` dataset with the same name, the contents of both will
                       be read into an ``int64`` ``pdarray``.
   :type strictTypes: bool, default=True
   :param allow_errors: If ``True``, files with read errors may be skipped instead of
                        causing the operation to fail. A warning will be included in the
                        return containing the total number of files skipped due to failure
                        and up to 10 filenames.
   :type allow_errors: bool, default=False
   :param calc_string_offsets: If ``True``, instruct the server to calculate the offsets or
                               segments array instead of loading it from HDF5 files. In the
                               future, this option may become the default.
   :type calc_string_offsets: bool, default=False
   :param read_nested: If ``True``, ``SegArray`` objects are read from the file. If
                       ``False``, ``SegArray`` objects and other nested Parquet columns
                       are ignored. Ignored if ``datasets`` is not ``None``. Parquet only.
   :type read_nested: bool, default=True
   :param has_non_float_nulls: Must be set to ``True`` to read non-float Parquet columns that
                               contain null values.
   :type has_non_float_nulls: bool, default=False

   .. rubric:: Notes

   This function is not currently supported for ``Categorical`` or
   ``GroupBy`` datasets.

   .. rubric:: Examples

   >>> import arkouda as ak

   Read files and return the data along with tagging information:

   >>> data, cat = ak.read_tagged_data("path/name")  # doctest: +SKIP

   The codes in ``cat`` map each record in ``data`` to the file it came
   from. The returned data includes a ``"Filename_Codes"`` array.

   >>> data  # doctest: +SKIP
   {"Filename_Codes": array([0 3 6 9 12]), "col_name": array([0 0 0 1])}


.. py:function:: read_zarr(store_path: str, ndim: int, dtype)

   Read a Zarr store from disk into a pdarray.

   Supports multi-dimensional pdarrays of numeric types.
   To use this function, ensure you have installed the blosc dependency (`make install-blosc`)
   and have included `ZarrMsg.chpl` in the `ServerModules.cfg` file.

   :param store_path: The path to the Zarr store. The path must be to a directory that contains a `.zarray`
                      file containing the Zarr store metadata.
   :type store_path: str
   :param ndim: The number of dimensions in the array
   :type ndim: int
   :param dtype: The data type of the array
   :type dtype: str

   :returns: The pdarray read from the Zarr store.
   :rtype: pdarray


.. py:function:: receive(hostname: str, port)

   Receive a pdarray sent by `pdarray.transfer()`.

   :param hostname: The hostname of the pdarray that sent the array
   :type hostname: str
   :param port: The port to send the array over. This needs to be an
                open port (i.e., not one that the Arkouda server is
                running on). This will open up `numLocales` ports,
                each of which in succession, so will use ports of the
                range {port..(port+numLocales)} (e.g., running an
                Arkouda server of 4 nodes, port 1234 is passed as
                `port`, Arkouda will use ports 1234, 1235, 1236,
                and 1237 to send the array data).
                This port much match the port passed to the call to
                `pdarray.transfer()`.
   :type port: int_scalars

   :returns: The pdarray sent from the sending server to the current
             receiving server.
   :rtype: pdarray

   :raises ValueError: Raised if the op is not within the pdarray.BinOps set
   :raises TypeError: Raised if other is not a pdarray or the pdarray.dtype is not
       a supported dtype


.. py:function:: receive_dataframe(hostname: str, port)

   Receive a pdarray sent by `dataframe.transfer()`.

   :param hostname: The hostname of the dataframe that sent the array
   :type hostname: str
   :param port: The port to send the dataframe over. This needs to be an
                open port (i.e., not one that the Arkouda server is
                running on). This will open up `numLocales` ports,
                each of which in succession, so will use ports of the
                range {port..(port+numLocales)} (e.g., running an
                Arkouda server of 4 nodes, port 1234 is passed as
                `port`, Arkouda will use ports 1234, 1235, 1236,
                and 1237 to send the array data).
                This port much match the port passed to the call to
                `pdarray.send_array()`.
   :type port: int_scalars

   :returns: The dataframe sent from the sending server to the
             current receiving server.
   :rtype: pdarray

   :raises ValueError: Raised if the op is not within the pdarray.BinOps set
   :raises TypeError: Raised if other is not a pdarray or the pdarray.dtype is not
       a supported dtype


.. py:function:: restore(filename)

   Return data saved using `ak.snapshot`.

   :param filename: Name used to create snapshot to be read
   :type filename: str

   :rtype: Dict

   .. rubric:: Notes

   Unlike other save/load methods using snapshot restore will save DataFrames alongside other
   objects in HDF5. Thus, they are returned within the dictionary as a dataframe.


.. py:function:: save_checkpoint(name='', path='.akdata', mode: Literal['overwrite', 'preserve_previous', 'error'] = 'overwrite')

   Save the server's state.

   Records some metadata about the server, and saves
   all pdarrays into parquet files.

   :param name: Name of the checkpoint. The default will be the server session ID, which
                is typically in format ``id_<hash>_``. A directory will be created in
                ``path`` with this name.
   :type name: str
   :param path: The directory to save the checkpoint. If the directory doesn't exist, it
                will be created. If it exists, a new directory for the checkpoint
                instance will be created inside this directory.
   :type path: str
   :param mode: How to handle an existing checkpoint with the same name.
                - ``'overwrite'`` (default): overwrite the checkpoint files.
                - ``'preserve_previous'``: rename existing checkpoint to ``<name>.prev``,
                  overwriting that if it exists.
                - ``'error'``: raise an error if the checkpoint exists.
   :type mode: {'overwrite', 'preserve_previous', 'error'}

   .. rubric:: Notes

   Only ``pdarray``s are saved. Other data structures will not be recorded. We
   expect to expand the coverage in the future.

   :returns: The checkpoint name, which will be the same as the ``name`` argument if
             it was passed.
   :rtype: str

   .. rubric:: Examples

   >>> import arkouda as ak
   >>> arr = ak.zeros(10, int)
   >>> arr[2] = 2
   >>> arr[2]
   np.int64(2)
   >>> cp_name = ak.save_checkpoint()
   >>> arr[2] = 3
   >>> arr[2]
   np.int64(3)
   >>> ak.load_checkpoint(cp_name) # doctest: +SKIP
   >>> arr[2]
   np.int64(3)

   .. seealso:: :py:obj:`load_checkpoint`


.. py:function:: snapshot(filename)

   Create a snapshot of the current Arkouda namespace.

   All currently accessible variables containing
   Arkouda objects will be written to an HDF5 file.

   Unlike other save/load functions, this maintains the integrity of dataframes.

   Current Variable names are used as the dataset name when saving.

   :param filename: Name to use when storing file
   :type filename: str

   .. seealso:: :py:obj:`ak.restore`


.. py:function:: to_csv(columns: Union[Mapping[str, Union[arkouda.numpy.pdarrayclass.pdarray, arkouda.numpy.strings.Strings]], List[Union[arkouda.numpy.pdarrayclass.pdarray, arkouda.numpy.strings.Strings]]], prefix_path: str, names: Optional[List[str]] = None, col_delim: str = ',', overwrite: bool = False)

   Write Arkouda object(s) to CSV file(s).

   All CSV Files written by Arkouda
   include a header denoting data types of the columns.

   :param columns: The objects to be written to CSV file. If a mapping is used and `names` is None
                   the keys of the mapping will be used as the dataset names.
   :type columns: Mapping[str, pdarray] or List[pdarray]
   :param prefix_path: The filename prefix to be used for saving files. Files will have _LOCALE#### appended
                       when they are written to disk.
   :type prefix_path: str
   :param names: names of dataset to be written. Order should correspond to the order of data
                 provided in `columns`.
   :type names: List[str] (Optional)
   :param col_delim: Defaults to ",". Value to be used to separate columns within the file.
                     Please be sure that the value used DOES NOT appear in your dataset.
   :type col_delim: str
   :param overwrite: Defaults to False. If True, any existing files matching your provided prefix_path will
                     be overwritten. If False, an error will be returned if existing files are found.
   :type overwrite: bool

   :raises ValueError: Raised if any datasets are present in all csv files or if one or
       more of the specified files do not exist
   :raises RuntimeError: Raised if one or more of the specified files cannot be opened.
       If `allow_errors` is true this may be raised if no values are returned
       from the server.
   :raises TypeError: Raised if we receive an unknown arkouda_type returned from the server

   .. seealso:: :py:obj:`read_csv`

   .. rubric:: Notes

   - CSV format is not currently supported by load/load_all operations
   - The column delimiter is expected to be the same for column names and data
   - Be sure that column delimiters are not found within your data.
   - All CSV files must delimit rows using newline (``\\n``) at this time.
   - Unlike other file formats, CSV files store Strings as their UTF-8 format instead of storing
     bytes as uint(8).


.. py:function:: to_hdf(columns: Union[Mapping[str, Union[arkouda.numpy.pdarrayclass.pdarray, arkouda.numpy.strings.Strings, arkouda.numpy.segarray.SegArray]], List[Union[arkouda.numpy.pdarrayclass.pdarray, arkouda.numpy.strings.Strings, arkouda.numpy.segarray.SegArray]]], prefix_path: str, names: Optional[List[str]] = None, mode: Literal['truncate', 'append'] = 'truncate', file_type: Literal['single', 'distribute'] = 'distribute') -> None

   Save multiple named pdarrays to HDF5 files.

   :param columns: Collection of arrays to save
   :type columns: dict or list of pdarrays
   :param prefix_path: Directory and filename prefix for output files
   :type prefix_path: str
   :param names: Dataset names for the pdarrays
   :type names: list of str
   :param mode: By default, truncate (overwrite) the output files if they exist.
                If 'append', attempt to create new dataset in existing files.
   :type mode: {"truncate", "append"}
   :param file_type: Default: distribute
                     Single writes the dataset to a single file
                     Distribute writes the dataset to a file per locale.
   :type file_type: {"single", "distribute"}

   :raises ValueError: Raised if (1) the lengths of columns and values differ or (2) the mode
       is not 'truncate' or 'append'
   :raises RuntimeError: Raised if a server-side error is thrown saving the pdarray

   .. seealso:: :py:obj:`to_parquet`, :py:obj:`load`, :py:obj:`load_all`, :py:obj:`read`

   .. rubric:: Notes

   Creates one file per locale containing that locale's chunk of each pdarray.
   If columns is a dictionary, the keys are used as the HDF5 dataset names.
   Otherwise, if no names are supplied, 0-up integers are used. By default,
   any existing files at path_prefix will be overwritten, unless the user
   specifies the 'append' mode, in which case arkouda will attempt to add
   <columns> as new datasets to existing files. If the wrong number of files
   is present or dataset names already exist, a RuntimeError is raised.

   .. rubric:: Examples

   >>> import arkouda as ak
   >>> a = ak.arange(25)
   >>> b = ak.arange(25)

   Save with mapping defining dataset names

   >>> ak.to_hdf({'a': a, 'b': b}, 'path/name_prefix') # doctest: +SKIP

   Save using names instead of mapping

   >>> ak.to_hdf([a, b], 'path/name_prefix', names=['a', 'b']) # doctest: +SKIP


.. py:function:: to_parquet(columns: Union[Mapping[str, Union[arkouda.numpy.pdarrayclass.pdarray, arkouda.numpy.strings.Strings, arkouda.numpy.segarray.SegArray]], List[Union[arkouda.numpy.pdarrayclass.pdarray, arkouda.numpy.strings.Strings, arkouda.numpy.segarray.SegArray]]], prefix_path: str, names: Optional[List[str]] = None, mode: Literal['truncate', 'append'] = 'truncate', compression: Optional[str] = None, convert_categoricals: bool = False) -> None

   Save multiple named arrays to Parquet files.

   :param columns:             List[Union[pdarray, Strings, SegArray]]]
                   Collection of arrays to save.
   :type columns: Union[Mapping[str, Union[pdarray, Strings, SegArray]],
   :param prefix_path: Directory and filename prefix for the output files.
   :type prefix_path: str
   :param names: Dataset names for the arrays when ``columns`` is provided as a list.
   :type names: Optional[List[str]], default=None
   :param mode: If ``"truncate"``, overwrite any existing output files. If
                ``"append"``, attempt to create a new dataset in existing files.

                ``"append"`` is deprecated. Use the multi-column write instead.
   :type mode: Literal["truncate", "append"], default="truncate"
   :param compression: Compression type to use when writing the file. Supported values
                       include ``"snappy"``, ``"gzip"``, ``"brotli"``, ``"zstd"``,
                       and ``"lz4"``.
   :type compression: Optional[str], default=None
   :param convert_categoricals: Parquet requires all columns to have the same size, and
                                ``Categorical`` objects do not satisfy that requirement. If set
                                to ``True``, write the equivalent ``Strings`` in place of any
                                ``Categorical`` columns.
   :type convert_categoricals: bool, default=False

   :raises ValueError: Raised if the lengths of ``columns`` and ``names`` differ, or if
       ``mode`` is not ``"truncate"`` or ``"append"``.
   :raises RuntimeError: Raised if a server-side error occurs while saving the arrays.

   .. seealso:: :py:obj:`to_hdf`, :py:obj:`load`, :py:obj:`load_all`, :py:obj:`read`

   .. rubric:: Notes

   Creates one file per locale containing that locale's chunk of each array.

   If ``columns`` is a dictionary, its keys are used as the Parquet
   column names. Otherwise, if no ``names`` are supplied, integer names
   starting at ``0`` are used.

   By default, any existing files at ``prefix_path`` are deleted
   regardless of whether they would be overwritten. If ``mode="append"``,
   Arkouda attempts to add ``columns`` as new datasets to existing files.
   If the wrong number of files is present or dataset names already
   exist, a ``RuntimeError`` is raised.

   .. rubric:: Examples

   >>> import arkouda as ak
   >>> a = ak.arange(25)
   >>> b = ak.arange(25)

   Save with a mapping defining dataset names:

   >>> ak.to_parquet({"a": a, "b": b}, "path/name_prefix")  # doctest: +SKIP

   Save using ``names`` instead of a mapping:

   >>> ak.to_parquet([a, b], "path/name_prefix", names=["a", "b"])  # doctest: +SKIP


.. py:function:: to_zarr(store_path: str, arr: arkouda.numpy.pdarrayclass.pdarray, chunk_shape)

   Write a pdarray to disk as a Zarr store.

   Supports multi-dimensional pdarrays of numeric types.
   To use this function, ensure you have installed the blosc dependency (`make install-blosc`)
   and have included `ZarrMsg.chpl` in the `ServerModules.cfg` file.

   :param store_path: The path at which Zarr store should be written
   :type store_path: str
   :param arr: The pdarray to be written to disk
   :type arr: pdarray
   :param chunk_shape: The shape of the chunks to be used in the Zarr store
   :type chunk_shape: tuple

   :raises ValueError: Raised if the number of dimensions in the chunk shape does not match
       the number of dimensions in the array or if the array is not a 32 or 64 bit numeric type


.. py:function:: update_hdf(columns: Union[Mapping[str, Union[arkouda.numpy.pdarrayclass.pdarray, arkouda.numpy.strings.Strings, arkouda.numpy.segarray.SegArray]], List[Union[arkouda.numpy.pdarrayclass.pdarray, arkouda.numpy.strings.Strings, arkouda.numpy.segarray.SegArray]]], prefix_path: str, names: Optional[List[str]] = None, repack: bool = True)

   Overwrite the datasets with name appearing in names or keys in columns if columns is a dictionary.

   :param columns: Collection of arrays to save
   :type columns: dict or list of pdarrays
   :param prefix_path: Directory and filename prefix for output files
   :type prefix_path: str
   :param names: Dataset names for the pdarrays
   :type names: list of str
   :param repack: Default: True
                  HDF5 does not release memory on delete. When True, the inaccessible
                  data (that was overwritten) is removed. When False, the data remains, but is
                  inaccessible. Setting to false will yield better performance, but will cause
                  file sizes to expand.
   :type repack: bool

   :raises RuntimeError: Raised if a server-side error is thrown saving the datasets

   .. rubric:: Notes

   - If file does not contain File_Format attribute to indicate how it was saved,
     the file name is checked for _LOCALE#### to determine if it is distributed.
   - If the datasets provided do not exist, they will be added
   - Because HDF5 deletes do not release memory, this will create a copy of the
     file with the new data
   - This workflow is slightly different from `to_hdf` to prevent reading and
     creating a copy of the file for each dataset