arkouda.index
=============

.. py:module:: arkouda.index


Classes
-------

.. autoapisummary::

   arkouda.index.Index
   arkouda.index.MultiIndex


Package Contents
----------------

.. py:class:: Index(values: Union[List, arkouda.numpy.pdarrayclass.pdarray, arkouda.numpy.strings.Strings, arkouda.pandas.categorical.Categorical, pandas.Index, Index, pandas.Categorical], name: Optional[str] = None, allow_list=False, max_list_size=1000)

   .. py:method:: argsort(ascending=True)


   .. py:method:: concat(other)


   .. py:method:: equals(other: Index) -> arkouda.numpy.dtypes.bool_scalars

      Whether Indexes are the same size, and all entries are equal.

      :param other: object to compare.
      :type other: object

      :returns: True if the Indexes are the same, o.w. False.
      :rtype: bool_scalars

      .. rubric:: Examples

      >>> import arkouda as ak
      >>> ak.connect()
      >>> i = ak.Index([1, 2, 3])
      >>> i_cpy = ak.Index([1, 2, 3])
      >>> i.equals(i_cpy)
      True
      >>> i2 = ak.Index([1, 2, 4])
      >>> i.equals(i2)
      False

      MultiIndex case:

      >>> arrays = [ak.array([1, 1, 2, 2]), ak.array(["red", "blue", "red", "blue"])]
      >>> m = ak.MultiIndex(arrays, names=["numbers2", "colors2"])
      >>> m.equals(m)
      True
      >>> arrays2 = [ak.array([1, 1, 2, 2]), ak.array(["red", "blue", "red", "green"])]
      >>> m2 = ak.MultiIndex(arrays2, names=["numbers2", "colors2"])
      >>> m.equals(m2)
      False


   .. py:method:: factory(index)
      :staticmethod:


   .. py:method:: from_return_msg(rep_msg)
      :classmethod:


   .. py:property:: index

      Deprecated alias for `values`.

      This property is maintained for backward compatibility and returns the same
      array as the `values` attribute. It will be removed in a future release;
      use `values` directly instead.

      :returns: * *arkouda.numpy.pdarray* -- The underlying values of this object (same as `values`).
                * *Deprecated*
                * *----------*
                * Use the `values` attribute directly. This alias will be removed in a future release.

      .. rubric:: Examples

      >>> import arkouda as ak
      >>> idx = ak.Index(ak.array([1, 2, 3]))
      >>> idx.index
      array([1 2 3])


   .. py:property:: inferred_type
      :type: str


      Return a string of the type inferred from the values.


   .. py:method:: is_registered()

      Return whether the object is registered.

      Return True iff the object is contained in the registry or is a component of a
      registered object.

      :returns: Indicates if the object is contained in the registry
      :rtype: numpy.bool

      :raises RegistrationError: Raised if there's a server-side error or a mis-match of registered components

      .. seealso:: :py:obj:`register`, :py:obj:`attach`, :py:obj:`unregister`

      .. rubric:: Notes

      Objects registered with the server are immune to deletion until
      they are unregistered.


   .. py:property:: is_unique

      Property indicating if all values in the index are unique.

      :rtype: bool - True if all values are unique, False otherwise.


   .. py:method:: lookup(key)


   .. py:method:: map(arg: Union[dict, arkouda.pandas.series.Series]) -> Index

      Map values of Index according to an input mapping.

      :param arg: The mapping correspondence.
      :type arg: dict or Series

      :returns: A new index with the values transformed by the mapping correspondence.
      :rtype: arkouda.index.Index

      :raises TypeError: Raised if arg is not of type dict or arkouda.pandas.Series.
          Raised if index values not of type pdarray, Categorical, or Strings.

      .. rubric:: Examples

      >>> import arkouda as ak
      >>> ak.connect()
      >>> idx = ak.Index(ak.array([2, 3, 2, 3, 4]))
      >>> display(idx)
      Index(array([2 3 2 3 4]), dtype='int64')
      >>> idx.map({4: 25.0, 2: 30.0, 1: 7.0, 3: 5.0})
      Index(array([30.00000000000000000 5.00000000000000000 30.00000000000000000
      5.00000000000000000 25.00000000000000000]), dtype='float64')
      >>> s2 = ak.Series(ak.array(["a","b","c","d"]), index = ak.array([4,2,1,3]))
      >>> idx.map(s2)
      Index(array(['b', 'b', 'd', 'd', 'a']), dtype='<U0')


   .. py:attribute:: max_list_size
      :value: 1000


   .. py:method:: memory_usage(unit='B')

      Return the memory usage of the Index values.

      :param unit: Unit to return. One of {'B', 'KB', 'MB', 'GB'}.
      :type unit: str, default = "B"

      :returns: Bytes of memory consumed.
      :rtype: int

      .. seealso:: :py:obj:`arkouda.numpy.pdarrayclass.nbytes`, :py:obj:`arkouda.index.MultiIndex.memory_usage`, :py:obj:`arkouda.pandas.series.Series.memory_usage`, :py:obj:`arkouda.pandas.dataframe.DataFrame.memory_usage`

      .. rubric:: Examples

      >>> import arkouda as ak
      >>> ak.connect()
      >>> idx = Index(ak.array([1, 2, 3]))
      >>> idx.memory_usage()
      24


   .. py:property:: names

      Return Index or MultiIndex names.


   .. py:property:: ndim

      Number of dimensions of the underlying data, by definition 1.

      .. seealso:: :py:obj:`MultiIndex.ndim`


   .. py:property:: nlevels

      Integer number of levels in this Index.

      An Index will always have 1 level.

      .. seealso:: :py:obj:`MultiIndex.nlevels`


   .. py:attribute:: objType
      :value: 'Index'


      Sequence used for indexing and alignment.

      The basic object storing axis labels for all DataFrame objects.

      :param values:
      :type values: List, pdarray, Strings, Categorical, pandas.Categorical, pandas.Index, or Index
      :param name: Name to be stored in the index.
      :type name: str, default=None
      :param allow_list = False: If False, list values will be converted to a pdarray.
                                 If True, list values will remain as a list, provided the data length is less than max_list_size.
      :param : If False, list values will be converted to a pdarray.
               If True, list values will remain as a list, provided the data length is less than max_list_size.
      :param max_list_size = 1000: This is the maximum allowed data length for the values to be stored as a list object.

      :raises ValueError: Raised if allow_list=True and the size of values is > max_list_size.

      .. seealso:: :py:obj:`MultiIndex`

      .. rubric:: Examples

      >>> import arkouda as ak
      >>> ak.Index([1, 2, 3])
      Index(array([1 2 3]), dtype='int64')

      >>> ak.Index(list('abc'))
      Index(array(['a', 'b', 'c']), dtype='<U0')

      >>> ak.Index([1, 2, 3], allow_list=True)
      Index([1, 2, 3], dtype='int64')


   .. py:method:: register(user_defined_name)

      Register this Index object and underlying components with the Arkouda server.

      :param user_defined_name: user defined name the Index is to be registered under,
                                this will be the root name for underlying components
      :type user_defined_name: str

      :returns: The same Index which is now registered with the arkouda server and has an updated name.
                This is an in-place modification, the original is returned to support
                a fluid programming style.
                Please note you cannot register two different Indexes with the same name.
      :rtype: Index

      :raises TypeError: Raised if user_defined_name is not a str
      :raises RegistrationError: If the server was unable to register the Index with the user_defined_name

      .. seealso:: :py:obj:`unregister`, :py:obj:`attach`, :py:obj:`is_registered`

      .. rubric:: Notes

      Objects registered with the server are immune to deletion until
      they are unregistered.


   .. py:attribute:: registered_name
      :type:  Optional[str]
      :value: None


   .. py:method:: set_dtype(dtype)

      Change the data type of the index.

      Currently only aku.ip_address and ak.array are supported.


   .. py:property:: shape


   .. py:method:: to_csv(prefix_path: str, dataset: str = 'index', col_delim: str = ',', overwrite: bool = False)

      Write Index to CSV file(s).

      File will contain a single column with the pdarray data.
      All CSV Files written by Arkouda include a header denoting data types of the columns.

      :param prefix_path: The filename prefix to be used for saving files. Files will have _LOCALE#### appended
                          when they are written to disk.
      :type prefix_path: str
      :param dataset: Column name to save the pdarray under. Defaults to "array".
      :type dataset: str
      :param col_delim: Defaults to ",". Value to be used to separate columns within the file.
                        Please be sure that the value used DOES NOT appear in your dataset.
      :type col_delim: str
      :param overwrite: Defaults to False. If True, any existing files matching your provided prefix_path will
                        be overwritten. If False, an error will be returned if existing files are found.
      :type overwrite: bool

      :rtype: str reponse message

      :raises ValueError: Raised if all datasets are not present in all parquet files or if one or
          more of the specified files do not exist.
      :raises RuntimeError: Raised if one or more of the specified files cannot be opened.
          If `allow_errors` is true this may be raised if no values are returned
          from the server.
      :raises TypeError: Raised if we receive an unknown arkouda_type returned from the server.
          Raised if the Index values are a list.

      .. rubric:: Notes

      - CSV format is not currently supported by load/load_all operations
      - The column delimiter is expected to be the same for column names and data
      - Be sure that column delimiters are not found within your data.
      - All CSV files must delimit rows using newline (`\n`) at this time.


   .. py:method:: to_dict(label)


   .. py:method:: to_hdf(prefix_path: str, dataset: str = 'index', mode: str = 'truncate', file_type: str = 'distribute') -> str

      Save the Index to HDF5.

      The object can be saved to a collection of files or single file.

      :param prefix_path: Directory and filename prefix that all output files share
      :type prefix_path: str
      :param dataset: Name of the dataset to create in files (must not already exist)
      :type dataset: str
      :param mode: By default, truncate (overwrite) output files, if they exist.
                   If 'append', attempt to create new dataset in existing files.
      :type mode: str {'truncate' | 'append'}
      :param file_type: Default: "distribute"
                        When set to single, dataset is written to a single file.
                        When distribute, dataset is written on a file per locale.
                        This is only supported by HDF5 files and will have no impact of Parquet Files.
      :type file_type: str ("single" | "distribute")

      :rtype: string message indicating result of save operation

      :raises RuntimeError: Raised if a server-side error is thrown saving the pdarray
      :raises TypeError: Raised if the Index values are a list.

      .. rubric:: Notes

      - The prefix_path must be visible to the arkouda server and the user must
      have write permission.
      - Output files have names of the form ``<prefix_path>_LOCALE<i>``, where ``<i>``
      ranges from 0 to ``numLocales`` for `file_type='distribute'`. Otherwise,
      the file name will be `prefix_path`.
      - If any of the output files already exist and
      the mode is 'truncate', they will be overwritten. If the mode is 'append'
      and the number of output files is less than the number of locales or a
      dataset with the same name already exists, a ``RuntimeError`` will result.
      - Any file extension can be used.The file I/O does not rely on the extension to
      determine the file format.


   .. py:method:: to_ndarray()


   .. py:method:: to_pandas()

      Return the equivalent Pandas Index.


   .. py:method:: to_parquet(prefix_path: str, dataset: str = 'index', mode: str = 'truncate', compression: Optional[str] = None)

      Save the Index to Parquet.

      The result is a collection of files,
      one file per locale of the arkouda server, where each filename starts
      with prefix_path. Each locale saves its chunk of the array to its
      corresponding file.

      :param prefix_path: Directory and filename prefix that all output files share
      :type prefix_path: str
      :param dataset: Name of the dataset to create in files (must not already exist)
      :type dataset: str
      :param mode: By default, truncate (overwrite) output files, if they exist.
                   If 'append', attempt to create new dataset in existing files.
      :type mode: str {'truncate' | 'append'}
      :param compression: (None | "snappy" | "gzip" | "brotli" | "zstd" | "lz4")
                          Sets the compression type used with Parquet files
      :type compression: str (Optional)

      :rtype: string message indicating result of save operation

      :raises RuntimeError: Raised if a server-side error is thrown saving the pdarray
      :raises TypeError: Raised if the Index values are a list.

      .. rubric:: Notes

      - The prefix_path must be visible to the arkouda server and the user must
      have write permission.
      - Output files have names of the form ``<prefix_path>_LOCALE<i>``, where ``<i>``
      ranges from 0 to ``numLocales`` for `file_type='distribute'`.
      - 'append' write mode is supported, but is not efficient.
      - If any of the output files already exist and
      the mode is 'truncate', they will be overwritten. If the mode is 'append'
      and the number of output files is less than the number of locales or a
      dataset with the same name already exists, a ``RuntimeError`` will result.
      - Any file extension can be used.The file I/O does not rely on the extension to
      determine the file format.


   .. py:method:: tolist()


   .. py:method:: unregister()

      Unregister this Index object in the arkouda server.

      Unregister this Index object in the arkouda server, which was previously
      registered using register() and/or attached to using attach()

      :raises RegistrationError: If the object is already unregistered or if there is a server error
          when attempting to unregister

      .. seealso:: :py:obj:`register`, :py:obj:`attach`, :py:obj:`is_registered`

      .. rubric:: Notes

      Objects registered with the server are immune to deletion until
      they are unregistered.


   .. py:method:: update_hdf(prefix_path: str, dataset: str = 'index', repack: bool = True)

      Overwrite the dataset with the name provided with this Index object.

      If the dataset does not exist it is added.

      :param prefix_path: Directory and filename prefix that all output files share
      :type prefix_path: str
      :param dataset: Name of the dataset to create in files
      :type dataset: str
      :param repack: Default: True
                     HDF5 does not release memory on delete. When True, the inaccessible
                     data (that was overwritten) is removed. When False, the data remains, but is
                     inaccessible. Setting to false will yield better performance, but will cause
                     file sizes to expand.
      :type repack: bool

      :raises RuntimeError: Raised if a server-side error is thrown saving the index

      .. rubric:: Notes

      - If file does not contain File_Format attribute to indicate how it was saved,
        the file name is checked for _LOCALE#### to determine if it is distributed.
      - If the dataset provided does not exist, it will be added
      - Because HDF5 deletes do not release memory, this will create a copy of the
        file with the new data


.. py:class:: MultiIndex(data: Union[list, tuple, pandas.MultiIndex, MultiIndex], name: Optional[str] = None, names: Optional[list[str]] = None)

   Bases: :py:obj:`Index`


   .. py:method:: argsort(ascending=True)


   .. py:method:: concat(other)


   .. py:property:: dtype
      :type: numpy.dtype


      Return the dtype object of the underlying data.


   .. py:method:: equal_levels(other: MultiIndex) -> bool

      Return True if the levels of both MultiIndex objects are the same.


   .. py:method:: get_level_values(level: Union[str, int])


   .. py:property:: index

      Deprecated alias for `values`.

      This property is maintained for backward compatibility and returns the same
      array as the `values` attribute. It will be removed in a future release;
      use `values` directly instead.

      :returns: * *arkouda.numpy.pdarray* -- The underlying values of this object (same as `values`).
                * *Deprecated*
                * *----------*
                * Use the `values` attribute directly. This alias will be removed in a future release.

      .. rubric:: Examples

      >>> import arkouda as ak
      >>> idx = ak.Index(ak.array([1, 2, 3]))
      >>> idx.index
      array([1 2 3])


   .. py:property:: inferred_type
      :type: str


      Return a string of the type inferred from the values.


   .. py:method:: is_registered()

      Return whether the object is registered.

      Return True iff the object is contained in the registry or is a component of a
      registered object.

      :returns: Indicates if the object is contained in the registry
      :rtype: numpy.bool

      :raises RegistrationError: Raised if there's a server-side error or a mis-match of registered components

      .. seealso:: :py:obj:`register`, :py:obj:`attach`, :py:obj:`unregister`

      .. rubric:: Notes

      Objects registered with the server are immune to deletion until
      they are unregistered.


   .. py:attribute:: levels
      :type:  list


   .. py:method:: lookup(key)


   .. py:method:: memory_usage(unit='B')

      Return the memory usage of the MultiIndex levels.

      :param unit: Unit to return. One of {'B', 'KB', 'MB', 'GB'}.
      :type unit: str, default = "B"

      :returns: Bytes of memory consumed.
      :rtype: int

      .. seealso:: :py:obj:`arkouda.numpy.pdarrayclass.nbytes`, :py:obj:`arkouda.index.Index.memory_usage`, :py:obj:`arkouda.pandas.series.Series.memory_usage`, :py:obj:`arkouda.pandas.dataframe.DataFrame.memory_usage`

      .. rubric:: Examples

      >>> import arkouda as ak
      >>> ak.connect()
      >>> m = ak.index.MultiIndex([ak.array([1,2,3]),ak.array([4,5,6])])
      >>> m.memory_usage()
      48


   .. py:property:: name

      Return Index or MultiIndex name.


   .. py:property:: names

      Return Index or MultiIndex names.


   .. py:property:: ndim

      Number of dimensions of the underlying data, by definition 1.

      .. seealso:: :py:obj:`Index.ndim`


   .. py:property:: nlevels
      :type: int


      Integer number of levels in this MultiIndex.

      .. seealso:: :py:obj:`Index.nlevels`


   .. py:attribute:: objType
      :value: 'MultiIndex'


      Sequence used for indexing and alignment.

      The basic object storing axis labels for all DataFrame objects.

      :param values:
      :type values: List, pdarray, Strings, Categorical, pandas.Categorical, pandas.Index, or Index
      :param name: Name to be stored in the index.
      :type name: str, default=None
      :param allow_list = False: If False, list values will be converted to a pdarray.
                                 If True, list values will remain as a list, provided the data length is less than max_list_size.
      :param : If False, list values will be converted to a pdarray.
               If True, list values will remain as a list, provided the data length is less than max_list_size.
      :param max_list_size = 1000: This is the maximum allowed data length for the values to be stored as a list object.

      :raises ValueError: Raised if allow_list=True and the size of values is > max_list_size.

      .. seealso:: :py:obj:`MultiIndex`

      .. rubric:: Examples

      >>> import arkouda as ak
      >>> ak.Index([1, 2, 3])
      Index(array([1 2 3]), dtype='int64')

      >>> ak.Index(list('abc'))
      Index(array(['a', 'b', 'c']), dtype='<U0')

      >>> ak.Index([1, 2, 3], allow_list=True)
      Index([1, 2, 3], dtype='int64')


   .. py:method:: register(user_defined_name)

      Register this Index object and underlying components with the Arkouda server.

      :param user_defined_name: user defined name the Index is to be registered under,
                                this will be the root name for underlying components
      :type user_defined_name: str

      :returns: The same Index which is now registered with the arkouda server and has an updated name.
                This is an in-place modification, the original is returned to support
                a fluid programming style.
                Please note you cannot register two different Indexes with the same name.
      :rtype: MultiIndex

      :raises TypeError: Raised if user_defined_name is not a str
      :raises RegistrationError: If the server was unable to register the Index with the user_defined_name

      .. seealso:: :py:obj:`unregister`, :py:obj:`attach`, :py:obj:`is_registered`

      .. rubric:: Notes

      Objects registered with the server are immune to deletion until
      they are unregistered.


   .. py:attribute:: registered_name
      :type:  Optional[str]
      :value: None


   .. py:method:: set_dtype(dtype)

      Change the data type of the index.

      Currently only aku.ip_address and ak.array are supported.


   .. py:method:: to_dict(labels=None)


   .. py:method:: to_hdf(prefix_path: str, dataset: str = 'index', mode: str = 'truncate', file_type: str = 'distribute') -> str

      Save the Index to HDF5.

      The object can be saved to a collection of files or single file.

      :param prefix_path: Directory and filename prefix that all output files share
      :type prefix_path: str
      :param dataset: Name of the dataset to create in files (must not already exist)
      :type dataset: str
      :param mode: By default, truncate (overwrite) output files, if they exist.
                   If 'append', attempt to create new dataset in existing files.
      :type mode: str {'truncate' | 'append'}
      :param file_type: Default: "distribute"
                        When set to single, dataset is written to a single file.
                        When distribute, dataset is written on a file per locale.
                        This is only supported by HDF5 files and will have no impact of Parquet Files.
      :type file_type: str ("single" | "distribute")

      :rtype: string message indicating result of save operation

      :raises RuntimeError: Raised if a server-side error is thrown saving the pdarray.

      .. rubric:: Notes

      - The prefix_path must be visible to the arkouda server and the user must
      have write permission.
      - Output files have names of the form ``<prefix_path>_LOCALE<i>``, where ``<i>``
      ranges from 0 to ``numLocales`` for `file_type='distribute'`. Otherwise,
      the file name will be `prefix_path`.
      - If any of the output files already exist and
      the mode is 'truncate', they will be overwritten. If the mode is 'append'
      and the number of output files is less than the number of locales or a
      dataset with the same name already exists, a ``RuntimeError`` will result.
      - Any file extension can be used.The file I/O does not rely on the extension to
      determine the file format.


   .. py:method:: to_ndarray()


   .. py:method:: to_pandas()

      Return the equivalent Pandas Index.


   .. py:method:: tolist()


   .. py:method:: unregister()

      Unregister this Index object in the arkouda server.

      Unregister this Index object in the arkouda server, which was previously
      registered using register() and/or attached to using attach()

      :raises RegistrationError: If the object is already unregistered or if there is a server error
          when attempting to unregister

      .. seealso:: :py:obj:`register`, :py:obj:`attach`, :py:obj:`is_registered`

      .. rubric:: Notes

      Objects registered with the server are immune to deletion until
      they are unregistered.


   .. py:method:: update_hdf(prefix_path: str, dataset: str = 'index', repack: bool = True)

      Overwrite the dataset with the name provided with this Index object.

      If the dataset does not exist it is added.

      :param prefix_path: Directory and filename prefix that all output files share
      :type prefix_path: str
      :param dataset: Name of the dataset to create in files
      :type dataset: str
      :param repack: Default: True
                     HDF5 does not release memory on delete. When True, the inaccessible
                     data (that was overwritten) is removed. When False, the data remains, but is
                     inaccessible. Setting to false will yield better performance, but will cause
                     file sizes to expand.
      :type repack: bool

      :raises RuntimeError: Raised if a server-side error is thrown saving the index
      :raises TypeError: Raised if the Index levels are a list.

      .. rubric:: Notes

      - If file does not contain File_Format attribute to indicate how it was saved,
        the file name is checked for _LOCALE#### to determine if it is distributed.
      - If the dataset provided does not exist, it will be added
      - Because HDF5 deletes do not release memory, this will create a copy of the
        file with the new data