arkouda.index¶
Classes¶
Module Contents¶
- class arkouda.index.Index(values: List | arkouda.pdarrayclass.pdarray | arkouda.Strings | arkouda.Categorical | pandas.Index | Index | pandas.Categorical, name: str | None = None, allow_list=False, max_list_size=1000)[source]¶
-
- equals(other: Index) bool [source]¶
Whether Indexes are the same size, and all entries are equal.
- Parameters:
other (object) – object to compare.
- Returns:
True if the Indexes are the same, o.w. False.
- Return type:
bool
Examples
>>> import arkouda as ak >>> ak.connect() >>> i = ak.Index([1, 2, 3]) >>> i_cpy = ak.Index([1, 2, 3]) >>> i.equals(i_cpy) True >>> i2 = ak.Index([1, 2, 4]) >>> i.equals(i2) False
MultiIndex case:
>>> arrays = [ak.array([1, 1, 2, 2]), ak.array(["red", "blue", "red", "blue"])] >>> m = ak.MultiIndex(arrays, names=["numbers2", "colors2"]) >>> m.equals(m) True >>> arrays2 = [ak.array([1, 1, 2, 2]), ak.array(["red", "blue", "red", "green"])] >>> m2 = ak.MultiIndex(arrays2, names=["numbers2", "colors2"]) >>> m.equals(m2) False
- property index¶
- This is maintained to support older code
- property inferred_type: str¶
Return a string of the type inferred from the values.
- is_registered()[source]¶
Return True iff the object is contained in the registry or is a component of a registered object.
- Returns:
Indicates if the object is contained in the registry
- Return type:
numpy.bool
- Raises:
RegistrationError – Raised if there’s a server-side error or a mis-match of registered components
See also
register
,attach
,unregister
Notes
Objects registered with the server are immune to deletion until they are unregistered.
- map(arg: dict | arkouda.series.Series) Index [source]¶
Map values of Index according to an input mapping.
- Parameters:
arg (dict or Series) – The mapping correspondence.
- Returns:
A new index with the values transformed by the mapping correspondence.
- Return type:
- Raises:
TypeError – Raised if arg is not of type dict or arkouda.Series. Raised if index values not of type pdarray, Categorical, or Strings.
Examples
>>> import arkouda as ak >>> ak.connect() >>> idx = ak.Index(ak.array([2, 3, 2, 3, 4])) >>> display(idx) Index(array([2 3 2 3 4]), dtype='int64') >>> idx.map({4: 25.0, 2: 30.0, 1: 7.0, 3: 5.0}) Index(array([30.00000000000000000 5.00000000000000000 30.00000000000000000 5.00000000000000000 25.00000000000000000]), dtype='float64') >>> s2 = ak.Series(ak.array(["a","b","c","d"]), index = ak.array([4,2,1,3])) >>> idx.map(s2) Index(array(['b', 'b', 'd', 'd', 'a']), dtype='<U0')
- max_list_size¶
- memory_usage(unit='B')[source]¶
Return the memory usage of the Index values.
- Parameters:
unit (str, default = "B") – Unit to return. One of {‘B’, ‘KB’, ‘MB’, ‘GB’}.
- Returns:
Bytes of memory consumed.
- Return type:
int
See also
arkouda.pdarrayclass.nbytes
,arkouda.index.MultiIndex.memory_usage
,arkouda.series.Series.memory_usage
,arkouda.dataframe.DataFrame.memory_usage
Examples
>>> import arkouda as ak >>> ak.connect() >>> idx = Index(ak.array([1, 2, 3])) >>> idx.memory_usage() 24
- property names¶
- Return Index or MultiIndex names.
- property ndim¶
- Number of dimensions of the underlying data, by definition 1.
See also
- property nlevels¶
- Integer number of levels in this Index.
- An Index will always have 1 level.
- .. seealso:: :obj:`MultiIndex.nlevels`
- objType = 'Index'¶
Sequence used for indexing and alignment.
The basic object storing axis labels for all DataFrame objects.
- Parameters:
values (List, pdarray, Strings, Categorical, pandas.Categorical, pandas.Index, or Index)
name (str, default=None) – Name to be stored in the index.
False (allow_list =) – If False, list values will be converted to a pdarray. If True, list values will remain as a list, provided the data length is less than max_list_size.
- :paramIf False, list values will be converted to a pdarray.
If True, list values will remain as a list, provided the data length is less than max_list_size.
- Parameters:
1000 (max_list_size =) – This is the maximum allowed data length for the values to be stored as a list object.
- Raises:
ValueError – Raised if allow_list=True and the size of values is > max_list_size.
See also
Examples
>>> ak.Index([1, 2, 3]) Index(array([1 2 3]), dtype='int64')
>>> ak.Index(list('abc')) Index(array(['a', 'b', 'c']), dtype='<U0')
>>> ak.Index([1, 2, 3], allow_list=True) Index([1, 2, 3], dtype='int64')
- register(user_defined_name)[source]¶
Register this Index object and underlying components with the Arkouda server
- Parameters:
user_defined_name (str) – user defined name the Index is to be registered under, this will be the root name for underlying components
- Returns:
The same Index which is now registered with the arkouda server and has an updated name. This is an in-place modification, the original is returned to support a fluid programming style. Please note you cannot register two different Indexes with the same name.
- Return type:
- Raises:
TypeError – Raised if user_defined_name is not a str
RegistrationError – If the server was unable to register the Index with the user_defined_name
See also
unregister
,attach
,is_registered
Notes
Objects registered with the server are immune to deletion until they are unregistered.
- registered_name: str | None = None¶
- save(prefix_path: str, dataset: str = 'index', mode: str = 'truncate', compression: str | None = None, file_format: str = 'HDF5', file_type: str = 'distribute') str [source]¶
DEPRECATED Save the index to HDF5 or Parquet. The result is a collection of files, one file per locale of the arkouda server, where each filename starts with prefix_path. Each locale saves its chunk of the array to its corresponding file. :param prefix_path: Directory and filename prefix that all output files share :type prefix_path: str :param dataset: Name of the dataset to create in files (must not already exist) :type dataset: str :param mode: By default, truncate (overwrite) output files, if they exist.
If ‘append’, attempt to create new dataset in existing files.
- Parameters:
compression (str (Optional)) – (None | “snappy” | “gzip” | “brotli” | “zstd” | “lz4”) Sets the compression type used with Parquet files
file_format (str {'HDF5', 'Parquet'}) – By default, saved files will be written to the HDF5 file format. If ‘Parquet’, the files will be written to the Parquet file format. This is case insensitive.
file_type (str ("single" | "distribute")) – Default: “distribute” When set to single, dataset is written to a single file. When distribute, dataset is written on a file per locale. This is only supported by HDF5 files and will have no impact of Parquet Files.
- Return type:
string message indicating result of save operation
- Raises:
RuntimeError – Raised if a server-side error is thrown saving the pdarray
ValueError – Raised if there is an error in parsing the prefix path pointing to file write location or if the mode parameter is neither truncate nor append
TypeError – Raised if any one of the prefix_path, dataset, or mode parameters is not a string. Raised if the Index values are a list.
See also
save_all
,load
,read
,to_parquet
,to_hdf
Notes
The prefix_path must be visible to the arkouda server and the user must have write permission. Output files have names of the form
<prefix_path>_LOCALE<i>
, where<i>
ranges from 0 tonumLocales
. If any of the output files already exist and the mode is ‘truncate’, they will be overwritten. If the mode is ‘append’ and the number of output files is less than the number of locales or a dataset with the same name already exists, aRuntimeError
will result. Previously all files saved in Parquet format were saved with a.parquet
file extension. This will require you to use load as if you saved the file with the extension. Try this if an older file is not being found. Any file extension can be used. The file I/O does not rely on the extension to determine the file format.
- set_dtype(dtype)[source]¶
Change the data type of the index
Currently only aku.ip_address and ak.array are supported.
- property shape¶
- to_csv(prefix_path: str, dataset: str = 'index', col_delim: str = ',', overwrite: bool = False)[source]¶
Write Index to CSV file(s). File will contain a single column with the pdarray data. All CSV Files written by Arkouda include a header denoting data types of the columns.
- prefix_path: str
The filename prefix to be used for saving files. Files will have _LOCALE#### appended when they are written to disk.
- dataset: str
Column name to save the pdarray under. Defaults to “array”.
- col_delim: str
Defaults to “,”. Value to be used to separate columns within the file. Please be sure that the value used DOES NOT appear in your dataset.
- overwrite: bool
Defaults to False. If True, any existing files matching your provided prefix_path will be overwritten. If False, an error will be returned if existing files are found.
str reponse message
- ValueError
Raised if all datasets are not present in all parquet files or if one or more of the specified files do not exist.
- RuntimeError
Raised if one or more of the specified files cannot be opened. If allow_errors is true this may be raised if no values are returned from the server.
- TypeError
Raised if we receive an unknown arkouda_type returned from the server. Raised if the Index values are a list.
CSV format is not currently supported by load/load_all operations
The column delimiter is expected to be the same for column names and data
Be sure that column delimiters are not found within your data.
All CSV files must delimit rows using newline (`
`) at this time.
- to_hdf(prefix_path: str, dataset: str = 'index', mode: str = 'truncate', file_type: str = 'distribute') str [source]¶
Save the Index to HDF5. The object can be saved to a collection of files or single file. :param prefix_path: Directory and filename prefix that all output files share :type prefix_path: str :param dataset: Name of the dataset to create in files (must not already exist) :type dataset: str :param mode: By default, truncate (overwrite) output files, if they exist.
If ‘append’, attempt to create new dataset in existing files.
- Parameters:
file_type (str ("single" | "distribute")) – Default: “distribute” When set to single, dataset is written to a single file. When distribute, dataset is written on a file per locale. This is only supported by HDF5 files and will have no impact of Parquet Files.
- Return type:
string message indicating result of save operation
- Raises:
RuntimeError – Raised if a server-side error is thrown saving the pdarray
TypeError – Raised if the Index values are a list.
Notes
The prefix_path must be visible to the arkouda server and the user must
have write permission. - Output files have names of the form
<prefix_path>_LOCALE<i>
, where<i>
ranges from 0 tonumLocales
for file_type=’distribute’. Otherwise, the file name will be prefix_path. - If any of the output files already exist and the mode is ‘truncate’, they will be overwritten. If the mode is ‘append’ and the number of output files is less than the number of locales or a dataset with the same name already exists, aRuntimeError
will result. - Any file extension can be used.The file I/O does not rely on the extension to determine the file format.
- to_parquet(prefix_path: str, dataset: str = 'index', mode: str = 'truncate', compression: str | None = None)[source]¶
Save the Index to Parquet. The result is a collection of files, one file per locale of the arkouda server, where each filename starts with prefix_path. Each locale saves its chunk of the array to its corresponding file. :param prefix_path: Directory and filename prefix that all output files share :type prefix_path: str :param dataset: Name of the dataset to create in files (must not already exist) :type dataset: str :param mode: By default, truncate (overwrite) output files, if they exist.
If ‘append’, attempt to create new dataset in existing files.
- Parameters:
compression (str (Optional)) – (None | “snappy” | “gzip” | “brotli” | “zstd” | “lz4”) Sets the compression type used with Parquet files
- Return type:
string message indicating result of save operation
- Raises:
RuntimeError – Raised if a server-side error is thrown saving the pdarray
TypeError – Raised if the Index values are a list.
Notes
The prefix_path must be visible to the arkouda server and the user must
have write permission. - Output files have names of the form
<prefix_path>_LOCALE<i>
, where<i>
ranges from 0 tonumLocales
for file_type=’distribute’. - ‘append’ write mode is supported, but is not efficient. - If any of the output files already exist and the mode is ‘truncate’, they will be overwritten. If the mode is ‘append’ and the number of output files is less than the number of locales or a dataset with the same name already exists, aRuntimeError
will result. - Any file extension can be used.The file I/O does not rely on the extension to determine the file format.
- unregister()[source]¶
Unregister this Index object in the arkouda server which was previously registered using register() and/or attached to using attach()
- Raises:
RegistrationError – If the object is already unregistered or if there is a server error when attempting to unregister
See also
register
,attach
,is_registered
Notes
Objects registered with the server are immune to deletion until they are unregistered.
- update_hdf(prefix_path: str, dataset: str = 'index', repack: bool = True)[source]¶
Overwrite the dataset with the name provided with this Index object. If the dataset does not exist it is added.
- Parameters:
prefix_path (str) – Directory and filename prefix that all output files share
dataset (str) – Name of the dataset to create in files
repack (bool) – Default: True HDF5 does not release memory on delete. When True, the inaccessible data (that was overwritten) is removed. When False, the data remains, but is inaccessible. Setting to false will yield better performance, but will cause file sizes to expand.
- Return type:
str - success message if successful
- Raises:
RuntimeError – Raised if a server-side error is thrown saving the index
Notes
If file does not contain File_Format attribute to indicate how it was saved, the file name is checked for _LOCALE#### to determine if it is distributed.
If the dataset provided does not exist, it will be added
Because HDF5 deletes do not release memory, this will create a copy of the file with the new data
- class arkouda.index.MultiIndex(data: list | tuple | pandas.MultiIndex | MultiIndex, name: str | None = None, names: list[str] | None = None)[source]¶
Bases:
Index
- property dtype: numpy.dtype¶
Return the dtype object of the underlying data.
- equal_levels(other: MultiIndex) bool [source]¶
Return True if the levels of both MultiIndex objects are the same
- first = True¶
- property index¶
- This is maintained to support older code
- property inferred_type: str¶
Return a string of the type inferred from the values.
- is_registered()[source]¶
Return True iff the object is contained in the registry or is a component of a registered object.
- Returns:
Indicates if the object is contained in the registry
- Return type:
numpy.bool
- Raises:
RegistrationError – Raised if there’s a server-side error or a mis-match of registered components
See also
register
,attach
,unregister
Notes
Objects registered with the server are immune to deletion until they are unregistered.
- levels: list¶
- memory_usage(unit='B')[source]¶
Return the memory usage of the MultiIndex levels.
- Parameters:
unit (str, default = "B") – Unit to return. One of {‘B’, ‘KB’, ‘MB’, ‘GB’}.
- Returns:
Bytes of memory consumed.
- Return type:
int
See also
arkouda.pdarrayclass.nbytes
,arkouda.index.Index.memory_usage
,arkouda.series.Series.memory_usage
,arkouda.dataframe.DataFrame.memory_usage
Examples
>>> import arkouda as ak >>> ak.connect() >>> m = ak.index.MultiIndex([ak.array([1,2,3]),ak.array([4,5,6])]) >>> m.memory_usage() 48
- property name¶
- Return Index or MultiIndex name.
- property names¶
- Return Index or MultiIndex names.
- property ndim¶
- Number of dimensions of the underlying data, by definition 1.
See also
- property nlevels: int¶
Integer number of levels in this MultiIndex.
See also
- objType = 'MultiIndex'¶
Sequence used for indexing and alignment.
The basic object storing axis labels for all DataFrame objects.
- Parameters:
values (List, pdarray, Strings, Categorical, pandas.Categorical, pandas.Index, or Index)
name (str, default=None) – Name to be stored in the index.
False (allow_list =) – If False, list values will be converted to a pdarray. If True, list values will remain as a list, provided the data length is less than max_list_size.
- :paramIf False, list values will be converted to a pdarray.
If True, list values will remain as a list, provided the data length is less than max_list_size.
- Parameters:
1000 (max_list_size =) – This is the maximum allowed data length for the values to be stored as a list object.
- Raises:
ValueError – Raised if allow_list=True and the size of values is > max_list_size.
See also
Examples
>>> ak.Index([1, 2, 3]) Index(array([1 2 3]), dtype='int64')
>>> ak.Index(list('abc')) Index(array(['a', 'b', 'c']), dtype='<U0')
>>> ak.Index([1, 2, 3], allow_list=True) Index([1, 2, 3], dtype='int64')
- register(user_defined_name)[source]¶
Register this Index object and underlying components with the Arkouda server
- Parameters:
user_defined_name (str) – user defined name the Index is to be registered under, this will be the root name for underlying components
- Returns:
The same Index which is now registered with the arkouda server and has an updated name. This is an in-place modification, the original is returned to support a fluid programming style. Please note you cannot register two different Indexes with the same name.
- Return type:
- Raises:
TypeError – Raised if user_defined_name is not a str
RegistrationError – If the server was unable to register the Index with the user_defined_name
See also
unregister
,attach
,is_registered
Notes
Objects registered with the server are immune to deletion until they are unregistered.
- registered_name: str | None = None¶
- set_dtype(dtype)[source]¶
Change the data type of the index
Currently only aku.ip_address and ak.array are supported.
- to_hdf(prefix_path: str, dataset: str = 'index', mode: str = 'truncate', file_type: str = 'distribute') str [source]¶
Save the Index to HDF5. The object can be saved to a collection of files or single file. :param prefix_path: Directory and filename prefix that all output files share :type prefix_path: str :param dataset: Name of the dataset to create in files (must not already exist) :type dataset: str :param mode: By default, truncate (overwrite) output files, if they exist.
If ‘append’, attempt to create new dataset in existing files.
- Parameters:
file_type (str ("single" | "distribute")) – Default: “distribute” When set to single, dataset is written to a single file. When distribute, dataset is written on a file per locale. This is only supported by HDF5 files and will have no impact of Parquet Files.
- Return type:
string message indicating result of save operation
- Raises:
RuntimeError – Raised if a server-side error is thrown saving the pdarray.
Notes
The prefix_path must be visible to the arkouda server and the user must
have write permission. - Output files have names of the form
<prefix_path>_LOCALE<i>
, where<i>
ranges from 0 tonumLocales
for file_type=’distribute’. Otherwise, the file name will be prefix_path. - If any of the output files already exist and the mode is ‘truncate’, they will be overwritten. If the mode is ‘append’ and the number of output files is less than the number of locales or a dataset with the same name already exists, aRuntimeError
will result. - Any file extension can be used.The file I/O does not rely on the extension to determine the file format.
- unregister()[source]¶
Unregister this Index object in the arkouda server which was previously registered using register() and/or attached to using attach()
- Raises:
RegistrationError – If the object is already unregistered or if there is a server error when attempting to unregister
See also
register
,attach
,is_registered
Notes
Objects registered with the server are immune to deletion until they are unregistered.
- update_hdf(prefix_path: str, dataset: str = 'index', repack: bool = True)[source]¶
Overwrite the dataset with the name provided with this Index object. If the dataset does not exist it is added.
- Parameters:
prefix_path (str) – Directory and filename prefix that all output files share
dataset (str) – Name of the dataset to create in files
repack (bool) – Default: True HDF5 does not release memory on delete. When True, the inaccessible data (that was overwritten) is removed. When False, the data remains, but is inaccessible. Setting to false will yield better performance, but will cause file sizes to expand.
- Return type:
str - success message if successful
- Raises:
RuntimeError – Raised if a server-side error is thrown saving the index
TypeError – Raised if the Index levels are a list.
Notes
If file does not contain File_Format attribute to indicate how it was saved, the file name is checked for _LOCALE#### to determine if it is distributed.
If the dataset provided does not exist, it will be added
Because HDF5 deletes do not release memory, this will create a copy of the file with the new data