Datetime is the Arkouda analog to pandas DatetimeIndex and
other timeseries data types.
Parameters:
pda (int64 pdarray, pd.DatetimeIndex, pd.Series, or np.datetime64 array)
unit (str, default 'ns') –
For int64 pdarray, denotes the unit of the input. Ignored for pandas
and numpy arrays, which carry their own unit. Not case-sensitive;
prefixes of full names (like ‘sec’) are accepted.
Possible values:
’weeks’ or ‘w’
’days’ or ‘d’
’hours’ or ‘h’
’minutes’, ‘m’, or ‘t’
’seconds’ or ‘s’
’milliseconds’, ‘ms’, or ‘l’
’microseconds’, ‘us’, or ‘u’
’nanoseconds’, ‘ns’, or ‘n’
Unlike in pandas, units cannot be combined or mixed with integers
Notes
The .values attribute is always in nanoseconds with int64 dtype.
Register this Datetime object and underlying components with the Arkouda server.
Parameters:
user_defined_name (str) – user defined name the Datetime is to be registered under,
this will be the root name for underlying components
Returns:
The same Datetime which is now registered with the arkouda server and has an updated name.
This is an in-place modification, the original is returned to support
a fluid programming style.
Please note you cannot register two different Datetimes with the same name.
Return sum of array elements along the given axis.
Parameters:
axis (int, Tuple[int, ...], optional, default = None) – The axis or axes along which to do the operation
If None, the computation is done across the entire array.
keepdims (bool, optional, default = False) – Whether to keep the singleton dimension(s) along axis in the result.
Returns:
numeric_scalars if axis is omitted, in which case operation is done over entire array
pdarray if axis is supplied, in which case the operation is done along that axis
The bool type is not a subclass of the int_ type
(the bool is not even a number type). This is different
than Python’s default implementation of bool as a
sub-class of int.
D.update([E, ]**F) -> None. Update D from mapping/iterable E and F.
If E is present and has a .keys() method, then does: for k in E.keys(): D[k] = E[k]
If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v
In either case, this is followed by: for k in F: D[k] = F[k]
Exception raised when duplicate values are found in a set of keys that are expected to be unique.
This is typically raised in lookup and alignment operations that assume
a one-to-one mapping between keys and values.
Examples
>>> fromarkouda.numpy.alignmentimportNonUniqueError>>> raiseNonUniqueError("Duplicate values found in key array.")Traceback (most recent call last):...arkouda.numpy.alignment.NonUniqueError: Duplicate values found in key array.
Append other to self, either vertically (axis=0, length of resulting SegArray
increases), or horizontally (axis=1, each sub-array of other appends to the
corresponding sub-array of self).
Select the j-th element of each sub-array, where possible.
Parameters:
j (int) – The index of the value to get from each sub-array. If j is negative,
it counts backwards from the end of each sub-array.
return_origins (bool) – If True, return a logical index indicating where j is in bounds
compressed (bool) – If False, return array is same size as self, with default value
where j is out of bounds. If True, the return array only contains
values where j is in bounds.
default (scalar) – When compressed=False, the value to return when j is out of bounds
for the sub-array
Returns:
valpdarray
compressed=False: The j-th value of each sub-array where j is in
bounds and the default value where j is out of bounds.
compressed=True: The j-th values of only the sub-arrays where j is
in bounds
origin_indicespdarray, bool
A Boolean array that is True where j is in bounds for the sub-array.
Return all sub-arrays of length n, as a list of columns.
Parameters:
n (int) – Length of sub-arrays to select
return_origins (bool) – Return a logical index indicating which sub-arrays are length n
Returns:
columnslist of pdarray
An n-long list of pdarray, where each row is one of the n-long
sub-arrays from the SegArray. The number of rows is the number of
True values in the returned mask.
origin_indicespdarray, bool
Array of bool for each element of the SegArray, True where sub-array
has length n.
Return all sub-array prefixes of length n (for sub-arrays that are at least n+1 long).
Parameters:
n (int) – Length of suffix
return_origins (bool) – If True, return a logical index indicating which sub-arrays
were long enough to return an n-prefix
proper (bool) – If True, only return proper prefixes, i.e. from sub-arrays
that are at least n+1 long. If False, allow the entire
sub-array to be returned as a prefix.
Returns:
prefixeslist of pdarray
An n-long list of pdarrays, essentially a table where each row is an n-prefix.
The number of rows is the number of True values in the returned mask.
origin_indicespdarray, bool
Boolean array that is True where the sub-array was long enough to return
an n-suffix, False otherwise.
Return the n-long suffix of each sub-array, where possible.
Parameters:
n (int) – Length of suffix
return_origins (bool) – If True, return a logical index indicating which sub-arrays
were long enough to return an n-suffix
proper (bool) – If True, only return proper suffixes, i.e. from sub-arrays
that are at least n+1 long. If False, allow the entire
sub-array to be returned as a suffix.
Returns:
suffixeslist of pdarray
An n-long list of pdarrays, essentially a table where each row is an n-suffix.
The number of rows is the number of True values in the returned mask.
origin_indicespdarray, bool
Boolean array that is True where the sub-array was long enough to return
an n-suffix, False otherwise.
Register this SegArray object and underlying components with the Arkouda server.
Parameters:
user_defined_name (str) – user defined name which this SegArray object will be registered under
Returns:
The same SegArray which is now registered with the arkouda server and has an updated name.
This is an in-place modification, the original is returned to support
a fluid programming style.
Please note you cannot register two different SegArrays with the same name.
Save the SegArray to HDF5. The result is a collection of HDF5 files, one file
per locale of the arkouda server, where each filename starts with prefix_path.
Parameters:
prefix_path (str) – Directory and filename prefix that all output files will share
dataset (str) – Name prefix for saved data within the HDF5 file
mode ({'truncate', 'append'}) – By default, truncate (overwrite) output files, if they exist.
If ‘append’, add data as a new column to existing files.
file_type ({"single", "distribute"}) – Default: “distribute”
When set to single, dataset is written to a single file.
When distribute, dataset is written on a file per locale.
This is only supported by HDF5 files and will have no impact of Parquet Files.
Save the SegArray object to Parquet. The result is a collection of files,
one file per locale of the arkouda server, where each filename starts
with prefix_path. Each locale saves its chunk of the object to its
corresponding file.
Parameters:
prefix_path (str) – Directory and filename prefix that all output files share
dataset (str) – Name of the dataset to create in files (must not already exist)
mode ({'truncate', 'append'}) – Deprecated.
Parameter kept to maintain functionality of other calls. Only Truncate
supported.
By default, truncate (overwrite) output files, if they exist.
If ‘append’, attempt to create new dataset in existing files.
compression (str (Optional)) – (None | “snappy” | “gzip” | “brotli” | “zstd” | “lz4”)
Sets the compression type used with Parquet files
Return type:
string message indicating result of save operation
Raises:
RuntimeError – Raised if a server-side error is thrown saving the pdarray
ValueError – If write mode is not Truncate.
Notes
Append mode for Parquet has been deprecated. It was not implemented for SegArray.
The prefix_path must be visible to the arkouda server and the user must
have write permission.
- Output files have names of the form <prefix_path>_LOCALE<i>, where <i>
ranges from 0 to numLocales for file_type=’distribute’.
- If any of the output files already exist and
the mode is ‘truncate’, they will be overwritten. If the mode is ‘append’
and the number of output files is less than the number of locales or a
dataset with the same name already exists, a RuntimeError will result.
- Any file extension can be used.The file I/O does not rely on the extension to
determine the file format.
Send a Segmented Array to a different Arkouda server.
Parameters:
hostname (str) – The hostname where the Arkouda server intended to
receive the Segmented Array is running.
port (int_scalars) – The port to send the array over. This needs to be an
open port (i.e., not one that the Arkouda server is
running on). This will open up numLocales ports,
each of which in succession, so will use ports of the
range {port..(port+numLocales)} (e.g., running an
Arkouda server of 4 nodes, port 1234 is passed as
port, Arkouda will use ports 1234, 1235, 1236,
and 1237 to send the array data).
This port much match the port passed to the call to
ak.receive_array().
Return type:
A message indicating a complete transfer
Raises:
ValueError – Raised if the op is not within the pdarray.BinOps set
TypeError – Raised if other is not a pdarray or the pdarray.dtype is not
a supported dtype
Overwrite the dataset with the name provided with this SegArray object. If
the dataset does not exist it is added.
Parameters:
prefix_path (str) – Directory and filename prefix that all output files share
dataset (str) – Name of the dataset to create in files
repack (bool) – Default: True
HDF5 does not release memory on delete. When True, the inaccessible
data (that was overwritten) is removed. When False, the data remains, but is
inaccessible. Setting to false will yield better performance, but will cause
file sizes to expand.
Raises:
RuntimeError – Raised if a server-side error is thrown saving the SegArray
Notes
If file does not contain File_Format attribute to indicate how it was saved,
the file name is checked for _LOCALE#### to determine if it is distributed.
If the dataset provided does not exist, it will be added
Because HDF5 deletes do not release memory, this will create a copy of the
file with the new data
D.update([E, ]**F) -> None. Update D from mapping/iterable E and F.
If E is present and has a .keys() method, then does: for k in E.keys(): D[k] = E[k]
If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v
In either case, this is followed by: for k in F: D[k] = F[k]
Represents an array of strings whose data resides on the
arkouda server. The user should not call this class directly;
rather its instances are created by other arkouda functions.
Strings is composed of two pdarrays: (1) offsets, which contains the
starting indices for each string and (2) bytes, which contains the
raw bytes of all strings, delimited by nulls.
>>> importarkoudaasak>>> strings=ak.array([f'StrINgS aRe Here {i}'foriinrange(5)])>>> stringsarray(['StrINgS aRe Here 0', 'StrINgS aRe Here 1', 'StrINgS aRe Here 2', 'StrINgS aRe Here 3', 'StrINgS aRe Here 4'])>>> strings.title()array(['Strings Are Here 0', 'Strings Are Here 1', 'Strings Are Here 2', 'Strings Are Here 3', 'Strings Are Here 4'])
Check whether each element contains the given substring.
Parameters:
substr (bytes or str_scalars) – The substring in the form of string or byte array to search for
regex (bool, default=False) – Indicates whether substr is a regular expression
Note: only handles regular expressions supported by re2
(does not support lookaheads/lookbehinds)
Returns:
True for elements that contain substr, False otherwise
Check whether each element ends with the given substring.
Parameters:
substr (bytes or str_scalars) – The suffix to search for
regex (bool, default=False) – Indicates whether substr is a regular expression
Note: only handles regular expressions supported by re2
(does not support lookaheads/lookbehinds)
Returns:
True for elements that end with substr, False otherwise
Assemble a Strings object from separate offset and bytes arrays.
This factory method constructs a segmented Strings array by sending two
separate components—offsets and values—to the Arkouda server and instructing
it to assemble them into a single Strings object. Use this when offsets
and byte data are created or transported independently.
Parameters:
offset_attrib (pdarray or str) – The array of starting positions for each string, or a string
expression that can be passed to create_pdarray to build it.
bytes_attrib (pdarray or str) – The array of raw byte values (e.g., uint8 character codes), or a string
expression that can be passed to create_pdarray to build it.
Returns:
A Strings object representing the assembled segmented strings array
on the Arkouda server.
Create a Strings object from an Arkouda server response message.
Parse the server’s response descriptor and construct a Strings array
with its underlying pdarray and total byte size.
Parameters:
rep_msg (str) – Server response message of the form:
`created<name><type><size><ndim><shape><itemsize>+...bytes.size<total_bytes>`
For example:
`"createdfooStrings31(3,)8+createdbytes.size24"`
Returns:
A Strings object representing the segmented strings array on the server,
initialized with the returned pdarray and byte-size metadata.
return_origins (bool, default=True) – If True, return a logical index indicating which strings
were long enough to return an n-prefix
proper (bool, default=True) – If True, only return proper prefixes, i.e. from strings
that are at least n+1 long. If False, allow the entire
string to be returned as a prefix.
Returns:
prefixesStrings
The array of n-character prefixes; the number of elements is the number of
True values in the returned mask.
origin_indicespdarray, bool
Boolean array that is True where the string was long enough to return
an n-character prefix, False otherwise.
return_origins (bool, default=True) – If True, return a logical index indicating which strings
were long enough to return an n-suffix
proper (bool, default=True) – If True, only return proper suffixes, i.e. from strings
that are at least n+1 long. If False, allow the entire
string to be returned as a suffix.
Returns:
suffixesStrings
The array of n-character suffixes; the number of elements is the number of
True values in the returned mask.
origin_indicespdarray, bool
Boolean array that is True where the string was long enough to return
an n-character suffix, False otherwise.
Return the permutation that groups the array, placing equivalent
strings together. All instances of the same string are guaranteed to lie
in one contiguous block of the permuted array, but the blocks are not
necessarily ordered.
If the arkouda server is compiled with “-sSegmentedString.useHash=true”,
then arkouda uses 128-bit hash values to group strings, rather than sorting
the strings directly. This method is fast, but the resulting permutation
merely groups equivalent strings and does not sort them. If the “useHash”
parameter is false, then a full sort is performed.
Raises:
RuntimeError – Raised if there is a server-side error in executing group request or
creating the pdarray encapsulating the return message
The implementation uses SipHash128, a fast and balanced hash function (used
by Python for dictionaries and sets). For realistic numbers of strings (up
to about 10**15), the probability of a collision between two 128-bit hash
values is negligible.
Return a boolean pdarray where index i indicates whether string i of the
Strings is alphabetic. This means there is at least one character,
and all the characters are alphabetic.
Returns:
True for elements that are alphabetic, False otherwise
Return a boolean pdarray where index i indicates whether string i of the
Strings has all numeric characters. There are 1922 unicode characters that
qualify as numeric, including the digits 0 through 9, superscripts and
subscripted digits, special characters with the digits encircled or
enclosed in parens, “vulgar fractions,” and more.
Returns:
True for elements that are numerics, False otherwise
Join the strings from another array onto the left of the strings
of this array, optionally inserting a delimiter.
Warning: This function is experimental and not guaranteed to work.
Parameters:
other (Strings) – The strings to join onto self’s strings
delimiter (bytes or str_scalars, default="") – String inserted between self and other
Peel off one or more delimited fields from each string (similar
to string.partition), returning two new arrays of strings.
Warning: This function is experimental and not guaranteed to work.
Parameters:
delimiter (bytes or str_scalars) – The separator where the split will occur
times (int_scalars, default=1) – The number of times the delimiter is sought, i.e. skip over
the first (times-1) delimiters
includeDelimiter (bool, default=False) – If true, append the delimiter to the end of the first return
array. By default, it is prepended to the beginning of the
second return array.
keepPartial (bool, default=False) – If true, a string that does not contain <times> instances of
the delimiter will be returned in the first array. By default,
such strings are returned in the second array.
fromRight (bool, default=False) – If true, peel from the right instead of the left (see also rpeel)
regex (bool, default=False) – Indicates whether delimiter is a regular expression
Note: only handles regular expressions supported by re2
(does not support lookaheads/lookbehinds)
Returns:
left: Strings
The field(s) peeled from the end of each string (unless
fromRight is true)
right: Strings
The remainder of each string after peeling (unless fromRight
is true)
TypeError – Raised if the delimiter parameter is not byte or str_scalars, if
times is not int64, or if includeDelimiter, keepPartial, or
fromRight is not bool
ValueError – Raised if times is < 1 or if delimiter is not a valid regex
RuntimeError – Raised if there is a server-side error thrown
Register this Strings object with a user defined name in the arkouda server
so it can be attached to later using Strings.attach().
This is an in-place operation, registering a Strings object more than once will
update the name in the registry and remove the previously registered name.
A name can only be registered to one object at a time.
Parameters:
user_defined_name (str) – user defined name which the Strings object is to be registered under
Returns:
The same Strings object which is now registered with the arkouda server and
has an updated name.
This is an in-place modification, the original is returned to support a
fluid programming style.
Please note you cannot register two different objects with the same name.
TypeError – Raised if user_defined_name is not a str
RegistrationError – If the server was unable to register the Strings object with the user_defined_name
If the user is attempting to register more than one object with the same name,
the former should be unregistered first to free up the registration name.
Peel off one or more delimited fields from the end of each string
(similar to string.rpartition), returning two new arrays of strings.
Warning: This function is experimental and not guaranteed to work.
Parameters:
delimiter (bytes or str_scalars) – The separator where the split will occur
times (int_scalars, default=1) – The number of times the delimiter is sought, i.e. skip over
the last (times-1) delimiters
includeDelimiter (bool, default=False) – If true, prepend the delimiter to the start of the first return
array. By default, it is appended to the end of the
second return array.
keepPartial (bool, default=False) – If true, a string that does not contain <times> instances of
the delimiter will be returned in the second array. By default,
such strings are returned in the first array.
regex (bool, default=False) – Indicates whether delimiter is a regular expression
Note: only handles regular expressions supported by re2
(does not support lookaheads/lookbehinds)
Returns:
left: Strings
The remainder of the string after peeling
right: Strings
The field(s) that were peeled from the right of each string
Return a match object with the first location in each element where pattern produces a match.
Elements match if any part of the string matches the regular expression pattern.
Parameters:
pattern (bytes or str_scalars) – Regex used to find matches
Returns:
Match object where elements match if any part of the string matches the
regular expression pattern
Unpack delimiter-joined substrings into a flat array.
Parameters:
delimiter (str) – Characters used to split strings into substrings
return_segments (bool, default=False) – If True, also return mapping of original strings to first substring
in return array.
regex (bool, default=False) – Indicates whether delimiter is a regular expression
Note: only handles regular expressions supported by re2
(does not support lookaheads/lookbehinds)
Returns:
Strings
Flattened substrings with delimiters removed
pdarray, int64 (optional)
For each original string, the index of first corresponding substring
in the return array
Check whether each element starts with the given substring.
Parameters:
substr (bytes or str_scalars) – The prefix to search for
regex (bool, default=False) – Indicates whether substr is a regular expression
Note: only handles regular expressions supported by re2
(does not support lookaheads/lookbehinds)
Returns:
True for elements that start with substr, False otherwise
Join the strings from another array onto one end of the strings
of this array, optionally inserting a delimiter.
Warning: This function is experimental and not guaranteed to work.
Parameters:
other (Strings) – The strings to join onto self’s strings
delimiter (bytes or str_scalars, default="") – String inserted between self and other
toLeft (bool, default=False) – If true, join other strings to the left of self. By default,
other is joined to the right of self.
Return a new Strings object with all leading and trailing occurrences of characters contained
in chars removed. The chars argument is a string specifying the set of characters to be removed.
If omitted, the chars argument defaults to removing whitespace. The chars argument is not a
prefix or suffix; rather, all combinations of its values are stripped.
Parameters:
chars (bytes or str_scalars, optional) – the set of characters to be removed
Returns:
Strings object with the leading and trailing characters matching the set of characters in
the chars argument removed
Return new Strings obtained by replacing non-overlapping occurrences of pattern with the
replacement repl.
If count is nonzero, at most count substitutions occur.
Parameters:
pattern (bytes or str_scalars) – The regex to substitue
repl (bytes or str_scalars) – The substring to replace pattern matches with
count (int, default=0) – The max number of pattern match occurences in each element to replace.
The default count=0 replaces all occurences of pattern with repl
Perform the same operation as sub(), but return a tuple (new_Strings, number_of_substitions).
Parameters:
pattern (bytes or str_scalars) – The regex to substitue
repl (bytes or str_scalars) – The substring to replace pattern matches with
count (int, default=0) – The max number of pattern match occurences in each element to replace.
The default count=0 replaces all occurences of pattern with repl
Returns:
Strings
Strings with pattern matches replaced
pdarray, int64
The number of substitutions made for each element of Strings
When axis is not None, this function does the same thing as “fancy” indexing (indexing arrays
using arrays); however, it can be easier to use if you need elements along a given axis.
A call such as np.take(arr,indices,axis=3) is equivalent to arr[:,:,:,indices,...].
Parameters:
indices (numeric_scalars or pdarray) – The indices of the values to extract. Also allow scalars for indices.
axis (int, optional) – The axis over which to select values. By default, the flattened input array is used.
Write Strings to CSV file(s). File will contain a single column with the Strings data.
All CSV Files written by Arkouda include a header denoting data types of the columns.
Unlike other file formats, CSV files store Strings as their UTF-8 format instead of storing
bytes as uint(8).
Parameters:
prefix_path (str) – The filename prefix to be used for saving files. Files will have _LOCALE#### appended
when they are written to disk.
dataset (str, default="strings_array") – Column name to save the Strings under. Defaults to “strings_array”.
col_delim (str, default=",") – Defaults to “,”. Value to be used to separate columns within the file.
Please be sure that the value used DOES NOT appear in your dataset.
overwrite (bool, default=False) – Defaults to False. If True, any existing files matching your provided prefix_path will
be overwritten. If False, an error will be returned if existing files are found.
Returns:
response message
Return type:
str
Raises:
ValueError – Raised if all datasets are not present in all parquet files or if one or
more of the specified files do not exist
RuntimeError – Raised if one or more of the specified files cannot be opened.
If allow_errors is true this may be raised if no values are returned
from the server.
TypeError – Raised if we receive an unknown arkouda_type returned from the server
Notes
CSV format is not currently supported by load/load_all operations
The column delimiter is expected to be the same for column names and data
Be sure that column delimiters are not found within your data.
All CSV files must delimit rows using newline (\\n) at this time.
Save the Strings object to HDF5.
The object can be saved to a collection of files or single file.
Parameters:
prefix_path (str) – Directory and filename prefix that all output files share
dataset (str, default="strings_array") – The name of the Strings dataset to be written, defaults to strings_array
mode ({"truncate", "append"}, default = "truncate") – By default, truncate (overwrite) output files, if they exist.
If ‘append’, create a new Strings dataset within existing files.
save_offsets (bool, default=True) – Defaults to True which will instruct the server to save the offsets array to HDF5
If False the offsets array will not be save and will be derived from the string values
upon load/read.
file_type ({"single", "distribute"}, default = "distribute") – Default: Distribute
Distribute the dataset over a file per locale.
Single file will save the dataset to one file
Returns:
String message indicating result of save operation
Return type:
str
Raises:
RuntimeError – Raised if a server-side error is thrown saving the pdarray
Notes
Parquet files do not store the segments, only the values.
Strings state is saved as two datasets within an hdf5 group:
one for the string characters and one for the
segments corresponding to the start of each string
the hdf5 group is named via the dataset parameter.
The prefix_path must be visible to the arkouda server and the user must
have write permission.
Output files have names of the form <prefix_path>_LOCALE<i>, where <i>
ranges from 0 to numLocales for file_type=’distribute’. Otherwise,
the file name will be prefix_path.
If any of the output files already exist and
the mode is ‘truncate’, they will be overwritten. If the mode is ‘append’
and the number of output files is less than the number of locales or a
dataset with the same name already exists, a RuntimeError will result.
Any file extension can be used.The file I/O does not rely on the extension to
determine the file format.
Convert the array to a np.ndarray, transferring array data from the
arkouda server to Python. If the array exceeds a built-in size limit,
a RuntimeError is raised.
Returns:
A numpy ndarray with the same strings as this array
Return type:
np.ndarray
Notes
The number of bytes in the array cannot exceed ak.client.maxTransferBytes,
otherwise a RuntimeError will be raised. This is to protect the user
from overflowing the memory of the system on which the Python client
is running, under the assumption that the server is running on a
distributed system with much more memory than the client. The user
may override this limit by setting ak.client.maxTransferBytes to a larger
value, but proceed with caution.
Save the Strings object to Parquet. The result is a collection of files,
one file per locale of the arkouda server, where each filename starts
with prefix_path. Each locale saves its chunk of the array to its
corresponding file.
Parameters:
prefix_path (str) – Directory and filename prefix that all output files share
dataset (str, default="strings_array") – Name of the dataset to create in files (must not already exist)
mode ({"truncate", "append"}, default = "truncate") – By default, truncate (overwrite) output files, if they exist.
If ‘append’, attempt to create new dataset in existing files.
compression ({"snappy", "gzip", "brotli", "zstd", "lz4"}, optional) – Sets the compression type used with Parquet files
Returns:
string message indicating result of save operation
Return type:
str
Raises:
RuntimeError – Raised if a server-side error is thrown saving the pdarray
Notes
The prefix_path must be visible to the arkouda server and the user must
have write permission.
- Output files have names of the form <prefix_path>_LOCALE<i>, where <i>
ranges from 0 to numLocales for file_type=’distribute’.
- ‘append’ write mode is supported, but is not efficient.
- If any of the output files already exist and
the mode is ‘truncate’, they will be overwritten. If the mode is ‘append’
and the number of output files is less than the number of locales or a
dataset with the same name already exists, a RuntimeError will result.
- Any file extension can be used.The file I/O does not rely on the extension to
determine the file format.
Convert the SegString to a list, transferring data from the
arkouda server to Python. If the SegString exceeds a built-in size limit,
a RuntimeError is raised.
Returns:
A list with the same strings as this SegString
Return type:
List[str]
Notes
The number of bytes in the array cannot exceed ak.client.maxTransferBytes,
otherwise a RuntimeError will be raised. This is to protect the user
from overflowing the memory of the system on which the Python client
is running, under the assumption that the server is running on a
distributed system with much more memory than the client. The user
may override this limit by setting ak.client.maxTransferBytes to a larger
value, but proceed with caution.
Send a Strings object to a different Arkouda server.
Parameters:
hostname (str) – The hostname where the Arkouda server intended to
receive the Strings object is running.
port (int_scalars) – The port to send the array over. This needs to be an
open port (i.e., not one that the Arkouda server is
running on). This will open up numLocales ports,
each of which in succession, so will use ports of the
range {port..(port+numLocales)} (e.g., running an
Arkouda server of 4 nodes, port 1234 is passed as
port, Arkouda will use ports 1234, 1235, 1236,
and 1237 to send the array data).
This port much match the port passed to the call to
ak.receive_array().
Returns:
A message indicating a complete transfer
Return type:
str
Raises:
ValueError – Raised if the op is not within the pdarray.BinOps set
TypeError – Raised if other is not a pdarray or the pdarray.dtype is not
a supported dtype
Overwrite the dataset with the name provided with this Strings object.
If the dataset does not exist it is added.
Parameters:
prefix_path (str) – Directory and filename prefix that all output files share
dataset (str, default="strings_array") – Name of the dataset to create in files
save_offsets (bool, default=True) – Defaults to True which will instruct the server to save the offsets array to HDF5
If False the offsets array will not be save and will be derived from the string values
upon load/read.
repack (bool, default=True) – Default: True
HDF5 does not release memory on delete. When True, the inaccessible
data (that was overwritten) is removed. When False, the data remains, but is
inaccessible. Setting to false will yield better performance, but will cause
file sizes to expand.
Returns:
success message if successful
Return type:
str
Raises:
RuntimeError – Raised if a server-side error is thrown saving the Strings object
Notes
If file does not contain File_Format attribute to indicate how it was saved,
the file name is checked for _LOCALE#### to determine if it is distributed.
If the dataset provided does not exist, it will be added
Represents a duration, the difference between two dates or times.
Timedelta is the Arkouda equivalent of pandas.TimedeltaIndex.
Parameters:
pda (int64 pdarray, pd.TimedeltaIndex, pd.Series, or np.timedelta64 array)
unit (str, default 'ns') –
For int64 pdarray, denotes the unit of the input. Ignored for pandas
and numpy arrays, which carry their own unit. Not case-sensitive;
prefixes of full names (like ‘sec’) are accepted.
Possible values:
’weeks’ or ‘w’
’days’ or ‘d’
’hours’ or ‘h’
’minutes’, ‘m’, or ‘t’
’seconds’ or ‘s’
’milliseconds’, ‘ms’, or ‘l’
’microseconds’, ‘us’, or ‘u’
’nanoseconds’, ‘ns’, or ‘n’
Unlike in pandas, units cannot be combined or mixed with integers
Notes
The .values attribute is always in nanoseconds with int64 dtype.
Register this Timedelta object and underlying components with the Arkouda server.
Parameters:
user_defined_name (str) – user defined name the timedelta is to be registered under,
this will be the root name for underlying components
Returns:
The same Timedelta which is now registered with the arkouda server and has an updated name.
This is an in-place modification, the original is returned to support
a fluid programming style.
Please note you cannot register two different Timedeltas with the same name.
Return sum of array elements along the given axis.
Parameters:
axis (int, Tuple[int, ...], optional, default = None) – The axis or axes along which to do the operation
If None, the computation is done across the entire array.
keepdims (bool, optional, default = False) – Whether to keep the singleton dimension(s) along axis in the result.
Returns:
numeric_scalars if axis is omitted, in which case operation is done over entire array
pdarray if axis is supplied, in which case the operation is done along that axis
The bool type is not a subclass of the int_ type
(the bool is not even a number type). This is different
than Python’s default implementation of bool as a
sub-class of int.
Returns True if all elements of a and b are equal within a tolerance.
This function compares two arrays elementwise and returns True if they are
equal within the tolerance defined by the parameters rtol and atol.
The comparison uses the formula: absolute(a - b) <= (atol + rtol * absolute(b))
arr (pdarray) – Values are appended to a copy of this array.
values (pdarray) – These values are appended to a copy of arr.
It must be of the correct shape (the same shape as arr, excluding axis).
If axis is not specified, values can be any shape and will be flattened before use.
axis (Optional[int], default=None) – The axis along which values are appended.
If axis is not given, both arr and values are flattened before use.
Returns:
A copy of arr with values appended to axis.
Note that append does not occur in-place: a new array is allocated and filled.
If axis is None, out is a flattened array.
Create a pdarray of consecutive integers within the interval [start, stop).
Called as: arange([start,] stop[, step,] dtype=int64).
If only one arg is given then arg is the stop parameter. If two args are
given, then the first arg is start and second is stop. If three args are
given, then the first arg is start, second is stop, third is step.
step (int_scalars, optional) – if one of these three is supplied, it’s used as stop, and start = 0, step = 1
if two of them are supplied, start = start, stop = stop, step = 1
if all three are supplied, start = start, stop = stop, step = step
dtype (np.dtype, type, or str) – The target dtype to cast values to
max_bits (int) – Specifies the maximum number of bits; only used for bigint pdarrays
Returns:
Integers from start (inclusive) to stop (exclusive) by step
Negative steps result in decreasing values. Currently, only int64
pdarrays can be created with this method. For float64 arrays, use
the linspace method.
where (bool or pdarray, default=True) – This condition is broadcast over the input. At locations where the condition is True,
the inverse cosine will be applied to the corresponding value. Elsewhere, it will retain
its original value. Default set to True.
Returns:
A pdarray containing inverse cosine for each element
of the original pdarray
where (bool or pdarray, default=True) – This condition is broadcast over the input. At locations where the condition is True,
the inverse hyperbolic cosine will be applied to the corresponding value. Elsewhere, it will
retain its original value. Default set to True.
Returns:
A pdarray containing inverse hyperbolic cosine for each element
of the original pdarray
where (bool or pdarray, default=True) – This condition is broadcast over the input. At locations where the condition is True,
the inverse sine will be applied to the corresponding value. Elsewhere, it will retain
its original value. Default set to True.
Returns:
A pdarray containing inverse sine for each element
of the original pdarray
where (bool or pdarray, default=True) – This condition is broadcast over the input. At locations where the condition is True,
the inverse hyperbolic sine will be applied to the corresponding value. Elsewhere, it will retain
its original value. Default set to True.
Returns:
A pdarray containing inverse hyperbolic sine for each element
of the original pdarray
where (bool or pdarray, default=True) – This condition is broadcast over the input. At locations where the condition is True,
the inverse tangent will be applied to the corresponding value. Elsewhere, it will retain
its original value. Default set to True.
Returns:
A pdarray containing inverse tangent for each element
of the original pdarray
Return the element-wise inverse tangent of the array pair. The result chosen is the
signed angle in radians between the ray ending at the origin and passing through the
point (1,0), and the ray ending at the origin and passing through the point (denom, num).
The result is between -pi and pi.
where (bool or pdarray, default=True) – This condition is broadcast over the input. At locations where the condition is True,
the inverse tangent will be applied to the corresponding values. Elsewhere, it will retain
its original value. Default set to True.
Returns:
A pdarray containing inverse tangent for each corresponding element pair
of the original pdarray, using the signed values or the numerator and
denominator to get proper placement on unit circle.
where (bool or pdarray, default=True) – This condition is broadcast over the input. At locations where the condition is True,
the inverse hyperbolic tangent will be applied to the corresponding value. Elsewhere,
it will retain its original value. Default set to True.
Returns:
A pdarray containing inverse hyperbolic tangent for each element
of the original pdarray
TypeError – Raised if pda is not a pdarray or k is not an integer
ValueError – Raised if the pda is empty, or pda.ndim > 1, or k < 1
Notes
This call is equivalent in value to ak.argsort(a)[k:]
and generally outperforms this operation.
This reduction will see a significant drop in performance as k grows
beyond a certain value. This value is system dependent, but generally
about a k of 5 million is where performance degradation has been observed.
TypeError – Raised if pda is not a pdarray or k is not an integer
ValueError – Raised if the pda is empty, or pda.ndim > 1, or k < 1
Notes
This call is equivalent in value to ak.argsort(a)[:k]
and generally outperforms this operation.
This reduction will see a significant drop in performance as k grows
beyond a certain value. This value is system dependent, but generally
about a k of 5 million is where performance degradation has been observed.
Convert a Python, NumPy, or Arkouda array-like into a pdarray or Strings object,
transferring data to the Arkouda server.
Parameters:
a (Union[pdarray, np.ndarray, Iterable, Strings]) – The array-like input to convert. Supported types include Arkouda Strings, pdarray,
NumPy ndarray, or Python iterables such as list, tuple, range, or deque.
dtype (Union[np.dtype, type, str], optional) – The target dtype to cast values to. This may be a NumPy dtype object,
a NumPy scalar type (e.g. np.int64), or a string (e.g. ‘int64’, ‘str’).
copy (bool, default=False) – If True, a deep copy of the array is made. If False, no copy is made if the input
is already a pdarray. Note: Arkouda does not currently support views or shallow copies.
This differs from NumPy. Also, the default (False) is chosen to reduce performance overhead.
max_bits (int, optional) – The maximum number of bits for bigint arrays. Ignored for other dtypes.
Returns:
A pdarray stored on the Arkouda server, or a Strings object.
Arkouda does not currently support shallow copies or views; all copies are deep.
The number of bytes transferred to the server is limited by ak.client.maxTransferBytes.
This prevents saturating the network during large transfers. To increase this limit,
set ak.client.maxTransferBytes to a larger value manually.
If the input is a Unicode string array (dtype.kind == ‘U’ or dtype=’str’),
this function recursively creates a Strings object from two internal `pdarray`s
(one for offsets and one for concatenated string bytes).
Compares two pdarrays for equality.
If neither array has any nan elements, then if all elements are pairwise equal,
it returns True.
If equal_Nan is False, then any nan element in either array gives a False return.
If equal_Nan is True, then pairwise-corresponding nans are considered equal.
Attach a previously created Arkouda object by its registered name.
This function retrieves an Arkouda object (e.g., pdarray, DataFrame,
Series, etc.) associated with a given name. It returns the corresponding
object based on the type of object stored under that name.
Parameters:
name (str) – The name of the object to attach.
Returns:
The Arkouda object associated with the given name. The returned object
could be of any supported type, such as pdarray, DataFrame, Series,
etc.
Return type:
object
Raises:
ValueError – If the object type in the response message does not match any known types.
Dtype sentinel for Arkouda’s variable-width (arbitrary-precision) integers.
This class represents the dtype object used by Arkouda arrays storing
arbitrary-precision integers. It behaves similarly to NumPy’s dtype
objects (such as np.int64), but corresponds to an unbounded,
variable-width integer type. Instances of bigint are singletons
created through __new__ so that all dtype references share the same
object.
Construction semantics follow NumPy’s pattern: calling ak.bigint()
returns the dtype sentinel, and calling ak.bigint(value) returns a
bigint_ scalar constructed from value.
bigint instances compare equal to the bigint class, to themselves,
to strings such as "bigint", and to other dtype-like objects whose
name attribute equals "bigint". This allows interoperability
across Arkouda’s dtype-resolution system.
Notes
This class represents only the dtype. Scalar values use bigint_,
which inherits from int.
Scalar type for Arkouda’s variable-width integer dtype.
bigint_ represents an individual arbitrary-precision integer value within
Arkouda arrays that use the bigint dtype. It inherits directly from
Python’s built-in int, ensuring full arbitrary-precision semantics
while providing dtype metadata compatible with Arkouda.
This class is typically constructed indirectly via ak.bigint(value),
which invokes the _BigIntMeta metaclass. Direct instantiation is
also supported.
Return the underlying Python int value, matching NumPy scalar
semantics.
Notes
bigint_ values behave exactly like Python int objects in arithmetic,
hashing, comparison, and formatting. Arkouda arrays wrap and distribute many
such scalars but do not impose fixed-width limits.
Create a bigint pdarray from an iterable of uint pdarrays.
The first item in arrays will be the highest 64 bits and
the last item will be the lowest 64 bits.
Parameters:
arrays (Sequence[pdarray]) – An iterable of uint pdarrays used to construct the bigint pdarray.
The first item in arrays will be the highest 64 bits and
the last item will be the lowest 64 bits.
max_bits (int) – Specifies the maximum number of bits; only used for bigint pdarrays
Return the binary representation of the input number as a string.
For negative numbers, if width is not given, a minus sign is added to the
front. If width is given, the two’s complement of the number is
returned, with respect to that width.
In a two’s-complement system negative numbers are represented by the two’s
complement of the absolute value. This is the most common method of
representing signed integers on computers [1]_. A N-bit two’s-complement
system can represent every integer in the range
\(-2^{N-1}\) to \(+2^{N-1}-1\).
Parameters:
num (int) – Only an integer decimal number can be used.
width (int, optional) – The length of the returned string if num is positive, or the length
of the two’s complement if num is negative, provided that width is
at least a sufficient number of bits for num to be represented in
the designated form. If the width value is insufficient, an error is
raised.
Returns:
bin – Binary representation of num or two’s complement of num.
The bool type is not a subclass of the int_ type
(the bool is not even a number type). This is different
than Python’s default implementation of bool as a
sub-class of int.
x (int, pdarray) – The int or array to be broadcast.
shape (int, Tuple[int, ...]) – The shape to which the array is to be broadcast.
Notes
If x and shape are both integers, the result has shape (shape,).
If x is an int and shape is a tuple, the result has shape (shape,).
if x is a pdarray and shape is an int, then if x.shape == (shape,)
x is unchanged. Otherwise a ValueError is raised.
If x is a pdarray and shape is a tuple, then x is broadcast to shape, if possible.
Returns:
A new array which is x broadcast to the provided shape.
Determine whether a value of one dtype can be safely cast to another,
following NumPy-like rules but including Arkouda-specific handling for
bigint and bigint_.
bigint → fixed-width signed/unsigned integers: not allowed, due to
potential overflow.
int64 / uint64 → bigint: allowed (widening).
float → bigint: not allowed (information loss).
All other cases fall back to numpy.can_cast semantics.
Parameters:
from_dt (Any) – Source dtype or scalar-like object.
to_dt (Any) – Target dtype or scalar-like object.
casting (str, optional) – Casting rule, matching NumPy’s can_cast API. Only "safe"
is currently implemented. Other values are accepted for API
compatibility but routed through the same logic.
The cast is performed according to Chapel’s casting rules and is NOT safe
from overflows or underflows. The user must ensure that the target dtype
has the precision and capacity to hold the desired result.
where (bool or pdarray, default=True) – This condition is applied over the input. At locations where the condition is True, the
corresponding value will be acted on by the function. Elsewhere, it will retain its
original value. Default set to True.
Returns:
A pdarray containing ceiling values of the input array elements
hi (numeric_scalars or pdarray) – the higher value of the clipping range
If lo or hi (or both) are pdarrays, the check is by pairwise elements.
See examples.
Returns:
A pdarray matching pda, except that element x remains x if lo <= x <= hi,
Either lo or hi may be None, but not both.
If lo > hi, all x = hi.
If all inputs are int64, output is int64, but if any input is float64, output is float64.
Return the permutation that groups the rows (left-to-right), if the
input arrays are treated as columns. The permutation sorts numeric
columns, but not Strings or Categoricals — those are grouped, not ordered.
Parameters:
arrays (Sequence of Strings, pdarray, or Categorical) – The columns (int64, uint64, float64, Strings, or Categorical) to sort by row.
algorithm (SortingAlgorithm, default=SortingAlgorithm.RadixSortLSD) – The algorithm to be used for sorting the arrays.
ascending (bool, default=True) – Whether to sort in ascending order. Ignored when arrays have ndim > 1.
Returns:
The indices that permute the rows into grouped order.
Uses a least-significant-digit radix sort, which is stable and resilient
to non-uniformity in data but communication intensive. Starts with the
last array and moves forward.
For Strings, sorting is based on a hash. This ensures grouping of identical strings,
but not lexicographic order. For Categoricals, sorting is based on the internal codes.
axis (int, default = 0) – The axis along which the arrays will be joined.
If axis is None, arrays are flattened before use. Only for use with pdarray, and when
ordered is True. Default is 0.
ordered (bool) – If True (default), the arrays will be appended in the
order given. If False, array data may be interleaved
in blocks, which can greatly improve performance but
results in non-deterministic ordering of elements.
Returns:
Single pdarray or Strings object containing all values, returned in
the original order
where (bool or pdarray, default=True) – This condition is broadcast over the input. At locations where the condition is True,
the cosine will be applied to the corresponding value. Elsewhere, it will retain
its original value. Default set to True.
Returns:
A pdarray containing cosine for each element
of the original pdarray
where (bool or pdarray, default=True) – This condition is broadcast over the input. At locations where the condition is True,
the hyperbolic cosine will be applied to the corresponding value. Elsewhere, it will retain
its original value. Default set to True.
Returns:
A pdarray containing hyperbolic cosine for each element
of the original pdarray
Create a fixed frequency Datetime range. Alias for
ak.Datetime(pd.date_range(args)). Subject to size limit
imposed by client.maxTransferBytes.
Parameters:
start (str or datetime-like, optional) – Left bound for generating dates.
end (str or datetime-like, optional) – Right bound for generating dates.
periods (int, optional) – Number of periods to generate.
freq (str or DateOffset, default 'D') – Frequency strings can have multiples, e.g. ‘5H’. See
timeseries.offset_aliases for a list of
frequency aliases.
tz (str or tzinfo, optional) – Time zone name for returning localized DatetimeIndex, for example
‘Asia/Hong_Kong’. By default, the resulting DatetimeIndex is
timezone-naive.
normalize (bool, default False) – Normalize start/end dates to midnight before generating date range.
name (str, default None) – Name of the resulting DatetimeIndex.
inclusive ({"both", "neither", "left", "right"}, default "both") – Include boundaries. Whether to set each bound as closed or open.
**kwargs – For compatibility. Has no effect on the result.
Returns:
rng
Return type:
DatetimeIndex
Notes
Of the four parameters start, end, periods, and freq,
exactly three must be specified. If freq is omitted, the resulting
DatetimeIndex will have periods linearly spaced elements between
start and end (closed on both sides).
To learn more about the frequency strings, please see this link.
If created from a 64-bit integer, it represents an offset from
1970-01-01T00:00:00.
If created from string, the string can be in ISO 8601 date
or datetime format.
When parsing a string to create a datetime object, if the string contains
a trailing timezone (A ‘Z’ or a timezone offset), the timezone will be
dropped and a User Warning is given.
Datetime64 objects should be considered to be UTC and therefore have an
offset of +0000.
where (bool or pdarray, default=True) – This condition is broadcast over the input. At locations where the condition is True, the
corresponding value will be converted from degrees to radians. Elsewhere, it will retain its
original value. Default set to True.
Returns:
A pdarray containing an angle converted to radians, from degrees, for each element
of the original pdarray
obj (slice, int, Sequence of int, Sequence of bool, or pdarray) – The indices to remove from ‘arr’. If obj is a pdarray, it must
have an integer or bool dtype.
axis (Optional[int], optional) – The axis along which to remove elements. If None, the array will
be flattened before removing elements. Defaults to None.
Calculate the n-th discrete difference along the given axis.
The first difference is given by out[i]=a[i+1]-a[i] along the given axis,
higher differences are calculated by using diff iteratively.
Parameters:
a (pdarray) – The array to calculate the difference
n (int, optional) – The order of the finite difference. Default is 1.
axis (int, optional) – The axis along which to calculate the difference. Default is the last axis.
prepend (pdarray, optional) – The pdarray to prepend to a along axis before calculating the difference.
append (pdarray, optional) – The pdarray to append to a along axis before calculating the difference.
Returns:
The n-th differences. The shape of the output is the same as a
except along axis where the dimension is smaller by n. The
type of the output is the same as the type of the difference
between any two elements of a. This is the same as the type of
a in most cases. A notable exception is datetime64, which
results in a timedelta64 output array.
Type is preserved for boolean arrays, so the result will contain
False when consecutive elements are the same and True when they
differ.
For unsigned integer arrays, the results will also be unsigned. This
should not be surprising, as the result is consistent with
calculating the difference directly.
If this is not desirable, then the array should be cast to a larger
integer type first:
x (numeric_scalars(float_scalars, int_scalars) or pdarray) – The dividend array, the values that will be the numerator of the floordivision and will be
acted on by the bases for modular division.
where (Boolean or pdarray) – This condition is broadcast over the input. At locations where the condition is True, the
corresponding value will be divided using floor and modular division. Elsewhere, it will retain
its original value. Default set to True.
Returns:
Returns a tuple that contains quotient and remainder of the division
Return a pair of integers, whose ratio is exactly equal to the original
floating point number, and with a positive denominator.
Raise OverflowError on infinities and a ValueError on NaNs.
Normalize a dtype-like input into an Arkouda dtype sentinel or a NumPy dtype.
This function accepts many dtype-like forms—including Python scalars,
NumPy scalar types, Arkouda bigint sentinels, and strings—and resolves
them to the canonical Arkouda/NumPy dtype object. The resolution rules
include special handling of the bigint family and magnitude-aware routing
for Python integers.
Parameters:
x (Any) – The dtype-like object to normalize. May be a Python scalar, a NumPy
dtype or scalar, the bigint sentinel or scalar, a dtype-specifying
string, or any object accepted by numpy.dtype.
Raises:
TypeError – If x cannot be interpreted as either an Arkouda dtype or a NumPy
dtype. This includes cases where numpy.dtype(x) itself fails.
Return indices of query items in a search list of items.
Parameters:
query ((sequence of) array-like) – The items to search for. If multiple arrays, each “row” is an item.
space ((sequence of) array-like) – The set of items in which to search. Must have same shape/dtype as query.
all_occurrences (bool) – When duplicate terms are present in search space, if all_occurrences is True,
return all occurrences found as a SegArray, otherwise return only the first
occurrences as a pdarray. Defaults to only finding the first occurrence.
Finding all occurrences is not yet supported on sequences of arrays
remove_missing (bool) – If all_occurrences is True, remove_missing is automatically enabled.
If False, return -1 for any items in query not found in space. If True,
remove these and only return indices of items that are found.
Returns:
indices – For each item in query, its index in space. If all_occurrences is False,
the return will be a pdarray of the first index where each value in the
query appears in the space. If all_occurrences is True, the return will be
a SegArray containing every index where each value in the query appears in
the space. If all_occurrences is True, remove_missing is automatically enabled.
If remove_missing is True, exclude missing values, otherwise return -1.
Returns the dtype for which finfo returns information. For complex
input, the returned dtype is the associated float* dtype for its
real and complex components.
The difference between 1.0 and the next smallest representable float
larger than 1.0. For example, for 64-bit binary floats in the IEEE-754
standard, eps=2**-52, approximately 2.22e-16.
The difference between 1.0 and the next smallest representable float
less than 1.0. For example, for 64-bit binary floats in the IEEE-754
standard, epsneg=2**-53, approximately 1.11e-16.
For developers of NumPy: do not instantiate this at the module level.
The initial calculation of these parameters is expensive and negatively
impacts import times. These objects are cached, so calling finfo()
repeatedly inside your functions is not a problem.
Note that smallest_normal is not actually the smallest positive
representable value in a NumPy floating point type. As in the IEEE-754
standard [1]_, NumPy floating point types make use of subnormal numbers to
fill the gap between 0 and smallest_normal. However, subnormal numbers
may have significantly reduced precision [2].
This function can also be used for complex data types as well. If used,
the output will be the same as the corresponding real float type
(e.g. numpy.finfo(numpy.csingle) is the same as numpy.finfo(numpy.single)).
However, the output is true for the real and imaginary components.
Return a pair of integers, whose ratio is exactly equal to the original
floating point number, and with a positive denominator.
Raise OverflowError on infinities and a ValueError on NaNs.
Return a pair of integers, whose ratio is exactly equal to the original
floating point number, and with a positive denominator.
Raise OverflowError on infinities and a ValueError on NaNs.
Return a pair of integers, whose ratio is exactly equal to the original
floating point number, and with a positive denominator.
Raise OverflowError on infinities and a ValueError on NaNs.
where (bool or pdarray, default=True) – This condition is applied over the input. At locations where the condition is True, the
corresponding value will be acted on by the function. Elsewhere, it will retain its
original value. Default set to True.
Returns:
A pdarray containing floor values of the input array elements
TypeError – Raised if neither dividend nor divisor is a pdarray (at least one must be)
or if any scalar or pdarray element is not one of int, uint, float, bigint
Format a floating-point scalar as a decimal string in positional notation.
Provides control over rounding, trimming and padding. Uses and assumes
IEEE unbiased rounding. Uses the “Dragon4” algorithm.
Parameters:
x (python float or numpy floating scalar) – Value to format.
precision (non-negative integer or None, optional) – Maximum number of digits to print. May be None if unique is
True, but must be an integer if unique is False.
unique (boolean, optional) – If True, use a digit-generation strategy which gives the shortest
representation which uniquely identifies the floating-point number from
other values of the same type, by judicious rounding. If precision
is given fewer digits than necessary can be printed, or if min_digits
is given more can be printed, in which cases the last digit is rounded
with unbiased rounding.
If False, digits are generated as if printing an infinite-precision
value and stopping after precision digits, rounding the remaining
value with unbiased rounding
fractional (boolean, optional) – If True, the cutoffs of precision and min_digits refer to the
total number of digits after the decimal point, including leading
zeros.
If False, precision and min_digits refer to the total number of
significant digits, before or after the decimal point, ignoring leading
zeros.
trim (one of 'k', '.', '0', '-', optional) –
Controls post-processing trimming of trailing digits, as follows:
’k’ : keep trailing zeros, keep decimal point (no trimming)
’.’ : trim all trailing zeros, leave decimal point
’0’ : trim all but the zero before the decimal point. Insert the
zero if it is missing.
’-’ : trim trailing zeros and any trailing decimal point
sign (boolean, optional) – Whether to show the sign for positive values.
pad_left (non-negative integer, optional) – Pad the left side of the string with whitespace until at least that
many characters are to the left of the decimal point.
pad_right (non-negative integer, optional) – Pad the right side of the string with whitespace until at least that
many characters are to the right of the decimal point.
min_digits (non-negative integer or None, optional) –
Minimum number of digits to print. Only has an effect if unique=True
in which case additional digits past those necessary to uniquely
identify the value may be printed, rounding the last additional digit.
Added in version 1.21.0.
Returns:
rep – The string representation of the floating point value
Format a floating-point scalar as a decimal string in scientific notation.
Provides control over rounding, trimming and padding. Uses and assumes
IEEE unbiased rounding. Uses the “Dragon4” algorithm.
Parameters:
x (python float or numpy floating scalar) – Value to format.
precision (non-negative integer or None, optional) – Maximum number of digits to print. May be None if unique is
True, but must be an integer if unique is False.
unique (boolean, optional) – If True, use a digit-generation strategy which gives the shortest
representation which uniquely identifies the floating-point number from
other values of the same type, by judicious rounding. If precision
is given fewer digits than necessary can be printed. If min_digits
is given more can be printed, in which cases the last digit is rounded
with unbiased rounding.
If False, digits are generated as if printing an infinite-precision
value and stopping after precision digits, rounding the remaining
value with unbiased rounding
trim (one of 'k', '.', '0', '-', optional) –
Controls post-processing trimming of trailing digits, as follows:
’k’ : keep trailing zeros, keep decimal point (no trimming)
’.’ : trim all trailing zeros, leave decimal point
’0’ : trim all but the zero before the decimal point. Insert the
zero if it is missing.
’-’ : trim trailing zeros and any trailing decimal point
sign (boolean, optional) – Whether to show the sign for positive values.
pad_left (non-negative integer, optional) – Pad the left side of the string with whitespace until at least that
many characters are to the left of the decimal point.
exp_digits (non-negative integer, optional) – Pad the exponent with zeros until it contains at least this
many digits. If omitted, the exponent will be at least 2 digits.
min_digits (non-negative integer or None, optional) –
Minimum number of digits to print. This only has an effect for
unique=True. In that case more digits than necessary to uniquely
identify the value may be printed and rounded unbiased.
Added in version 1.21.0.
Returns:
rep – The string representation of the floating point value
Return a pair of integers, whose ratio is exactly equal to the original
floating point number, and with a positive denominator.
Raise OverflowError on infinities and a ValueError on NaNs.
full (bool, default=True) – This is only used when a single pdarray is passed into hash
By default, a 128-bit hash is computed and returned as
two int64 arrays. If full=False, then a 64-bit hash
is computed and returned as a single int64 array.
Returns:
If full=True or a list of pdarrays is passed,
a 2-tuple of pdarrays containing the high
and low 64 bits of each hash, respectively.
If full=False and a single pdarray is passed,
a single pdarray containing a 64-bit hash
Return type:
hashes
Raises:
TypeError – Raised if the parameter is not a pdarray
In the case of a single pdarray being passed, this function
uses the SIPhash algorithm, which can output either a 64-bit
or 128-bit hash. However, the 64-bit hash runs a significant
risk of collisions when applied to more than a few million
unique values. Unless the number of unique values is known to
be small, the 128-bit hash is strongly recommended.
Note that this hash should not be used for security, or for
any cryptographic application. Not only is SIPhash not
intended for such uses, but this implementation employs a
fixed key for the hash, which makes it possible for an
adversary with control over input to engineer collisions.
In the case of a list of pdrrays, Strings, Categoricals, or Segarrays
being passed, a non-linear function must be applied to each
array since hashes of subsequent arrays cannot be simply XORed
because equivalent values will cancel each other out, hence we
do a rotation by the ordinal of the array.
To plot, export the left edges and the histogram to NumPy
>>> b_np = b.to_ndarray()
>>> import numpy as np
>>> b_widths = np.diff(b_np)
>>> plt.bar(b_np[:-1], h.to_ndarray(), width=b_widths, align=’edge’, edgecolor=’black’)
<BarContainer object of 3 artists>
>>> plt.show() # doctest: +SKIP
Compute the bi-dimensional histogram of two data samples with evenly spaced bins.
Parameters:
x (pdarray) – A pdarray containing the x coordinates of the points to be histogrammed.
y (pdarray) – A pdarray containing the y coordinates of the points to be histogrammed.
bins (int_scalars or [int, int], default=10) – The number of equal-size bins to use.
If int, the number of bins for the two dimensions (nx=ny=bins).
If [int, int], the number of bins in each dimension (nx, ny = bins).
Defaults to 10
range (((x_min, x_max), (y_min, y_max)), optional) – The ranges of the values in x and y to count.
Values outside of these ranges are dropped.
By default, all values are counted.
Returns:
histpdarray
shape(nx, ny)
The bi-dimensional histogram of samples x and y.
Values in x are histogrammed along the first dimension and
values in y are histogrammed along the second dimension.
The x bins are evenly spaced in the interval [x.min(), x.max()]
and y bins are evenly spaced in the interval [y.min(), y.max()].
If range parameter is provided, the intervals are given
by range[0] for x and range[1] for y..
Compute the multidimensional histogram of data in sample with evenly spaced bins.
Parameters:
sample (Sequence of pdarray) – A sequence of pdarrays containing the coordinates of the points to be histogrammed.
bins (int_scalars or Sequence of int_scalars, default=10) – The number of equal-size bins to use.
If int, the number of bins for all dimensions (nx=ny=…=bins).
If [int, int, …], the number of bins in each dimension (nx, ny, … = bins).
Defaults to 10
range (Sequence[optional (min_val, max_val)], optional) – The ranges of the values to count for each array in sample.
Values outside of these ranges are dropped.
By default, all values are counted.
Returns:
histpdarray
shape(nx, ny, …, nd)
The multidimensional histogram of pdarrays in sample.
Values in first pdarray are histogrammed along the first dimension.
Values in second pdarray are histogrammed along the second dimension and so on.
edgesList[pdarray]
A list of pdarrays containing the bin edges for each dimension.
Stack arrays in sequence horizontally (column wise).
This is equivalent to concatenation along the second axis, except for 1-D arrays
where it concatenates along the first axis. Rebuilds arrays divided by hsplit.
This function makes most sense for arrays with up to 3 dimensions. For instance, for pixel-data
with a height (first axis), width (second axis), and r/g/b channels (third axis). The functions
concatenate, stack and block provide more general stacking and concatenation operations.
Parameters:
tup (sequence of pdarray) – The arrays must have the same shape along all but the second axis, except 1-D arrays which
can be any length. In the case of a single array_like input, it will be treated as a sequence of
arrays; i.e., each element along the zeroth axis is treated as a separate array.
dtype (str or type, optional) – If provided, the destination array will have this type.
casting ({‘no’, ‘equiv’, ‘safe’, ‘same_kind’, ‘unsafe’}, optional) – Controls what kind of data casting may occur. Defaults to ‘same_kind’. Currently unused.
B (list of pdarrays, pdarray, Strings, or Categorical) – The set of elements in which to test membership
assume_unique (bool, optional, defaults to False) – If true, assume rows of a and b are each unique and sorted.
By default, sort and unique them explicitly.
symmetric (bool, optional, defaults to False) – Return in1d(A, B), in1d(B, A) when A and B are single items.
invert (bool, optional, defaults to False) – If True, the values in the returned array are inverted (that is,
False where an element of A is in B and True otherwise).
Default is False. ak.in1d(a,b,invert=True) is equivalent
to (but is faster than) ~ak.in1d(a,b).
Returns:
True for each row in a that is contained in b
Return type:
groupable
Raises:
TypeError – Raised if either A or B is not a pdarray, Strings, or Categorical
object, or if both are pdarrays and either has rank > 1,
or if invert is not a bool
RuntimeError – Raised if the dtype of either array is not supported
in1d can be considered as an element-wise function version of the
python keyword in, for 1-D sequences. in1d(a,b) is logically
equivalent to ak.array([iteminbforitemina]), but is much
faster and scales to arbitrarily large a.
ak.in1d is not supported for bool or float64 pdarrays
Return indices of query items in a search list of items. Items not found will be excluded.
When duplicate terms are present in search space return indices of all occurrences.
Parameters:
query ((sequence of) pdarray or Strings or Categorical) – The items to search for. If multiple arrays, each “row” is an item.
space ((sequence of) pdarray or Strings or Categorical) – The set of items in which to search. Must have same shape/dtype as query.
Returns:
For each item in query that is found in space, its index in space.
Apply a function defined over intervals to an array of arguments.
Parameters:
keys (2-tuple of (sequences of) pdarrays) – Tuple of closed intervals expressed as (lower_bounds_inclusive, upper_bounds_inclusive).
Must have same dtype(s) as vals.
values (pdarray) – Function value to return for each entry in keys.
arguments ((sequences of) pdarray) – Values to search for in intervals. If multiple arrays, each “row” is an item.
fillvalue (scalar) – Default value to return when argument is not in any interval.
tiebreak ((optional) pdarray, numeric) – When an argument is present in more than one key interval, the interval with the
lowest tiebreak value will be chosen. If no tiebreak is given, the
first valid key interval will be chosen.
Returns:
Value of function corresponding to the keys interval
containing each argument, or fillvalue if argument not
in any interval.
Determine if the provided name is associated with a registered Arkouda object.
This function checks if the name is found in the registry of objects,
and optionally checks if it is registered as a component of a registered object.
Parameters:
name (str) – The name to check for in the registry.
as_component (bool, default=False) – When True, the function checks if the name is registered as a component
of a registered object (rather than as a standalone object).
Returns:
True if the name is found in the registry, False otherwise.
KeyError – If the registry query encounters an issue (e.g., invalid registry data or access issues).
Examples
>>> importarkoudaasak
Check if a name is registered as an object
>>> obj = ak.array([1, 2, 3])
>>> registered_obj = obj.register(“my_array”)
>>> result = ak.is_registered(“my_array”)
>>> print(result)
True
>>> registered_obj.unregister()
Check if a name is registered as a component
>>> result = ak.is_registered(“my_component”, as_component=True)
>>> print(result)
False
Return a boolean pdarray where index i indicates whether string i of the
Strings has all numeric characters. There are 1922 unicode characters that
qualify as numeric, including the digits 0 through 9, superscripts and
subscripted digits, special characters with the digits encircled or
enclosed in parens, “vulgar fractions,” and more.
Returns:
True for elements that are numerics, False otherwise
If you need a stricter way to identify a numerical scalar, use
isinstance(x,numbers.Number), as that returns False for most
non-numerical elements such as strings.
In most cases np.ndim(x)==0 should be used instead of this function,
as that will also return true for 0d arrays. This is how numpy overloads
functions in the style of the dx arguments to gradient and
the bins argument to histogram. Some key differences:
x
isscalar(x)
np.ndim(x)==0
PEP 3141 numeric objects
(including builtins)
True
True
builtin string and buffer objects
True
True
other builtin objects, like
pathlib.Path, Exception,
the result of re.compile
stop (Union[numeric_scalars, pdarray]) – The end value of the sequence, unless endpoint is set to False.
In that case, the sequence consists of all but the last of num+1
evenly spaced samples, so that stop is excluded. Note that the step
size changes when endpoint is False.
num (int, optional) – Number of samples to generate. Default is 50. Must be non-negative.
endpoint (bool, optional) – If True, stop is the last sample. Otherwise, it is not included.
Default is True.
dtype (dtype, optional) – Allowed for compatibility with numpy linspace, but anything entered
is ignored. The output is always ak.float64.
axis (int, optional) – The axis in the result to store the samples. Relevant only if start
or stop are array-like. By default (0), the samples will be along a
new axis inserted at the beginning. Use -1 to get an axis at the end.
Returns:
There are num equally spaced samples in the closed interval
[start,stop] or the half-open interval [start,stop)
(depending on whether endpoint is True or False).
TypeError – Raised if start or stop is not a float or a pdarray, or if num
is not an int, or if endpoint is not a bool, or if dtype is anything
other than None or float64, or axis is not an integer.
ValueError – Raised if axis is not a valid axis for the given data.
This function returns a boolean pdarray where each element is
the logical negation of the corresponding element in x. For
boolean arrays, this is equivalent to applying the unary ~
operator. For numeric arrays, zero is treated as False and
non-zero as True.
Parameters:
x (pdarray) – Input array on which to compute element-wise logical NOT.
Returns:
A boolean pdarray with the same shape as x containing
the result of the NOT operation applied element-wise.
This is a simplified version of numpy.logical_not(). It
currently does not support keyword arguments such as out or
where, and always allocates a new result array.
stop (Union[numeric_scalars, pdarray]) – The end value of the sequence, unless endpoint is set to False.
In that case, the sequence consists of all but the last of num+1
evenly spaced samples, so that stop is excluded. Note that the step
size changes when endpoint is False.
num (int, optional) – Number of samples to generate. Default is 50. Must be non-negative.
base (numeric_scalars, optional) – the base of the log space, defaults to 10.0.
endpoint (bool, optional) – If True, stop is the last sample. Otherwise, it is not included.
Default is True.
dtype (Union[None, float64]) – allowed for compatibility with numpy, but ignored. Outputs are always float
axis (int, optional) – The axis in the result to store the samples. Relevant only if start
or stop are array-like. By default (0), the samples will be along a
new axis inserted at the beginning. Use -1 to get an axis at the end.
Returns:
There are num equally spaced (logarithmically) samples in the closed interval
base**``[start, stop]`` or the half-open interval base**``[start, stop)``
(depending on whether endpoint is True or False).
TypeError – Raised if start or stop is not a float or a pdarray, or if num
is not an int, or if endpoint is not a bool, or if dtype is anything
other than None or float64, or axis is not an integer.
ValueError – Raised if axis is not a valid axis for the given data, or if base < 0.
Return a pair of integers, whose ratio is exactly equal to the original
floating point number, and with a positive denominator.
Raise OverflowError on infinities and a ValueError on NaNs.
Apply the function defined by the mapping keys –> values to arguments.
Parameters:
keys ((sequence of) array-like) – The domain of the function. Entries must be unique (if a sequence of
arrays is given, each row is treated as a tuple-valued entry).
values (pdarray) – The range of the function. Must be same length as keys.
arguments ((sequence of) array-like) – The arguments on which to evaluate the function. Must have same dtype
(or tuple of dtypes, for a sequence) as keys.
fillvalue (scalar) – The default value to return for arguments not in keys.
Returns:
evaluated – The result of evaluating the function over arguments.
While the values cannot be Strings (or other complex objects), the same
result can be achieved by passing an arange as the values, then using
the return as indices into the desired object.
Compute the product of two matrices.
If both are 1D, this returns a simple dot product.
If both are 2D, it returns a conventional matrix multiplication.
If only one is 1D, the result matches the “dot” function, so we use that.
If neither is 1D and at least one is > 2D, then broadcasting is involved.
If pda_L’s shape is [(leftshape),m,n] and pda_R’s shape is [(rightshape),n,k],
then the result will have shape [(common shape),m,k] where common shape is a
shape that both leftshape and rightshape can be broadcast to.
Return the element-wise maximum of x1 and x2. Where either is a nan, return nan,
else the greater of x1, x2. If x1 and x2 are not the same shape, they are first
broadcast to a mutual shape, if possible.
The element-wise maximum of x1 and x2. If both are scalars, it invokes
numpy maximum, otherwise where either is a nan, the returned pdarray
Where neither is a nan, it stores the maximum of x1 and x2.
TypeError – Raised if pda is not a pdarray or k is not an integer
ValueError – Raised if the pda is empty, or pda.ndim > 1, or k < 1
Notes
This call is equivalent in value to a[ak.argsort(a)[k:]]
and generally outperforms this operation.
This reduction will see a significant drop in performance as k grows
beyond a certain value. This value is system dependent, but generally
about a k of 5 million is where performance degredation has been observed.
Compute the median of a given array. 1d case only, for now.
Parameters:
pda (pdarray) – The input data, in pdarray form, numeric type or boolean
Returns:
The median of the entire pdarray
The array is sorted, and then if the number of elements is odd,
the return value is the middle element. If even, then the
mean of the two middle elements.
Return the element-wise minimum of x1 and x2. Where either is a nan, return nan,
else the lesser of x1, x2. If x1 and x2 are not the same shape, they are first
broadcast to a mutual shape, if possible.
The element-wise minimum of x1 and x2. If both are scalars, it invokes
numpy minimum, otherwise where either is a nan, the returned pdarray
Where neither is a nan, it stores the minimum of x1 and x2.
ValueError – Raised if the pda is empty, or pda.ndim > 1, or k < 1
Notes
This call is equivalent in value to a[ak.argsort(a)[:k]]
and generally outperforms this operation.
This reduction will see a significant drop in performance as k grows
beyond a certain value. This value is system dependent, but generally
about a k of 5 million is where performance degredation has been observed.
x2 (pdarray, numeric_scalars, or bigint) – The direction where to look for the next representable value of x1.
If x1.shape != x2.shape, they must be broadcastable to a common shape
(which becomes the shape of the output).
Returns:
The next representable values of x1 in the direction of x2.
This is a scalar if both x1 and x2 are scalars.
max_bits (int) – Specifies the maximum number of bits; only used for bigint pdarrays
Included for consistency, as ones are all zeros ending on a one, regardless
of max_bits
The basic arkouda array class. This class contains only the
attributes of the array; the data resides on the arkouda
server. When a server operation results in a new array, arkouda
will create a pdarray instance that points to the array data on
the server. As such, the user should not initialize pdarray
instances directly.
Return True iff all elements of the array along the given axis evaluate to True.
Parameters:
axis (int, Tuple[int, ...], optional, default = None) – The axis or axes along which to do the operation
If None, the computation is done across the entire array.
keepdims (bool, optional, default = False) – Whether to keep the singleton dimension(s) along axis in the result.
Returns:
boolean if axis is omitted, pdarray if axis is supplied
Return True iff any element of the array along the given axis evaluates to True.
Parameters:
axis (int, Tuple[int, ...], optional, default = None) – The axis or axes along which to do the operation
If None, the computation is done across the entire array.
keepdims (bool, optional, default = False) – Whether to keep the singleton dimension(s) along axis in the result.
Returns:
boolean if axis is omitted, else pdarray if axis is supplied
Return index of the first occurrence of the maximum along the given axis.
Parameters:
axis (int, Tuple[int, ...], optional, default = None) – The axis or axes along which to do the operation
If None, the computation is done across the entire array.
keepdims (bool, optional, default = False) – Whether to keep the singleton dimension(s) along axis in the result.
Returns:
int64 or uint64 if axis is omitted, in which case operation is done over entire array
pdarray if axis is supplied, in which case the operation is done along that axis
Return index of the first occurrence of the minimum along the given axis.
Parameters:
axis (int, Tuple[int, ...], optional, default = None) – The axis or axes along which to do the operation
If None, the computation is done across the entire array.
keepdims (bool, optional, default = False) – Whether to keep the singleton dimension(s) along axis in the result.
Returns:
int64 or uint64 if axis is omitted, in which case operation is done over entire array
pdarray if axis is supplied, in which case the operation is done along that axis
algorithm (SortingAlgorithm, default SortingAlgorithm.RadixSortLSD) – The algorithm to use for sorting.
axis (int_scalars, default 0) – The axis to sort along. Must be between -1 and the array rank.
ascending (bool, default True) – Whether to sort in ascending order. If False, returns a reversed permutation.
Note: ascending=False is only supported for 1D arrays.
Create a list of uint pdarrays from a bigint pdarray.
The first item in return will be the highest 64 bits of the
bigint pdarray and the last item will be the lowest 64 bits.
Returns:
A list of uint pdarrays where:
The first item in return will be the highest 64 bits of the
bigint pdarray and the last item will be the lowest 64 bits.
Return type:
List[pdarrays]
Raises:
RuntimeError – Raised if there is a server-side error thrown
Attempt to cast scalar other to the element dtype of this pdarray,
and print the resulting value to a string (e.g. for sending to a
server command). The user should not call this function directly.
Parameters:
other (object) – The scalar to be cast to the pdarray.dtype
Return type:
string representation of np.dtype corresponding to the other parameter
Raises:
TypeError – Raised if the other parameter cannot be converted to
Numpy dtype
Return True iff the array (or given axis of the array) is monotonically non-decreasing.
Parameters:
axis (int, Tuple[int, ...], optional, default = None) – The axis or axes along which to do the operation
If None, the computation is done across the entire array.
keepdims (bool, optional, default = False) – Whether to keep the singleton dimension(s) along axis in the result.
Returns:
boolean if axis is omitted, else pdarray if axis is supplied
This method is equivalent to arkouda.logical_not(self)().
It returns a boolean pdarray where each element is the
logical negation of the corresponding element in self.
Returns:
A boolean pdarray with the same shape as self
containing the result of the NOT operation.
Works as a method of a pdarray (e.g. a.logical_not())
or as a standalone function (e.g. ak.logical_not(a)). For
boolean arrays, this is equivalent to applying the unary ~
operator. For numeric arrays, zero is treated as False and
non-zero as True.
Return max of array elements along the given axis.
Parameters:
axis (int, Tuple[int, ...], optional, default = None) – The axis or axes along which to do the operation
If None, the computation is done across the entire array.
keepdims (bool, optional, default = False) – Whether to keep the singleton dimension(s) along axis in the result.
Returns:
numeric_scalar if axis is omitted, in which case operation is done over entire array
pdarray if axis is supplied, in which case the operation is done along that axis
axis (int, Tuple[int, ...], optional, default = None) – The axis or axes along which to do the operation
If None, the computation is done across the entire array.
keepdims (bool, optional, default = False) – Whether to keep the singleton dimension(s) along axis in the result.
Returns:
The mean calculated from the pda sum and size, along the axis/axes if
those are given.
Return min of array elements along the given axis.
Parameters:
axis (int, Tuple[int, ...], optional, default = None) – The axis or axes along which to do the operation
If None, the computation is done across the entire array.
keepdims (bool, optional, default = False) – Whether to keep the singleton dimension(s) along axis in the result.
Returns:
numeric_scalar if axis is omitted, in which case operation is done over entire array
pdarray if axis is supplied, in which case the operation is done along that axis
Return prod of array elements along the given axis.
Parameters:
axis (int, Tuple[int, ...], optional, defalt = None) – The axis or axes along which to do the operation
If None, the computation is done across the entire array.
keepdims (bool, optional, default = False) – Whether to keep the singleton dimension(s) along axis in the result.
Returns:
numeric_scalars if axis is omitted, in which case operation is done over entire array
pdarray if axis is supplied, in which case the operation is done along that axis
Register this pdarray with a user defined name in the arkouda server
so it can be attached to later using pdarray.attach().
This is an in-place operation, registering a pdarray more than once will
update the name in the registry and remove the previously registered name.
A name can only be registered to one pdarray at a time.
Parameters:
user_defined_name (str) – user defined name array is to be registered under
Returns:
The same pdarray which is now registered with the arkouda server and has an updated name.
This is an in-place modification, the original is returned to support a
fluid programming style.
Please note you cannot register two different pdarrays with the same name.
TypeError – Raised if user_defined_name is not a str
RegistrationError – If the server was unable to register the pdarray with the user_defined_name
If the user is attempting to register more than one pdarray with the same name,
the former should be unregistered first to free up the registration name.
Return the standard deviation of values in the array. The standard
deviation is implemented as the square root of the variance.
Parameters:
ddof (int_scalars) – “Delta Degrees of Freedom” used in calculating std
axis (int, Tuple[int, ...], optional, default = None) – The axis or axes along which to do the operation
If None, the computation is done across the entire array.
keepdims (bool, optional, default = False) – Whether to keep the singleton dimension(s) along axis in the result.
Returns:
The scalar standard deviation of the array, or the standard deviation
The standard deviation is the square root of the average of the squared
deviations from the mean, i.e., std=sqrt(mean((x-x.mean())**2)).
The average squared deviation is normally calculated as
x.sum()/N, where N=len(x). If, however, ddof is specified,
the divisor N-ddof is used instead. In standard statistical
practice, ddof=1 provides an unbiased estimator of the variance
of the infinite population. ddof=0 provides a maximum likelihood
estimate of the variance for normally distributed variables. The
standard deviation computed in this function is the square root of
the estimated variance, so even with ddof=1, it will not be an
unbiased estimate of the standard deviation per se.
Return sum of array elements along the given axis.
Parameters:
axis (int, Tuple[int, ...], optional, default = None) – The axis or axes along which to do the operation
If None, the computation is done across the entire array.
keepdims (bool, optional, default = False) – Whether to keep the singleton dimension(s) along axis in the result.
Returns:
numeric_scalars if axis is omitted, in which case operation is done over entire array
pdarray if axis is supplied, in which case the operation is done along that axis
When axis is not None, this function does the same thing as “fancy” indexing (indexing arrays
using arrays); however, it can be easier to use if you need elements along a given axis.
A call such as np.take(arr,indices,axis=3) is equivalent to arr[:,:,:,indices,...].
Parameters:
indices (numeric_scalars or pdarray) – The indices of the values to extract. Also allow scalars for indices.
axis (int, optional) – The axis over which to select values. By default, the flattened input array is used.
Write pdarry to CSV file(s). File will contain a single column
with the pdarray data. All CSV files written by Arkouda include
a header denoting data types of the columns.
Parameters:
prefix_path (str) – filename prefix to be used for saving files. Files will have
_LOCALE#### appended when they are written to disk.
dataset (str, defaults to "array") – column name to save the pdarray under.
col_delim (str, defaults to ",") – value to be used to separate columns within the file. Please
be sure that the value used DOES NOT appear in your dataset.
overwrite (bool, defaults to False) – If True, existing files matching the provided path will be overwritten.
if False and existing files are found, an error will be returned.
Returns:
response message
Return type:
str
Raises:
ValueError – Raised if all datasets are not present in all parquet files or if one
or more of the specified files do not exist
RuntimeError – Raised if one or more of the specified files cannot be opened.
if ‘allow_errors’ is true, this may be raised if no values are returned
from the server.
TypeError – Raise if the server returns an unknown arkouda_type
Notes
CSV format is not currently supported by load/load_all operations
The column delimiter is expected to be the same for all column names and data
Be sure that column delimiters are not found within your data.
All CSV files must delimit rows using newline (”\n”) at this time.
Convert the array to a Numba DeviceND array, transferring array data from the
arkouda server to Python via ndarray. If the array exceeds a builtin size limit,
a RuntimeError is raised.
Returns:
A Numba ndarray with the same attributes and data as the pdarray; on GPU
Return type:
numba.DeviceNDArray
Raises:
ImportError – Raised if CUDA is not available
ModuleNotFoundError – Raised if Numba is either not installed or not enabled
RuntimeError – Raised if there is a server-side error thrown in the course of retrieving
the pdarray.
Notes
The number of bytes in the array cannot exceed client.maxTransferBytes,
otherwise a RuntimeError will be raised. This is to protect the user
from overflowing the memory of the system on which the Python client
is running, under the assumption that the server is running on a
distributed system with much more memory than the client. The user
may override this limit by setting client.maxTransferBytes to a larger
value, but proceed with caution.
Save the pdarray to HDF5.
The object can be saved to a collection of files or single file.
Parameters:
prefix_path (str) – Directory and filename prefix that all output files share
dataset (str) – Name of the dataset to create in files (must not already exist)
mode ({'truncate', 'append'}) – By default, truncate (overwrite) output files, if they exist.
If ‘append’, attempt to create new dataset in existing files.
file_type ({"single", "distribute"}) – Default: “distribute”
When set to single, dataset is written to a single file.
When distribute, dataset is written on a file per locale.
This is only supported by HDF5 files and will have no impact of Parquet Files.
Return type:
string message indicating result of save operation
Raises:
RuntimeError – Raised if a server-side error is thrown saving the pdarray
Notes
The prefix_path must be visible to the arkouda server and the user must
have write permission.
- Output files have names of the form <prefix_path>_LOCALE<i>, where <i>
ranges from 0 to numLocales for file_type=’distribute’. Otherwise,
the file name will be prefix_path.
- If any of the output files already exist and
the mode is ‘truncate’, they will be overwritten. If the mode is ‘append’
and the number of output files is less than the number of locales or a
dataset with the same name already exists, a RuntimeError will result.
- Any file extension can be used.The file I/O does not rely on the extension to
determine the file format.
Examples
>>> importarkoudaasak>>> a=ak.arange(25)
Saving without an extension
>>> a.to_hdf(‘path/prefix’, dataset=’array’) # doctest: +SKIP
Saves the array to numLocales HDF5 files with the name cwd/path/name_prefix_LOCALE####
Saving with an extension (HDF5)
>>> a.to_hdf(‘path/prefix.h5’, dataset=’array’) # doctest: +SKIP
Saves the array to numLocales HDF5 files with the name
cwd/path/name_prefix_LOCALE####.h5 where #### is replaced by each locale number
Saving to a single file
>>> a.to_hdf(‘path/prefix.hdf5’, dataset=’array’, file_type=’single’) # doctest: +SKIP
Saves the array in to single hdf5 file on the root node.
cwd/path/name_prefix.hdf5
Convert the array to a np.ndarray, transferring array data from the
Arkouda server to client-side Python. Note: if the pdarray size exceeds
client.maxTransferBytes, a RuntimeError is raised.
Returns:
A numpy ndarray with the same attributes and data as the pdarray
Return type:
np.ndarray
Raises:
RuntimeError – Raised if there is a server-side error thrown, if the pdarray size
exceeds the built-in client.maxTransferBytes size limit, or if the bytes
received does not match expected number of bytes
Notes
The number of bytes in the array cannot exceed client.maxTransferBytes,
otherwise a RuntimeError will be raised. This is to protect the user
from overflowing the memory of the system on which the Python client
is running, under the assumption that the server is running on a
distributed system with much more memory than the client. The user
may override this limit by setting client.maxTransferBytes to a larger
value, but proceed with caution.
Save the pdarray to Parquet. The result is a collection of files,
one file per locale of the arkouda server, where each filename starts
with prefix_path. Each locale saves its chunk of the array to its
corresponding file.
Parameters:
prefix_path (str) – Directory and filename prefix that all output files share
dataset (str) – Name of the dataset to create in files (must not already exist)
mode ({'truncate', 'append'}) – By default, truncate (overwrite) output files, if they exist.
If ‘append’, attempt to create new dataset in existing files.
compression (str (Optional)) – (None | “snappy” | “gzip” | “brotli” | “zstd” | “lz4”)
Sets the compression type used with Parquet files
Return type:
string message indicating result of save operation
Raises:
RuntimeError – Raised if a server-side error is thrown saving the pdarray
Notes
The prefix_path must be visible to the arkouda server and the user must
have write permission.
- Output files have names of the form <prefix_path>_LOCALE<i>, where <i>
ranges from 0 to numLocales for file_type=’distribute’.
- ‘append’ write mode is supported, but is not efficient.
- If any of the output files already exist and
the mode is ‘truncate’, they will be overwritten. If the mode is ‘append’
and the number of output files is less than the number of locales or a
dataset with the same name already exists, a RuntimeError will result.
- Any file extension can be used.The file I/O does not rely on the extension to
determine the file format.
Examples
>>> importarkoudaasak>>> a=ak.arange(25)
Saving without an extension
>>> a.to_parquet(‘path/prefix’, dataset=’array’) # doctest: +SKIP
Saves the array to numLocales HDF5 files with the name cwd/path/name_prefix_LOCALE####
Saving with an extension (HDF5)
>>> a.to_parqet(‘path/prefix.parquet’, dataset=’array’) # doctest: +SKIP
Saves the array to numLocales HDF5 files with the name
cwd/path/name_prefix_LOCALE####.parquet where #### is replaced by each locale number
Convert the array to a list, transferring array data from the
Arkouda server to client-side Python. Note: if the pdarray size exceeds
client.maxTransferBytes, a RuntimeError is raised.
RuntimeError – Raised if there is a server-side error thrown, if the pdarray size
exceeds the built-in client.maxTransferBytes size limit, or if the bytes
received does not match expected number of bytes
Notes
The number of bytes in the array cannot exceed client.maxTransferBytes,
otherwise a RuntimeError will be raised. This is to protect the user
from overflowing the memory of the system on which the Python client
is running, under the assumption that the server is running on a
distributed system with much more memory than the client. The user
may override this limit by setting client.maxTransferBytes to a larger
value, but proceed with caution.
hostname (str) – The hostname where the Arkouda server intended to
receive the pdarray is running.
port (int_scalars) – The port to send the array over. This needs to be an
open port (i.e., not one that the Arkouda server is
running on). This will open up numLocales ports,
each of which in succession, so will use ports of the
range {port..(port+numLocales)} (e.g., running an
Arkouda server of 4 nodes, port 1234 is passed as
port, Arkouda will use ports 1234, 1235, 1236,
and 1237 to send the array data).
This port much match the port passed to the call to
ak.receive_array().
Return type:
A message indicating a complete transfer
Raises:
ValueError – Raised if the op is not within the pdarray.BinOps set
TypeError – Raised if other is not a pdarray or the pdarray.dtype is not
a supported dtype
Overwrite the dataset with the name provided with this pdarray. If
the dataset does not exist it is added.
Parameters:
prefix_path (str) – Directory and filename prefix that all output files share
dataset (str) – Name of the dataset to create in files
repack (bool) – Default: True
HDF5 does not release memory on delete. When True, the inaccessible
data (that was overwritten) is removed. When False, the data remains, but is
inaccessible. Setting to false will yield better performance, but will cause
file sizes to expand.
Return type:
str - success message if successful
Raises:
RuntimeError – Raised if a server-side error is thrown saving the pdarray
Notes
If file does not contain File_Format attribute to indicate how it was saved,
the file name is checked for _LOCALE#### to determine if it is distributed.
If the dataset provided does not exist, it will be added
ddof (int_scalars) – “Delta Degrees of Freedom” used in calculating var
axis (int, Tuple[int, ...], optional, default = None) – The axis or axes along which to do the operation
If None, the computation is done across the entire array.
keepdims (bool, optional, default = False) – Whether to keep the singleton dimension(s) along axis in the result.
Returns:
The scalar variance of the array, or the variance along the axis/axes
if supplied
The variance is the average of the squared deviations from the mean,
i.e., var=mean((x-x.mean())**2).
The mean is normally calculated as x.sum()/N, where N=len(x).
If, however, ddof is specified, the divisor N-ddof is used
instead. In standard statistical practice, ddof=1 provides an
unbiased estimator of the variance of a hypothetical infinite population.
ddof=0 provides a maximum likelihood estimate of the variance for
normally distributed variables.
Compute the q-th percentile of the data along the specified axis.
Parameters:
a (pdarray) – data whose percentile will be computed
q (pdarray, Tuple, or np.ndarray) – a scalar, tuple, or np.ndarray of q values for the computation. All values
must be in the range 0 <= q <= 100
axis (None, int scalar, or tuple of int scalars) – the axis or axes along which the percentiles are computed. The default is None,
which computes the percenntile along a flattened version of the array.
method (string) – one of “inverted_cdf,” “averaged_inverted_cdf”, “closest_observation”,
“interpolated_inverted_cdf”, “hazen”, “weibull”, “linear”, ‘median_unbiased”,
“normal_unbiased”, “lower”,” higher”, “midpoint”
keepdims (bool) – True if the degenerate axes are to be retained after slicing, False if not
Returns:
If q is a scalar and axis is None, the result is a scalar.
If q is a scalar and axis is supplied, the result is a pdarray of rank len(axis)
less than the rank of a.
If q is an array and axis is None, the result is a pdarray of shape q.shape
If q is an array and axis is None, the result is a pdarray of rank q.ndim +
pda.ndim - len(axis). However, there is an intermediate result which is of rank
q.ndim + pda.ndim. If this is not in the compiled ranks, an error will be thrown
even if the final result would be in the compiled ranks.
np.percentile also supports the method “nearest,” however its behavior does not match
the numpy documentation, so it’s not supported here.
np.percentile also allows for weighted inputs, but only for the method “inverted_cdf.”
That also is not supported here.
ValueError – Raised if scalar q or any value of array q is outside the range [0,100]
Raised if the method is not one of the 12 supported methods.
Raised if the result would have a rank not in the compiled ranks.
Raises an array to a power. If where is given, the operation will only take place in the positions
where the where condition is True.
Note:
Our implementation of the where argument deviates from numpy. The difference in behavior occurs
at positions where the where argument contains a False. In numpy, these position will have
uninitialized memory (which can contain anything and will vary between runs). We have chosen to
instead return the value of the original array in these positions.
Parameters:
pda (pdarray) – A pdarray of values that will be raised to a power (pwr)
pwr (integer, float, or pdarray) – The power(s) that pda is raised to
where (Boolean or pdarray) – This condition is broadcast over the input. At locations where the condition is True, the
corresponding value will be raised to the respective power. Elsewhere, it will retain its
original value. Default set to True.
Returns:
a pdarray of values raised to a power, under the boolean where condition.
Overwrite elements of A with elements from B based upon a mask array.
Similar to numpy.putmask, where mask = False, A retains its original value,
but where mask = True, A is overwritten with the corresponding entry from Values.
This is similar to ak.where, except that (1) no new pdarray is created, and
(2) Values does not have to be the same size as A and mask.
Parameters:
A (pdarray) – Value(s) used when mask is False (see Notes for allowed dtypes)
mask (pdarray) – Used to choose values from A or B, must be same size as A, and of type ak.bool_
Values (pdarray) – Value(s) used when mask is False (see Notes for allowed dtypes)
Compute the q-th quantile of the data along the specified axis.
Parameters:
a (pdarray) – data whose quantile will be computed
q (pdarray, Tuple, or np.ndarray) – a scalar, tuple, or np.ndarray of q values for the computation. All values
must be in the range 0 <= q <= 1
axis (None, int scalar, or tuple of int scalars) – the axis or axes along which the quantiles are computed. The default is None,
which computes the quantile along a flattened version of the array.
method (string) – one of “inverted_cdf,” “averaged_inverted_cdf”, “closest_observation”,
“interpolated_inverted_cdf”, “hazen”, “weibull”, “linear”, ‘median_unbiased”,
“normal_unbiased”, “lower”,” higher”, “midpoint”
keepdims (bool) – True if the degenerate axes are to be retained after slicing, False if not
Returns:
If q is a scalar and axis is None, the result is a scalar.
If q is a scalar and axis is supplied, the result is a pdarray of rank len(axis)
less than the rank of a.
If q is an array and axis is None, the result is a pdarray of shape q.shape
If q is an array and axis is None, the result is a pdarray of rank q.ndim +
pda.ndim - len(axis). However, there is an intermediate result which is of rank
q.ndim + pda.ndim. If this is not in the compiled ranks, an error will be thrown
even if the final result would be in the compiled ranks.
np.quantile also supports the method “nearest,” however its behavior does not match
the numpy documentation, so it’s not supported here.
np.quantile also allows for weighted inputs, but only for the method “inverted_cdf.”
That also is not supported here.
ValueError – Raised if scalar q or any value of array q is outside the range [0,1]
Raised if the method is not one of the 12 supported methods.
Raised if the result would have a rank not in the compiled ranks.
where (bool or pdarray, default=True) – This condition is broadcast over the input. At locations where the condition is True, the
corresponding value will be converted from radians to degrees. Elsewhere, it will retain its
original value. Default set to True.
Returns:
A pdarray containing an angle converted to degrees, from radians, for each element
of the original pdarray
TypeError – Raised if logmean is neither a float nor a int, logstd is not a float,
seed is not an int, size is not an int, or if characters is not a str
The lengths of the generated strings are distributed $Lognormal(\mu, \sigma^2)$,
with \(\\mu = logmean\) and \(\\sigma = logstd\). Thus, the strings will
have an average length of \(exp(\\mu + 0.5*\\sigma^2)\), a minimum length of
zero, and a heavy tail towards longer strings.
This function iterates through the dictionary data, registering each object
with its corresponding name. It is useful for batch registering multiple
objects in Arkouda.
Parameters:
data (dict) – A dictionary that maps the name to register the object to the object itself.
For example, {“MyArray”: ak.array([0, 1, 2])}.
After calling this function, “array1” and “array2” are registered
in Arkouda, and can be accessed by their names.
>>> ak.unregister_all([“array1”, “array2”])
ValueError – Raised if repeats is not an int or a 1-dimensional array, or if it contains
negative values, if its size does not match the input arrays size along
axis.
RuntimeError – Raised if the operation fails server-side.
TypeError – Raised if axis anything but None or int, or if either a or repeats is invalid
(the a and repeat cases should be impossible).
IndexError – Raised if axis is invalid for the given rank.
Determine the result dtype from one or more inputs, following NumPy’s
promotion rules but extended to support Arkouda bigint semantics.
This function mirrors numpy.result_type for standard NumPy dtypes,
scalars, and arrays, but additionally recognizes Arkouda bigint and
bigint_ values, promoting them according to Arkouda-specific rules.
In mixed-type expressions, the following logic is applied:
Any presence of bigint or bigint_ promotes the result to:
float64 if any float is also present,
otherwise bigint.
Python integers first pass through Arkouda’s magnitude-aware dtype()
routing, so extremely large integers may promote to bigint.
Booleans promote to bool as in NumPy.
Mixed signed/unsigned integers follow NumPy rules, except that a
non-negative signed scalar combined with unsigned scalars promotes to
the widest unsigned dtype.
All remaining cases defer to numpy.result_type.
Parameters:
*args (Any) – One or more dtype-like objects, scalars, NumPy arrays, Arkouda arrays,
or any value accepted by numpy.result_type or Arkouda’s
dtype() conversion.
where (bool or pdarray, default=True) – This condition is applied over the input. At locations where the condition is True, the
corresponding value will be acted on by the function. Elsewhere, it will retain its
original value. Default set to True.
Returns:
A pdarray containing input array elements rounded to the nearest integer
D.update([E, ]**F) -> None. Update D from mapping/iterable E and F.
If E is present and has a .keys() method, then does: for k in E.keys(): D[k] = E[k]
If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v
In either case, this is followed by: for k in F: D[k] = F[k]
Return the index of the best interval containing each query value.
Given an array of query vals and non-overlapping, closed intervals, return
the index of the best (see tiebreak) interval containing each query value,
or -1 if not present in any interval.
Parameters:
vals ((sequence of) pdarray(int, uint, float)) – Values to search for in intervals. If multiple arrays, each “row” is an item.
intervals (2-tuple of (sequences of) pdarrays) – Non-overlapping, half-open intervals, as a tuple of
(lower_bounds_inclusive, upper_bounds_exclusive)
Must have same dtype(s) as vals.
tiebreak ((optional) pdarray, numeric) – When a value is present in more than one interval, the interval with the
lowest tiebreak value will be chosen. If no tiebreak is given, the
first containing interval will be chosen.
hierarchical (boolean) – When True, sequences of pdarrays will be treated as components specifying
a single dimension (i.e. hierarchical)
When False, sequences of pdarrays will be specifying multi-dimensional intervals
Returns:
idx – Index of interval containing each query value, or -1 if not found
Find indices where elements should be inserted to maintain order.
Find the indices into a sorted array a such that, if the corresponding
elements in v were inserted before the indices, the order of a would be preserved.
Parameters:
a (pdarray) – 1-D input array. Must be sorted in ascending order. sorter is not currently supported.
side ({'left', 'right'}, default='left') – If ‘left’, the index of the first suitable location found is given.
If ‘right’, return the last such index.
x2_sorted (bool, default=False) – If True, assumes that v (x2) is already sorted in ascending order. This can improve performance
for large, sorted search arrays. If False, no assumption is made about the order of v.
Returns:
indices – If v is an array, returns an array of insertion points with the same shape.
If v is a scalar, returns a single integer index.
Return True if a and b share any Arkouda server-side buffers.
This is an Arkouda analogue of numpy.shares_memory with a simpler definition:
it checks for identical backing buffer identities (same server object names).
Notes
Because Arkouda commonly materializes results (rather than views),
aliasing is rare and usually only true when objects literally reference
the same backing buffers.
For compound containers (e.g., SegArray, Strings, Categorical), we check
all of their component buffers.
If you introduce true view semantics in the future, teach _ak_buffer_names
to surface the base buffer name(s) and view descriptors, and compare bases.
where (bool or pdarray, default=True) – This condition is broadcast over the input. At locations where the condition is True,
the sine will be applied to the corresponding value. Elsewhere, it will retain
its original value. Default set to True.
Returns:
A pdarray containing sin for each element
of the original pdarray
Return a pair of integers, whose ratio is exactly equal to the original
floating point number, and with a positive denominator.
Raise OverflowError on infinities and a ValueError on NaNs.
where (bool or pdarray, default=True) – This condition is broadcast over the input. At locations where the condition is True,
the hyperbolic sine will be applied to the corresponding value. Elsewhere, it will retain
its original value. Default set to True.
Returns:
A pdarray containing hyperbolic sine for each element
of the original pdarray
Takes the square root of array. If where is given, the operation will only take place in
the positions where the where condition is True.
Parameters:
pda (pdarray) – A pdarray of values the square roots of which will be computed
where (Boolean or pdarray) – This condition is broadcast over the input. At locations where the condition is True, the
corresponding value will be square rooted. Elsewhere, it will retain its original value.
Default set to True.
Returns:
a pdarray of square roots of the original values, or the original values themselves,
subject to the boolean where condition.
where (bool or pdarray, default=True) – This condition is applied over the input. At locations where the condition is True, the
corresponding value will be acted on by the function. Elsewhere, it will retain its
original value. Default set to True.
Returns:
A pdarray containing square values of the input
array elements
When axis is not None, this function does the same thing as “fancy” indexing (indexing arrays
using arrays); however, it can be easier to use if you need elements along a given axis.
A call such as np.take(arr,indices,axis=3) is equivalent to arr[:,:,:,indices,...].
Parameters:
a (pdarray or Strings) – The array from which to take elements
where (bool or pdarray, default=True) – This condition is broadcast over the input. At locations where the condition is True,
the tangent will be applied to the corresponding value. Elsewhere, it will retain
its original value. Default set to True.
Returns:
A pdarray containing tangent for each element
of the original pdarray
where (bool or pdarray, default=True) – This condition is broadcast over the input. At locations where the condition is True,
the hyperbolic tangent will be applied to the corresponding value. Elsewhere, it will retain
its original value. Default set to True.
Returns:
A pdarray containing hyperbolic tangent for each element
of the original pdarray
Construct an array by repeating A the number of times given by reps.
If reps has length d, the result will have dimension of max(d,A.ndim).
If A.ndim<d, A is promoted to be d-dimensional by prepending new axes. So a shape (3,) array is promoted to (1, 3) for 2-D replication, or shape (1, 1, 3) for 3-D replication. If this is not the desired behavior, promote A to d-dimensions manually before calling this function.
If A.ndim>d, reps is promoted to A.ndim by prepending 1’s to it. Thus for an A of shape (2, 3, 4, 5), a reps of (2, 2) is treated as (1, 1, 2, 2).
Return a fixed frequency TimedeltaIndex, with day as the default
frequency. Alias for ak.Timedelta(pd.timedelta_range(args)).
Subject to size limit imposed by client.maxTransferBytes.
Parameters:
start (str or timedelta-like, default None) – Left bound for generating timedeltas.
end (str or timedelta-like, default None) – Right bound for generating timedeltas.
periods (int, default None) – Number of periods to generate.
freq (str or DateOffset, default 'D') – Frequency strings can have multiples, e.g. ‘5H’.
name (str, default None) – Name of the resulting TimedeltaIndex.
closed (str, default None) – Make the interval closed with respect to the given frequency to
the ‘left’, ‘right’, or both sides (None).
Returns:
rng
Return type:
TimedeltaIndex
Notes
Of the four parameters start, end, periods, and freq,
exactly three must be specified. If freq is omitted, the resulting
TimedeltaIndex will have periods linearly spaced elements between
start and end (closed on both sides).
To learn more about the frequency strings, please see this link.
axes (Tuple[int,...] Optional, defaults to None) – If specified, must be a tuple which contains a permutation of the axes of pda.
Returns:
the transpose of the input matrix
For a 1-D array, this is the original array.
For a 2-D array, this is the standard matrix transpose.
For an n-D array, if axes are given, their order indicates how the axes are permuted.
If axes is None, the axes are reversed.
where (bool or pdarray, default=True) – This condition is applied over the input. At locations where the condition is True, the
corresponding value will be acted on by the function. Elsewhere, it will retain its
original value. Default set to True.
Returns:
A pdarray containing input array elements truncated to the nearest integer
This function sends a request to unregister the Arkouda object associated
with the specified name. It returns a response message indicating the
success or failure of the operation.
Parameters:
name (str) – The name of the object to unregister.
Returns:
A message indicating the result of the unregister operation.
Return type:
str
Raises:
RuntimeError – If the object associated with the given name does not exist or cannot
be unregistered.
After calling this function, “array1” and “array2” are registered
in Arkouda, and can be accessed by their names.
>>> ak.unregister_all([“array1”, “array2”])
Ensure that the input is returned as a list.
If the input is a single pdarray, Strings, or Categorical object, wrap it in a list.
Otherwise, return the input unchanged.
This function differs from histogram() in that it only returns
counts for values that are present, leaving out empty “bins”. This
function delegates all logic to the unique() method where the
return_counts parameter is set to True.
ValueError – Raised if x1 and x2 can not be broadcast to a compatible shape
or if the last dimensions of x1 and x2 don’t match.
Notes
This matches the behavior of numpy vecdot, but as commented above, it is not the
behavior of the deprecated vecdot, which calls the chapel-side vecdot function.
This function only uses broadcast_to, broadcast_shapes, ak.sum, and the
binops pdarray multiplication function. The last dimension of x1 and x2 must
match, and it must be possible to broadcast them to a compatible shape.
The deprecated vecdot can be computed via ak.vecdot(a,b,axis=0) on pdarrays
of matching shape.
Create a new structured or unstructured void scalar.
length_or_dataint, array-like, bytes-like, object
One of multiple meanings (see notes). The length or
bytes data of an unstructured void. Or alternatively,
the data to be stored in the new scalar when dtype
is provided.
This can be an array-like, in which case an array may
be returned.
dtypedtype, optional
If provided the dtype of the new scalar. This dtype must
be “void” dtype (i.e. a structured or unstructured void,
see also defining-structured-types).
Added in version 1.24.
For historical reasons and because void scalars can represent both
arbitrary byte data and structured dtypes, the void constructor
has three calling conventions:
np.void(5) creates a dtype="V5" scalar filled with five
\0 bytes. The 5 can be a Python or NumPy integer.
np.void(b"bytes-like") creates a void scalar from the byte string.
The dtype itemsize will match the byte string length, here "V10".
When a dtype= is passed the call is roughly the same as an
array creation. However, a void scalar rather than array is returned.
Please see the examples which show all three different conventions.
This is equivalent to concatenation along the first axis after
1-D arrays of shape (N,) have been reshaped to (1,N). Rebuilds arrays divided by vsplit.
This function makes most sense for arrays with up to 3 dimensions.
For instance, for pixel-data with a height (first axis), width (second axis),
and r/g/b channels (third axis). The functions concatenate, stack and block
provide more general stacking and concatenation operations.
Parameters:
tup (sequence of pdarray) – The arrays must have the same shape along all but the first axis. 1-D arrays
must have the same length. In the case of a single array_like input, it will be
treated as a sequence of arrays; i.e., each element along the zeroth axis is treated
as a separate array.
dtype (str or type, optional) – If provided, the destination array will have this dtype.
casting ({"no", "equiv", "safe", "same_kind", "unsafe"], optional) – Controls what kind of data casting may occur. Defaults to ‘same_kind’. Currently unused.
Returns:
The array formed by stacking the given arrays, will be at least 2-D.
Return an array with elements chosen from A and B based upon a
conditioning array. As is the case with numpy.where, the return array
consists of values from the first array (A) where the conditioning array
elements are True and from the second array (B) where the conditioning
array elements are False.
Parameters:
condition (pdarray) – Used to choose values from A or B
TypeError – Raised if the condition object is not a pdarray, if A or B is not
an int, np.int64, float, np.float64, bool, pdarray, str, Strings, Categorical
if pdarray dtypes are not supported or do not match, or multiple
condition clauses (see Notes section) are applied
ValueError – Raised if the shapes of the condition, A, and B pdarrays are unequal
A and B must have the same dtype and only one conditional clause
is supported e.g., n < 5, n > 1, which is supported in numpy
is not currently supported in Arkouda
dtype (all_scalars) – Type of resulting array, default ak.float64
max_bits (int) – Specifies the maximum number of bits; only used for bigint pdarrays
Included for consistency, as zeros are represented as all zeros, regardless
of the value of max_bits