arkouda.numpy.strings

Classes

Strings

Represents an array of strings whose data resides on the

Module Contents

class arkouda.numpy.strings.Strings(strings_pdarray: arkouda.numpy.pdarrayclass.pdarray, bytes_size: arkouda.numpy.dtypes.int_scalars)[source]

Represents an array of strings whose data resides on the arkouda server. The user should not call this class directly; rather its instances are created by other arkouda functions.

entry

Encapsulation of a Segmented Strings array contained on the arkouda server. This is a composite of

  • offsets array: starting indices for each string

  • bytes array: raw bytes of all strings joined by nulls

Type:

pdarray

size

The number of strings in the array

Type:

int_scalars

nbytes

The total number of bytes in all strings

Type:

int_scalars

ndim

The rank of the array (currently only rank 1 arrays supported)

Type:

int_scalars

shape

The sizes of each dimension of the array

Type:

tuple

dtype

The dtype is ak.str_

Type:

type

logger

Used for all logging operations

Type:

ArkoudaLogger

Notes

Strings is composed of two pdarrays: (1) offsets, which contains the starting indices for each string and (2) bytes, which contains the raw bytes of all strings, delimited by nulls.

BinOps
argsort(algorithm: arkouda.numpy.sorting.SortingAlgorithm = SortingAlgorithm.RadixSortLSD, ascending: bool = True) arkouda.numpy.pdarrayclass.pdarray[source]

Return the permutation that sorts the Strings.

Parameters:
  • algorithm (SortingAlgorithm, default SortingAlgorithm.RadixSortLSD) – The algorithm to use for sorting.

  • ascending (bool, default True) – Whether to sort in ascending order.

Returns:

The indices that sort the Strings.

Return type:

pdarray

astype(dtype: numpy.dtype | str) arkouda.numpy.pdarrayclass.pdarray | Strings[source]

Cast values of Strings object to provided dtype.

Parameters:

dtype (np.dtype or str) – Dtype to cast to

Returns:

An arkouda pdarray with values converted to the specified data type

Return type:

pdarray

Notes

This is essentially shorthand for ak.cast(x, ‘<dtype>’) where x is a pdarray.

cached_regex_patterns() List[source]

Returns the regex patterns for which Match objects have been cached.

capitalize() Strings[source]

Return a new Strings from the original replaced with the first letter capitilzed and the remaining letters lowercase.

Returns:

Strings from the original replaced with the capitalized equivalent.

Return type:

Strings

Raises:

RuntimeError – Raised if there is a server-side error thrown.

See also

Strings.lower, String.upper, String.title

Examples

>>> import arkouda as ak
>>> strings = ak.array([f'StrINgS aRe Here {i}' for i in range(5)])
>>> strings
array(['StrINgS aRe Here 0', 'StrINgS aRe Here 1', 'StrINgS aRe Here 2', 'StrINgS aRe Here 3', 'StrINgS aRe Here 4'])
>>> strings.title()
array(['Strings Are Here 0', 'Strings Are Here 1', 'Strings Are Here 2', 'Strings Are Here 3', 'Strings Are Here 4'])
static concatenate_uniquely(strings: List[Strings]) Strings[source]

Concatenates a list of Strings into a single Strings object containing only unique strings. Order may not be preserved.

Parameters:

strings (List[Strings]) – List of segmented string objects to concatenate.

Returns:

A new Strings object containing the unique values.

Return type:

Strings

contains(substr: bytes | arkouda.numpy.dtypes.str_scalars, regex: bool = False) arkouda.numpy.pdarrayclass.pdarray[source]

Check whether each element contains the given substring.

Parameters:
  • substr (bytes or str_scalars) – The substring in the form of string or byte array to search for

  • regex (bool, default=False) – Indicates whether substr is a regular expression Note: only handles regular expressions supported by re2 (does not support lookaheads/lookbehinds)

Returns:

True for elements that contain substr, False otherwise

Return type:

pdarray

Raises:
  • TypeError – Raised if the substr parameter is not bytes or str_scalars

  • ValueError – Rasied if substr is not a valid regex

  • RuntimeError – Raised if there is a server-side error thrown

Examples

>>> import arkouda as ak
>>> strings = ak.array([f'{i} string {i}' for i in range(1, 6)])
>>> strings
array(['1 string 1', '2 string 2', '3 string 3', '4 string 4', '5 string 5'])
>>> strings.contains('string')
array([True True True True True])
>>> strings.contains('string \\d', regex=True)
array([True True True True True])
copy() Strings[source]

Return a deep copy of the Strings object.

Returns:

A deep copy of the Strings.

Return type:

Strings

decode(fromEncoding: str, toEncoding: str = 'UTF-8') Strings[source]

Return a new strings object in fromEncoding, expecting that the current Strings is encoded in toEncoding.

Parameters:
  • fromEncoding (str) – The current encoding of the strings object

  • toEncoding (str, default="UTF-8") – The encoding that the strings will be converted to, default to UTF-8

Returns:

A new Strings object in toEncoding

Return type:

Strings

Raises:

RuntimeError – Raised if there is a server-side error thrown

property dtype: numpy.dtype

Return the dtype object of the underlying data.

encode(toEncoding: str, fromEncoding: str = 'UTF-8') Strings[source]

Return a new strings object in toEncoding, expecting that the current Strings is encoded in fromEncoding.

Parameters:
  • toEncoding (str) – The encoding that the strings will be converted to

  • fromEncoding (str, default="UTF-8") – The current encoding of the strings object, default to UTF-8

Returns:

A new Strings object in toEncoding

Return type:

Strings

Raises:

RuntimeError – Raised if there is a server-side error thrown

endswith(substr: bytes | arkouda.numpy.dtypes.str_scalars, regex: bool = False) arkouda.numpy.pdarrayclass.pdarray[source]

Check whether each element ends with the given substring.

Parameters:
  • substr (bytes or str_scalars) – The suffix to search for

  • regex (bool, default=False) – Indicates whether substr is a regular expression Note: only handles regular expressions supported by re2 (does not support lookaheads/lookbehinds)

Returns:

True for elements that end with substr, False otherwise

Return type:

pdarray

Raises:
  • TypeError – Raised if the substr parameter is not bytes or str_scalars

  • ValueError – Rasied if substr is not a valid regex

  • RuntimeError – Raised if there is a server-side error thrown

Examples

>>> import arkouda as ak
>>> strings_start = ak.array([f'{i} string' for i in range(1,6)])
>>> strings_start
array(['1 string', '2 string', '3 string', '4 string', '5 string'])
>>> strings_start.endswith('ing')
array([True True True True True])
>>> strings_end = ak.array([f'string {i}' for i in range(1, 6)])
>>> strings_end
array(['string 1', 'string 2', 'string 3', 'string 4', 'string 5'])
>>> strings_end.endswith('ing \\d', regex = True)
array([True True True True True])
entry: arkouda.numpy.pdarrayclass.pdarray
equals(other) arkouda.numpy.dtypes.bool_scalars[source]

Whether Strings are the same size and all entries are equal.

Parameters:

other (Any) – object to compare.

Returns:

True if the Strings are the same, o.w. False.

Return type:

bool_scalars

Examples

>>> import arkouda as ak
>>> s = ak.array(["a", "b", "c"])
>>> s_cpy = ak.array(["a", "b", "c"])
>>> s.equals(s_cpy)
np.True_
>>> s2 = ak.array(["a", "x", "c"])
>>> s.equals(s2)
np.False_
find_locations(pattern: bytes | arkouda.numpy.dtypes.str_scalars) Tuple[arkouda.numpy.pdarrayclass.pdarray, arkouda.numpy.pdarrayclass.pdarray, arkouda.numpy.pdarrayclass.pdarray][source]

Finds pattern matches and returns pdarrays containing the number, start postitions, and lengths of matches.

Parameters:

pattern (bytes or str_scalars) – The regex pattern used to find matches

Returns:

pdarray, int64

For each original string, the number of pattern matches

pdarray, int64

The start positons of pattern matches

pdarray, int64

The lengths of pattern matches

Return type:

Tuple[pdarray, pdarray, pdarray]

Raises:
  • TypeError – Raised if the pattern parameter is not bytes or str_scalars

  • ValueError – Raised if pattern is not a valid regex

  • RuntimeError – Raised if there is a server-side error thrown

Examples

>>> import arkouda as ak
>>> strings = ak.array([f'{i} string {i}' for i in range(1, 6)])
>>> num_matches, starts, lens = strings.find_locations('\\d')
>>> num_matches
array([2 2 2 2 2])
>>> starts
array([0 9 0 9 0 9 0 9 0 9])
>>> lens
array([1 1 1 1 1 1 1 1 1 1])
findall(pattern: bytes | arkouda.numpy.dtypes.str_scalars, return_match_origins: bool = False) Strings | Tuple[source]

Return a new Strings containg all non-overlapping matches of pattern.

Parameters:
  • pattern (bytes or str_scalars) – Regex used to find matches

  • return_match_origins (bool, default=False) – If True, return a pdarray containing the index of the original string each pattern match is from

Returns:

Strings

Strings object containing only pattern matches

pdarray, int64 (optional)

The index of the original string each pattern match is from

Return type:

Union[Strings, Tuple]

Raises:
  • TypeError – Raised if the pattern parameter is not bytes or str_scalars

  • ValueError – Raised if pattern is not a valid regex

  • RuntimeError – Raised if there is a server-side error thrown

Examples

>>> import arkouda as ak
>>> strings = ak.array(['1_2___', '____', '3', '__4___5____6___7', ''])
>>> strings.findall('_+', return_match_origins=True)
(array(['_', '___', '____', '__', '___', '____', '___']), array([0 0 1 3 3 3 3]))
flatten() Strings[source]

Return a copy of the array collapsed into one dimension.

Return type:

A copy of the input array, flattened to one dimension.

Note

As multidimensional Strings are currently supported, flatten on a Strings object will always return itself.

static from_parts(offset_attrib: arkouda.numpy.pdarrayclass.pdarray | str, bytes_attrib: arkouda.numpy.pdarrayclass.pdarray | str) Strings[source]

Assemble a Strings object from separate offset and bytes arrays.

This factory method constructs a segmented Strings array by sending two separate components—offsets and values—to the Arkouda server and instructing it to assemble them into a single Strings object. Use this when offsets and byte data are created or transported independently.

Parameters:
  • offset_attrib (pdarray or str) – The array of starting positions for each string, or a string expression that can be passed to create_pdarray to build it.

  • bytes_attrib (pdarray or str) – The array of raw byte values (e.g., uint8 character codes), or a string expression that can be passed to create_pdarray to build it.

Returns:

A Strings object representing the assembled segmented strings array on the Arkouda server.

Return type:

Strings

Raises:

RuntimeError – If conversion of offset_attrib or bytes_attrib to pdarray fails, or if the server is unable to assemble the parts into a Strings.

Notes

  • Both inputs can be existing pdarray instances or arguments suitable for create_pdarray.

  • Internally uses the CMD_ASSEMBLE command to merge offsets and values.

static from_return_msg(rep_msg: str) Strings[source]

Create a Strings object from an Arkouda server response message.

Parse the server’s response descriptor and construct a Strings array with its underlying pdarray and total byte size.

Parameters:

rep_msg (str) – Server response message of the form: ` created <name> <type> <size> <ndim> <shape> <itemsize>+... bytes.size <total_bytes> ` For example: ` "created foo Strings 3 1 (3,) 8+created bytes.size 24" `

Returns:

A Strings object representing the segmented strings array on the server, initialized with the returned pdarray and byte-size metadata.

Return type:

Strings

Raises:

RuntimeError – If the response message cannot be parsed or does not match the expected format.

Examples

>>> import arkouda as ak

# Example response message (typically from generic_msg) >>> rep_msg = “created foo Strings 3 1 (3,) 8+created bytes.size 24” >>> s = ak.Strings.from_return_msg(rep_msg) >>> isinstance(s, ak.Strings) True

fullmatch(pattern: bytes | arkouda.numpy.dtypes.str_scalars) arkouda.pandas.match.Match[source]

Return a match object where elements match only if the whole string matches the regular expression pattern.

Parameters:

pattern (bytes or str_scalars) – Regex used to find matches

Returns:

Match object where elements match only if the whole string matches the regular expression pattern

Return type:

Match

Examples

>>> import arkouda as ak
>>> strings = ak.array(['1_2___', '____', '3', '__4___5____6___7', ''])
>>> strings.fullmatch('_+')
<ak.Match object: matched=False; matched=True, span=(0, 4); matched=False;
matched=False; matched=False>
get_bytes() arkouda.numpy.pdarrayclass.pdarray[source]

Getter for the bytes component (uint8 pdarray) of this Strings.

Returns:

Pdarray of bytes of the string accessed

Return type:

pdarray

Example

>>> import arkouda as ak
>>> x = ak.array(['one', 'two', 'three'])
>>> x.get_bytes()
array([111 110 101 0 116 119 111 0 116 104 114 101 101 0])
get_lengths() arkouda.numpy.pdarrayclass.pdarray[source]

Return the length of each string in the array.

Returns:

The length of each string

Return type:

pdarray

Raises:

RuntimeError – Raised if there is a server-side error thrown

get_offsets() arkouda.numpy.pdarrayclass.pdarray[source]

Getter for the offsets component (int64 pdarray) of this Strings.

Returns:

Pdarray of offsets of the string accessed

Return type:

pdarray

Example

>>> import arkouda as ak
>>> x = ak.array(['one', 'two', 'three'])
>>> x.get_offsets()
array([0 4 8])
get_prefixes(n: arkouda.numpy.dtypes.int_scalars, return_origins: bool = True, proper: bool = True) Strings | Tuple[Strings, arkouda.numpy.pdarrayclass.pdarray][source]

Return the n-long prefix of each string, where possible.

Parameters:
  • n (int_scalars) – Length of prefix

  • return_origins (bool, default=True) – If True, return a logical index indicating which strings were long enough to return an n-prefix

  • proper (bool, default=True) – If True, only return proper prefixes, i.e. from strings that are at least n+1 long. If False, allow the entire string to be returned as a prefix.

Returns:

prefixesStrings

The array of n-character prefixes; the number of elements is the number of True values in the returned mask.

origin_indicespdarray, bool

Boolean array that is True where the string was long enough to return an n-character prefix, False otherwise.

Return type:

Union[Strings, Tuple[Strings, pdarray]]

get_suffixes(n: arkouda.numpy.dtypes.int_scalars, return_origins: bool = True, proper: bool = True) Strings | Tuple[Strings, arkouda.numpy.pdarrayclass.pdarray][source]

Return the n-long suffix of each string, where possible.

Parameters:
  • n (int_scalars) – Length of suffix

  • return_origins (bool, default=True) – If True, return a logical index indicating which strings were long enough to return an n-suffix

  • proper (bool, default=True) – If True, only return proper suffixes, i.e. from strings that are at least n+1 long. If False, allow the entire string to be returned as a suffix.

Returns:

suffixesStrings

The array of n-character suffixes; the number of elements is the number of True values in the returned mask.

origin_indicespdarray, bool

Boolean array that is True where the string was long enough to return an n-character suffix, False otherwise.

Return type:

Union[Strings, Tuple[Strings, pdarray]]

group() arkouda.numpy.pdarrayclass.pdarray[source]

Return the permutation that groups the array, placing equivalent strings together. All instances of the same string are guaranteed to lie in one contiguous block of the permuted array, but the blocks are not necessarily ordered.

Returns:

The permutation that groups the array by value

Return type:

pdarray

See also

GroupBy, unique

Notes

If the arkouda server is compiled with “-sSegmentedString.useHash=true”, then arkouda uses 128-bit hash values to group strings, rather than sorting the strings directly. This method is fast, but the resulting permutation merely groups equivalent strings and does not sort them. If the “useHash” parameter is false, then a full sort is performed.

Raises:

RuntimeError – Raised if there is a server-side error in executing group request or creating the pdarray encapsulating the return message

hash() Tuple[arkouda.numpy.pdarrayclass.pdarray, arkouda.numpy.pdarrayclass.pdarray][source]

Compute a 128-bit hash of each string.

Returns:

A tuple of two int64 pdarrays. The ith hash value is the concatenation of the ith values from each array.

Return type:

Tuple[pdarray,pdarray]

Notes

The implementation uses SipHash128, a fast and balanced hash function (used by Python for dictionaries and sets). For realistic numbers of strings (up to about 10**15), the probability of a collision between two 128-bit hash values is negligible.

property inferred_type: str

Return a string of the type inferred from the values.

info() str[source]

Return a JSON formatted string containing information about all components of self.

Returns:

JSON string containing information about all components of self

Return type:

str

is_registered() numpy.bool_[source]

Return True iff the object is contained in the registry.

Returns:

Indicates if the object is contained in the registry

Return type:

bool

Raises:

RuntimeError – Raised if there’s a server-side error thrown

isalnum() arkouda.numpy.pdarrayclass.pdarray[source]

Return a boolean pdarray where index i indicates whether string i of the Strings is alphanumeric.

Returns:

True for elements that are alphanumeric, False otherwise

Return type:

pdarray

Raises:

RuntimeError – Raised if there is a server-side error thrown

Examples

>>> import arkouda as ak
>>> not_alnum = ak.array([f'%Strings {i}' for i in range(3)])
>>> alnum = ak.array([f'Strings{i}' for i in range(3)])
>>> strings = ak.concatenate([not_alnum, alnum])
>>> strings
array(['%Strings 0', '%Strings 1', '%Strings 2', 'Strings0', 'Strings1', 'Strings2'])
>>> strings.isalnum()
array([False False False True True True])
isalpha() arkouda.numpy.pdarrayclass.pdarray[source]

Return a boolean pdarray where index i indicates whether string i of the Strings is alphabetic. This means there is at least one character, and all the characters are alphabetic.

Returns:

True for elements that are alphabetic, False otherwise

Return type:

pdarray

Raises:

RuntimeError – Raised if there is a server-side error thrown

Examples

>>> import arkouda as ak
>>> not_alpha = ak.array([f'%Strings {i}' for i in range(3)])
>>> alpha = ak.array(['StringA','StringB','StringC'])
>>> strings = ak.concatenate([not_alpha, alpha])
>>> strings
array(['%Strings 0', '%Strings 1', '%Strings 2', 'StringA', 'StringB', 'StringC'])
>>> strings.isalpha()
array([False False False True True True])
isdecimal() arkouda.numpy.pdarrayclass.pdarray[source]

Return a boolean pdarray where index i indicates whether string i of the Strings has all decimal characters.

Returns:

True for elements that are decimals, False otherwise

Return type:

pdarray

Raises:

RuntimeError – Raised if there is a server-side error thrown

See also

Strings.isdigit

Examples

>>> import arkouda as ak
>>> not_decimal = ak.array([f'Strings {i}' for i in range(3)])
>>> decimal = ak.array([f'12{i}' for i in range(3)])
>>> strings = ak.concatenate([not_decimal, decimal])
>>> strings
array(['Strings 0', 'Strings 1', 'Strings 2', '120', '121', '122'])
>>> strings.isdecimal()
array([False False False True True True])

Special Character Examples

>>> special_strings = ak.array(["3.14", "0", "²", "2³₇", "2³x₇"])
>>> special_strings
array(['3.14', '0', '²', '2³₇', '2³x₇'])
>>> special_strings.isdecimal()
array([False True False False False])
isdigit() arkouda.numpy.pdarrayclass.pdarray[source]

Return a boolean pdarray where index i indicates whether string i of the Strings has all digit characters.

Returns:

True for elements that are digits, False otherwise

Return type:

pdarray

Raises:

RuntimeError – Raised if there is a server-side error thrown

Examples

>>> import arkouda as ak
>>> not_digit = ak.array([f'Strings {i}' for i in range(3)])
>>> digit = ak.array([f'12{i}' for i in range(3)])
>>> strings = ak.concatenate([not_digit, digit])
>>> strings
array(['Strings 0', 'Strings 1', 'Strings 2', '120', '121', '122'])
>>> strings.isdigit()
array([False False False True True True])

Special Character Examples

>>> special_strings = ak.array(["3.14", "0", "²", "2³₇", "2³x₇"])
>>> special_strings
array(['3.14', '0', '²', '2³₇', '2³x₇'])
>>> special_strings.isdigit()
array([False True True True False])
isempty() arkouda.numpy.pdarrayclass.pdarray[source]

Return a boolean pdarray where index i indicates whether string i of the Strings is empty.

True for elements that are the empty string, False otherwise

Returns:

True for elements that are digits, False otherwise

Return type:

pdarray

Raises:

RuntimeError – Raised if there is a server-side error thrown

Examples

>>> import arkouda as ak
>>> not_empty = ak.array([f'Strings {i}' for i in range(3)])
>>> empty = ak.array(['' for i in range(3)])
>>> strings = ak.concatenate([not_empty, empty])
>>> strings
array(['Strings 0', 'Strings 1', 'Strings 2', '', '', ''])
>>> strings.isempty()
array([False False False True True True])
islower() arkouda.numpy.pdarrayclass.pdarray[source]

Return a boolean pdarray where index i indicates whether string i of the Strings is entirely lowercase.

Returns:

True for elements that are entirely lowercase, False otherwise

Return type:

pdarray

Raises:

RuntimeError – Raised if there is a server-side error thrown

See also

Strings.isupper

Examples

>>> import arkouda as ak
>>> lower = ak.array([f'strings {i}' for i in range(3)])
>>> upper = ak.array([f'STRINGS {i}' for i in range(3)])
>>> strings = ak.concatenate([lower, upper])
>>> strings
array(['strings 0', 'strings 1', 'strings 2', 'STRINGS 0', 'STRINGS 1', 'STRINGS 2'])
>>> strings.islower()
array([True True True False False False])
isnumeric() arkouda.numpy.pdarrayclass.pdarray[source]

Return a boolean pdarray where index i indicates whether string i of the Strings has all numeric characters. There are 1922 unicode characters that qualify as numeric, including the digits 0 through 9, superscripts and subscripted digits, special characters with the digits encircled or enclosed in parens, “vulgar fractions,” and more.

Returns:

True for elements that are numerics, False otherwise

Return type:

pdarray

Raises:

RuntimeError – Raised if there is a server-side error thrown

Examples

>>> import arkouda as ak
>>> not_numeric = ak.array([f'Strings {i}' for i in range(3)])
>>> numeric = ak.array([f'12{i}' for i in range(3)])
>>> strings = ak.concatenate([not_numeric, numeric])
>>> strings
array(['Strings 0', 'Strings 1', 'Strings 2', '120', '121', '122'])
>>> strings.isnumeric()
array([False False False True True True])

Special Character Examples

>>> special_strings = ak.array(["3.14", "0", "²", "2³₇", "2³x₇"])
>>> special_strings
array(['3.14', '0', '²', '2³₇', '2³x₇'])
>>> special_strings.isnumeric()
array([False True True True False])
isspace() arkouda.numpy.pdarrayclass.pdarray[source]

Return a boolean pdarray where index i indicates whether string i has all whitespace characters (‘ ’, ‘ ’, ‘

’, ‘ ’, ‘ ’, ‘ ’).

pdarray

True for elements that are whitespace, False otherwise

RuntimeError

Raised if there is a server-side error thrown

Strings.islower Strings.isupper Strings.istitle

>>> import arkouda as ak
>>> not_space = ak.array([f'Strings {i}' for i in range(3)])
>>> space = ak.array([' ', '\t', '\n', '\v', '\f', '\r', ' \t\n\v\f\r'])
>>> strings = ak.concatenate([not_space, space])
>>> strings
array(['Strings 0', 'Strings 1', 'Strings 2', ' ', 'u0009', 'n', 'u000B', 'u000C', 'u000D', ' u0009nu000Bu000Cu000D'])
>>> strings.isspace()
array([False False False True True True True True True True])
istitle() arkouda.numpy.pdarrayclass.pdarray[source]

Return a boolean pdarray where index i indicates whether string i of the Strings is titlecase.

Returns:

True for elements that are titlecase, False otherwise

Return type:

pdarray

Raises:

RuntimeError – Raised if there is a server-side error thrown

Examples

>>> import arkouda as ak
>>> mixed = ak.array([f'sTrINgs {i}' for i in range(3)])
>>> title = ak.array([f'Strings {i}' for i in range(3)])
>>> strings = ak.concatenate([mixed, title])
>>> strings
array(['sTrINgs 0', 'sTrINgs 1', 'sTrINgs 2', 'Strings 0', 'Strings 1', 'Strings 2'])
>>> strings.istitle()
array([False False False True True True])
isupper() arkouda.numpy.pdarrayclass.pdarray[source]

Return a boolean pdarray where index i indicates whether string i of the Strings is entirely uppercase.

Returns:

True for elements that are entirely uppercase, False otherwise

Return type:

pdarray

Raises:

RuntimeError – Raised if there is a server-side error thrown

See also

Strings.islower

Examples

>>> import arkouda as ak
>>> lower = ak.array([f'strings {i}' for i in range(3)])
>>> upper = ak.array([f'STRINGS {i}' for i in range(3)])
>>> strings = ak.concatenate([lower, upper])
>>> strings
array(['strings 0', 'strings 1', 'strings 2', 'STRINGS 0', 'STRINGS 1', 'STRINGS 2'])
>>> strings.isupper()
array([False False False True True True])
logger: arkouda.core.logger.ArkoudaLogger
lower() Strings[source]

Return a new Strings with all uppercase characters from the original replaced with their lowercase equivalent.

Returns:

Strings with all uppercase characters from the original replaced with their lowercase equivalent

Return type:

Strings

Raises:

RuntimeError – Raised if there is a server-side error thrown

See also

Strings.upper

Examples

>>> import arkouda as ak
>>> strings = ak.array([f'StrINgS {i}' for i in range(5)])
>>> strings
array(['StrINgS 0', 'StrINgS 1', 'StrINgS 2', 'StrINgS 3', 'StrINgS 4'])
>>> strings.lower()
array(['strings 0', 'strings 1', 'strings 2', 'strings 3', 'strings 4'])
lstick(other: Strings, delimiter: bytes | arkouda.numpy.dtypes.str_scalars = '') Strings[source]

Join the strings from another array onto the left of the strings of this array, optionally inserting a delimiter. Warning: This function is experimental and not guaranteed to work.

Parameters:
  • other (Strings) – The strings to join onto self’s strings

  • delimiter (bytes or str_scalars, default="") – String inserted between self and other

Returns:

The array of joined strings, as other + self

Return type:

Strings

Raises:
  • TypeError – Raised if the delimiter parameter is neither bytes nor a str or if the other parameter is not a Strings instance

  • RuntimeError – Raised if there is a server-side error thrown

See also

stick, peel, rpeel

Examples

>>> import arkouda as ak
>>> s = ak.array(['a', 'c', 'e'])
>>> t = ak.array(['b', 'd', 'f'])
>>> s.lstick(t, delimiter='.')
array(['b.a', 'd.c', 'f.e'])
match(pattern: bytes | arkouda.numpy.dtypes.str_scalars) arkouda.pandas.match.Match[source]

Return a match object where elements match only if the beginning of the string matches the regular expression pattern.

Parameters:

pattern (bytes or str_scalars) – Regex used to find matches

Returns:

Match object where elements match only if the beginning of the string matches the regular expression pattern

Return type:

Match

Examples

>>> import arkouda as ak
>>> strings = ak.array(['1_2___', '____', '3', '__4___5____6___7', ''])
>>> strings.match('_+')
<ak.Match object: matched=False; matched=True, span=(0, 4); matched=False;
matched=True, span=(0, 2); matched=False>
nbytes: arkouda.numpy.dtypes.int_scalars
ndim: arkouda.numpy.dtypes.int_scalars
objType = 'Strings'
peel(delimiter: bytes | arkouda.numpy.dtypes.str_scalars, times: arkouda.numpy.dtypes.int_scalars = 1, includeDelimiter: bool = False, keepPartial: bool = False, fromRight: bool = False, regex: bool = False) Tuple[Strings, Strings][source]

Peel off one or more delimited fields from each string (similar to string.partition), returning two new arrays of strings. Warning: This function is experimental and not guaranteed to work.

Parameters:
  • delimiter (bytes or str_scalars) – The separator where the split will occur

  • times (int_scalars, default=1) – The number of times the delimiter is sought, i.e. skip over the first (times-1) delimiters

  • includeDelimiter (bool, default=False) – If true, append the delimiter to the end of the first return array. By default, it is prepended to the beginning of the second return array.

  • keepPartial (bool, default=False) – If true, a string that does not contain <times> instances of the delimiter will be returned in the first array. By default, such strings are returned in the second array.

  • fromRight (bool, default=False) – If true, peel from the right instead of the left (see also rpeel)

  • regex (bool, default=False) – Indicates whether delimiter is a regular expression Note: only handles regular expressions supported by re2 (does not support lookaheads/lookbehinds)

Returns:

left: Strings

The field(s) peeled from the end of each string (unless fromRight is true)

right: Strings

The remainder of each string after peeling (unless fromRight is true)

Return type:

Tuple[Strings, Strings]

Raises:
  • TypeError – Raised if the delimiter parameter is not byte or str_scalars, if times is not int64, or if includeDelimiter, keepPartial, or fromRight is not bool

  • ValueError – Raised if times is < 1 or if delimiter is not a valid regex

  • RuntimeError – Raised if there is a server-side error thrown

See also

rpeel, stick, lstick

Examples

>>> import arkouda as ak
>>> s = ak.array(['a.b', 'c.d', 'e.f.g'])
>>> s.peel('.')
(array(['a', 'c', 'e']), array(['b', 'd', 'f.g']))
>>> s.peel('.', includeDelimiter=True)
(array(['a.', 'c.', 'e.']), array(['b', 'd', 'f.g']))
>>> s.peel('.', times=2)
(array(['', '', 'e.f']), array(['a.b', 'c.d', 'g']))
>>> s.peel('.', times=2, keepPartial=True)
(array(['a.b', 'c.d', 'e.f']), array(['', '', 'g']))
pretty_print_info() None[source]

Print information about all components of self in a human readable format.

purge_cached_regex_patterns() None[source]

Purges cached regex patterns.

regex_split(pattern: bytes | arkouda.numpy.dtypes.str_scalars, maxsplit: int = 0, return_segments: bool = False) Strings | Tuple[source]

Return a new Strings split by the occurrences of pattern.

If maxsplit is nonzero, at most maxsplit splits occur.

Parameters:
  • pattern (bytes or str_scalars) – Regex used to split strings into substrings

  • maxsplit (int, default=0) – The max number of pattern match occurences in each element to split. The default maxsplit=0 splits on all occurences

  • return_segments (bool, default=False) – If True, return mapping of original strings to first substring in return array.

Returns:

Strings

Substrings with pattern matches removed

pdarray, int64 (optional)

For each original string, the index of first corresponding substring in the return array

Return type:

Union[Strings, Tuple]

Examples

>>> import arkouda as ak
>>> strings = ak.array(['1_2___', '____', '3', '__4___5____6___7', ''])
>>> strings.regex_split('_+', maxsplit=2, return_segments=True)
(array(['1', '2', '', '', '', '3', '', '4', '5____6___7', '']), array([0 3 5 6 9]))
register(user_defined_name: str) Strings[source]

Register this Strings object with a user defined name in the arkouda server so it can be attached to later using Strings.attach().

This is an in-place operation, registering a Strings object more than once will update the name in the registry and remove the previously registered name. A name can only be registered to one object at a time.

Parameters:

user_defined_name (str) – user defined name which the Strings object is to be registered under

Returns:

The same Strings object which is now registered with the arkouda server and has an updated name. This is an in-place modification, the original is returned to support a fluid programming style. Please note you cannot register two different objects with the same name.

Return type:

Strings

Raises:
  • TypeError – Raised if user_defined_name is not a str

  • RegistrationError – If the server was unable to register the Strings object with the user_defined_name If the user is attempting to register more than one object with the same name, the former should be unregistered first to free up the registration name.

See also

attach, unregister

Notes

Registered names/Strings objects in the server are immune to deletion until they are unregistered.

registered_name: str | None = None
rpeel(delimiter: bytes | arkouda.numpy.dtypes.str_scalars, times: arkouda.numpy.dtypes.int_scalars = 1, includeDelimiter: bool = False, keepPartial: bool = False, regex: bool = False) Tuple[Strings, Strings][source]

Peel off one or more delimited fields from the end of each string (similar to string.rpartition), returning two new arrays of strings. Warning: This function is experimental and not guaranteed to work.

Parameters:
  • delimiter (bytes or str_scalars) – The separator where the split will occur

  • times (int_scalars, default=1) – The number of times the delimiter is sought, i.e. skip over the last (times-1) delimiters

  • includeDelimiter (bool, default=False) – If true, prepend the delimiter to the start of the first return array. By default, it is appended to the end of the second return array.

  • keepPartial (bool, default=False) – If true, a string that does not contain <times> instances of the delimiter will be returned in the second array. By default, such strings are returned in the first array.

  • regex (bool, default=False) – Indicates whether delimiter is a regular expression Note: only handles regular expressions supported by re2 (does not support lookaheads/lookbehinds)

Returns:

left: Strings

The remainder of the string after peeling

right: Strings

The field(s) that were peeled from the right of each string

Return type:

Tuple[Strings, Strings]

Raises:
  • TypeError – Raised if the delimiter parameter is not bytes or str_scalars or if times is not int64

  • ValueError – Raised if times is < 1 or if delimiter is not a valid regex

  • RuntimeError – Raised if there is a server-side error thrown

See also

peel, stick, lstick

Examples

>>> import arkouda as ak
>>> s = ak.array(['a.b', 'c.d', 'e.f.g'])
>>> s.rpeel('.')
(array(['a', 'c', 'e.f']), array(['b', 'd', 'g']))

Compared against peel

>>> s.peel('.')
(array(['a', 'c', 'e']), array(['b', 'd', 'f.g']))
search(pattern: bytes | arkouda.numpy.dtypes.str_scalars) arkouda.pandas.match.Match[source]

Return a match object with the first location in each element where pattern produces a match. Elements match if any part of the string matches the regular expression pattern.

Parameters:

pattern (bytes or str_scalars) – Regex used to find matches

Returns:

Match object where elements match if any part of the string matches the regular expression pattern

Return type:

Match

Examples

>>> import arkouda as ak
>>> strings = ak.array(['1_2___', '____', '3', '__4___5____6___7', ''])
>>> strings.search('_+')
<ak.Match object: matched=True, span=(1, 2); matched=True, span=(0, 4);
matched=False; matched=True, span=(0, 2); matched=False>
shape: Tuple[int]
size: arkouda.numpy.dtypes.int_scalars
split(delimiter: str, return_segments: bool = False, regex: bool = False) Strings | Tuple[source]

Unpack delimiter-joined substrings into a flat array.

Parameters:
  • delimiter (str) – Characters used to split strings into substrings

  • return_segments (bool, default=False) – If True, also return mapping of original strings to first substring in return array.

  • regex (bool, default=False) – Indicates whether delimiter is a regular expression Note: only handles regular expressions supported by re2 (does not support lookaheads/lookbehinds)

Returns:

Strings

Flattened substrings with delimiters removed

pdarray, int64 (optional)

For each original string, the index of first corresponding substring in the return array

Return type:

Union[Strings, Tuple]

See also

peel, rpeel

Examples

>>> import arkouda as ak
>>> orig = ak.array(['one|two', 'three|four|five', 'six'])
>>> orig.split('|')
array(['one', 'two', 'three', 'four', 'five', 'six'])
>>> flat, mapping = orig.split('|', return_segments=True)
>>> mapping
array([0 2 5])
>>> under = ak.array(['one_two', 'three_____four____five', 'six'])
>>> under_split, under_map = under.split('_+', return_segments=True, regex=True)
>>> under_split
array(['one', 'two', 'three', 'four', 'five', 'six'])
>>> under_map
array([0 2 5])
startswith(substr: bytes | arkouda.numpy.dtypes.str_scalars, regex: bool = False) arkouda.numpy.pdarrayclass.pdarray[source]

Check whether each element starts with the given substring.

Parameters:
  • substr (bytes or str_scalars) – The prefix to search for

  • regex (bool, default=False) – Indicates whether substr is a regular expression Note: only handles regular expressions supported by re2 (does not support lookaheads/lookbehinds)

Returns:

True for elements that start with substr, False otherwise

Return type:

pdarray

Raises:
  • TypeError – Raised if the substr parameter is not a bytes ior str_scalars

  • ValueError – Rasied if substr is not a valid regex

  • RuntimeError – Raised if there is a server-side error thrown

Examples

>>> import arkouda as ak
>>> strings_end = ak.array([f'string {i}' for i in range(1, 6)])
>>> strings_end
array(['string 1', 'string 2', 'string 3', 'string 4', 'string 5'])
>>> strings_end.startswith('string')
array([True True True True True])
>>> strings_start = ak.array([f'{i} string' for i in range(1,6)])
>>> strings_start
array(['1 string', '2 string', '3 string', '4 string', '5 string'])
>>> strings_start.startswith('\\d str', regex = True)
array([True True True True True])
stick(other: Strings, delimiter: bytes | arkouda.numpy.dtypes.str_scalars = '', toLeft: bool = False) Strings[source]

Join the strings from another array onto one end of the strings of this array, optionally inserting a delimiter. Warning: This function is experimental and not guaranteed to work.

Parameters:
  • other (Strings) – The strings to join onto self’s strings

  • delimiter (bytes or str_scalars, default="") – String inserted between self and other

  • toLeft (bool, default=False) – If true, join other strings to the left of self. By default, other is joined to the right of self.

Returns:

The array of joined strings

Return type:

Strings

Raises:
  • TypeError – Raised if the delimiter parameter is not bytes or str_scalars or if the other parameter is not a Strings instance

  • ValueError – Raised if times is < 1

  • RuntimeError – Raised if there is a server-side error thrown

See also

lstick, peel, rpeel

Examples

>>> import arkouda as ak
>>> s = ak.array(['a', 'c', 'e'])
>>> t = ak.array(['b', 'd', 'f'])
>>> s.stick(t, delimiter='.')
array(['a.b', 'c.d', 'e.f'])
strip(chars: bytes | arkouda.numpy.dtypes.str_scalars | None = '') Strings[source]

Return a new Strings object with all leading and trailing occurrences of characters contained in chars removed. The chars argument is a string specifying the set of characters to be removed. If omitted, the chars argument defaults to removing whitespace. The chars argument is not a prefix or suffix; rather, all combinations of its values are stripped.

Parameters:

chars (bytes or str_scalars, optional) – the set of characters to be removed

Returns:

Strings object with the leading and trailing characters matching the set of characters in the chars argument removed

Return type:

Strings

Raises:

RuntimeError – Raised if there is a server-side error thrown

Examples

>>> import arkouda as ak
>>> strings = ak.array(['Strings ', '  StringS  ', 'StringS   '])
>>> s = strings.strip()
>>> s
array(['Strings', 'StringS', 'StringS'])
>>> strings = ak.array(['Strings 1', '1 StringS  ', '  1StringS  12 '])
>>> s = strings.strip(' 12')
>>> s
array(['Strings', 'StringS', 'StringS'])
sub(pattern: bytes | arkouda.numpy.dtypes.str_scalars, repl: bytes | arkouda.numpy.dtypes.str_scalars, count: int = 0) Strings[source]

Return new Strings obtained by replacing non-overlapping occurrences of pattern with the replacement repl.

If count is nonzero, at most count substitutions occur.

Parameters:
  • pattern (bytes or str_scalars) – The regex to substitue

  • repl (bytes or str_scalars) – The substring to replace pattern matches with

  • count (int, default=0) – The max number of pattern match occurences in each element to replace. The default count=0 replaces all occurences of pattern with repl

Returns:

Strings with pattern matches replaced

Return type:

Strings

Raises:
  • TypeError – Raised if pattern or repl are not bytes or str_scalars

  • ValueError – Raised if pattern is not a valid regex

  • RuntimeError – Raised if there is a server-side error thrown

See also

Strings.subn

Examples

>>> import arkouda as ak
>>> strings = ak.array(['1_2___', '____', '3', '__4___5____6___7', ''])
>>> strings.sub(pattern='_+', repl='-', count=2)
array(['1-2-', '-', '3', '-4-5____6___7', ''])
subn(pattern: bytes | arkouda.numpy.dtypes.str_scalars, repl: bytes | arkouda.numpy.dtypes.str_scalars, count: int = 0) Tuple[Strings, arkouda.numpy.pdarrayclass.pdarray][source]

Perform the same operation as sub(), but return a tuple (new_Strings, number_of_substitions).

Parameters:
  • pattern (bytes or str_scalars) – The regex to substitue

  • repl (bytes or str_scalars) – The substring to replace pattern matches with

  • count (int, default=0) – The max number of pattern match occurences in each element to replace. The default count=0 replaces all occurences of pattern with repl

Returns:

Strings

Strings with pattern matches replaced

pdarray, int64

The number of substitutions made for each element of Strings

Return type:

Tuple[Strings, pdarray]

Raises:
  • TypeError – Raised if pattern or repl are not bytes or str_scalars

  • ValueError – Raised if pattern is not a valid regex

  • RuntimeError – Raised if there is a server-side error thrown

See also

Strings.sub

Examples

>>> import arkouda as ak
>>> strings = ak.array(['1_2___', '____', '3', '__4___5____6___7', ''])
>>> strings.subn(pattern='_+', repl='-', count=2)
(array(['1-2-', '-', '3', '-4-5____6___7', '']), array([2 1 0 2 0]))
take(indices: arkouda.numpy.dtypes.numeric_scalars | arkouda.numpy.pdarrayclass.pdarray, axis: int | None = None) Strings[source]

Take elements from the array along an axis.

When axis is not None, this function does the same thing as “fancy” indexing (indexing arrays using arrays); however, it can be easier to use if you need elements along a given axis. A call such as np.take(arr, indices, axis=3) is equivalent to arr[:,:,:,indices,...].

Parameters:
  • indices (numeric_scalars or pdarray) – The indices of the values to extract. Also allow scalars for indices.

  • axis (int, optional) – The axis over which to select values. By default, the flattened input array is used.

Returns:

A Strings containing the selected elements.

Return type:

Strings

Examples

>>> import arkouda as ak
>>> a = ak.array(["a","b","c"])
>>> indices = [0, 1]
>>> a.take(indices)
array(['a', 'b'])
title() Strings[source]

Return a new Strings from the original replaced with their titlecase equivalent.

Returns:

Strings from the original replaced with their titlecase equivalent.

Return type:

Strings

Raises:

RuntimeError – Raised if there is a server-side error thrown.

See also

Strings.lower, String.upper

Examples

>>> import arkouda as ak
>>> strings = ak.array([f'StrINgS {i}' for i in range(5)])
>>> strings
array(['StrINgS 0', 'StrINgS 1', 'StrINgS 2', 'StrINgS 3', 'StrINgS 4'])
>>> strings.title()
array(['Strings 0', 'Strings 1', 'Strings 2', 'Strings 3', 'Strings 4'])
to_csv(prefix_path: str, dataset: str = 'strings_array', col_delim: str = ',', overwrite: bool = False) str[source]

Write Strings to CSV file(s). File will contain a single column with the Strings data. All CSV Files written by Arkouda include a header denoting data types of the columns. Unlike other file formats, CSV files store Strings as their UTF-8 format instead of storing bytes as uint(8).

Parameters:
  • prefix_path (str) – The filename prefix to be used for saving files. Files will have _LOCALE#### appended when they are written to disk.

  • dataset (str, default="strings_array") – Column name to save the Strings under. Defaults to “strings_array”.

  • col_delim (str, default=",") – Defaults to “,”. Value to be used to separate columns within the file. Please be sure that the value used DOES NOT appear in your dataset.

  • overwrite (bool, default=False) – Defaults to False. If True, any existing files matching your provided prefix_path will be overwritten. If False, an error will be returned if existing files are found.

Returns:

response message

Return type:

str

Raises:
  • ValueError – Raised if all datasets are not present in all parquet files or if one or more of the specified files do not exist

  • RuntimeError – Raised if one or more of the specified files cannot be opened. If allow_errors is true this may be raised if no values are returned from the server.

  • TypeError – Raised if we receive an unknown arkouda_type returned from the server

Notes

  • CSV format is not currently supported by load/load_all operations

  • The column delimiter is expected to be the same for column names and data

  • Be sure that column delimiters are not found within your data.

  • All CSV files must delimit rows using newline (\\n) at this time.

to_hdf(prefix_path: str, dataset: str = 'strings_array', mode: Literal['truncate', 'append'] = 'truncate', save_offsets: bool = True, file_type: Literal['single', 'distribute'] = 'distribute') str[source]

Save the Strings object to HDF5. The object can be saved to a collection of files or single file.

Parameters:
  • prefix_path (str) – Directory and filename prefix that all output files share

  • dataset (str, default="strings_array") – The name of the Strings dataset to be written, defaults to strings_array

  • mode ({"truncate", "append"}, default = "truncate") – By default, truncate (overwrite) output files, if they exist. If ‘append’, create a new Strings dataset within existing files.

  • save_offsets (bool, default=True) – Defaults to True which will instruct the server to save the offsets array to HDF5 If False the offsets array will not be save and will be derived from the string values upon load/read.

  • file_type ({"single", "distribute"}, default = "distribute") – Default: Distribute Distribute the dataset over a file per locale. Single file will save the dataset to one file

Returns:

String message indicating result of save operation

Return type:

str

Raises:

RuntimeError – Raised if a server-side error is thrown saving the pdarray

Notes

  • Parquet files do not store the segments, only the values.

  • Strings state is saved as two datasets within an hdf5 group: one for the string characters and one for the segments corresponding to the start of each string

  • the hdf5 group is named via the dataset parameter.

  • The prefix_path must be visible to the arkouda server and the user must have write permission.

  • Output files have names of the form <prefix_path>_LOCALE<i>, where <i> ranges from 0 to numLocales for file_type=’distribute’. Otherwise, the file name will be prefix_path.

  • If any of the output files already exist and the mode is ‘truncate’, they will be overwritten. If the mode is ‘append’ and the number of output files is less than the number of locales or a dataset with the same name already exists, a RuntimeError will result.

  • Any file extension can be used.The file I/O does not rely on the extension to determine the file format.

See also

to_hdf

to_ndarray() numpy.ndarray[source]

Convert the array to a np.ndarray, transferring array data from the arkouda server to Python. If the array exceeds a built-in size limit, a RuntimeError is raised.

Returns:

A numpy ndarray with the same strings as this array

Return type:

np.ndarray

Notes

The number of bytes in the array cannot exceed ak.core.client.maxTransferBytes, otherwise a RuntimeError will be raised. This is to protect the user from overflowing the memory of the system on which the Python client is running, under the assumption that the server is running on a distributed system with much more memory than the client. The user may override this limit by setting ak.core.client.maxTransferBytes to a larger value, but proceed with caution.

See also

array, tolist

Examples

>>> import arkouda as ak
>>> a = ak.array(["hello", "my", "world"])
>>> a.to_ndarray()
array(['hello', 'my', 'world'], dtype='<U5')
>>> type(a.to_ndarray())
<class 'numpy.ndarray'>
to_parquet(prefix_path: str, dataset: str = 'strings_array', mode: Literal['truncate', 'append'] = 'truncate', compression: Literal['snappy', 'gzip', 'brotli', 'zstd', 'lz4'] | None = None) str[source]

Save the Strings object to Parquet. The result is a collection of files, one file per locale of the arkouda server, where each filename starts with prefix_path. Each locale saves its chunk of the array to its corresponding file.

Parameters:
  • prefix_path (str) – Directory and filename prefix that all output files share

  • dataset (str, default="strings_array") – Name of the dataset to create in files (must not already exist)

  • mode ({"truncate", "append"}, default = "truncate") – By default, truncate (overwrite) output files, if they exist. If ‘append’, attempt to create new dataset in existing files.

  • compression ({"snappy", "gzip", "brotli", "zstd", "lz4"}, optional) – Sets the compression type used with Parquet files

Returns:

string message indicating result of save operation

Return type:

str

Raises:

RuntimeError – Raised if a server-side error is thrown saving the pdarray

Notes

  • The prefix_path must be visible to the arkouda server and the user must

have write permission. - Output files have names of the form <prefix_path>_LOCALE<i>, where <i> ranges from 0 to numLocales for file_type=’distribute’. - ‘append’ write mode is supported, but is not efficient. - If any of the output files already exist and the mode is ‘truncate’, they will be overwritten. If the mode is ‘append’ and the number of output files is less than the number of locales or a dataset with the same name already exists, a RuntimeError will result. - Any file extension can be used.The file I/O does not rely on the extension to determine the file format.

tolist() List[str][source]

Convert the SegString to a list, transferring data from the arkouda server to Python. If the SegString exceeds a built-in size limit, a RuntimeError is raised.

Returns:

A list with the same strings as this SegString

Return type:

List[str]

Notes

The number of bytes in the array cannot exceed ak.core.client.maxTransferBytes, otherwise a RuntimeError will be raised. This is to protect the user from overflowing the memory of the system on which the Python client is running, under the assumption that the server is running on a distributed system with much more memory than the client. The user may override this limit by setting ak.core.client.maxTransferBytes to a larger value, but proceed with caution.

See also

to_ndarray

Examples

>>> import arkouda as ak
>>> a = ak.array(["hello", "my", "world"])
>>> a.tolist()
['hello', 'my', 'world']
>>> type(a.tolist())
<class 'list'>
transfer(hostname: str, port: arkouda.numpy.dtypes.int_scalars) str | memoryview[source]

Send a Strings object to a different Arkouda server.

Parameters:
  • hostname (str) – The hostname where the Arkouda server intended to receive the Strings object is running.

  • port (int_scalars) – The port to send the array over. This needs to be an open port (i.e., not one that the Arkouda server is running on). This will open up numLocales ports, each of which in succession, so will use ports of the range {port..(port+numLocales)} (e.g., running an Arkouda server of 4 nodes, port 1234 is passed as port, Arkouda will use ports 1234, 1235, 1236, and 1237 to send the array data). This port much match the port passed to the call to ak.receive_array().

Returns:

A message indicating a complete transfer

Return type:

str

Raises:
  • ValueError – Raised if the op is not within the pdarray.BinOps set

  • TypeError – Raised if other is not a pdarray or the pdarray.dtype is not a supported dtype

unregister() None[source]

Unregister a Strings object in the arkouda server which was previously registered using register() and/or attached to using attach().

Raises:

RuntimeError – Raised if the server could not find the internal name/symbol to remove

See also

register, attach

Notes

Registered names/Strings objects in the server are immune to deletion until they are unregistered.

update_hdf(prefix_path: str, dataset: str = 'strings_array', save_offsets: bool = True, repack: bool = True) str[source]

Overwrite the dataset with the name provided with this Strings object.

If the dataset does not exist it is added.

Parameters:
  • prefix_path (str) – Directory and filename prefix that all output files share

  • dataset (str, default="strings_array") – Name of the dataset to create in files

  • save_offsets (bool, default=True) – Defaults to True which will instruct the server to save the offsets array to HDF5 If False the offsets array will not be save and will be derived from the string values upon load/read.

  • repack (bool, default=True) – Default: True HDF5 does not release memory on delete. When True, the inaccessible data (that was overwritten) is removed. When False, the data remains, but is inaccessible. Setting to false will yield better performance, but will cause file sizes to expand.

Returns:

success message if successful

Return type:

str

Raises:

RuntimeError – Raised if a server-side error is thrown saving the Strings object

Notes

  • If file does not contain File_Format attribute to indicate how it was saved, the file name is checked for _LOCALE#### to determine if it is distributed.

  • If the dataset provided does not exist, it will be added

upper() Strings[source]

Return a new Strings with all lowercase characters from the original replaced with their uppercase equivalent.

Returns:

Strings with all lowercase characters from the original replaced with their uppercase equivalent

Return type:

Strings

Raises:

RuntimeError – Raised if there is a server-side error thrown

See also

Strings.lower

Examples

>>> import arkouda as ak
>>> strings = ak.array([f'StrINgS {i}' for i in range(5)])
>>> strings
array(['StrINgS 0', 'StrINgS 1', 'StrINgS 2', 'StrINgS 3', 'StrINgS 4'])
>>> strings.upper()
array(['STRINGS 0', 'STRINGS 1', 'STRINGS 2', 'STRINGS 3', 'STRINGS 4'])