arkouda.numpy.strings ===================== .. py:module:: arkouda.numpy.strings Classes ------- .. autoapisummary:: arkouda.numpy.strings.Strings Module Contents --------------- .. py:class:: Strings(strings_pdarray: arkouda.numpy.pdarrayclass.pdarray, bytes_size: arkouda.numpy.dtypes.int_scalars) Represents an array of strings whose data resides on the arkouda server. The user should not call this class directly; rather its instances are created by other arkouda functions. .. attribute:: entry Encapsulation of a Segmented Strings array contained on the arkouda server. This is a composite of - offsets array: starting indices for each string - bytes array: raw bytes of all strings joined by nulls :type: pdarray .. attribute:: size The number of strings in the array :type: int_scalars .. attribute:: nbytes The total number of bytes in all strings :type: int_scalars .. attribute:: ndim The rank of the array (currently only rank 1 arrays supported) :type: int_scalars .. attribute:: shape The sizes of each dimension of the array :type: tuple .. attribute:: dtype The dtype is ak.str_ :type: type .. attribute:: logger Used for all logging operations :type: ArkoudaLogger .. rubric:: Notes Strings is composed of two pdarrays: (1) offsets, which contains the starting indices for each string and (2) bytes, which contains the raw bytes of all strings, delimited by nulls. .. py:attribute:: BinOps .. py:method:: argsort(algorithm: arkouda.numpy.sorting.SortingAlgorithm = SortingAlgorithm.RadixSortLSD, ascending: bool = True) -> arkouda.numpy.pdarrayclass.pdarray Return the permutation that sorts the Strings. :param algorithm: The algorithm to use for sorting. :type algorithm: SortingAlgorithm, default SortingAlgorithm.RadixSortLSD :param ascending: Whether to sort in ascending order. :type ascending: bool, default True :returns: The indices that sort the Strings. :rtype: pdarray .. py:method:: astype(dtype: Union[numpy.dtype, str]) -> Union[arkouda.numpy.pdarrayclass.pdarray, Strings] Cast values of Strings object to provided dtype. :param dtype: Dtype to cast to :type dtype: np.dtype or str :returns: An arkouda pdarray with values converted to the specified data type :rtype: pdarray .. rubric:: Notes This is essentially shorthand for ak.cast(x, '') where x is a pdarray. .. py:method:: cached_regex_patterns() -> List Returns the regex patterns for which Match objects have been cached. .. py:method:: capitalize() -> Strings Return a new Strings from the original replaced with the first letter capitilzed and the remaining letters lowercase. :returns: Strings from the original replaced with the capitalized equivalent. :rtype: Strings :raises RuntimeError: Raised if there is a server-side error thrown. .. seealso:: :py:obj:`Strings.lower`, :py:obj:`String.upper`, :py:obj:`String.title` .. rubric:: Examples >>> import arkouda as ak >>> strings = ak.array([f'StrINgS aRe Here {i}' for i in range(5)]) >>> strings array(['StrINgS aRe Here 0', 'StrINgS aRe Here 1', 'StrINgS aRe Here 2', 'StrINgS aRe Here 3', 'StrINgS aRe Here 4']) >>> strings.title() array(['Strings Are Here 0', 'Strings Are Here 1', 'Strings Are Here 2', 'Strings Are Here 3', 'Strings Are Here 4']) .. py:method:: concatenate_uniquely(strings: List[Strings]) -> Strings :staticmethod: Concatenates a list of Strings into a single Strings object containing only unique strings. Order may not be preserved. :param strings: List of segmented string objects to concatenate. :type strings: List[Strings] :returns: A new Strings object containing the unique values. :rtype: Strings .. py:method:: contains(substr: Union[bytes, arkouda.numpy.dtypes.str_scalars], regex: bool = False) -> arkouda.numpy.pdarrayclass.pdarray Check whether each element contains the given substring. :param substr: The substring in the form of string or byte array to search for :type substr: bytes or str_scalars :param regex: Indicates whether substr is a regular expression Note: only handles regular expressions supported by re2 (does not support lookaheads/lookbehinds) :type regex: bool, default=False :returns: True for elements that contain substr, False otherwise :rtype: pdarray :raises TypeError: Raised if the substr parameter is not bytes or str_scalars :raises ValueError: Rasied if substr is not a valid regex :raises RuntimeError: Raised if there is a server-side error thrown .. seealso:: :py:obj:`Strings.startswith`, :py:obj:`Strings.endswith` .. rubric:: Examples >>> import arkouda as ak >>> strings = ak.array([f'{i} string {i}' for i in range(1, 6)]) >>> strings array(['1 string 1', '2 string 2', '3 string 3', '4 string 4', '5 string 5']) >>> strings.contains('string') array([True True True True True]) >>> strings.contains('string \\d', regex=True) array([True True True True True]) .. py:method:: copy() -> Strings Return a deep copy of the Strings object. :returns: A deep copy of the Strings. :rtype: Strings .. py:method:: decode(fromEncoding: str, toEncoding: str = 'UTF-8') -> Strings Return a new strings object in `fromEncoding`, expecting that the current Strings is encoded in `toEncoding`. :param fromEncoding: The current encoding of the strings object :type fromEncoding: str :param toEncoding: The encoding that the strings will be converted to, default to UTF-8 :type toEncoding: str, default="UTF-8" :returns: A new Strings object in `toEncoding` :rtype: Strings :raises RuntimeError: Raised if there is a server-side error thrown .. py:property:: dtype :type: numpy.dtype Return the dtype object of the underlying data. .. py:method:: encode(toEncoding: str, fromEncoding: str = 'UTF-8') -> Strings Return a new strings object in `toEncoding`, expecting that the current Strings is encoded in `fromEncoding`. :param toEncoding: The encoding that the strings will be converted to :type toEncoding: str :param fromEncoding: The current encoding of the strings object, default to UTF-8 :type fromEncoding: str, default="UTF-8" :returns: A new Strings object in `toEncoding` :rtype: Strings :raises RuntimeError: Raised if there is a server-side error thrown .. py:method:: endswith(substr: Union[bytes, arkouda.numpy.dtypes.str_scalars], regex: bool = False) -> arkouda.numpy.pdarrayclass.pdarray Check whether each element ends with the given substring. :param substr: The suffix to search for :type substr: bytes or str_scalars :param regex: Indicates whether substr is a regular expression Note: only handles regular expressions supported by re2 (does not support lookaheads/lookbehinds) :type regex: bool, default=False :returns: True for elements that end with substr, False otherwise :rtype: pdarray :raises TypeError: Raised if the substr parameter is not bytes or str_scalars :raises ValueError: Rasied if substr is not a valid regex :raises RuntimeError: Raised if there is a server-side error thrown .. seealso:: :py:obj:`Strings.contains`, :py:obj:`Strings.startswith` .. rubric:: Examples >>> import arkouda as ak >>> strings_start = ak.array([f'{i} string' for i in range(1,6)]) >>> strings_start array(['1 string', '2 string', '3 string', '4 string', '5 string']) >>> strings_start.endswith('ing') array([True True True True True]) >>> strings_end = ak.array([f'string {i}' for i in range(1, 6)]) >>> strings_end array(['string 1', 'string 2', 'string 3', 'string 4', 'string 5']) >>> strings_end.endswith('ing \\d', regex = True) array([True True True True True]) .. py:attribute:: entry :type: arkouda.numpy.pdarrayclass.pdarray .. py:method:: equals(other) -> arkouda.numpy.dtypes.bool_scalars Whether Strings are the same size and all entries are equal. :param other: object to compare. :type other: Any :returns: True if the Strings are the same, o.w. False. :rtype: bool_scalars .. rubric:: Examples >>> import arkouda as ak >>> s = ak.array(["a", "b", "c"]) >>> s_cpy = ak.array(["a", "b", "c"]) >>> s.equals(s_cpy) np.True_ >>> s2 = ak.array(["a", "x", "c"]) >>> s.equals(s2) np.False_ .. py:method:: find_locations(pattern: Union[bytes, arkouda.numpy.dtypes.str_scalars]) -> Tuple[arkouda.numpy.pdarrayclass.pdarray, arkouda.numpy.pdarrayclass.pdarray, arkouda.numpy.pdarrayclass.pdarray] Finds pattern matches and returns pdarrays containing the number, start postitions, and lengths of matches. :param pattern: The regex pattern used to find matches :type pattern: bytes or str_scalars :returns: pdarray, int64 For each original string, the number of pattern matches pdarray, int64 The start positons of pattern matches pdarray, int64 The lengths of pattern matches :rtype: Tuple[pdarray, pdarray, pdarray] :raises TypeError: Raised if the pattern parameter is not bytes or str_scalars :raises ValueError: Raised if pattern is not a valid regex :raises RuntimeError: Raised if there is a server-side error thrown .. seealso:: :py:obj:`Strings.findall`, :py:obj:`Strings.match` .. rubric:: Examples >>> import arkouda as ak >>> strings = ak.array([f'{i} string {i}' for i in range(1, 6)]) >>> num_matches, starts, lens = strings.find_locations('\\d') >>> num_matches array([2 2 2 2 2]) >>> starts array([0 9 0 9 0 9 0 9 0 9]) >>> lens array([1 1 1 1 1 1 1 1 1 1]) .. py:method:: findall(pattern: Union[bytes, arkouda.numpy.dtypes.str_scalars], return_match_origins: bool = False) -> Union[Strings, Tuple] Return a new Strings containg all non-overlapping matches of pattern. :param pattern: Regex used to find matches :type pattern: bytes or str_scalars :param return_match_origins: If True, return a pdarray containing the index of the original string each pattern match is from :type return_match_origins: bool, default=False :returns: Strings Strings object containing only pattern matches pdarray, int64 (optional) The index of the original string each pattern match is from :rtype: Union[Strings, Tuple] :raises TypeError: Raised if the pattern parameter is not bytes or str_scalars :raises ValueError: Raised if pattern is not a valid regex :raises RuntimeError: Raised if there is a server-side error thrown .. seealso:: :py:obj:`Strings.find_locations` .. rubric:: Examples >>> import arkouda as ak >>> strings = ak.array(['1_2___', '____', '3', '__4___5____6___7', '']) >>> strings.findall('_+', return_match_origins=True) (array(['_', '___', '____', '__', '___', '____', '___']), array([0 0 1 3 3 3 3])) .. py:method:: flatten() -> Strings Return a copy of the array collapsed into one dimension. :rtype: A copy of the input array, flattened to one dimension. .. note:: As multidimensional Strings are currently supported, flatten on a Strings object will always return itself. .. py:method:: from_parts(offset_attrib: Union[arkouda.numpy.pdarrayclass.pdarray, str], bytes_attrib: Union[arkouda.numpy.pdarrayclass.pdarray, str]) -> Strings :staticmethod: Assemble a Strings object from separate offset and bytes arrays. This factory method constructs a segmented `Strings` array by sending two separate components—offsets and values—to the Arkouda server and instructing it to assemble them into a single `Strings` object. Use this when offsets and byte data are created or transported independently. :param offset_attrib: The array of starting positions for each string, or a string expression that can be passed to `create_pdarray` to build it. :type offset_attrib: pdarray or str :param bytes_attrib: The array of raw byte values (e.g., uint8 character codes), or a string expression that can be passed to `create_pdarray` to build it. :type bytes_attrib: pdarray or str :returns: A `Strings` object representing the assembled segmented strings array on the Arkouda server. :rtype: Strings :raises RuntimeError: If conversion of `offset_attrib` or `bytes_attrib` to `pdarray` fails, or if the server is unable to assemble the parts into a `Strings`. .. rubric:: Notes - Both inputs can be existing `pdarray` instances or arguments suitable for `create_pdarray`. - Internally uses the `CMD_ASSEMBLE` command to merge offsets and values. .. py:method:: from_return_msg(rep_msg: str) -> Strings :staticmethod: Create a Strings object from an Arkouda server response message. Parse the server’s response descriptor and construct a `Strings` array with its underlying pdarray and total byte size. :param rep_msg: Server response message of the form: ``` created +... bytes.size ``` For example: ``` "created foo Strings 3 1 (3,) 8+created bytes.size 24" ``` :type rep_msg: str :returns: A `Strings` object representing the segmented strings array on the server, initialized with the returned pdarray and byte-size metadata. :rtype: Strings :raises RuntimeError: If the response message cannot be parsed or does not match the expected format. .. rubric:: Examples >>> import arkouda as ak # Example response message (typically from `generic_msg`) >>> rep_msg = "created foo Strings 3 1 (3,) 8+created bytes.size 24" >>> s = ak.Strings.from_return_msg(rep_msg) >>> isinstance(s, ak.Strings) True .. py:method:: fullmatch(pattern: Union[bytes, arkouda.numpy.dtypes.str_scalars]) -> arkouda.pandas.match.Match Return a match object where elements match only if the whole string matches the regular expression pattern. :param pattern: Regex used to find matches :type pattern: bytes or str_scalars :returns: Match object where elements match only if the whole string matches the regular expression pattern :rtype: Match .. rubric:: Examples >>> import arkouda as ak >>> strings = ak.array(['1_2___', '____', '3', '__4___5____6___7', '']) >>> strings.fullmatch('_+') .. py:method:: get_bytes() -> arkouda.numpy.pdarrayclass.pdarray Getter for the bytes component (uint8 pdarray) of this Strings. :returns: Pdarray of bytes of the string accessed :rtype: pdarray .. rubric:: Example >>> import arkouda as ak >>> x = ak.array(['one', 'two', 'three']) >>> x.get_bytes() array([111 110 101 0 116 119 111 0 116 104 114 101 101 0]) .. py:method:: get_lengths() -> arkouda.numpy.pdarrayclass.pdarray Return the length of each string in the array. :returns: The length of each string :rtype: pdarray :raises RuntimeError: Raised if there is a server-side error thrown .. py:method:: get_offsets() -> arkouda.numpy.pdarrayclass.pdarray Getter for the offsets component (int64 pdarray) of this Strings. :returns: Pdarray of offsets of the string accessed :rtype: pdarray .. rubric:: Example >>> import arkouda as ak >>> x = ak.array(['one', 'two', 'three']) >>> x.get_offsets() array([0 4 8]) .. py:method:: get_prefixes(n: arkouda.numpy.dtypes.int_scalars, return_origins: bool = True, proper: bool = True) -> Union[Strings, Tuple[Strings, arkouda.numpy.pdarrayclass.pdarray]] Return the n-long prefix of each string, where possible. :param n: Length of prefix :type n: int_scalars :param return_origins: If True, return a logical index indicating which strings were long enough to return an n-prefix :type return_origins: bool, default=True :param proper: If True, only return proper prefixes, i.e. from strings that are at least n+1 long. If False, allow the entire string to be returned as a prefix. :type proper: bool, default=True :returns: prefixes : Strings The array of n-character prefixes; the number of elements is the number of True values in the returned mask. origin_indices : pdarray, bool Boolean array that is True where the string was long enough to return an n-character prefix, False otherwise. :rtype: Union[Strings, Tuple[Strings, pdarray]] .. py:method:: get_suffixes(n: arkouda.numpy.dtypes.int_scalars, return_origins: bool = True, proper: bool = True) -> Union[Strings, Tuple[Strings, arkouda.numpy.pdarrayclass.pdarray]] Return the n-long suffix of each string, where possible. :param n: Length of suffix :type n: int_scalars :param return_origins: If True, return a logical index indicating which strings were long enough to return an n-suffix :type return_origins: bool, default=True :param proper: If True, only return proper suffixes, i.e. from strings that are at least n+1 long. If False, allow the entire string to be returned as a suffix. :type proper: bool, default=True :returns: suffixes : Strings The array of n-character suffixes; the number of elements is the number of True values in the returned mask. origin_indices : pdarray, bool Boolean array that is True where the string was long enough to return an n-character suffix, False otherwise. :rtype: Union[Strings, Tuple[Strings, pdarray]] .. py:method:: group() -> arkouda.numpy.pdarrayclass.pdarray Return the permutation that groups the array, placing equivalent strings together. All instances of the same string are guaranteed to lie in one contiguous block of the permuted array, but the blocks are not necessarily ordered. :returns: The permutation that groups the array by value :rtype: pdarray .. seealso:: :py:obj:`GroupBy`, :py:obj:`unique` .. rubric:: Notes If the arkouda server is compiled with "-sSegmentedString.useHash=true", then arkouda uses 128-bit hash values to group strings, rather than sorting the strings directly. This method is fast, but the resulting permutation merely groups equivalent strings and does not sort them. If the "useHash" parameter is false, then a full sort is performed. :raises RuntimeError: Raised if there is a server-side error in executing group request or creating the pdarray encapsulating the return message .. py:method:: hash() -> Tuple[arkouda.numpy.pdarrayclass.pdarray, arkouda.numpy.pdarrayclass.pdarray] Compute a 128-bit hash of each string. :returns: A tuple of two int64 pdarrays. The ith hash value is the concatenation of the ith values from each array. :rtype: Tuple[pdarray,pdarray] .. rubric:: Notes The implementation uses SipHash128, a fast and balanced hash function (used by Python for dictionaries and sets). For realistic numbers of strings (up to about 10**15), the probability of a collision between two 128-bit hash values is negligible. .. py:property:: inferred_type :type: str Return a string of the type inferred from the values. .. py:method:: info() -> str Return a JSON formatted string containing information about all components of self. :returns: JSON string containing information about all components of self :rtype: str .. py:method:: is_registered() -> numpy.bool_ Return True iff the object is contained in the registry. :returns: Indicates if the object is contained in the registry :rtype: bool :raises RuntimeError: Raised if there's a server-side error thrown .. py:method:: isalnum() -> arkouda.numpy.pdarrayclass.pdarray Return a boolean pdarray where index i indicates whether string i of the Strings is alphanumeric. :returns: True for elements that are alphanumeric, False otherwise :rtype: pdarray :raises RuntimeError: Raised if there is a server-side error thrown .. seealso:: :py:obj:`Strings.islower`, :py:obj:`Strings.isupper`, :py:obj:`Strings.istitle` .. rubric:: Examples >>> import arkouda as ak >>> not_alnum = ak.array([f'%Strings {i}' for i in range(3)]) >>> alnum = ak.array([f'Strings{i}' for i in range(3)]) >>> strings = ak.concatenate([not_alnum, alnum]) >>> strings array(['%Strings 0', '%Strings 1', '%Strings 2', 'Strings0', 'Strings1', 'Strings2']) >>> strings.isalnum() array([False False False True True True]) .. py:method:: isalpha() -> arkouda.numpy.pdarrayclass.pdarray Return a boolean pdarray where index i indicates whether string i of the Strings is alphabetic. This means there is at least one character, and all the characters are alphabetic. :returns: True for elements that are alphabetic, False otherwise :rtype: pdarray :raises RuntimeError: Raised if there is a server-side error thrown .. seealso:: :py:obj:`Strings.islower`, :py:obj:`Strings.isupper`, :py:obj:`Strings.istitle`, :py:obj:`Strings.isalnum` .. rubric:: Examples >>> import arkouda as ak >>> not_alpha = ak.array([f'%Strings {i}' for i in range(3)]) >>> alpha = ak.array(['StringA','StringB','StringC']) >>> strings = ak.concatenate([not_alpha, alpha]) >>> strings array(['%Strings 0', '%Strings 1', '%Strings 2', 'StringA', 'StringB', 'StringC']) >>> strings.isalpha() array([False False False True True True]) .. py:method:: isdecimal() -> arkouda.numpy.pdarrayclass.pdarray Return a boolean pdarray where index i indicates whether string i of the Strings has all decimal characters. :returns: True for elements that are decimals, False otherwise :rtype: pdarray :raises RuntimeError: Raised if there is a server-side error thrown .. seealso:: :py:obj:`Strings.isdigit` .. rubric:: Examples >>> import arkouda as ak >>> not_decimal = ak.array([f'Strings {i}' for i in range(3)]) >>> decimal = ak.array([f'12{i}' for i in range(3)]) >>> strings = ak.concatenate([not_decimal, decimal]) >>> strings array(['Strings 0', 'Strings 1', 'Strings 2', '120', '121', '122']) >>> strings.isdecimal() array([False False False True True True]) Special Character Examples >>> special_strings = ak.array(["3.14", "0", "²", "2³₇", "2³x₇"]) >>> special_strings array(['3.14', '0', '²', '2³₇', '2³x₇']) >>> special_strings.isdecimal() array([False True False False False]) .. py:method:: isdigit() -> arkouda.numpy.pdarrayclass.pdarray Return a boolean pdarray where index i indicates whether string i of the Strings has all digit characters. :returns: True for elements that are digits, False otherwise :rtype: pdarray :raises RuntimeError: Raised if there is a server-side error thrown .. seealso:: :py:obj:`Strings.islower`, :py:obj:`Strings.isupper`, :py:obj:`Strings.istitle` .. rubric:: Examples >>> import arkouda as ak >>> not_digit = ak.array([f'Strings {i}' for i in range(3)]) >>> digit = ak.array([f'12{i}' for i in range(3)]) >>> strings = ak.concatenate([not_digit, digit]) >>> strings array(['Strings 0', 'Strings 1', 'Strings 2', '120', '121', '122']) >>> strings.isdigit() array([False False False True True True]) Special Character Examples >>> special_strings = ak.array(["3.14", "0", "²", "2³₇", "2³x₇"]) >>> special_strings array(['3.14', '0', '²', '2³₇', '2³x₇']) >>> special_strings.isdigit() array([False True True True False]) .. py:method:: isempty() -> arkouda.numpy.pdarrayclass.pdarray Return a boolean pdarray where index i indicates whether string i of the Strings is empty. True for elements that are the empty string, False otherwise :returns: True for elements that are digits, False otherwise :rtype: pdarray :raises RuntimeError: Raised if there is a server-side error thrown .. seealso:: :py:obj:`Strings.islower`, :py:obj:`Strings.isupper`, :py:obj:`Strings.istitle` .. rubric:: Examples >>> import arkouda as ak >>> not_empty = ak.array([f'Strings {i}' for i in range(3)]) >>> empty = ak.array(['' for i in range(3)]) >>> strings = ak.concatenate([not_empty, empty]) >>> strings array(['Strings 0', 'Strings 1', 'Strings 2', '', '', '']) >>> strings.isempty() array([False False False True True True]) .. py:method:: islower() -> arkouda.numpy.pdarrayclass.pdarray Return a boolean pdarray where index i indicates whether string i of the Strings is entirely lowercase. :returns: True for elements that are entirely lowercase, False otherwise :rtype: pdarray :raises RuntimeError: Raised if there is a server-side error thrown .. seealso:: :py:obj:`Strings.isupper` .. rubric:: Examples >>> import arkouda as ak >>> lower = ak.array([f'strings {i}' for i in range(3)]) >>> upper = ak.array([f'STRINGS {i}' for i in range(3)]) >>> strings = ak.concatenate([lower, upper]) >>> strings array(['strings 0', 'strings 1', 'strings 2', 'STRINGS 0', 'STRINGS 1', 'STRINGS 2']) >>> strings.islower() array([True True True False False False]) .. py:method:: isnumeric() -> arkouda.numpy.pdarrayclass.pdarray Return a boolean pdarray where index i indicates whether string i of the Strings has all numeric characters. There are 1922 unicode characters that qualify as numeric, including the digits 0 through 9, superscripts and subscripted digits, special characters with the digits encircled or enclosed in parens, "vulgar fractions," and more. :returns: True for elements that are numerics, False otherwise :rtype: pdarray :raises RuntimeError: Raised if there is a server-side error thrown .. seealso:: :py:obj:`Strings.isdecimal` .. rubric:: Examples >>> import arkouda as ak >>> not_numeric = ak.array([f'Strings {i}' for i in range(3)]) >>> numeric = ak.array([f'12{i}' for i in range(3)]) >>> strings = ak.concatenate([not_numeric, numeric]) >>> strings array(['Strings 0', 'Strings 1', 'Strings 2', '120', '121', '122']) >>> strings.isnumeric() array([False False False True True True]) Special Character Examples >>> special_strings = ak.array(["3.14", "0", "²", "2³₇", "2³x₇"]) >>> special_strings array(['3.14', '0', '²', '2³₇', '2³x₇']) >>> special_strings.isnumeric() array([False True True True False]) .. py:method:: isspace() -> arkouda.numpy.pdarrayclass.pdarray Return a boolean pdarray where index i indicates whether string i has all whitespace characters (‘ ’, ‘ ’, ‘ ’, ‘ ’, ‘ ’, ‘ ’). Returns ------- pdarray True for elements that are whitespace, False otherwise Raises ------ RuntimeError Raised if there is a server-side error thrown See Also -------- Strings.islower Strings.isupper Strings.istitle Examples -------- >>> import arkouda as ak >>> not_space = ak.array([f'Strings {i}' for i in range(3)]) >>> space = ak.array([' ', '\t', '\n', '\v', '\f', '\r', ' \t\n\v\f\r']) >>> strings = ak.concatenate([not_space, space]) >>> strings array(['Strings 0', 'Strings 1', 'Strings 2', ' ', 'u0009', 'n', 'u000B', 'u000C', 'u000D', ' u0009nu000Bu000Cu000D']) >>> strings.isspace() array([False False False True True True True True True True]) .. py:method:: istitle() -> arkouda.numpy.pdarrayclass.pdarray Return a boolean pdarray where index i indicates whether string i of the Strings is titlecase. :returns: True for elements that are titlecase, False otherwise :rtype: pdarray :raises RuntimeError: Raised if there is a server-side error thrown .. seealso:: :py:obj:`Strings.islower`, :py:obj:`Strings.isupper` .. rubric:: Examples >>> import arkouda as ak >>> mixed = ak.array([f'sTrINgs {i}' for i in range(3)]) >>> title = ak.array([f'Strings {i}' for i in range(3)]) >>> strings = ak.concatenate([mixed, title]) >>> strings array(['sTrINgs 0', 'sTrINgs 1', 'sTrINgs 2', 'Strings 0', 'Strings 1', 'Strings 2']) >>> strings.istitle() array([False False False True True True]) .. py:method:: isupper() -> arkouda.numpy.pdarrayclass.pdarray Return a boolean pdarray where index i indicates whether string i of the Strings is entirely uppercase. :returns: True for elements that are entirely uppercase, False otherwise :rtype: pdarray :raises RuntimeError: Raised if there is a server-side error thrown .. seealso:: :py:obj:`Strings.islower` .. rubric:: Examples >>> import arkouda as ak >>> lower = ak.array([f'strings {i}' for i in range(3)]) >>> upper = ak.array([f'STRINGS {i}' for i in range(3)]) >>> strings = ak.concatenate([lower, upper]) >>> strings array(['strings 0', 'strings 1', 'strings 2', 'STRINGS 0', 'STRINGS 1', 'STRINGS 2']) >>> strings.isupper() array([False False False True True True]) .. py:attribute:: logger :type: arkouda.core.logger.ArkoudaLogger .. py:method:: lower() -> Strings Return a new Strings with all uppercase characters from the original replaced with their lowercase equivalent. :returns: Strings with all uppercase characters from the original replaced with their lowercase equivalent :rtype: Strings :raises RuntimeError: Raised if there is a server-side error thrown .. seealso:: :py:obj:`Strings.upper` .. rubric:: Examples >>> import arkouda as ak >>> strings = ak.array([f'StrINgS {i}' for i in range(5)]) >>> strings array(['StrINgS 0', 'StrINgS 1', 'StrINgS 2', 'StrINgS 3', 'StrINgS 4']) >>> strings.lower() array(['strings 0', 'strings 1', 'strings 2', 'strings 3', 'strings 4']) .. py:method:: lstick(other: Strings, delimiter: Union[bytes, arkouda.numpy.dtypes.str_scalars] = '') -> Strings Join the strings from another array onto the left of the strings of this array, optionally inserting a delimiter. *Warning*: This function is experimental and not guaranteed to work. :param other: The strings to join onto self's strings :type other: Strings :param delimiter: String inserted between self and other :type delimiter: bytes or str_scalars, default="" :returns: The array of joined strings, as other + self :rtype: Strings :raises TypeError: Raised if the delimiter parameter is neither bytes nor a str or if the other parameter is not a Strings instance :raises RuntimeError: Raised if there is a server-side error thrown .. seealso:: :py:obj:`stick`, :py:obj:`peel`, :py:obj:`rpeel` .. rubric:: Examples >>> import arkouda as ak >>> s = ak.array(['a', 'c', 'e']) >>> t = ak.array(['b', 'd', 'f']) >>> s.lstick(t, delimiter='.') array(['b.a', 'd.c', 'f.e']) .. py:method:: match(pattern: Union[bytes, arkouda.numpy.dtypes.str_scalars]) -> arkouda.pandas.match.Match Return a match object where elements match only if the beginning of the string matches the regular expression pattern. :param pattern: Regex used to find matches :type pattern: bytes or str_scalars :returns: Match object where elements match only if the beginning of the string matches the regular expression pattern :rtype: Match .. rubric:: Examples >>> import arkouda as ak >>> strings = ak.array(['1_2___', '____', '3', '__4___5____6___7', '']) >>> strings.match('_+') .. py:attribute:: nbytes :type: arkouda.numpy.dtypes.int_scalars .. py:attribute:: ndim :type: arkouda.numpy.dtypes.int_scalars .. py:attribute:: objType :value: 'Strings' .. py:method:: peel(delimiter: Union[bytes, arkouda.numpy.dtypes.str_scalars], times: arkouda.numpy.dtypes.int_scalars = 1, includeDelimiter: bool = False, keepPartial: bool = False, fromRight: bool = False, regex: bool = False) -> Tuple[Strings, Strings] Peel off one or more delimited fields from each string (similar to string.partition), returning two new arrays of strings. *Warning*: This function is experimental and not guaranteed to work. :param delimiter: The separator where the split will occur :type delimiter: bytes or str_scalars :param times: The number of times the delimiter is sought, i.e. skip over the first (times-1) delimiters :type times: int_scalars, default=1 :param includeDelimiter: If true, append the delimiter to the end of the first return array. By default, it is prepended to the beginning of the second return array. :type includeDelimiter: bool, default=False :param keepPartial: If true, a string that does not contain instances of the delimiter will be returned in the first array. By default, such strings are returned in the second array. :type keepPartial: bool, default=False :param fromRight: If true, peel from the right instead of the left (see also rpeel) :type fromRight: bool, default=False :param regex: Indicates whether delimiter is a regular expression Note: only handles regular expressions supported by re2 (does not support lookaheads/lookbehinds) :type regex: bool, default=False :returns: left: Strings The field(s) peeled from the end of each string (unless fromRight is true) right: Strings The remainder of each string after peeling (unless fromRight is true) :rtype: Tuple[Strings, Strings] :raises TypeError: Raised if the delimiter parameter is not byte or str_scalars, if times is not int64, or if includeDelimiter, keepPartial, or fromRight is not bool :raises ValueError: Raised if times is < 1 or if delimiter is not a valid regex :raises RuntimeError: Raised if there is a server-side error thrown .. seealso:: :py:obj:`rpeel`, :py:obj:`stick`, :py:obj:`lstick` .. rubric:: Examples >>> import arkouda as ak >>> s = ak.array(['a.b', 'c.d', 'e.f.g']) >>> s.peel('.') (array(['a', 'c', 'e']), array(['b', 'd', 'f.g'])) >>> s.peel('.', includeDelimiter=True) (array(['a.', 'c.', 'e.']), array(['b', 'd', 'f.g'])) >>> s.peel('.', times=2) (array(['', '', 'e.f']), array(['a.b', 'c.d', 'g'])) >>> s.peel('.', times=2, keepPartial=True) (array(['a.b', 'c.d', 'e.f']), array(['', '', 'g'])) .. py:method:: pretty_print_info() -> None Print information about all components of self in a human readable format. .. py:method:: purge_cached_regex_patterns() -> None Purges cached regex patterns. .. py:method:: regex_split(pattern: Union[bytes, arkouda.numpy.dtypes.str_scalars], maxsplit: int = 0, return_segments: bool = False) -> Union[Strings, Tuple] Return a new Strings split by the occurrences of pattern. If maxsplit is nonzero, at most maxsplit splits occur. :param pattern: Regex used to split strings into substrings :type pattern: bytes or str_scalars :param maxsplit: The max number of pattern match occurences in each element to split. The default maxsplit=0 splits on all occurences :type maxsplit: int, default=0 :param return_segments: If True, return mapping of original strings to first substring in return array. :type return_segments: bool, default=False :returns: Strings Substrings with pattern matches removed pdarray, int64 (optional) For each original string, the index of first corresponding substring in the return array :rtype: Union[Strings, Tuple] .. rubric:: Examples >>> import arkouda as ak >>> strings = ak.array(['1_2___', '____', '3', '__4___5____6___7', '']) >>> strings.regex_split('_+', maxsplit=2, return_segments=True) (array(['1', '2', '', '', '', '3', '', '4', '5____6___7', '']), array([0 3 5 6 9])) .. py:method:: register(user_defined_name: str) -> Strings Register this Strings object with a user defined name in the arkouda server so it can be attached to later using Strings.attach(). This is an in-place operation, registering a Strings object more than once will update the name in the registry and remove the previously registered name. A name can only be registered to one object at a time. :param user_defined_name: user defined name which the Strings object is to be registered under :type user_defined_name: str :returns: The same Strings object which is now registered with the arkouda server and has an updated name. This is an in-place modification, the original is returned to support a fluid programming style. Please note you cannot register two different objects with the same name. :rtype: Strings :raises TypeError: Raised if user_defined_name is not a str :raises RegistrationError: If the server was unable to register the Strings object with the user_defined_name If the user is attempting to register more than one object with the same name, the former should be unregistered first to free up the registration name. .. seealso:: :py:obj:`attach`, :py:obj:`unregister` .. rubric:: Notes Registered names/Strings objects in the server are immune to deletion until they are unregistered. .. py:attribute:: registered_name :type: Optional[str] :value: None .. py:method:: rpeel(delimiter: Union[bytes, arkouda.numpy.dtypes.str_scalars], times: arkouda.numpy.dtypes.int_scalars = 1, includeDelimiter: bool = False, keepPartial: bool = False, regex: bool = False) -> Tuple[Strings, Strings] Peel off one or more delimited fields from the end of each string (similar to string.rpartition), returning two new arrays of strings. *Warning*: This function is experimental and not guaranteed to work. :param delimiter: The separator where the split will occur :type delimiter: bytes or str_scalars :param times: The number of times the delimiter is sought, i.e. skip over the last (times-1) delimiters :type times: int_scalars, default=1 :param includeDelimiter: If true, prepend the delimiter to the start of the first return array. By default, it is appended to the end of the second return array. :type includeDelimiter: bool, default=False :param keepPartial: If true, a string that does not contain instances of the delimiter will be returned in the second array. By default, such strings are returned in the first array. :type keepPartial: bool, default=False :param regex: Indicates whether delimiter is a regular expression Note: only handles regular expressions supported by re2 (does not support lookaheads/lookbehinds) :type regex: bool, default=False :returns: left: Strings The remainder of the string after peeling right: Strings The field(s) that were peeled from the right of each string :rtype: Tuple[Strings, Strings] :raises TypeError: Raised if the delimiter parameter is not bytes or str_scalars or if times is not int64 :raises ValueError: Raised if times is < 1 or if delimiter is not a valid regex :raises RuntimeError: Raised if there is a server-side error thrown .. seealso:: :py:obj:`peel`, :py:obj:`stick`, :py:obj:`lstick` .. rubric:: Examples >>> import arkouda as ak >>> s = ak.array(['a.b', 'c.d', 'e.f.g']) >>> s.rpeel('.') (array(['a', 'c', 'e.f']), array(['b', 'd', 'g'])) Compared against peel >>> s.peel('.') (array(['a', 'c', 'e']), array(['b', 'd', 'f.g'])) .. py:method:: search(pattern: Union[bytes, arkouda.numpy.dtypes.str_scalars]) -> arkouda.pandas.match.Match Return a match object with the first location in each element where pattern produces a match. Elements match if any part of the string matches the regular expression pattern. :param pattern: Regex used to find matches :type pattern: bytes or str_scalars :returns: Match object where elements match if any part of the string matches the regular expression pattern :rtype: Match .. rubric:: Examples >>> import arkouda as ak >>> strings = ak.array(['1_2___', '____', '3', '__4___5____6___7', '']) >>> strings.search('_+') .. py:attribute:: shape :type: Tuple[int] .. py:attribute:: size :type: arkouda.numpy.dtypes.int_scalars .. py:method:: split(delimiter: str, return_segments: bool = False, regex: bool = False) -> Union[Strings, Tuple] Unpack delimiter-joined substrings into a flat array. :param delimiter: Characters used to split strings into substrings :type delimiter: str :param return_segments: If True, also return mapping of original strings to first substring in return array. :type return_segments: bool, default=False :param regex: Indicates whether delimiter is a regular expression Note: only handles regular expressions supported by re2 (does not support lookaheads/lookbehinds) :type regex: bool, default=False :returns: Strings Flattened substrings with delimiters removed pdarray, int64 (optional) For each original string, the index of first corresponding substring in the return array :rtype: Union[Strings, Tuple] .. seealso:: :py:obj:`peel`, :py:obj:`rpeel` .. rubric:: Examples >>> import arkouda as ak >>> orig = ak.array(['one|two', 'three|four|five', 'six']) >>> orig.split('|') array(['one', 'two', 'three', 'four', 'five', 'six']) >>> flat, mapping = orig.split('|', return_segments=True) >>> mapping array([0 2 5]) >>> under = ak.array(['one_two', 'three_____four____five', 'six']) >>> under_split, under_map = under.split('_+', return_segments=True, regex=True) >>> under_split array(['one', 'two', 'three', 'four', 'five', 'six']) >>> under_map array([0 2 5]) .. py:method:: startswith(substr: Union[bytes, arkouda.numpy.dtypes.str_scalars], regex: bool = False) -> arkouda.numpy.pdarrayclass.pdarray Check whether each element starts with the given substring. :param substr: The prefix to search for :type substr: bytes or str_scalars :param regex: Indicates whether substr is a regular expression Note: only handles regular expressions supported by re2 (does not support lookaheads/lookbehinds) :type regex: bool, default=False :returns: True for elements that start with substr, False otherwise :rtype: pdarray :raises TypeError: Raised if the substr parameter is not a bytes ior str_scalars :raises ValueError: Rasied if substr is not a valid regex :raises RuntimeError: Raised if there is a server-side error thrown .. seealso:: :py:obj:`Strings.contains`, :py:obj:`Strings.endswith` .. rubric:: Examples >>> import arkouda as ak >>> strings_end = ak.array([f'string {i}' for i in range(1, 6)]) >>> strings_end array(['string 1', 'string 2', 'string 3', 'string 4', 'string 5']) >>> strings_end.startswith('string') array([True True True True True]) >>> strings_start = ak.array([f'{i} string' for i in range(1,6)]) >>> strings_start array(['1 string', '2 string', '3 string', '4 string', '5 string']) >>> strings_start.startswith('\\d str', regex = True) array([True True True True True]) .. py:method:: stick(other: Strings, delimiter: Union[bytes, arkouda.numpy.dtypes.str_scalars] = '', toLeft: bool = False) -> Strings Join the strings from another array onto one end of the strings of this array, optionally inserting a delimiter. *Warning*: This function is experimental and not guaranteed to work. :param other: The strings to join onto self's strings :type other: Strings :param delimiter: String inserted between self and other :type delimiter: bytes or str_scalars, default="" :param toLeft: If true, join other strings to the left of self. By default, other is joined to the right of self. :type toLeft: bool, default=False :returns: The array of joined strings :rtype: Strings :raises TypeError: Raised if the delimiter parameter is not bytes or str_scalars or if the other parameter is not a Strings instance :raises ValueError: Raised if times is < 1 :raises RuntimeError: Raised if there is a server-side error thrown .. seealso:: :py:obj:`lstick`, :py:obj:`peel`, :py:obj:`rpeel` .. rubric:: Examples >>> import arkouda as ak >>> s = ak.array(['a', 'c', 'e']) >>> t = ak.array(['b', 'd', 'f']) >>> s.stick(t, delimiter='.') array(['a.b', 'c.d', 'e.f']) .. py:method:: strip(chars: Optional[Union[bytes, arkouda.numpy.dtypes.str_scalars]] = '') -> Strings Return a new Strings object with all leading and trailing occurrences of characters contained in chars removed. The chars argument is a string specifying the set of characters to be removed. If omitted, the chars argument defaults to removing whitespace. The chars argument is not a prefix or suffix; rather, all combinations of its values are stripped. :param chars: the set of characters to be removed :type chars: bytes or str_scalars, optional :returns: Strings object with the leading and trailing characters matching the set of characters in the chars argument removed :rtype: Strings :raises RuntimeError: Raised if there is a server-side error thrown .. rubric:: Examples >>> import arkouda as ak >>> strings = ak.array(['Strings ', ' StringS ', 'StringS ']) >>> s = strings.strip() >>> s array(['Strings', 'StringS', 'StringS']) >>> strings = ak.array(['Strings 1', '1 StringS ', ' 1StringS 12 ']) >>> s = strings.strip(' 12') >>> s array(['Strings', 'StringS', 'StringS']) .. py:method:: sub(pattern: Union[bytes, arkouda.numpy.dtypes.str_scalars], repl: Union[bytes, arkouda.numpy.dtypes.str_scalars], count: int = 0) -> Strings Return new Strings obtained by replacing non-overlapping occurrences of pattern with the replacement repl. If count is nonzero, at most count substitutions occur. :param pattern: The regex to substitue :type pattern: bytes or str_scalars :param repl: The substring to replace pattern matches with :type repl: bytes or str_scalars :param count: The max number of pattern match occurences in each element to replace. The default count=0 replaces all occurences of pattern with repl :type count: int, default=0 :returns: Strings with pattern matches replaced :rtype: Strings :raises TypeError: Raised if pattern or repl are not bytes or str_scalars :raises ValueError: Raised if pattern is not a valid regex :raises RuntimeError: Raised if there is a server-side error thrown .. seealso:: :py:obj:`Strings.subn` .. rubric:: Examples >>> import arkouda as ak >>> strings = ak.array(['1_2___', '____', '3', '__4___5____6___7', '']) >>> strings.sub(pattern='_+', repl='-', count=2) array(['1-2-', '-', '3', '-4-5____6___7', '']) .. py:method:: subn(pattern: Union[bytes, arkouda.numpy.dtypes.str_scalars], repl: Union[bytes, arkouda.numpy.dtypes.str_scalars], count: int = 0) -> Tuple[Strings, arkouda.numpy.pdarrayclass.pdarray] Perform the same operation as sub(), but return a tuple (new_Strings, number_of_substitions). :param pattern: The regex to substitue :type pattern: bytes or str_scalars :param repl: The substring to replace pattern matches with :type repl: bytes or str_scalars :param count: The max number of pattern match occurences in each element to replace. The default count=0 replaces all occurences of pattern with repl :type count: int, default=0 :returns: Strings Strings with pattern matches replaced pdarray, int64 The number of substitutions made for each element of Strings :rtype: Tuple[Strings, pdarray] :raises TypeError: Raised if pattern or repl are not bytes or str_scalars :raises ValueError: Raised if pattern is not a valid regex :raises RuntimeError: Raised if there is a server-side error thrown .. seealso:: :py:obj:`Strings.sub` .. rubric:: Examples >>> import arkouda as ak >>> strings = ak.array(['1_2___', '____', '3', '__4___5____6___7', '']) >>> strings.subn(pattern='_+', repl='-', count=2) (array(['1-2-', '-', '3', '-4-5____6___7', '']), array([2 1 0 2 0])) .. py:method:: take(indices: Union[arkouda.numpy.dtypes.numeric_scalars, arkouda.numpy.pdarrayclass.pdarray], axis: Optional[int] = None) -> Strings Take elements from the array along an axis. When axis is not None, this function does the same thing as “fancy” indexing (indexing arrays using arrays); however, it can be easier to use if you need elements along a given axis. A call such as ``np.take(arr, indices, axis=3)`` is equivalent to ``arr[:,:,:,indices,...]``. :param indices: The indices of the values to extract. Also allow scalars for indices. :type indices: numeric_scalars or pdarray :param axis: The axis over which to select values. By default, the flattened input array is used. :type axis: int, optional :returns: A Strings containing the selected elements. :rtype: Strings .. rubric:: Examples >>> import arkouda as ak >>> a = ak.array(["a","b","c"]) >>> indices = [0, 1] >>> a.take(indices) array(['a', 'b']) .. py:method:: title() -> Strings Return a new Strings from the original replaced with their titlecase equivalent. :returns: Strings from the original replaced with their titlecase equivalent. :rtype: Strings :raises RuntimeError: Raised if there is a server-side error thrown. .. seealso:: :py:obj:`Strings.lower`, :py:obj:`String.upper` .. rubric:: Examples >>> import arkouda as ak >>> strings = ak.array([f'StrINgS {i}' for i in range(5)]) >>> strings array(['StrINgS 0', 'StrINgS 1', 'StrINgS 2', 'StrINgS 3', 'StrINgS 4']) >>> strings.title() array(['Strings 0', 'Strings 1', 'Strings 2', 'Strings 3', 'Strings 4']) .. py:method:: to_csv(prefix_path: str, dataset: str = 'strings_array', col_delim: str = ',', overwrite: bool = False) -> str Write Strings to CSV file(s). File will contain a single column with the Strings data. All CSV Files written by Arkouda include a header denoting data types of the columns. Unlike other file formats, CSV files store Strings as their UTF-8 format instead of storing bytes as uint(8). :param prefix_path: The filename prefix to be used for saving files. Files will have _LOCALE#### appended when they are written to disk. :type prefix_path: str :param dataset: Column name to save the Strings under. Defaults to "strings_array". :type dataset: str, default="strings_array" :param col_delim: Defaults to ",". Value to be used to separate columns within the file. Please be sure that the value used DOES NOT appear in your dataset. :type col_delim: str, default="," :param overwrite: Defaults to False. If True, any existing files matching your provided prefix_path will be overwritten. If False, an error will be returned if existing files are found. :type overwrite: bool, default=False :returns: response message :rtype: str :raises ValueError: Raised if all datasets are not present in all parquet files or if one or more of the specified files do not exist :raises RuntimeError: Raised if one or more of the specified files cannot be opened. If `allow_errors` is true this may be raised if no values are returned from the server. :raises TypeError: Raised if we receive an unknown arkouda_type returned from the server .. rubric:: Notes - CSV format is not currently supported by load/load_all operations - The column delimiter is expected to be the same for column names and data - Be sure that column delimiters are not found within your data. - All CSV files must delimit rows using newline (``\\n``) at this time. .. py:method:: to_hdf(prefix_path: str, dataset: str = 'strings_array', mode: Literal['truncate', 'append'] = 'truncate', save_offsets: bool = True, file_type: Literal['single', 'distribute'] = 'distribute') -> str Save the Strings object to HDF5. The object can be saved to a collection of files or single file. :param prefix_path: Directory and filename prefix that all output files share :type prefix_path: str :param dataset: The name of the Strings dataset to be written, defaults to strings_array :type dataset: str, default="strings_array" :param mode: By default, truncate (overwrite) output files, if they exist. If 'append', create a new Strings dataset within existing files. :type mode: {"truncate", "append"}, default = "truncate" :param save_offsets: Defaults to True which will instruct the server to save the offsets array to HDF5 If False the offsets array will not be save and will be derived from the string values upon load/read. :type save_offsets: bool, default=True :param file_type: Default: Distribute Distribute the dataset over a file per locale. Single file will save the dataset to one file :type file_type: {"single", "distribute"}, default = "distribute" :returns: String message indicating result of save operation :rtype: str :raises RuntimeError: Raised if a server-side error is thrown saving the pdarray .. rubric:: Notes - Parquet files do not store the segments, only the values. - Strings state is saved as two datasets within an hdf5 group: one for the string characters and one for the segments corresponding to the start of each string - the hdf5 group is named via the dataset parameter. - The prefix_path must be visible to the arkouda server and the user must have write permission. - Output files have names of the form ``_LOCALE``, where ```` ranges from 0 to ``numLocales`` for `file_type='distribute'`. Otherwise, the file name will be `prefix_path`. - If any of the output files already exist and the mode is 'truncate', they will be overwritten. If the mode is 'append' and the number of output files is less than the number of locales or a dataset with the same name already exists, a ``RuntimeError`` will result. - Any file extension can be used.The file I/O does not rely on the extension to determine the file format. .. seealso:: :py:obj:`to_hdf` .. py:method:: to_ndarray() -> numpy.ndarray Convert the array to a np.ndarray, transferring array data from the arkouda server to Python. If the array exceeds a built-in size limit, a RuntimeError is raised. :returns: A numpy ndarray with the same strings as this array :rtype: np.ndarray .. rubric:: Notes The number of bytes in the array cannot exceed ``ak.core.client.maxTransferBytes``, otherwise a ``RuntimeError`` will be raised. This is to protect the user from overflowing the memory of the system on which the Python client is running, under the assumption that the server is running on a distributed system with much more memory than the client. The user may override this limit by setting ak.core.client.maxTransferBytes to a larger value, but proceed with caution. .. seealso:: :py:obj:`array`, :py:obj:`tolist` .. rubric:: Examples >>> import arkouda as ak >>> a = ak.array(["hello", "my", "world"]) >>> a.to_ndarray() array(['hello', 'my', 'world'], dtype='>> type(a.to_ndarray()) .. py:method:: to_parquet(prefix_path: str, dataset: str = 'strings_array', mode: Literal['truncate', 'append'] = 'truncate', compression: Optional[Literal['snappy', 'gzip', 'brotli', 'zstd', 'lz4']] = None) -> str Save the Strings object to Parquet. The result is a collection of files, one file per locale of the arkouda server, where each filename starts with prefix_path. Each locale saves its chunk of the array to its corresponding file. :param prefix_path: Directory and filename prefix that all output files share :type prefix_path: str :param dataset: Name of the dataset to create in files (must not already exist) :type dataset: str, default="strings_array" :param mode: By default, truncate (overwrite) output files, if they exist. If 'append', attempt to create new dataset in existing files. :type mode: {"truncate", "append"}, default = "truncate" :param compression: Sets the compression type used with Parquet files :type compression: {"snappy", "gzip", "brotli", "zstd", "lz4"}, optional :returns: string message indicating result of save operation :rtype: str :raises RuntimeError: Raised if a server-side error is thrown saving the pdarray .. rubric:: Notes - The prefix_path must be visible to the arkouda server and the user must have write permission. - Output files have names of the form ``_LOCALE``, where ```` ranges from 0 to ``numLocales`` for `file_type='distribute'`. - 'append' write mode is supported, but is not efficient. - If any of the output files already exist and the mode is 'truncate', they will be overwritten. If the mode is 'append' and the number of output files is less than the number of locales or a dataset with the same name already exists, a ``RuntimeError`` will result. - Any file extension can be used.The file I/O does not rely on the extension to determine the file format. .. py:method:: tolist() -> List[str] Convert the SegString to a list, transferring data from the arkouda server to Python. If the SegString exceeds a built-in size limit, a RuntimeError is raised. :returns: A list with the same strings as this SegString :rtype: List[str] .. rubric:: Notes The number of bytes in the array cannot exceed ``ak.core.client.maxTransferBytes``, otherwise a ``RuntimeError`` will be raised. This is to protect the user from overflowing the memory of the system on which the Python client is running, under the assumption that the server is running on a distributed system with much more memory than the client. The user may override this limit by setting ak.core.client.maxTransferBytes to a larger value, but proceed with caution. .. seealso:: :py:obj:`to_ndarray` .. rubric:: Examples >>> import arkouda as ak >>> a = ak.array(["hello", "my", "world"]) >>> a.tolist() ['hello', 'my', 'world'] >>> type(a.tolist()) .. py:method:: transfer(hostname: str, port: arkouda.numpy.dtypes.int_scalars) -> Union[str, memoryview] Send a Strings object to a different Arkouda server. :param hostname: The hostname where the Arkouda server intended to receive the Strings object is running. :type hostname: str :param port: The port to send the array over. This needs to be an open port (i.e., not one that the Arkouda server is running on). This will open up `numLocales` ports, each of which in succession, so will use ports of the range {port..(port+numLocales)} (e.g., running an Arkouda server of 4 nodes, port 1234 is passed as `port`, Arkouda will use ports 1234, 1235, 1236, and 1237 to send the array data). This port much match the port passed to the call to `ak.receive_array()`. :type port: int_scalars :returns: A message indicating a complete transfer :rtype: str :raises ValueError: Raised if the op is not within the pdarray.BinOps set :raises TypeError: Raised if other is not a pdarray or the pdarray.dtype is not a supported dtype .. py:method:: unregister() -> None Unregister a Strings object in the arkouda server which was previously registered using register() and/or attached to using attach(). :raises RuntimeError: Raised if the server could not find the internal name/symbol to remove .. seealso:: :py:obj:`register`, :py:obj:`attach` .. rubric:: Notes Registered names/Strings objects in the server are immune to deletion until they are unregistered. .. py:method:: update_hdf(prefix_path: str, dataset: str = 'strings_array', save_offsets: bool = True, repack: bool = True) -> str Overwrite the dataset with the name provided with this Strings object. If the dataset does not exist it is added. :param prefix_path: Directory and filename prefix that all output files share :type prefix_path: str :param dataset: Name of the dataset to create in files :type dataset: str, default="strings_array" :param save_offsets: Defaults to True which will instruct the server to save the offsets array to HDF5 If False the offsets array will not be save and will be derived from the string values upon load/read. :type save_offsets: bool, default=True :param repack: Default: True HDF5 does not release memory on delete. When True, the inaccessible data (that was overwritten) is removed. When False, the data remains, but is inaccessible. Setting to false will yield better performance, but will cause file sizes to expand. :type repack: bool, default=True :returns: success message if successful :rtype: str :raises RuntimeError: Raised if a server-side error is thrown saving the Strings object .. rubric:: Notes - If file does not contain File_Format attribute to indicate how it was saved, the file name is checked for _LOCALE#### to determine if it is distributed. - If the dataset provided does not exist, it will be added .. py:method:: upper() -> Strings Return a new Strings with all lowercase characters from the original replaced with their uppercase equivalent. :returns: Strings with all lowercase characters from the original replaced with their uppercase equivalent :rtype: Strings :raises RuntimeError: Raised if there is a server-side error thrown .. seealso:: :py:obj:`Strings.lower` .. rubric:: Examples >>> import arkouda as ak >>> strings = ak.array([f'StrINgS {i}' for i in range(5)]) >>> strings array(['StrINgS 0', 'StrINgS 1', 'StrINgS 2', 'StrINgS 3', 'StrINgS 4']) >>> strings.upper() array(['STRINGS 0', 'STRINGS 1', 'STRINGS 2', 'STRINGS 3', 'STRINGS 4'])