arkouda.match

Regular expression match results for segmented string arrays in Arkouda.

The arkouda.match module defines the Match class, which encapsulates results from regex-based operations such as search, match, and fullmatch on Arkouda Strings arrays.

This class provides methods to retrieve: - Match booleans (matched) - Start and end positions of matches - Capture groups - Matched substrings - Origin indices of matched elements (optional)

These operations enable powerful pattern recognition and substring extraction on large-scale segmented string arrays, implemented efficiently in the Arkouda server.

Exports

  • Match : Object representing a regex match result

  • MatchType : Enum indicating the regex method used (SEARCH, MATCH, FULLMATCH)

Examples

>>> import arkouda as ak
>>> strings = ak.array(["Isaac Newton", "Ada Lovelace", ""])
>>> m = strings.search("(\\w+) (\\w+)")
>>> m.matched()
array([True True False])
>>> m.group(1)
array(['Isaac', 'Ada'])
>>> m.group(2, return_group_origins=True)
(array(['Newton', 'Lovelace']), array([0 1]))

Notes

  • group(0) returns the full match by default.

  • If regexMaxCaptures is exceeded, the server must be recompiled with a higher limit.

  • Match objects are typically obtained via Strings.search(), /

Strings.match(), or Strings.fullmatch().

See also

arkouda.strings.Strings, arkouda.client.regexMaxCaptures

Classes

Module Contents

class arkouda.match.Match(matched: arkouda.numpy.pdarrayclass.pdarray, starts: arkouda.numpy.pdarrayclass.pdarray, lengths: arkouda.numpy.pdarrayclass.pdarray, indices: arkouda.numpy.pdarrayclass.pdarray, parent_entry_name: str, match_type: MatchType, pattern: str)[source]
end() arkouda.numpy.pdarrayclass.pdarray[source]

Return the ends of matches.

Returns:

The end positions of matches

Return type:

pdarray

Examples

>>> import arkouda as ak
>>> strings = ak.array(['1_2___', '____', '3', '__4___5____6___7', ''])
>>> strings.search('_+').end()
array([2 4 2])
find_matches(return_match_origins: bool = False)[source]

Return all matches as a new Strings object.

Parameters:

return_match_origins (bool) – If True, return a pdarray containing the index of the original string each pattern match is from

Returns:

  • Strings – Strings object containing only matches

  • pdarray, int64 (optional) – The index of the original string each pattern match is from

Raises:

RuntimeError – Raised if there is a server-side error thrown

Examples

>>> import arkouda as ak
>>> strings = ak.array(['1_2___', '____', '3', '__4___5____6___7', ''])
>>> strings.search('_+').find_matches(return_match_origins=True)
(array(['_', '____', '__']), array([0 1 3]))
group(group_num: int = 0, return_group_origins: bool = False)[source]

Return a new Strings containing the capture group corresponding to group_num.

For the default, group_num=0, return the full match.

Parameters:
  • group_num (int) – The index of the capture group to be returned

  • return_group_origins (bool) – If True, return a pdarray containing the index of the original string each capture group is from

Returns:

  • Strings – Strings object containing only the capture groups corresponding to group_num

  • pdarray, int64 (optional) – The index of the original string each group is from

Examples

>>> import arkouda as ak
>>> strings = ak.array(["Isaac Newton, physics", '<-calculus->', 'Gottfried Leibniz, math'])
>>> m = strings.search("(\\w+) (\\w+)")
>>> m.group()
array(['Isaac Newton', 'Gottfried Leibniz'])
>>> m.group(1)
array(['Isaac', 'Gottfried'])
>>> m.group(2, return_group_origins=True)
(array(['Newton', 'Leibniz']), array([0 2]))
match_type() str[source]

Return the type of the Match object.

Returns:

MatchType of the Match object

Return type:

str

Examples

>>> import arkouda as ak
>>> strings = ak.array(['1_2___', '____', '3', '__4___5____6___7', ''])
>>> strings.search('_+').match_type()
'SEARCH'
matched() arkouda.numpy.pdarrayclass.pdarray[source]

Return a boolean array indiciating whether each element matched.

Returns:

True for elements that match, False otherwise

Return type:

pdarray

Examples

>>> import arkouda as ak
>>> strings = ak.array(['1_2___', '____', '3', '__4___5____6___7', ''])
>>> strings.search('_+').matched()
array([True True False True False])
re
start() arkouda.numpy.pdarrayclass.pdarray[source]

Return the starts of matches.

Returns:

The start positions of matches

Return type:

pdarray

Examples

>>> import arkouda as ak
>>> strings = ak.array(['1_2___', '____', '3', '__4___5____6___7', ''])
>>> strings.search('_+').start()
array([1 0 0])