arkouda.match¶
Regular expression match results for segmented string arrays in Arkouda.
The arkouda.match module defines the Match class, which encapsulates results from regex-based operations such as search, match, and fullmatch on Arkouda Strings arrays.
This class provides methods to retrieve: - Match booleans (matched) - Start and end positions of matches - Capture groups - Matched substrings - Origin indices of matched elements (optional)
These operations enable powerful pattern recognition and substring extraction on large-scale segmented string arrays, implemented efficiently in the Arkouda server.
Exports¶
Match : Object representing a regex match result
MatchType : Enum indicating the regex method used (SEARCH, MATCH, FULLMATCH)
Examples
>>> import arkouda as ak
>>> strings = ak.array(["Isaac Newton", "Ada Lovelace", ""])
>>> m = strings.search("(\\w+) (\\w+)")
>>> m.matched()
array([True True False])
>>> m.group(1)
array(['Isaac', 'Ada'])
>>> m.group(2, return_group_origins=True)
(array(['Newton', 'Lovelace']), array([0 1]))
Notes
group(0) returns the full match by default.
If regexMaxCaptures is exceeded, the server must be recompiled with a higher limit.
Match objects are typically obtained via Strings.search(), /
Strings.match(), or Strings.fullmatch().
See also
arkouda.strings.Strings
, arkouda.client.regexMaxCaptures
Classes¶
Module Contents¶
- class arkouda.match.Match(matched: arkouda.numpy.pdarrayclass.pdarray, starts: arkouda.numpy.pdarrayclass.pdarray, lengths: arkouda.numpy.pdarrayclass.pdarray, indices: arkouda.numpy.pdarrayclass.pdarray, parent_entry_name: str, match_type: MatchType, pattern: str)[source]¶
- end() arkouda.numpy.pdarrayclass.pdarray [source]¶
Return the ends of matches.
- Returns:
The end positions of matches
- Return type:
Examples
>>> import arkouda as ak >>> strings = ak.array(['1_2___', '____', '3', '__4___5____6___7', '']) >>> strings.search('_+').end() array([2 4 2])
- find_matches(return_match_origins: bool = False)[source]¶
Return all matches as a new Strings object.
- Parameters:
return_match_origins (bool) – If True, return a pdarray containing the index of the original string each pattern match is from
- Returns:
Strings – Strings object containing only matches
pdarray, int64 (optional) – The index of the original string each pattern match is from
- Raises:
RuntimeError – Raised if there is a server-side error thrown
Examples
>>> import arkouda as ak >>> strings = ak.array(['1_2___', '____', '3', '__4___5____6___7', '']) >>> strings.search('_+').find_matches(return_match_origins=True) (array(['_', '____', '__']), array([0 1 3]))
- group(group_num: int = 0, return_group_origins: bool = False)[source]¶
Return a new Strings containing the capture group corresponding to group_num.
For the default, group_num=0, return the full match.
- Parameters:
group_num (int) – The index of the capture group to be returned
return_group_origins (bool) – If True, return a pdarray containing the index of the original string each capture group is from
- Returns:
Strings – Strings object containing only the capture groups corresponding to group_num
pdarray, int64 (optional) – The index of the original string each group is from
Examples
>>> import arkouda as ak >>> strings = ak.array(["Isaac Newton, physics", '<-calculus->', 'Gottfried Leibniz, math']) >>> m = strings.search("(\\w+) (\\w+)") >>> m.group() array(['Isaac Newton', 'Gottfried Leibniz']) >>> m.group(1) array(['Isaac', 'Gottfried']) >>> m.group(2, return_group_origins=True) (array(['Newton', 'Leibniz']), array([0 2]))
- match_type() str [source]¶
Return the type of the Match object.
- Returns:
MatchType of the Match object
- Return type:
str
Examples
>>> import arkouda as ak >>> strings = ak.array(['1_2___', '____', '3', '__4___5____6___7', '']) >>> strings.search('_+').match_type() 'SEARCH'
- matched() arkouda.numpy.pdarrayclass.pdarray [source]¶
Return a boolean array indiciating whether each element matched.
- Returns:
True for elements that match, False otherwise
- Return type:
Examples
>>> import arkouda as ak >>> strings = ak.array(['1_2___', '____', '3', '__4___5____6___7', '']) >>> strings.search('_+').matched() array([True True False True False])
- re¶
- start() arkouda.numpy.pdarrayclass.pdarray [source]¶
Return the starts of matches.
- Returns:
The start positions of matches
- Return type:
Examples
>>> import arkouda as ak >>> strings = ak.array(['1_2___', '____', '3', '__4___5____6___7', '']) >>> strings.search('_+').start() array([1 0 0])