arkouda.match

Classes

Match

Encapsulates regular expression match results on Arkouda segmented string arrays.

Package Contents

class arkouda.match.Match(matched: arkouda.numpy.pdarrayclass.pdarray, starts: arkouda.numpy.pdarrayclass.pdarray, lengths: arkouda.numpy.pdarrayclass.pdarray, indices: arkouda.numpy.pdarrayclass.pdarray, parent_entry_name: str, match_type: MatchType, pattern: str)[source]

Encapsulates regular expression match results on Arkouda segmented string arrays.

Created by calling search(), match(), or fullmatch() on a Strings object. Provides access to match booleans, span information, capture groups, and origin indices of matches.

re

Regex pattern used.

Type:

str

Examples

>>> import arkouda as ak
>>> strings = ak.array(['1_2___', '____', '3', '__4___5____6___7', ''])
>>> m = strings.search('_+')
>>> m
<ak.Match object: matched=True, span=(1, 2); matched=True, span=(0, 4);
matched=False; matched=True, span=(0, 2); matched=False>
>>> type(m)
<class 'arkouda.pandas.match.Match'>
>>> m.matched()
array([True True False True False])
>>> m.start()
array([1 0 0])
>>> m.end()
array([2 4 2])
>>> m.match_type()
'SEARCH'
>>> m.re
'_+'
>>> m[1]
'matched=True, span=(0, 4)'
end() arkouda.numpy.pdarrayclass.pdarray[source]

Return the ends of matches.

Returns:

The end positions of matches

Return type:

pdarray

Examples

>>> import arkouda as ak
>>> strings = ak.array(['1_2___', '____', '3', '__4___5____6___7', ''])
>>> strings.search('_+').end()
array([2 4 2])
find_matches(return_match_origins: bool = False)[source]

Return all matches as a new Strings object.

Parameters:

return_match_origins (bool) – If True, return a pdarray containing the index of the original string each pattern match is from

Returns:

  • Strings – Strings object containing only matches

  • pdarray, int64 (optional) – The index of the original string each pattern match is from

Raises:

RuntimeError – Raised if there is a server-side error thrown

Examples

>>> import arkouda as ak
>>> strings = ak.array(['1_2___', '____', '3', '__4___5____6___7', ''])
>>> strings.search('_+').find_matches(return_match_origins=True)
(array(['_', '____', '__']), array([0 1 3]))
group(group_num: int = 0, return_group_origins: bool = False)[source]

Return a new Strings containing the capture group corresponding to group_num.

For the default, group_num=0, return the full match.

Parameters:
  • group_num (int) – The index of the capture group to be returned

  • return_group_origins (bool) – If True, return a pdarray containing the index of the original string each capture group is from

Returns:

  • Strings – Strings object containing only the capture groups corresponding to group_num

  • pdarray, int64 (optional) – The index of the original string each group is from

Examples

>>> import arkouda as ak
>>> strings = ak.array(["Isaac Newton, physics", '<-calculus->', 'Gottfried Leibniz, math'])
>>> m = strings.search("(\\w+) (\\w+)")
>>> m.group()
array(['Isaac Newton', 'Gottfried Leibniz'])
>>> m.group(1)
array(['Isaac', 'Gottfried'])
>>> m.group(2, return_group_origins=True)
(array(['Newton', 'Leibniz']), array([0 2]))
match_type() str[source]

Return the type of the Match object.

Returns:

MatchType of the Match object

Return type:

str

Examples

>>> import arkouda as ak
>>> strings = ak.array(['1_2___', '____', '3', '__4___5____6___7', ''])
>>> strings.search('_+').match_type()
'SEARCH'
matched() arkouda.numpy.pdarrayclass.pdarray[source]

Return a boolean array indiciating whether each element matched.

Returns:

True for elements that match, False otherwise

Return type:

pdarray

Examples

>>> import arkouda as ak
>>> strings = ak.array(['1_2___', '____', '3', '__4___5____6___7', ''])
>>> strings.search('_+').matched()
array([True True False True False])
re: str
start() arkouda.numpy.pdarrayclass.pdarray[source]

Return the starts of matches.

Returns:

The start positions of matches

Return type:

pdarray

Examples

>>> import arkouda as ak
>>> strings = ak.array(['1_2___', '____', '3', '__4___5____6___7', ''])
>>> strings.search('_+').start()
array([1 0 0])