arkouda.pandas.matcher

Matching utilities for Arkouda arrays.

The arkouda.matcher module provides functions for efficiently aligning, matching, and comparing Arkouda arrays. These tools are used internally to support operations such as joins, merges, and reindexing across arrays, particularly when working with categorical or structured data.

Functions in this module generally operate on one or more pdarray or Categorical inputs and return index mappings or boolean masks indicating matches or relationships between datasets.

Features

  • Index resolution for join operations

  • Efficient value-matching across large arrays

  • Support for type-aware and multi-column matching

  • Underpins DataFrame merge, Series alignment, and MultiIndex comparisons

Typical Use Cases

  • Determining where values from one array appear in another

  • Generating permutation indices for aligning arrays

  • Supporting categorical equivalence testing

Notes

  • Functions in this module are primarily intended for internal use in arkouda.pandas-style functionality.

  • Most functions assume inputs are distributed pdarray, Strings, or Categorical types with compatible shapes and data types.

See also

-, -, -, -

Classes

Matcher

Utility class for storing and standardizing information about pattern matches.

Module Contents

class arkouda.pandas.matcher.Matcher(pattern: arkouda.numpy.dtypes.str_scalars, parent_entry_name: str)[source]

Utility class for storing and standardizing information about pattern matches.

The Matcher class defines a standard set of location-related fields that can be used to represent the results of search or match operations, typically involving string or pattern matching over Arkouda arrays.

LocationsInfo = frozenset({...})

A set of standardized string keys describing match-related metadata. These include:

  • 'num_matches' – total number of matches found.

  • 'starts' – start positions of matches.

  • 'lengths' – lengths of matches.

  • 'search_bool' – boolean array indicating matches in the search space.

  • 'search_ind' – indices of matches in the search space.

  • 'match_bool' – boolean array indicating actual matches.

  • 'match_ind' – indices of actual matches.

  • 'full_match_bool' – boolean array for full string matches.

  • 'full_match_ind' – indices of full matches.

LocationsInfo
find_locations() None[source]

Populate Matcher object by finding the positions of matches.

findall(return_match_origins: bool = False)[source]

Return all non-overlapping matches of pattern in Strings as a new Strings object.

full_match_bool: arkouda.numpy.pdarrayclass.pdarray
full_match_ind: arkouda.numpy.pdarrayclass.pdarray
get_match(match_type: arkouda.pandas.match.MatchType, parent: object = None) arkouda.pandas.match.Match[source]

Create a Match object of type match_type.

indices: arkouda.numpy.pdarrayclass.pdarray
lengths: arkouda.numpy.pdarrayclass.pdarray
logger
match_bool: arkouda.numpy.pdarrayclass.pdarray
match_ind: arkouda.numpy.pdarrayclass.pdarray
num_matches: arkouda.numpy.pdarrayclass.pdarray
objType = 'Matcher'
parent_entry_name
populated = False
search_bool: arkouda.numpy.pdarrayclass.pdarray
search_ind: arkouda.numpy.pdarrayclass.pdarray
split(maxsplit: int = 0, return_segments: bool = False)[source]

Split string by the occurrences of pattern.

If maxsplit is nonzero, at most maxsplit splits occur.

starts: arkouda.numpy.pdarrayclass.pdarray
sub(repl: str, count: int = 0, return_num_subs: bool = False)[source]

Return the Strings obtained by replacing non-overlapping occurrences of pattern with the replacement repl.

If count is nonzero, at most count substitutions occur If return_num_subs is True, return the number of substitutions that occurred.