arkouda.pandas.matcher¶
Matching utilities for Arkouda arrays.
The arkouda.matcher module provides functions for efficiently aligning, matching, and comparing Arkouda arrays. These tools are used internally to support operations such as joins, merges, and reindexing across arrays, particularly when working with categorical or structured data.
Functions in this module generally operate on one or more pdarray or Categorical inputs and return index mappings or boolean masks indicating matches or relationships between datasets.
Features¶
Index resolution for join operations
Efficient value-matching across large arrays
Support for type-aware and multi-column matching
Underpins DataFrame merge, Series alignment, and MultiIndex comparisons
Typical Use Cases¶
Determining where values from one array appear in another
Generating permutation indices for aligning arrays
Supporting categorical equivalence testing
Notes
Functions in this module are primarily intended for internal use in arkouda.pandas-style functionality.
Most functions assume inputs are distributed pdarray, Strings, or Categorical types with compatible shapes and data types.
See also
-, -, -, -
Classes¶
Utility class for storing and standardizing information about pattern matches. |
Module Contents¶
- class arkouda.pandas.matcher.Matcher(pattern: arkouda.numpy.dtypes.str_scalars, parent_entry_name: str)[source]¶
Utility class for storing and standardizing information about pattern matches.
The
Matcherclass defines a standard set of location-related fields that can be used to represent the results of search or match operations, typically involving string or pattern matching over Arkouda arrays.- LocationsInfo = frozenset({...})
A set of standardized string keys describing match-related metadata. These include:
'num_matches'– total number of matches found.'starts'– start positions of matches.'lengths'– lengths of matches.'search_bool'– boolean array indicating matches in the search space.'search_ind'– indices of matches in the search space.'match_bool'– boolean array indicating actual matches.'match_ind'– indices of actual matches.'full_match_bool'– boolean array for full string matches.'full_match_ind'– indices of full matches.
- LocationsInfo¶
- findall(return_match_origins: bool = False)[source]¶
Return all non-overlapping matches of pattern in Strings as a new Strings object.
- full_match_bool: arkouda.numpy.pdarrayclass.pdarray¶
- full_match_ind: arkouda.numpy.pdarrayclass.pdarray¶
- get_match(match_type: arkouda.pandas.match.MatchType, parent: object = None) arkouda.pandas.match.Match[source]¶
Create a Match object of type match_type.
- indices: arkouda.numpy.pdarrayclass.pdarray¶
- lengths: arkouda.numpy.pdarrayclass.pdarray¶
- logger¶
- match_bool: arkouda.numpy.pdarrayclass.pdarray¶
- match_ind: arkouda.numpy.pdarrayclass.pdarray¶
- num_matches: arkouda.numpy.pdarrayclass.pdarray¶
- objType = 'Matcher'¶
- parent_entry_name¶
- populated = False¶
- search_bool: arkouda.numpy.pdarrayclass.pdarray¶
- search_ind: arkouda.numpy.pdarrayclass.pdarray¶
- split(maxsplit: int = 0, return_segments: bool = False)[source]¶
Split string by the occurrences of pattern.
If maxsplit is nonzero, at most maxsplit splits occur.
- sub(repl: str, count: int = 0, return_num_subs: bool = False)[source]¶
Return the Strings obtained by replacing non-overlapping occurrences of pattern with the replacement repl.
If count is nonzero, at most count substitutions occur If return_num_subs is True, return the number of substitutions that occurred.