arkouda.alignment ================= .. py:module:: arkouda.alignment Exceptions ---------- .. autoapisummary:: arkouda.alignment.NonUniqueError Functions --------- .. autoapisummary:: arkouda.alignment.align arkouda.alignment.find arkouda.alignment.in1d_intervals arkouda.alignment.interval_lookup arkouda.alignment.is_cosorted arkouda.alignment.left_align arkouda.alignment.lookup arkouda.alignment.right_align arkouda.alignment.search_intervals arkouda.alignment.unsqueeze arkouda.alignment.zero_up Module Contents --------------- .. py:exception:: NonUniqueError Bases: :py:obj:`ValueError` Inappropriate argument value (of correct type). .. py:function:: align(*args) Map multiple arrays of sparse identifiers to a common 0-up index. :param \*args: Arrays to map to dense index :type \*args: pdarrays or sequences of pdarrays :returns: **aligned** -- Arrays with values replaced by 0-up indices :rtype: list of pdarrays .. py:function:: find(query, space, all_occurrences=False, remove_missing=False) Return indices of query items in a search list of items. :param query: The items to search for. If multiple arrays, each "row" is an item. :type query: (sequence of) array-like :param space: The set of items in which to search. Must have same shape/dtype as query. :type space: (sequence of) array-like :param all_occurrences: When duplicate terms are present in search space, if all_occurrences is True, return all occurrences found as a SegArray, otherwise return only the first occurrences as a pdarray. Defaults to only finding the first occurrence. Finding all occurrences is not yet supported on sequences of arrays :type all_occurrences: bool :param remove_missing: If all_occurrences is True, remove_missing is automatically enabled. If False, return -1 for any items in query not found in space. If True, remove these and only return indices of items that are found. :type remove_missing: bool :returns: **indices** -- For each item in query, its index in space. If all_occurrences is False, the return will be a pdarray of the first index where each value in the query appears in the space. If all_occurrences is True, the return will be a SegArray containing every index where each value in the query appears in the space. If all_occurrences is True, remove_missing is automatically enabled. If remove_missing is True, exclude missing values, otherwise return -1. :rtype: pdarray or SegArray .. rubric:: Examples >>> select_from = ak.arange(10) >>> arr1 = select_from[ak.randint(0, select_from.size, 20, seed=10)] >>> arr2 = select_from[ak.randint(0, select_from.size, 20, seed=11)] # remove some values to ensure we have some values # which don't appear in the search space >>> arr2 = arr2[arr2 != 9] >>> arr2 = arr2[arr2 != 3] # find with defaults (all_occurrences and remove_missing both False) >>> ak.find(arr1, arr2) array([-1 -1 -1 0 1 -1 -1 -1 2 -1 5 -1 8 -1 5 -1 -1 11 5 0]) # set remove_missing to True, only difference from default # is missing values are excluded >>> ak.find(arr1, arr2, remove_missing=True) array([0 1 2 5 8 5 11 5 0]) # set both remove_missing and all_occurrences to True, missing values # will be empty segments >>> ak.find(arr1, arr2, remove_missing=True, all_occurrences=True).to_list() [[], [], [], [0, 4], [1, 3, 10], [], [], [], [2, 6, 12, 13], [], [5, 7], [], [8, 9, 14], [], [5, 7], [], [], [11, 15], [5, 7], [0, 4]] .. py:function:: in1d_intervals(vals, intervals, symmetric=False) Test each value for membership in *any* of a set of half-open (pythonic) intervals. :param vals: Values to test for membership in intervals :type vals: pdarray(int, float) :param intervals: Non-overlapping, half-open intervals, as a tuple of (lower_bounds_inclusive, upper_bounds_exclusive) :type intervals: 2-tuple of pdarrays :param symmetric: If True, also return boolean pdarray indicating which intervals contained one or more query values. :type symmetric: bool :returns: * *pdarray(bool)* -- Array of same length as , True if corresponding value is included in any of the ranges defined by (low[i], high[i]) inclusive. * *pdarray(bool) (if symmetric=True)* -- Array of same length as number of intervals, True if corresponding interval contains any of the values in . .. rubric:: Notes First return array is equivalent to the following: ((vals >= intervals[0][0]) & (vals < intervals[1][0])) | ((vals >= intervals[0][1]) & (vals < intervals[1][1])) | ... ((vals >= intervals[0][-1]) & (vals < intervals[1][-1])) But much faster when testing many ranges. Second (optional) return array is equivalent to: ((intervals[0] <= vals[0]) & (intervals[1] > vals[0])) | ((intervals[0] <= vals[1]) & (intervals[1] > vals[1])) | ... ((intervals[0] <= vals[-1]) & (intervals[1] > vals[-1])) But much faster when vals is non-trivial size. .. py:function:: interval_lookup(keys, values, arguments, fillvalue=-1, tiebreak=None, hierarchical=False) Apply a function defined over intervals to an array of arguments. :param keys: Tuple of closed intervals expressed as (lower_bounds_inclusive, upper_bounds_inclusive). Must have same dtype(s) as vals. :type keys: 2-tuple of (sequences of) pdarrays :param values: Function value to return for each entry in keys. :type values: pdarray :param arguments: Values to search for in intervals. If multiple arrays, each "row" is an item. :type arguments: (sequences of) pdarray :param fillvalue: Default value to return when argument is not in any interval. :type fillvalue: scalar :param tiebreak: When an argument is present in more than one key interval, the interval with the lowest tiebreak value will be chosen. If no tiebreak is given, the first valid key interval will be chosen. :type tiebreak: (optional) pdarray, numeric :returns: Value of function corresponding to the keys interval containing each argument, or fillvalue if argument not in any interval. :rtype: pdarray .. py:function:: is_cosorted(arrays) Return True iff the arrays are cosorted, i.e., if the arrays were columns in a table then the rows are sorted. :param arrays: Arrays to check for cosortedness :type arrays: list-like of pdarrays :returns: True iff arrays are cosorted. :rtype: bool :raises ValueError: Raised if arrays are not the same length :raises TypeError: Raised if arrays is not a list-like of pdarrays .. py:function:: left_align(left, right) Map two arrays of sparse identifiers to the 0-up index set implied by the left array, discarding values from right that do not appear in left. .. py:function:: lookup(keys, values, arguments, fillvalue=-1) Apply the function defined by the mapping keys --> values to arguments. :param keys: The domain of the function. Entries must be unique (if a sequence of arrays is given, each row is treated as a tuple-valued entry). :type keys: (sequence of) array-like :param values: The range of the function. Must be same length as keys. :type values: pdarray :param arguments: The arguments on which to evaluate the function. Must have same dtype (or tuple of dtypes, for a sequence) as keys. :type arguments: (sequence of) array-like :param fillvalue: The default value to return for arguments not in keys. :type fillvalue: scalar :returns: **evaluated** -- The result of evaluating the function over arguments. :rtype: pdarray .. rubric:: Notes While the values cannot be Strings (or other complex objects), the same result can be achieved by passing an arange as the values, then using the return as indices into the desired object. .. rubric:: Examples # Lookup numbers by two-word name >>> keys1 = ak.array(['twenty' for _ in range(5)]) >>> keys2 = ak.array(['one', 'two', 'three', 'four', 'five']) >>> values = ak.array([21, 22, 23, 24, 25]) >>> args1 = ak.array(['twenty', 'thirty', 'twenty']) >>> args2 = ak.array(['four', 'two', 'two']) >>> aku.lookup([keys1, keys2], values, [args1, args2]) array([24, -1, 22]) # Other direction requires an intermediate index >>> revkeys = values >>> revindices = ak.arange(values.size) >>> revargs = ak.array([24, 21, 22]) >>> idx = aku.lookup(revkeys, revindices, revargs) >>> keys1[idx], keys2[idx] (array(['twenty', 'twenty', 'twenty']), array(['four', 'one', 'two'])) .. py:function:: right_align(left, right) Map two arrays of sparse values to the 0-up index set implied by the right array, discarding values from left that do not appear in right. :param left: Left-hand identifiers :type left: pdarray or a sequence of pdarrays :param right: Right-hand identifiers that define the index :type right: pdarray or a sequence of pdarrays :returns: * **keep** (*pdarray, bool*) -- Logical index of left-hand values that survived * **aligned** (*(pdarray, pdarray)*) -- Left and right arrays with values replaced by 0-up indices .. py:function:: search_intervals(vals, intervals, tiebreak=None, hierarchical=True) Given an array of query vals and non-overlapping, closed intervals, return the index of the best (see tiebreak) interval containing each query value, or -1 if not present in any interval. :param vals: Values to search for in intervals. If multiple arrays, each "row" is an item. :type vals: (sequence of) pdarray(int, uint, float) :param intervals: Non-overlapping, half-open intervals, as a tuple of (lower_bounds_inclusive, upper_bounds_exclusive) Must have same dtype(s) as vals. :type intervals: 2-tuple of (sequences of) pdarrays :param tiebreak: When a value is present in more than one interval, the interval with the lowest tiebreak value will be chosen. If no tiebreak is given, the first containing interval will be chosen. :type tiebreak: (optional) pdarray, numeric :param hierarchical: When True, sequences of pdarrays will be treated as components specifying a single dimension (i.e. hierarchical) When False, sequences of pdarrays will be specifying multi-dimensional intervals :type hierarchical: boolean :returns: **idx** -- Index of interval containing each query value, or -1 if not found :rtype: pdarray(int64) .. rubric:: Notes The return idx satisfies the following condition: present = idx > -1 ((intervals[0][idx[present]] <= vals[present]) & (intervals[1][idx[present]] >= vals[present])).all() .. rubric:: Examples >>> starts = (ak.array([0, 5]), ak.array([0, 11])) >>> ends = (ak.array([5, 9]), ak.array([10, 20])) >>> vals = (ak.array([0, 0, 2, 5, 5, 6, 6, 9]), ak.array([0, 20, 1, 5, 15, 0, 12, 30])) >>> ak.search_intervals(vals, (starts, ends), hierarchical=False) array([0 -1 0 0 1 -1 1 -1]) >>> ak.search_intervals(vals, (starts, ends)) array([0 0 0 0 1 1 1 -1]) >>> bi_starts = ak.bigint_from_uint_arrays([ak.cast(a, ak.uint64) for a in starts]) >>> bi_ends = ak.bigint_from_uint_arrays([ak.cast(a, ak.uint64) for a in ends]) >>> bi_vals = ak.bigint_from_uint_arrays([ak.cast(a, ak.uint64) for a in vals]) >>> bi_starts, bi_ends, bi_vals (array(["0" "92233720368547758091"]), array(["92233720368547758090" "166020696663385964564"]), array(["0" "20" "36893488147419103233" "92233720368547758085" "92233720368547758095" "110680464442257309696" "110680464442257309708" "166020696663385964574"])) >>> ak.search_intervals(bi_vals, (bi_starts, bi_ends)) array([0 0 0 0 1 1 1 -1]) .. py:function:: unsqueeze(p) .. py:function:: zero_up(vals) Map an array of sparse values to 0-up indices. :param vals: Array to map to dense index :type vals: pdarray :returns: **aligned** -- Array with values replaced by 0-up indices :rtype: pdarray