SegArrays in Arkouda¶
In NumPy, arrays containing variable-length sub-arrays are supported as an array containing a single column. Each column contains another ndarray of some length. Depending on the chosen approach in NumPy, this can result in a loss of functioanlity. In Arkouda, the array containing variable-length sub-arrays is its own class: SegArray
In order to efficiently store arrays with varying row and column dimensions, Arkouda uses a “segmented array” data strucuture:
segments: Anint64array containing the start index of each sub-array within the flattened values arrayvalues: The flattened values of all sub-arrays
Performance¶
SegArray objects are currently processed entire on the Arkouda client side. The data structure is reflective of the data structure that will be used for Arkouda server side processing.
Iteration¶
Because SegArray is currently processing entirely on the Arkouda client side, iteration is natively supported. Thus, for row in segarr with iterate over each sub-array. Each of these sub-arrays is currently returned as a numpy.ndarray.
Similar to Strings, SegArrays will be moved to process server side. This will remove the ability to natively iterate to discourage transferring all of the objects data to the client. In order to support this moving forward, SegArray includes a to_ndarray() function. It is recommended that this function be used for iteration over SegArray objects, to prevent issues associated with moving processing server side. For more information on the usage of to_ndarray with SegArray
Operation¶
Arkouda SegArray objects support the following operations:
Indexing with integer, slice, integer
pdarray, and booleanpdarray(see Indexing and Assignment)Comparison (==) Provides an Arkouda
pdarraycontainingboolvalues indicating the equality of each sub-array in theSegArray.Array Set Operations, e.g.
uniqueConcatenation with other
SegArrays. Horizontal and vertical axis supported.