SegArrays in Arkouda¶

In NumPy, arrays containing variable-length sub-arrays are supported as an array containing a single column. Each column contains another ndarray of some length. Depending on the chosen approach in NumPy, this can result in a loss of functioanlity. In Arkouda, the array containing variable-length sub-arrays is its own class: SegArray

In order to efficiently store arrays with varying row and column dimensions, Arkouda uses a “segmented array” data strucuture:

segments: An int64 array containing the start index of each sub-array within the flattened values array
values: The flattened values of all sub-arrays

Performance¶

SegArray objects are currently processed entire on the Arkouda client side. The data structure is reflective of the data structure that will be used for Arkouda server side processing.

Iteration¶

Because SegArray is currently processing entirely on the Arkouda client side, iteration is natively supported. Thus, for row in segarr with iterate over each sub-array. Each of these sub-arrays is currently returned as a numpy.ndarray.

Similar to Strings, SegArrays will be moved to process server side. This will remove the ability to natively iterate to discourage transferring all of the objects data to the client. In order to support this moving forward, SegArray includes a to_ndarray() function. It is recommended that this function be used for iteration over SegArray objects, to prevent issues associated with moving processing server side. For more information on the usage of to_ndarray with SegArray

Operation¶

Arkouda SegArray objects support the following operations:

Indexing with integer, slice, integer pdarray, and boolean pdarray (see Indexing and Assignment)
Comparison (==) Provides an Arkouda pdarray containing bool values indicating the equality of each sub-array in the SegArray.
Array Set Operations, e.g. unique
Concatenation with other SegArrays. Horizontal and vertical axis supported.

SegArrays in Arkouda¶

Performance¶

Iteration¶

Operation¶

SegArray Specific Methods¶

Prefix & Suffix¶

NGrams¶

Sub-array of Size¶

Access/Set Specific Elements in Sub-Array¶

Append & Prepend¶

Deduplication¶

SegArray SetOps¶

Union¶

Intersect¶

Set Difference¶

Symmetric Difference¶