.. _examples-label:

*************
Examples
*************

Arkouda Arrays
====================

Arkouda arrays function similarly to arrays in NumPy, but allow for a much larger scale. In Arkouda, arrays are referred to as `pdarray` objects. It is possible to generate a `pdarray` from a Python `list`, NumPy `ndarray`, or using a generator method similar to those found in NumPy. This document aims to provide an introduction into some of the most commonly used elements of Arkouda and is not an exhaustive description of functionality.

`pdarray` Creation
-------------------

Like `ndarray` objects in NumPy, Arkouda `pdarray` objects can be generated from a Python `list`.

.. code-block:: python

    # create the Python List
    >> l = [0, 1, 2, 3, 4]
    
    # generate a pdarray
    >> ak_arr = ak.array(l)
    >> ak_arr
    array([0 1 2 3 4])

`pdarray` objects can be generated directly from an `ndarray`. This allows you to easily move objects into Arkouda from NumPy.

.. code-block:: python

    # create an ndarray
    >> np_arr = np.array([0, 1, 2, 3, 4])

    # generate a pdarray
    >> ak_arr = ak.array(np_arr)
    >> ak_arr
    array([0 1 2 3 4])

`pdarray` objects can be generated using generator calls such as `arange` and `randint`.

.. code-block:: python

    # arange
    >> ak_arr = ak.arange(10)
    >> ak_arr
    array([0 1 2 3 4 5 6 7 8 9])

    # randint(low, high, size)
    >> r = ak.randint(0, 100, 10)
    >> r # output will vary
    array([52 84 1 52 80 71 27 20 7 7])

Exporting `pdarray` Objects
---------------------------

Arkouda allows users to export `pdarray` objects to other formats to aide in transitioning between toolsets. A `pdarray` can be exported to a NumPy `ndarray` or a Python `list`.

.. code-block:: python

    # create pdarray
    >> ak_arr = ak.array([0, 1, 2, 3, 4])

    # export to ndarray
    >> np_arr = ak_arr.to_ndarray()
    >> np_arr
    array([0, 1, 2, 3, 4])

    # export to a Python List
    >> l = ak_arr.to_list()
    >> l
    [0, 1, 2, 3, 4]

`pdarray` Set operations
------------------------

Like NumPy, Arkouda supports set operations on `pdarray` objects. The supported set operations are 

- **IN** (`in1d`) : Test whether each element of a 1-D array is also present in a second array.
- **UNION** (`union1d`) : Compute the unique union of the arrays
- **INTERSECT** (`intersect1d`) : Compute the unique intersection of the arrays.
- **SET DIFFERENCE** (`setdiff1d`) : Compute the difference between the two arrays.
- **SYMMETRIC DIFFERENCE** (`setxor1d`) : Compute the exclusive-or of the two arrays.

One important note is that Arkouda takes this functionality beyond a single dimension. These operations can be performed on lists of `pdarrays` as well. We will look at `in1d` and `intersect1d` in both 1 dimension and multiple in the code block below.

.. code-block:: python

    # configure 2 pdarrays to run against
    >> a = ak.array([4, 2, 5, 6, 4, 7, 2])
    >> b = ak.array([1, 5, 4, 11, 9, 6])

    # compute boolean array indicating the values from a found in b.
    >> ak_in1d = ak.in1d(a, b)
    >> ak_in1d
    array([True False True True True False False])

    # compute array of unique values found in a and b
    >> ak_int = ak.intersect1d(a, b)
    >> ak_int
    array([4 5 6])

    # Arkouda can perform this operation on multiple arrays at once
    >> m1 =[
        ak.array([0, 1, 3, 4, 8, 5, 0]),
        ak.array([0, 9, 5, 1, 8, 5, 0])
    ]
    >> m2 =[
        ak.array([0, 1, 3, 4, 8, 7]),
        ak.array([0, 2, 5, 9, 8, 5])
    ]

    
    >> ak_in1dmult = ak.in1d(m1, m2)
    >> ak_in1dmulti
    array([True False True False True False True])
    
    >> ak_intmult = ak.intersect1d(m1, m2)
    >> ak_intmult
    [array([0 3 8]), array([0 5 8])]

There are a few things to keep in mind when working in the multi-dimension case. First, `m1` and `m2` must be Python `lists` containing the same number of `pdarray` elements. Second, the values are treated as a tuple. Using our example above, the first value of `m1` is viewed as `(0, 0)` during computation.

Arkouda DataFrames
====================

Like in Pandas, Arkouda supports the construct of a `DataFrame`. The structure of these objects is very similar, though some functionality may vary. `DataFrames` are extremely useful when working with multiple `pdarray` objects that are related. In Arkouda, `DataFrames` consist of an `Index` (which uses are `Arkouda.Index`), `Column Names` and `Column Data`.

Creating & Using a DataFrame
-----------------------------

Let's take a look at creating a `DataFrame` in Arkouda. Once again, we have several methods to create a `DataFrame` in Arkouda:

- Importing a Pandas `DataFrame`
- Python Mapping `{column_name: column_data}`. `column_data` must be `pdarray`. `column_name` will be used by the constructor to set the column names for the `DataFrame`

The most important thing to remember is that each column of an Arkouda `DataFrame` is a `pdarray` and must be provided as such. The only exception is when a Pandas DataFrame is being imported because the constructor will generate the `pdarray` objects for you from the columns of the Pandas `DataFrame`. 

Importing Pandas DataFrame
^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. code-block:: python

    # construct the Pandas DataFrame
    >> fname = ['John', 'Jane', 'John', 'Jake']
    >> lname = ['Doe', 'Doe', 'Smith', 'Brown']
    >> age = [37, 35, 50, 32]
    >> salary = [75000, 77000, 100000, 35000]
    >> pd_df = pd.DataFrame({
        'F_Name': fname,
        'L_Name': lname,
        'Age': age,
        'Salary': salary
    })
    >> pd_df
        F_Name L_Name  Age  Salary
    0   John    Doe   37   75000
    1   Jane    Doe   35   77000
    2   John  Smith   50  100000
    3   Jake  Brown   32   35000

    # call the Arkouda DataFrame constructor
    >> df = ak.DataFrame(pd_df)
    >> df
        F_Name L_Name  Age  Salary
    0   John    Doe   37   75000
    1   Jane    Doe   35   77000
    2   John  Smith   50  100000
    3   Jake  Brown   32   35000 (4 rows x 4 columns)

Python Mapping
^^^^^^^^^^^^^^^

.. code-block:: python

    >> fname = ak.array(['John', 'Jane', 'John', 'Jake'])
    >> lname = ak.array(['Doe', 'Doe', 'Smith', 'Brown'])
    >> age = ak.array([37, 35, 50, 32])
    >> salary = ak.array([75000, 77000, 100000, 35000])
    >> df = ak.DataFrame({
        'F_Name': fname,
        'L_Name': lname,
        'Age': age,
        'Salary': salary
    })

    >> df
        F_Name L_Name  Age  Salary
    0   John    Doe   37   75000
    1   Jane    Doe   35   77000
    2   John  Smith   50  100000
    3   Jake  Brown   32   35000 (4 rows x 4 columns)

**NOTICE**: Here the call to the Arkouda `DataFrame` constructor takes in very close to the same information as the Pandas constructor, but with one key difference. Each of the columns is an Arkouda `pdarray`.

Basic Interaction
^^^^^^^^^^^^^^^^^

**Please Note:** For this section we will be using the same `DataFrame` generated in the creation demos.

In this section, we will highlight some of the basics of `DataFrame` interaction in Arkouda. You should notice that it is very similar to interacting with a Pandas `DataFrame`.

.. code-block:: python

    # adding reference to dataframe created earlier for easy reference
    >> df
        F_Name L_Name  Age  Salary
    0   John    Doe   37   75000
    1   Jane    Doe   35   77000
    2   John  Smith   50  100000
    3   Jake  Brown   32   35000 (4 rows x 4 columns)

    # accessing a column
    >> df['Age']
    array([37 35 50 32])

    # accessing multiple columns at once
    >> df['L_Name', 'Age'] # equivalent to df[['L_Name', 'Age']]
        L_Name  Age
    0    Doe   37
    1    Doe   35
    2  Smith   50
    3  Brown   32 (4 rows x 2 columns)

    # accessing row
    >> df[0]
    {'F_Name': 'John', 'L_Name': 'Doe', 'Age': 37, 'Salary': 75000}

    # accessing row slice
    >> df[0:2]
        F_Name L_Name  Age  Salary
    0   John    Doe   37   75000
    1   Jane    Doe   35   77000 (2 rows x 4 columns)

    # accessing multiple indexes
    >> idx = ak.array([0, 2, 3])
    >> df[idx]
        F_Name L_Name  Age  Salary
    0   John    Doe   37   75000
    2   John  Smith   50  100000
    3   Jake  Brown   32   35000 (3 rows x 4 columns)

Exporting to Pandas
--------------------

Exporting an Arkouda `DataFrame` to Pandas is extremely simple using the `to_pandas` function. 

.. code-block:: python

    # adding reference to dataframe created earlier for easy reference
    >> df
        F_Name L_Name  Age  Salary
    0   John    Doe   37   75000
    1   Jane    Doe   35   77000
    2   John  Smith   50  100000
    3   Jake  Brown   32   35000 (4 rows x 4 columns)

    >> pd_df = df.to_pandas()
    >> pd_df
        F_Name L_Name  Age  Salary
    0   John    Doe   37   75000
    1   Jane    Doe   35   77000
    2   John  Smith   50  100000
    3   Jake  Brown   32   35000

GroupBy
====================

In Pandas, groupby-aggregate is a very useful pattern that can be computationally intensive. Arkouda supports grouping by key and most aggregations in Pandas. `GroupBy` functionality in Arkouda is supported on `pdarray` and `DataFrame` objects.

`pdarrays`
-----------

.. code-block:: python

    # using randint for more interesting results. Note values will vary
    >> x = ak.randint(0, 10, 100)
    >> g = ak.GroupBy(x)
    >> g.count()
    (array([0 1 2 3 4 5 6 7 8 9]), array([14 5 8 17 14 8 5 9 11 9]))

DataFrames
-----------

.. code-block:: python

    # adding reference to dataframe created earlier for easy reference
    >> df
        F_Name L_Name  Age  Salary
    0   John    Doe   37   75000
    1   Jane    Doe   35   77000
    2   John  Smith   50  100000
    3   Jake  Brown   32   35000 (4 rows x 4 columns)

    >> g = df.groupby("L_Name")
    >> g.count()
    Doe      2
    Brown    1
    Smith    1
    dtype: int64