Summarizing Data

Descriptive Statistics

Simple descriptive statistics are available as reduction methods on pdarray objects.

>>> A = ak.randint(-10, 11, 1000)
>>> A.min()
-10
>>> A.max()
10
>>> A.sum()
13
>>> A.mean()
0.013
>>> A.var()
36.934176000000015
>>> A.std()
6.07734942223993

The list of reductions supported on pdarray objects is:

pdarray.any(axis=None, keepdims=False)[source]

Return True iff any element of the array along the given axis evaluates to True.

Parameters:
  • axis (int, Tuple[int, ...], optional, default = None) – The axis or axes along which to do the operation If None, the computation is done across the entire array.

  • keepdims (bool, optional, default = False) – Whether to keep the singleton dimension(s) along axis in the result.

Returns:

boolean if axis is omitted, else pdarray if axis is supplied

Return type:

boolean or pdarray

Examples

>>> ak.any(ak.array([True,False,False]))
True
>>> ak.any(ak.array([[True,True,False],[False,True,True]]),axis=0)
array([True True True])
>>> ak.any(ak.array([[True,True,True],[False,False,False]]),axis=0,keepdims=True)
array([array([True True True])])
>>> ak.any(ak.array([[True,True,True],[False,False,False]]),axis=1,keepdims=True)
array([array([True]) array([False])])
>>> ak.array([True,False,False]).any()
True
Raises:
  • TypeError – Raised if pda is not a pdarray instance

  • RuntimeError – Raised if there’s a server-side error thrown

Notes

Works as a method of a pdarray (e.g. a.any()) or a standalone function (e.g. ak.any(a))

pdarray.all(axis=None, keepdims=False)[source]

Return True iff all elements of the array along the given axis evaluate to True.

Parameters:
  • axis (int, Tuple[int, ...], optional, default = None) – The axis or axes along which to do the operation If None, the computation is done across the entire array.

  • keepdims (bool, optional, default = False) – Whether to keep the singleton dimension(s) along axis in the result.

Returns:

boolean if axis is omitted, pdarray if axis is supplied

Return type:

boolean or pdarray

Examples

>>> ak.all(ak.array([True,False,False]))
False
>>> ak.all(ak.array([[True,True,False],[False,True,True]]),axis=0)
array([False True False])
>>> ak.all(ak.array([[True,True,True],[False,False,False]]),axis=0,keepdims=True)
array([array([False False False])])
>>> ak.all(ak.array([[True,True,True],[False,False,False]]),axis=1,keepdims=True)
array([array([True]) array([False])])
>>> ak.array([True,False,False]).all()
False
Raises:
  • TypeError – Raised if pda is not a pdarray instance

  • RuntimeError – Raised if there’s a server-side error thrown

Notes

Works as a method of a pdarray (e.g. a.any()) or a standalone function (e.g. ak.all(a))

pdarray.is_sorted(axis=None, keepdims=False)[source]

Return True iff the array (or given axis of the array) is monotonically non-decreasing.

Parameters:
  • axis (int, Tuple[int, ...], optional, default = None) – The axis or axes along which to do the operation If None, the computation is done across the entire array.

  • keepdims (bool, optional, default = False) – Whether to keep the singleton dimension(s) along axis in the result.

Returns:

boolean if axis is omitted, else pdarray if axis is supplied

Return type:

boolean or pdarray

Raises:
  • TypeError – Raised if pda is not a pdarray instance

  • RuntimeError – Raised if there’s a server-side error thrown

Examples

>>> ak.is_sorted(ak.array([1,2,3,4,5]))
True
>>> ak.is_sorted(ak.array([5,4,3,2,1]))
False
>>> ak.array([[1,2,3],[5,4,3]]).is_sorted(axis=1)
array([True False])

Notes

Works as a method of a pdarray (e.g. a.is_sorted()) or a standalone function (e.g. ak.is_sorted(a))

pdarray.sum(axis=None, keepdims=False)[source]

Return sum of array elements along the given axis.

Parameters:
  • axis (int, Tuple[int, ...], optional, default = None) – The axis or axes along which to do the operation If None, the computation is done across the entire array.

  • keepdims (bool, optional, default = False) – Whether to keep the singleton dimension(s) along axis in the result.

Returns:

numpy_scalar if axis is omitted, in which case operation is done over entire array pdarray if axis is supplied, in which case the operation is done along that axis

Return type:

numpy_scalar or pdarray

Raises:
  • TypeError – Raised if pda is not a pdarray instance

  • RuntimeError – Raised if there’s a server-side error thrown

Examples

>>> ak.sum(ak.array([1,2,3,4,5]))
15
>>> ak.sum(ak.array([5.5,4.5,3.5,2.5,1.5]))
17.5
>>> ak.array([[1,2,3],[5,4,3]]).sum(axis=1)
array([6 12])

Notes

Works as a method of a pdarray (e.g. a.sum()) or a standalone function (e.g. ak.sum(a))

pdarray.prod(axis=None, keepdims=False)[source]

Return prod of array elements along the given axis.

Parameters:
  • axis (int, Tuple[int, ...], optional, defalt = None) – The axis or axes along which to do the operation If None, the computation is done across the entire array.

  • keepdims (bool, optional, default = False) – Whether to keep the singleton dimension(s) along axis in the result.

Returns:

numpy_scalar if axis is omitted, in which case operation is done over entire array pdarray if axis is supplied, in which case the operation is done along that axis

Return type:

numpy_scalar or pdarray

Raises:
  • TypeError – Raised if pda is not a pdarray instance

  • RuntimeError – Raised if there’s a server-side error thrown

Examples

>>> ak.prod(ak.array([1,2,3,4,5]))
120
>>> ak.prod(ak.array([5.5,4.5,3.5,2.5,1.5]))
324.84375
>>> ak.array([[1,2,3],[5,4,3]]).prod(axis=1)
array([6 60])

Notes

Works as a method of a pdarray (e.g. a.prod()) or a standalone function (e.g. ak.prod(a))

pdarray.min(axis=None, keepdims=False)[source]

Return min of array elements along the given axis.

Parameters:
  • axis (int, Tuple[int, ...], optional, default = None) – The axis or axes along which to do the operation If None, the computation is done across the entire array.

  • keepdims (bool, optional, default = False) – Whether to keep the singleton dimension(s) along axis in the result.

Returns:

numpy_scalar if axis is omitted, in which case operation is done over entire array pdarray if axis is supplied, in which case the operation is done along that axis

Return type:

numpy_scalar or pdarray

Raises:
  • TypeError – Raised if pda is not a pdarray instance

  • RuntimeError – Raised if there’s a server-side error thrown

Examples

>>> ak.min(ak.array([1,2,3,4,5]))
1
>>> ak.min(ak.array([5.5,4.5,3.5,2.5,1.5]))
1.5
>>> ak.array([[1,2,3],[5,4,3]]).min(axis=1)
array([1 3])

Notes

Works as a method of a pdarray (e.g. a.min()) or a standalone function (e.g. ak.min(a))

pdarray.max(axis=None, keepdims=False)[source]

Return max of array elements along the given axis.

Parameters:
  • axis (int, Tuple[int, ...], optional, default = None) – The axis or axes along which to do the operation If None, the computation is done across the entire array.

  • keepdims (bool, optional, default = False) – Whether to keep the singleton dimension(s) along axis in the result.

Returns:

numpy_scalar if axis is omitted, in which case operation is done over entire array pdarray if axis is supplied, in which case the operation is done along that axis

Return type:

numpy_scalar or pdarray

Raises:
  • TypeError – Raised if pda is not a pdarray instance

  • RuntimeError – Raised if there’s a server-side error thrown

Examples

>>> ak.max(ak.array([1,2,3,4,5]))
5
>>> ak.max(ak.array([5.5,4.5,3.5,2.5,1.5]))
5.5
>>> ak.array([[1,2,3],[5,4,3]]).max(axis=1)
array([3 5])

Notes

Works as a method of a pdarray (e.g. a.max()) or a standalone function (e.g. ak.max(a))

pdarray.argmin(axis=None, keepdims=False)[source]

Return index of the first occurrence of the minimum along the given axis.

Parameters:
  • axis (int, Tuple[int, ...], optional, default = None) – The axis or axes along which to do the operation If None, the computation is done across the entire array.

  • keepdims (bool, optional, default = False) – Whether to keep the singleton dimension(s) along axis in the result.

Returns:

int64 or uint64 if axis is omitted, in which case operation is done over entire array pdarray if axis is supplied, in which case the operation is done along that axis

Return type:

int64, uint64 or pdarray

Raises:
  • TypeError – Raised if pda is not a pdarray instance

  • RuntimeError – Raised if there’s a server-side error thrown

Examples

>>> ak.argmin(ak.array([1,2,3,4,5]))
0
>>> ak.argmin(ak.array([5.5,4.5,3.5,2.5,1.5]))
4
>>> ak.array([[1,2,3],[5,4,3]]).argmin(axis=1)
array([0 2])

Notes

Works as a method of a pdarray (e.g. a.argmin()) or a standalone function (e.g. ak.argmin(a))

pdarray.argmax(axis=None, keepdims=False)[source]

Return index of the first occurrence of the maximum along the given axis.

Parameters:
  • axis (int, Tuple[int, ...], optional, default = None) – The axis or axes along which to do the operation If None, the computation is done across the entire array.

  • keepdims (bool, optional, default = False) – Whether to keep the singleton dimension(s) along axis in the result.

Returns:

int64 or uint64 if axis is omitted, in which case operation is done over entire array pdarray if axis is supplied, in which case the operation is done along that axis

Return type:

int64, uint64 or pdarray

Raises:
  • TypeError – Raised if pda is not a pdarray instance

  • RuntimeError – Raised if there’s a server-side error thrown

Examples

>>> ak.argmax(ak.array([1,2,3,4,5]))
4
>>> ak.argmax(ak.array([5.5,4.5,3.5,2.5,1.5]))
0
>>> ak.array([[1,2,3],[5,4,3]]).argmax(axis=1)
array([2 0])

Notes

Works as a method of a pdarray (e.g. a.argmax()) or a standalone function (e.g. ak.argmax(a))

pdarray.mean()[source]

Compute the mean. See arkouda.mean for details.

Return type:

float64

pdarray.var(ddof=0)[source]

Compute the variance. See arkouda.var for details.

Return type:

float64

pdarray.std(ddof=0)[source]

Compute the standard deviation. See arkouda.std for details.

Return type:

float64

pdarray.mink(k)[source]

Compute the minimum “k” values. See arkouda.mink for details.

Return type:

pdarray

pdarray.maxk(k)[source]

Compute the maximum “k” values. See arkouda.maxk for details.

Return type:

pdarray

pdarray.argmink(k)[source]

Finds the indices corresponding to the k minimum values of an array. See arkouda.argmink for details.

Return type:

pdarray

pdarray.argmaxk(k)[source]

Finds the indices corresponding to the k maximum values of an array. See arkouda.argmaxk for details.

Return type:

pdarray

Histogram

Arkouda can compute simple histograms on pdarray data. Currently, this function can only create histograms over evenly spaced bins between the min and max of the data. In the future, we plan to support using a pdarray to define custom bin edges.

arkouda.histogram(pda, bins=10)[source]

Compute a histogram of evenly spaced bins over the range of an array.

Parameters:
  • pda (pdarray) – The values to histogram

  • bins (int_scalars, default=10) – The number of equal-size bins to use (default: 10)

Returns:

The number of values present in each bin and the bin edges

Return type:

(pdarray, Union[pdarray, int64 or float64])

Raises:
  • TypeError – Raised if the parameter is not a pdarray or if bins is not an int.

  • ValueError – Raised if bins < 1

  • NotImplementedError – Raised if pdarray dtype is bool or uint8

Notes

The bins are evenly spaced in the interval [pda.min(), pda.max()].

Examples

>>> import matplotlib.pyplot as plt
>>> A = ak.arange(0, 10, 1)
>>> nbins = 3
>>> h, b = ak.histogram(A, bins=nbins)
>>> h
array([3 3 4])
>>> b
array([0.00000000000000000 3.00000000000000000 6.00000000000000000 9.00000000000000000])
# To plot, export the left edges and the histogram to NumPy
>>> b_np = b.to_ndarray()
>>> import numpy as np
>>> b_widths = np.diff(b_np)
>>> plt.bar(b_np[:-1], h.to_ndarray(), width=b_widths, align='edge', edgecolor='black')
<BarContainer object of 3 artists>
>>> plt.show()

Value Counts

For int64 pdarray objects, it is often useful to count only the unique values that appear. This function finds all unique values and their counts.

arkouda.value_counts(pda)[source]

Count the occurrences of the unique values of an array.

Parameters:

pda (pdarray) – The array of values to count

Return type:

tuple[Union[pdarray, Strings, Categorical, Sequence[Union[pdarray, Strings, Categorical]]], pdarray]

Returns:

  • unique_values (pdarray, int64 or Strings) – The unique values, sorted in ascending order

  • counts (pdarray, int64) – The number of times the corresponding unique value occurs

Raises:

TypeError – Raised if the parameter is not a pdarray

See also

unique, histogram

Notes

This function differs from histogram() in that it only returns counts for values that are present, leaving out empty “bins”. This function delegates all logic to the unique() method where the return_counts parameter is set to True.

Examples

>>> A = ak.array([2, 0, 2, 4, 0, 0])
>>> ak.value_counts(A)
(array([0 2 4]), array([3 2 1]))