Summarizing Data

Descriptive Statistics

Simple descriptive statistics are available as reduction methods on pdarray objects.

>>> A = ak.randint(-10, 11, 1000)
>>> A.min()
-10
>>> A.max()
10
>>> A.sum()
13
>>> A.mean()
0.013
>>> A.var()
36.934176000000015
>>> A.std()
6.07734942223993

The list of reductions supported on pdarray objects is:

pdarray.any()[source]

Return True iff any element of the array evaluates to True.

Return type:

bool_

pdarray.all()[source]

Return True iff all elements of the array evaluate to True.

Return type:

bool_

pdarray.is_sorted()[source]

Return True iff the array is monotonically non-decreasing.

Parameters:

None

Returns:

Indicates if the array is monotonically non-decreasing

Return type:

bool

Raises:
  • TypeError – Raised if pda is not a pdarray instance

  • RuntimeError – Raised if there’s a server-side error thrown

pdarray.sum()[source]

Return the sum of all elements in the array.

Return type:

Union[bool, bool_, float, float64, float32, int, int8, int16, int32, int64, uint8, uint16, uint32, uint64]

pdarray.prod()[source]

Return the product of all elements in the array. Return value is always a np.float64 or np.int64.

Return type:

float64

pdarray.min()[source]

Return the minimum value of the array.

Return type:

Union[float64, float32, int8, int16, int32, int64, bool_, str_, uint8, uint16, uint32, uint64]

pdarray.max()[source]

Return the maximum value of the array.

Return type:

Union[float64, float32, int8, int16, int32, int64, bool_, str_, uint8, uint16, uint32, uint64]

pdarray.argmin()[source]

Return the index of the first occurrence of the array min value

Return type:

Union[int64, uint64]

pdarray.argmax()[source]

Return the index of the first occurrence of the array max value.

Return type:

Union[int64, uint64]

pdarray.mean()[source]

Return the mean of the array.

Return type:

float64

pdarray.var(ddof=0)[source]

Compute the variance. See arkouda.var for details.

Parameters:

ddof (int_scalars) – “Delta Degrees of Freedom” used in calculating var

Returns:

The scalar variance of the array

Return type:

np.float64

Raises:
  • TypeError – Raised if pda is not a pdarray instance

  • ValueError – Raised if the ddof >= pdarray size

  • RuntimeError – Raised if there’s a server-side error thrown

pdarray.std(ddof=0)[source]

Compute the standard deviation. See arkouda.std for details.

Parameters:

ddof (int_scalars) – “Delta Degrees of Freedom” used in calculating std

Returns:

The scalar standard deviation of the array

Return type:

np.float64

Raises:
  • TypeError – Raised if pda is not a pdarray instance

  • RuntimeError – Raised if there’s a server-side error thrown

pdarray.mink(k)[source]

Compute the minimum “k” values.

Parameters:

k (int_scalars) – The desired count of maximum values to be returned by the output.

Returns:

The maximum k values from pda

Return type:

pdarray, int

Raises:

TypeError – Raised if pda is not a pdarray

pdarray.maxk(k)[source]

Compute the maximum “k” values.

Parameters:

k (int_scalars) – The desired count of maximum values to be returned by the output.

Returns:

The maximum k values from pda

Return type:

pdarray, int

Raises:

TypeError – Raised if pda is not a pdarray

pdarray.argmink(k)[source]

Compute the minimum “k” values.

Parameters:

k (int_scalars) – The desired count of maximum values to be returned by the output.

Returns:

Indices corresponding to the maximum k values from pda

Return type:

pdarray, int

Raises:

TypeError – Raised if pda is not a pdarray

pdarray.argmaxk(k)[source]

Finds the indices corresponding to the maximum “k” values.

Parameters:

k (int_scalars) – The desired count of maximum values to be returned by the output.

Returns:

Indices corresponding to the maximum k values, sorted

Return type:

pdarray, int

Raises:

TypeError – Raised if pda is not a pdarray

Histogram

Arkouda can compute simple histograms on pdarray data. Currently, this function can only create histograms over evenly spaced bins between the min and max of the data. In the future, we plan to support using a pdarray to define custom bin edges.

arkouda.histogram(pda, bins=10)[source]

Compute a histogram of evenly spaced bins over the range of an array.

Parameters:
  • pda (pdarray) – The values to histogram

  • bins (int_scalars) – The number of equal-size bins to use (default: 10)

Returns:

Bin edges and The number of values present in each bin

Return type:

(pdarray, Union[pdarray, int64 or float64])

Raises:
  • TypeError – Raised if the parameter is not a pdarray or if bins is not an int.

  • ValueError – Raised if bins < 1

  • NotImplementedError – Raised if pdarray dtype is bool or uint8

See also

value_counts, histogram2d

Notes

The bins are evenly spaced in the interval [pda.min(), pda.max()].

Examples

>>> import matplotlib.pyplot as plt
>>> A = ak.arange(0, 10, 1)
>>> nbins = 3
>>> h, b = ak.histogram(A, bins=nbins)
>>> h
array([3, 3, 4])
>>> b
array([0., 3., 6., 9.])

# To plot, export the left edges and the histogram to NumPy >>> plt.plot(b.to_ndarray()[::-1], h.to_ndarray())

Value Counts

For int64 pdarray objects, it is often useful to count only the unique values that appear. This function finds all unique values and their counts.

arkouda.value_counts(pda)[source]

Count the occurrences of the unique values of an array.

Parameters:

pda (pdarray, int64) – The array of values to count

Return type:

Union[Categorical, Tuple[Union[pdarray, Strings], Optional[pdarray]]]

Returns:

  • unique_values (pdarray, int64 or Strings) – The unique values, sorted in ascending order

  • counts (pdarray, int64) – The number of times the corresponding unique value occurs

Raises:

TypeError – Raised if the parameter is not a pdarray

See also

unique, histogram

Notes

This function differs from histogram() in that it only returns counts for values that are present, leaving out empty “bins”. This function delegates all logic to the unique() method where the return_counts parameter is set to True.

Examples

>>> A = ak.array([2, 0, 2, 4, 0, 0])
>>> ak.value_counts(A)
(array([0, 2, 4]), array([3, 2, 1]))