Summarizing Data¶
Descriptive Statistics¶
Simple descriptive statistics are available as reduction methods on pdarray
objects.
>>> A = ak.randint(-10, 11, 1000)
>>> A.min()
-10
>>> A.max()
10
>>> A.sum()
13
>>> A.mean()
0.013
>>> A.var()
36.934176000000015
>>> A.std()
6.07734942223993
The list of reductions supported on pdarray
objects is:
- pdarray.any(axis=None, keepdims=False)[source]¶
Return True iff any element of the array along the given axis evaluates to True.
- Parameters:
axis (int, Tuple[int, ...], optional, default = None) – The axis or axes along which to do the operation If None, the computation is done across the entire array.
keepdims (bool, optional, default = False) – Whether to keep the singleton dimension(s) along axis in the result.
- Returns:
boolean if axis is omitted, else pdarray if axis is supplied
- Return type:
boolean or pdarray
Examples
>>> ak.any(ak.array([True,False,False])) True >>> ak.any(ak.array([[True,True,False],[False,True,True]]),axis=0) array([True True True]) >>> ak.any(ak.array([[True,True,True],[False,False,False]]),axis=0,keepdims=True) array([array([True True True])]) >>> ak.any(ak.array([[True,True,True],[False,False,False]]),axis=1,keepdims=True) array([array([True]) array([False])]) >>> ak.array([True,False,False]).any() True
- Raises:
TypeError – Raised if pda is not a pdarray instance
RuntimeError – Raised if there’s a server-side error thrown
Notes
Works as a method of a pdarray (e.g. a.any()) or a standalone function (e.g. ak.any(a))
- pdarray.all(axis=None, keepdims=False)[source]¶
Return True iff all elements of the array along the given axis evaluate to True.
- Parameters:
axis (int, Tuple[int, ...], optional, default = None) – The axis or axes along which to do the operation If None, the computation is done across the entire array.
keepdims (bool, optional, default = False) – Whether to keep the singleton dimension(s) along axis in the result.
- Returns:
boolean if axis is omitted, pdarray if axis is supplied
- Return type:
boolean or pdarray
Examples
>>> ak.all(ak.array([True,False,False])) False >>> ak.all(ak.array([[True,True,False],[False,True,True]]),axis=0) array([False True False]) >>> ak.all(ak.array([[True,True,True],[False,False,False]]),axis=0,keepdims=True) array([array([False False False])]) >>> ak.all(ak.array([[True,True,True],[False,False,False]]),axis=1,keepdims=True) array([array([True]) array([False])]) >>> ak.array([True,False,False]).all() False
- Raises:
TypeError – Raised if pda is not a pdarray instance
RuntimeError – Raised if there’s a server-side error thrown
Notes
Works as a method of a pdarray (e.g. a.any()) or a standalone function (e.g. ak.all(a))
- pdarray.is_sorted(axis=None, keepdims=False)[source]¶
Return True iff the array (or given axis of the array) is monotonically non-decreasing.
- Parameters:
axis (int, Tuple[int, ...], optional, default = None) – The axis or axes along which to do the operation If None, the computation is done across the entire array.
keepdims (bool, optional, default = False) – Whether to keep the singleton dimension(s) along axis in the result.
- Returns:
boolean if axis is omitted, else pdarray if axis is supplied
- Return type:
boolean or pdarray
- Raises:
TypeError – Raised if pda is not a pdarray instance
RuntimeError – Raised if there’s a server-side error thrown
Examples
>>> ak.is_sorted(ak.array([1,2,3,4,5])) True >>> ak.is_sorted(ak.array([5,4,3,2,1])) False >>> ak.array([[1,2,3],[5,4,3]]).is_sorted(axis=1) array([True False])
Notes
Works as a method of a pdarray (e.g. a.is_sorted()) or a standalone function (e.g. ak.is_sorted(a))
- pdarray.sum(axis=None, keepdims=False)[source]¶
Return sum of array elements along the given axis.
- Parameters:
axis (int, Tuple[int, ...], optional, default = None) – The axis or axes along which to do the operation If None, the computation is done across the entire array.
keepdims (bool, optional, default = False) – Whether to keep the singleton dimension(s) along axis in the result.
- Returns:
numpy_scalar if axis is omitted, in which case operation is done over entire array pdarray if axis is supplied, in which case the operation is done along that axis
- Return type:
numpy_scalar or pdarray
- Raises:
TypeError – Raised if pda is not a pdarray instance
RuntimeError – Raised if there’s a server-side error thrown
Examples
>>> ak.sum(ak.array([1,2,3,4,5])) 15 >>> ak.sum(ak.array([5.5,4.5,3.5,2.5,1.5])) 17.5 >>> ak.array([[1,2,3],[5,4,3]]).sum(axis=1) array([6 12])
Notes
Works as a method of a pdarray (e.g. a.sum()) or a standalone function (e.g. ak.sum(a))
- pdarray.prod(axis=None, keepdims=False)[source]¶
Return prod of array elements along the given axis.
- Parameters:
axis (int, Tuple[int, ...], optional, defalt = None) – The axis or axes along which to do the operation If None, the computation is done across the entire array.
keepdims (bool, optional, default = False) – Whether to keep the singleton dimension(s) along axis in the result.
- Returns:
numpy_scalar if axis is omitted, in which case operation is done over entire array pdarray if axis is supplied, in which case the operation is done along that axis
- Return type:
numpy_scalar or pdarray
- Raises:
TypeError – Raised if pda is not a pdarray instance
RuntimeError – Raised if there’s a server-side error thrown
Examples
>>> ak.prod(ak.array([1,2,3,4,5])) 120 >>> ak.prod(ak.array([5.5,4.5,3.5,2.5,1.5])) 324.84375 >>> ak.array([[1,2,3],[5,4,3]]).prod(axis=1) array([6 60])
Notes
Works as a method of a pdarray (e.g. a.prod()) or a standalone function (e.g. ak.prod(a))
- pdarray.min(axis=None, keepdims=False)[source]¶
Return min of array elements along the given axis.
- Parameters:
axis (int, Tuple[int, ...], optional, default = None) – The axis or axes along which to do the operation If None, the computation is done across the entire array.
keepdims (bool, optional, default = False) – Whether to keep the singleton dimension(s) along axis in the result.
- Returns:
numpy_scalar if axis is omitted, in which case operation is done over entire array pdarray if axis is supplied, in which case the operation is done along that axis
- Return type:
numpy_scalar or pdarray
- Raises:
TypeError – Raised if pda is not a pdarray instance
RuntimeError – Raised if there’s a server-side error thrown
Examples
>>> ak.min(ak.array([1,2,3,4,5])) 1 >>> ak.min(ak.array([5.5,4.5,3.5,2.5,1.5])) 1.5 >>> ak.array([[1,2,3],[5,4,3]]).min(axis=1) array([1 3])
Notes
Works as a method of a pdarray (e.g. a.min()) or a standalone function (e.g. ak.min(a))
- pdarray.max(axis=None, keepdims=False)[source]¶
Return max of array elements along the given axis.
- Parameters:
axis (int, Tuple[int, ...], optional, default = None) – The axis or axes along which to do the operation If None, the computation is done across the entire array.
keepdims (bool, optional, default = False) – Whether to keep the singleton dimension(s) along axis in the result.
- Returns:
numpy_scalar if axis is omitted, in which case operation is done over entire array pdarray if axis is supplied, in which case the operation is done along that axis
- Return type:
numpy_scalar or pdarray
- Raises:
TypeError – Raised if pda is not a pdarray instance
RuntimeError – Raised if there’s a server-side error thrown
Examples
>>> ak.max(ak.array([1,2,3,4,5])) 5 >>> ak.max(ak.array([5.5,4.5,3.5,2.5,1.5])) 5.5 >>> ak.array([[1,2,3],[5,4,3]]).max(axis=1) array([3 5])
Notes
Works as a method of a pdarray (e.g. a.max()) or a standalone function (e.g. ak.max(a))
- pdarray.argmin(axis=None, keepdims=False)[source]¶
Return index of the first occurrence of the minimum along the given axis.
- Parameters:
axis (int, Tuple[int, ...], optional, default = None) – The axis or axes along which to do the operation If None, the computation is done across the entire array.
keepdims (bool, optional, default = False) – Whether to keep the singleton dimension(s) along axis in the result.
- Returns:
int64 or uint64 if axis is omitted, in which case operation is done over entire array pdarray if axis is supplied, in which case the operation is done along that axis
- Return type:
- Raises:
TypeError – Raised if pda is not a pdarray instance
RuntimeError – Raised if there’s a server-side error thrown
Examples
>>> ak.argmin(ak.array([1,2,3,4,5])) 0 >>> ak.argmin(ak.array([5.5,4.5,3.5,2.5,1.5])) 4 >>> ak.array([[1,2,3],[5,4,3]]).argmin(axis=1) array([0 2])
Notes
Works as a method of a pdarray (e.g. a.argmin()) or a standalone function (e.g. ak.argmin(a))
- pdarray.argmax(axis=None, keepdims=False)[source]¶
Return index of the first occurrence of the maximum along the given axis.
- Parameters:
axis (int, Tuple[int, ...], optional, default = None) – The axis or axes along which to do the operation If None, the computation is done across the entire array.
keepdims (bool, optional, default = False) – Whether to keep the singleton dimension(s) along axis in the result.
- Returns:
int64 or uint64 if axis is omitted, in which case operation is done over entire array pdarray if axis is supplied, in which case the operation is done along that axis
- Return type:
- Raises:
TypeError – Raised if pda is not a pdarray instance
RuntimeError – Raised if there’s a server-side error thrown
Examples
>>> ak.argmax(ak.array([1,2,3,4,5])) 4 >>> ak.argmax(ak.array([5.5,4.5,3.5,2.5,1.5])) 0 >>> ak.array([[1,2,3],[5,4,3]]).argmax(axis=1) array([2 0])
Notes
Works as a method of a pdarray (e.g. a.argmax()) or a standalone function (e.g. ak.argmax(a))
- pdarray.std(ddof=0)[source]¶
Compute the standard deviation. See
arkouda.std
for details.- Return type:
Histogram¶
Arkouda can compute simple histograms on pdarray
data. Currently, this function can only create histograms over evenly spaced bins between the min and max of the data. In the future, we plan to support using a pdarray
to define custom bin edges.
- arkouda.histogram(pda, bins=10)[source]¶
Compute a histogram of evenly spaced bins over the range of an array.
- Parameters:
pda (pdarray) – The values to histogram
bins (int_scalars, default=10) – The number of equal-size bins to use (default: 10)
- Returns:
The number of values present in each bin and the bin edges
- Return type:
- Raises:
TypeError – Raised if the parameter is not a pdarray or if bins is not an int.
ValueError – Raised if bins < 1
NotImplementedError – Raised if pdarray dtype is bool or uint8
See also
Notes
The bins are evenly spaced in the interval [pda.min(), pda.max()].
Examples
>>> import matplotlib.pyplot as plt >>> A = ak.arange(0, 10, 1) >>> nbins = 3 >>> h, b = ak.histogram(A, bins=nbins) >>> h array([3 3 4]) >>> b array([0.00000000000000000 3.00000000000000000 6.00000000000000000 9.00000000000000000]) # To plot, export the left edges and the histogram to NumPy >>> b_np = b.to_ndarray() >>> import numpy as np >>> b_widths = np.diff(b_np) >>> plt.bar(b_np[:-1], h.to_ndarray(), width=b_widths, align='edge', edgecolor='black') <BarContainer object of 3 artists> >>> plt.show()
Value Counts¶
For int64 pdarray
objects, it is often useful to count only the unique values that appear. This function finds all unique values and their counts.
- arkouda.value_counts(pda)[source]¶
Count the occurrences of the unique values of an array.
- Parameters:
pda (pdarray) – The array of values to count
- Return type:
tuple
[Union
[pdarray
,Strings
,Categorical
,Sequence
[Union
[pdarray
,Strings
,Categorical
]]],pdarray
]- Returns:
unique_values (pdarray, int64 or Strings) – The unique values, sorted in ascending order
counts (pdarray, int64) – The number of times the corresponding unique value occurs
- Raises:
TypeError – Raised if the parameter is not a pdarray
Notes
This function differs from
histogram()
in that it only returns counts for values that are present, leaving out empty “bins”. This function delegates all logic to the unique() method where the return_counts parameter is set to True.Examples
>>> A = ak.array([2, 0, 2, 4, 0, 0]) >>> ak.value_counts(A) (array([0 2 4]), array([3 2 1]))