Summarizing Data¶
Descriptive Statistics¶
Simple descriptive statistics are available as reduction methods on pdarray
objects.
>>> A = ak.randint(-10, 11, 1000)
>>> A.min()
-10
>>> A.max()
10
>>> A.sum()
13
>>> A.mean()
0.013
>>> A.var()
36.934176000000015
>>> A.std()
6.07734942223993
The list of reductions supported on pdarray
objects is:
- pdarray.any(axis=None, keepdims=False)[source]¶
Return True iff any element of the array evaluates to True.
- pdarray.all(axis=None, keepdims=False)[source]¶
Return True iff all elements of the array evaluate to True.
- pdarray.is_sorted(axis=None, keepdims=False)[source]¶
Return True iff the array is monotonically non-decreasing.
- Parameters:
None
- Returns:
Indicates if the array is monotonically non-decreasing
- Return type:
bool
- Raises:
TypeError – Raised if pda is not a pdarray instance
RuntimeError – Raised if there’s a server-side error thrown
- pdarray.prod(axis=None, keepdims=False)[source]¶
Return the product of all elements in the array. Return value is always a np.float64 or np.int64.
- pdarray.argmin(axis=None, keepdims=False)[source]¶
Return the index of the first occurrence of the array min value
- pdarray.argmax(axis=None, keepdims=False)[source]¶
Return the index of the first occurrence of the array max value.
- pdarray.var(ddof=0)[source]¶
Compute the variance. See
arkouda.var
for details.- Parameters:
ddof (int_scalars) – “Delta Degrees of Freedom” used in calculating var
- Returns:
The scalar variance of the array
- Return type:
np.float64
- Raises:
TypeError – Raised if pda is not a pdarray instance
ValueError – Raised if the ddof >= pdarray size
RuntimeError – Raised if there’s a server-side error thrown
- pdarray.std(ddof=0)[source]¶
Compute the standard deviation. See
arkouda.std
for details.- Parameters:
ddof (int_scalars) – “Delta Degrees of Freedom” used in calculating std
- Returns:
The scalar standard deviation of the array
- Return type:
np.float64
- Raises:
TypeError – Raised if pda is not a pdarray instance
RuntimeError – Raised if there’s a server-side error thrown
- pdarray.mink(k)[source]¶
Compute the minimum “k” values.
- Parameters:
k (int_scalars) – The desired count of maximum values to be returned by the output.
- Returns:
The maximum k values from pda
- Return type:
pdarray, int
- Raises:
TypeError – Raised if pda is not a pdarray
- pdarray.maxk(k)[source]¶
Compute the maximum “k” values.
- Parameters:
k (int_scalars) – The desired count of maximum values to be returned by the output.
- Returns:
The maximum k values from pda
- Return type:
pdarray, int
- Raises:
TypeError – Raised if pda is not a pdarray
- pdarray.argmink(k)[source]¶
Compute the minimum “k” values.
- Parameters:
k (int_scalars) – The desired count of maximum values to be returned by the output.
- Returns:
Indices corresponding to the maximum k values from pda
- Return type:
pdarray, int
- Raises:
TypeError – Raised if pda is not a pdarray
- pdarray.argmaxk(k)[source]¶
Finds the indices corresponding to the maximum “k” values.
- Parameters:
k (int_scalars) – The desired count of maximum values to be returned by the output.
- Returns:
Indices corresponding to the maximum k values, sorted
- Return type:
pdarray, int
- Raises:
TypeError – Raised if pda is not a pdarray
Histogram¶
Arkouda can compute simple histograms on pdarray
data. Currently, this function can only create histograms over evenly spaced bins between the min and max of the data. In the future, we plan to support using a pdarray
to define custom bin edges.
- arkouda.histogram(pda, bins=10)[source]¶
Compute a histogram of evenly spaced bins over the range of an array.
- Parameters:
pda (pdarray) – The values to histogram
bins (int_scalars) – The number of equal-size bins to use (default: 10)
- Returns:
Bin edges and The number of values present in each bin
- Return type:
- Raises:
TypeError – Raised if the parameter is not a pdarray or if bins is not an int.
ValueError – Raised if bins < 1
NotImplementedError – Raised if pdarray dtype is bool or uint8
See also
Notes
The bins are evenly spaced in the interval [pda.min(), pda.max()].
Examples
>>> import matplotlib.pyplot as plt >>> A = ak.arange(0, 10, 1) >>> nbins = 3 >>> h, b = ak.histogram(A, bins=nbins) >>> h array([3, 3, 4]) >>> b array([0., 3., 6., 9.])
# To plot, export the left edges and the histogram to NumPy >>> plt.plot(b.to_ndarray()[::-1], h.to_ndarray())
Value Counts¶
For int64 pdarray
objects, it is often useful to count only the unique values that appear. This function finds all unique values and their counts.
- arkouda.value_counts(pda)[source]¶
Count the occurrences of the unique values of an array.
- Parameters:
- Return type:
tuple
[Union
[pdarray
,Strings
,Categorical
,Sequence
[Union
[pdarray
,Strings
,Categorical
]]],pdarray
]- Returns:
unique_values (pdarray, int64 or Strings) – The unique values, sorted in ascending order
counts (pdarray, int64) – The number of times the corresponding unique value occurs
- Raises:
TypeError – Raised if the parameter is not a pdarray
Notes
This function differs from
histogram()
in that it only returns counts for values that are present, leaving out empty “bins”. This function delegates all logic to the unique() method where the return_counts parameter is set to True.Examples
>>> A = ak.array([2, 0, 2, 4, 0, 0]) >>> ak.value_counts(A) (array([0, 2, 4]), array([3, 2, 1]))