arkouda.plotting¶
Plotting utilities for Arkouda data structures.
The arkouda.plotting module provides lightweight, matplotlib-based
visualization functions for Arkouda arrays and DataFrames. These tools are
intended for exploratory data analysis, especially for understanding
distributions and skew across numeric or categorical data columns.
Functions¶
- plot_dist
Plot the histogram and cumulative distribution for binned data. Useful for visualizing data generated from
ak.histogram.- hist_all
Generate histograms for all numeric columns in an Arkouda DataFrame (or a specified subset of columns). Automatically computes the number of bins using Doane’s formula and handles missing values, datetime, and categorical data appropriately.
Notes
These functions require
matplotlib.pyplotand are meant for interactive Python sessions or Jupyter notebooks.plot_distdoes not callplt.show()automatically; you must call it manually to display the plot.hist_allhandles categorical grouping via Arkouda’sGroupByand supportsDatetimeandTimedeltaplotting by converting them to numeric types.
Examples
>>> import arkouda as ak
>>> import numpy as np
>>> from arkouda.plotting import hist_all, plot_dist
>>> df = ak.DataFrame({'x': ak.array(np.random.randn(100))})
>>> fig, axes = hist_all(df)
>>> fig.savefig("hist_all.png")
>>> b, h = ak.histogram(ak.arange(10), 3)
>>> plot_dist(b.to_ndarray(), h[:-1].to_ndarray())
(<Figure size 1200x500 with 2 Axes>, array([<Axes: title={'center': 'distribution'}>,
<Axes: title={'center': 'cumulative distribution'}>], dtype=object))
>>> import matplotlib.pyplot as plt
>>> plt.show()
See also
matplotlib.pyplot, arkouda.DataFrame, arkouda.histogram
Functions¶
Module Contents¶
- arkouda.plotting.hist_all(ak_df: arkouda.pandas.dataframe.DataFrame, cols: list[str] | None = None)[source]¶
Create a grid of histograms for numeric columns in an Arkouda DataFrame.
- Parameters:
ak_df (DataFrame) – An Arkouda DataFrame containing the data to visualize.
cols (list, optional) – A list of column names to plot. If empty or not provided, all columns in the DataFrame are considered.
- Returns:
(fig, axes)wherefigis the matplotlib Figure andaxesis an array of Axes objects.- Return type:
tuple[matplotlib.figure.Figure, numpy.ndarray]
Notes
This function uses
matplotlibto display a grid of histograms. It attempts to select a suitable number of bins using Doane’s formula. Columns with non-numeric types are grouped and encoded before plotting.Examples
Basic usage with all columns:
>>> import arkouda as ak >>> import numpy as np >>> from arkouda.plotting import hist_all >>> ak_df = ak.DataFrame({ ... "a": ak.array(np.random.randn(100)), ... "b": ak.array(np.random.randn(100)), ... "c": ak.array(np.random.randn(100)), ... "d": ak.array(np.random.randn(100)), ... }) >>> fig, axes = hist_all(ak_df)
Save the figure to disk:
>>> fig, axes = hist_all(ak_df, cols=["a", "b"]) >>> fig.savefig("hist_all.png")
- arkouda.plotting.plot_dist(b: arkouda.numpy.pdarrayclass.pdarray | numpy.typing.NDArray[numpy.floating], h: arkouda.numpy.pdarrayclass.pdarray | numpy.typing.NDArray[numpy.floating], *, log: bool = True, xlabel: str | None = None, newfig: bool = True, show: bool = False) Tuple[matplotlib.figure.Figure, numpy.ndarray][source]¶
Plot the distribution and cumulative distribution of histogram data.
- Parameters:
b (arkouda.pdarray or numpy.ndarray) – Histogram bin edges (length N+1) or bin centers (length N).
h (arkouda.pdarray or numpy.ndarray) – Histogram counts. Accepts length N or N+1 (Arkouda-like extra last bin).
log (bool, default True) – If True, use a log scale for the y-axis of the distribution plot.
xlabel (str, optional) – Label for the x-axis.
newfig (bool, default True) – If True, create a new figure; otherwise draw into the current figure.
show (bool, default False) – If True, call
plt.show()before returning.
- Returns:
(fig, axes)whereaxes[0]is the distribution plot andaxes[1]is the cumulative distribution plot.- Return type:
tuple[matplotlib.figure.Figure, numpy.ndarray]
Notes
If
his one element longer than expected (as withak.histogram), the final element is dropped automatically.Examples
Using Arkouda’s histogram:
>>> import arkouda as ak >>> import numpy as np >>> from matplotlib import pyplot as plt >>> from arkouda.plotting import plot_dist >>> edges, counts = ak.histogram(ak.arange(10), 3) >>> fig, axes = plot_dist(edges, counts) >>> fig.savefig("dist.png")
Using NumPy’s histogram:
>>> data = np.random.randn(1000) >>> counts, edges = np.histogram(data, bins=20) >>> fig, axes = plot_dist(edges, counts, xlabel="Value") >>> plt.show()