arkouda.plotting

Plotting utilities for Arkouda data structures.

The arkouda.plotting module provides lightweight, matplotlib-based visualization functions for Arkouda arrays and DataFrames. These tools are intended for exploratory data analysis, especially for understanding distributions and skew across numeric or categorical data columns.

Functions

plot_dist

Plot the histogram and cumulative distribution for binned data. Useful for visualizing data generated from ak.histogram.

hist_all

Generate histograms for all numeric columns in an Arkouda DataFrame (or a specified subset of columns). Automatically computes the number of bins using Doane’s formula and handles missing values, datetime, and categorical data appropriately.

Notes

  • These functions require matplotlib.pyplot and are meant for interactive Python sessions or Jupyter notebooks.

  • plot_dist does not call plt.show() automatically; you must call it manually to display the plot.

  • hist_all handles categorical grouping via Arkouda’s GroupBy and supports Datetime and Timedelta plotting by converting them to numeric types.

Examples

>>> import arkouda as ak
>>> import numpy as np
>>> from arkouda.plotting import hist_all, plot_dist
>>> df = ak.DataFrame({'x': ak.array(np.random.randn(100))})
>>> fig, axes = hist_all(df)
>>> fig.savefig("hist_all.png")
>>> b, h = ak.histogram(ak.arange(10), 3)
>>> plot_dist(b.to_ndarray(), h[:-1].to_ndarray())
(<Figure size 1200x500 with 2 Axes>, array([<Axes: title={'center': 'distribution'}>,
       <Axes: title={'center': 'cumulative distribution'}>], dtype=object))
>>> import matplotlib.pyplot as plt
>>> plt.show()

See also

matplotlib.pyplot, arkouda.DataFrame, arkouda.histogram

Functions

hist_all(ak_df[, cols])

Create a grid of histograms for numeric columns in an Arkouda DataFrame.

plot_dist(→ Tuple[matplotlib.figure.Figure, numpy.ndarray])

Plot the distribution and cumulative distribution of histogram data.

Module Contents

arkouda.plotting.hist_all(ak_df: arkouda.pandas.dataframe.DataFrame, cols: list[str] | None = None)[source]

Create a grid of histograms for numeric columns in an Arkouda DataFrame.

Parameters:
  • ak_df (DataFrame) – An Arkouda DataFrame containing the data to visualize.

  • cols (list, optional) – A list of column names to plot. If empty or not provided, all columns in the DataFrame are considered.

Returns:

(fig, axes) where fig is the matplotlib Figure and axes is an array of Axes objects.

Return type:

tuple[matplotlib.figure.Figure, numpy.ndarray]

Notes

This function uses matplotlib to display a grid of histograms. It attempts to select a suitable number of bins using Doane’s formula. Columns with non-numeric types are grouped and encoded before plotting.

Examples

Basic usage with all columns:

>>> import arkouda as ak
>>> import numpy as np
>>> from arkouda.plotting import hist_all
>>> ak_df = ak.DataFrame({
...     "a": ak.array(np.random.randn(100)),
...     "b": ak.array(np.random.randn(100)),
...     "c": ak.array(np.random.randn(100)),
...     "d": ak.array(np.random.randn(100)),
... })
>>> fig, axes = hist_all(ak_df)

Save the figure to disk:

>>> fig, axes = hist_all(ak_df, cols=["a", "b"])
>>> fig.savefig("hist_all.png")
arkouda.plotting.plot_dist(b: arkouda.numpy.pdarrayclass.pdarray | numpy.typing.NDArray[numpy.floating], h: arkouda.numpy.pdarrayclass.pdarray | numpy.typing.NDArray[numpy.floating], *, log: bool = True, xlabel: str | None = None, newfig: bool = True, show: bool = False) Tuple[matplotlib.figure.Figure, numpy.ndarray][source]

Plot the distribution and cumulative distribution of histogram data.

Parameters:
  • b (arkouda.pdarray or numpy.ndarray) – Histogram bin edges (length N+1) or bin centers (length N).

  • h (arkouda.pdarray or numpy.ndarray) – Histogram counts. Accepts length N or N+1 (Arkouda-like extra last bin).

  • log (bool, default True) – If True, use a log scale for the y-axis of the distribution plot.

  • xlabel (str, optional) – Label for the x-axis.

  • newfig (bool, default True) – If True, create a new figure; otherwise draw into the current figure.

  • show (bool, default False) – If True, call plt.show() before returning.

Returns:

(fig, axes) where axes[0] is the distribution plot and axes[1] is the cumulative distribution plot.

Return type:

tuple[matplotlib.figure.Figure, numpy.ndarray]

Notes

If h is one element longer than expected (as with ak.histogram), the final element is dropped automatically.

Examples

Using Arkouda’s histogram:

>>> import arkouda as ak
>>> import numpy as np
>>> from matplotlib import pyplot as plt
>>> from arkouda.plotting import plot_dist
>>> edges, counts = ak.histogram(ak.arange(10), 3)
>>> fig, axes = plot_dist(edges, counts)
>>> fig.savefig("dist.png")

Using NumPy’s histogram:

>>> data = np.random.randn(1000)
>>> counts, edges = np.histogram(data, bins=20)
>>> fig, axes = plot_dist(edges, counts, xlabel="Value")
>>> plt.show()