arkouda.comm_diagnostics¶
Communication diagnostics and instrumentation utilities for Arkouda.
This module provides tools to collect, reset, and report Chapel communication statistics used in Arkouda operations. It is useful for profiling and debugging distributed communication patterns in both blocking and non-blocking modes. Diagnostics can be queried at a per-locale level and printed as a markdown-formatted summary.
Features¶
Start, stop, and reset communication diagnostics tracking
Enable verbose reporting of communication events
Retrieve statistics on blocking and non-blocking gets, puts, AMOs, and remote execution
Inspect remote cache usage (hits, misses, prefetch, readahead)
Aggregate results into a DataFrame
Export markdown summary tables
Functions¶
- start_comm_diagnostics()
Start communication diagnostics tracking.
- stop_comm_diagnostics()
Stop communication diagnostics tracking.
- reset_comm_diagnostics()
Reset all collected diagnostics.
- print_comm_diagnostics_table(print_empty_columns=False)
Print a markdown-formatted summary table of diagnostics.
- start_verbose_comm()
Enable verbose communication reporting.
- stop_verbose_comm()
Disable verbose communication reporting.
Getters for specific metrics¶
get_comm_diagnostics_put
get_comm_diagnostics_get
get_comm_diagnostics_put_nb
get_comm_diagnostics_get_nb
get_comm_diagnostics_try_nb
get_comm_diagnostics_wait_nb
get_comm_diagnostics_amo
get_comm_diagnostics_execute_on
get_comm_diagnostics_execute_on_fast
get_comm_diagnostics_execute_on_nb
Cache diagnostics¶
get_comm_diagnostics_cache_get_hits
get_comm_diagnostics_cache_get_misses
get_comm_diagnostics_cache_put_hits
get_comm_diagnostics_cache_put_misses
get_comm_diagnostics_cache_num_prefetches
get_comm_diagnostics_cache_num_page_readaheads
get_comm_diagnostics_cache_prefetch_unused
get_comm_diagnostics_cache_prefetch_waited
get_comm_diagnostics_cache_readahead_unused
get_comm_diagnostics_cache_readahead_waited
- get_comm_diagnostics() -> DataFrame
Collect all diagnostics into a single DataFrame.
Examples
>>> import arkouda as ak
>>> import arkouda.comm_diagnostics as cd
>>> from arkouda.comm_diagnostics import (
... start_comm_diagnostics,
... stop_comm_diagnostics,
... get_comm_diagnostics,
... print_comm_diagnostics_table,
... )
>>> start_comm_diagnostics()
'commDiagnostics started.'
>>> a = ak.randint(0, 100, 1_000_000)
>>> b = ak.sort(a)
>>> stop_comm_diagnostics()
'commDiagnostics stopped.'
>>> df = get_comm_diagnostics()
>>> list(df.columns)
['put', 'get', 'put_nb', 'get_nb', 'try_nb', 'amo',
'execute_on', 'execute_on_fast', 'execute_on_nb',
'cache_get_hits', 'cache_get_misses',
'cache_put_hits', 'cache_put_misses',
'cache_num_prefetches', 'cache_num_page_readaheads',
'cache_prefetch_unused', 'cache_prefetch_waited',
'cache_readahead_unused', 'cache_readahead_waited',
'wait_nb']
>>> df[["put", "get"]]
put get
0 162 118
1 170 198
2 170 198
3 170 198 (4 rows x 2 columns)
>>> print_comm_diagnostics_table()
+----+-------+-------+--------------+-----------------+
| | put | get | execute_on | execute_on_nb |
+====+=======+=======+==============+=================+
| 0 | 162 | 118 | 180 | 126 |
+----+-------+-------+--------------+-----------------+
| 1 | 170 | 198 | 184 | 0 |
+----+-------+-------+--------------+-----------------+
| 2 | 170 | 198 | 184 | 0 |
+----+-------+-------+--------------+-----------------+
| 3 | 170 | 198 | 184 | 0 |
+----+-------+-------+--------------+-----------------+
'commDiagnostics printed.'
…
Notes
Printed tables and verbose messages appear in the server-side Chapel logs.
See also
arkouda.pandas.dataframe.DataFrame, arkouda.core.client.generic_msg
Functions¶
|
Return a DataFrame with the communication diagnostics statistics. |
Return atomic memory operations statistic. |
|
Return number of gets that were handled by the cache. |
|
Return number of gets that were not handled by the cache. |
|
Return number of readaheads issued to the remote cache at the granularity of cache pages. |
|
Return number of prefetches issued to the remote cache at the granularity of cache pages. |
|
Return number of cache pages that were prefetched but unused. |
|
Return number of cache pages that were prefetched but waited. |
|
Return number of puts that were stored in cache pages that already existed. |
|
Return number of puts that required the cache to create a new page to store them. |
|
Return number of cache pages that were read ahead but unused. |
|
Return number of cache pages that were read ahead but waited. |
|
Return blocking remote executions, in which initiator waits for completion. |
|
Return blocking remote executions performed by the target locale’s Active Message handler. |
|
Return non-blocking remote executions. |
|
Return blocking gets, in which initiator waits for completion. |
|
Return non-blocking gets. |
|
Return blocking puts, in which initiator waits for completion. |
|
Return non-blocking puts. |
|
Return test statistics for non-blocking get/put completions. |
|
Return blocking waits for non-blocking get/put completions. |
|
|
Print the current communication counts in a markdown table. |
Reset aggregate communication counts across the whole program. |
|
Start counting communication operations across the whole program. |
|
Start on-the-fly reporting of communication initiated on any locale. |
|
Stop counting communication operations across the whole program. |
|
Stop on-the-fly reporting of communication initiated on any locale. |
Module Contents¶
- arkouda.comm_diagnostics.get_comm_diagnostics() arkouda.pandas.dataframe.DataFrame[source]¶
Return a DataFrame with the communication diagnostics statistics.
- Return type:
- arkouda.comm_diagnostics.get_comm_diagnostics_amo()[source]¶
Return atomic memory operations statistic.
- Returns:
A pdarray, where the size is the number of locales, populated with the statistic value from each locale.
- Return type:
- arkouda.comm_diagnostics.get_comm_diagnostics_cache_get_hits()[source]¶
Return number of gets that were handled by the cache.
Gets counted here did not require the cache to communicate in order to return the result.
- Returns:
A pdarray, where the size is the number of locales, populated with the statistic value from each locale.
- Return type:
- arkouda.comm_diagnostics.get_comm_diagnostics_cache_get_misses()[source]¶
Return number of gets that were not handled by the cache.
- Returns:
A pdarray, where the size is the number of locales, populated with the statistic value from each locale.
- Return type:
- arkouda.comm_diagnostics.get_comm_diagnostics_cache_num_page_readaheads()[source]¶
Return number of readaheads issued to the remote cache at the granularity of cache pages.
- Returns:
A pdarray, where the size is the number of locales, populated with the statistic value from each locale.
- Return type:
- arkouda.comm_diagnostics.get_comm_diagnostics_cache_num_prefetches()[source]¶
Return number of prefetches issued to the remote cache at the granularity of cache pages.
This counter is specifically triggered via calls to chpl_comm_remote_prefetch.
- Returns:
A pdarray, where the size is the number of locales, populated with the statistic value from each locale.
- Return type:
- arkouda.comm_diagnostics.get_comm_diagnostics_cache_prefetch_unused()[source]¶
Return number of cache pages that were prefetched but unused.
Return number of cache pages that were prefetched but evicted from the cache before being accessed (i.e., the prefetches were too early).
- Returns:
A pdarray, where the size is the number of locales, populated with the statistic value from each locale.
- Return type:
- arkouda.comm_diagnostics.get_comm_diagnostics_cache_prefetch_waited()[source]¶
Return number of cache pages that were prefetched but waited.
Number of cache pages that were prefetched but did not arrive in the cache before being accessed (i.e., the prefetches were too late).
- Returns:
A pdarray, where the size is the number of locales, populated with the statistic value from each locale.
- Return type:
- arkouda.comm_diagnostics.get_comm_diagnostics_cache_put_hits()[source]¶
Return number of puts that were stored in cache pages that already existed.
- Returns:
A pdarray, where the size is the number of locales, populated with the statistic value from each locale.
- Return type:
- arkouda.comm_diagnostics.get_comm_diagnostics_cache_put_misses()[source]¶
Return number of puts that required the cache to create a new page to store them.
- Returns:
A pdarray, where the size is the number of locales, populated with the statistic value from each locale.
- Return type:
- arkouda.comm_diagnostics.get_comm_diagnostics_cache_readahead_unused()[source]¶
Return number of cache pages that were read ahead but unused.
The number of cache pages that were read ahead but evicted from the cache before being accessed (i.e., the readaheads were too early).
- Returns:
A pdarray, where the size is the number of locales, populated with the statistic value from each locale.
- Return type:
- arkouda.comm_diagnostics.get_comm_diagnostics_cache_readahead_waited()[source]¶
Return number of cache pages that were read ahead but waited.
Return number of cache pages that were read ahead but did not arrive in the cache before being accessed (i.e., the readaheads were too late).
- Returns:
A pdarray, where the size is the number of locales, populated with the statistic value from each locale.
- Return type:
- arkouda.comm_diagnostics.get_comm_diagnostics_execute_on()[source]¶
Return blocking remote executions, in which initiator waits for completion.
- Returns:
A pdarray, where the size is the number of locales, populated with the statistic value from each locale.
- Return type:
- arkouda.comm_diagnostics.get_comm_diagnostics_execute_on_fast()[source]¶
Return blocking remote executions performed by the target locale’s Active Message handler.
- Returns:
A pdarray, where the size is the number of locales, populated with the statistic value from each locale.
- Return type:
- arkouda.comm_diagnostics.get_comm_diagnostics_execute_on_nb()[source]¶
Return non-blocking remote executions.
- Returns:
A pdarray, where the size is the number of locales, populated with the statistic value from each locale.
- Return type:
- arkouda.comm_diagnostics.get_comm_diagnostics_get()[source]¶
Return blocking gets, in which initiator waits for completion.
- Returns:
A pdarray, where the size is the number of locales, populated with the statistic value from each locale.
- Return type:
- arkouda.comm_diagnostics.get_comm_diagnostics_get_nb()[source]¶
Return non-blocking gets.
- Returns:
A pdarray, where the size is the number of locales, populated with the statistic value from each locale.
- Return type:
- arkouda.comm_diagnostics.get_comm_diagnostics_put()[source]¶
Return blocking puts, in which initiator waits for completion.
- Returns:
A pdarray, where the size is the number of locales, populated with the statistic value from each locale.
- Return type:
- arkouda.comm_diagnostics.get_comm_diagnostics_put_nb()[source]¶
Return non-blocking puts.
- Returns:
A pdarray, where the size is the number of locales, populated with the statistic value from each locale.
- Return type:
- arkouda.comm_diagnostics.get_comm_diagnostics_try_nb()[source]¶
Return test statistics for non-blocking get/put completions.
- Returns:
A pdarray, where the size is the number of locales, populated with the statistic value from each locale.
- Return type:
- arkouda.comm_diagnostics.get_comm_diagnostics_wait_nb()[source]¶
Return blocking waits for non-blocking get/put completions.
- Returns:
A pdarray, where the size is the number of locales, populated with the statistic value from each locale.
- Return type:
- arkouda.comm_diagnostics.print_comm_diagnostics_table(print_empty_columns=False)[source]¶
Print the current communication counts in a markdown table.
Uses a row per locale and a column per operation. By default, operations for which all locales have a count of zero are not displayed in the table, though an argument can be used to reverse that behavior.
- Parameters:
print_empty_columns (bool=False)
Note
The table will only be printed to the chapel logs.
- Returns:
Completion message.
- Return type:
- arkouda.comm_diagnostics.reset_comm_diagnostics()[source]¶
Reset aggregate communication counts across the whole program.
- Returns:
Completion message.
- Return type:
- arkouda.comm_diagnostics.start_comm_diagnostics()[source]¶
Start counting communication operations across the whole program.
- Returns:
Completion message.
- Return type:
- arkouda.comm_diagnostics.start_verbose_comm()[source]¶
Start on-the-fly reporting of communication initiated on any locale.
Note
Reporting will only be printed to the chapel logs.
- Returns:
Completion message.
- Return type: