arkouda.comm_diagnostics¶
Communication diagnostics and instrumentation utilities for Arkouda.
This module provides tools to collect, reset, and report Chapel communication statistics used in Arkouda operations. It is useful for profiling and debugging distributed communication patterns in both blocking and non-blocking modes. The diagnostics can be queried at a per-locale level and printed as a markdown-formatted summary.
Features¶
Start/stop/reset communication diagnostics tracking
Enable verbose reporting of communication events
Retrieve statistics on blocking/non-blocking gets, puts, AMOs, and remote execution
Inspect remote cache usage (hits, misses, prefetch, readahead)
Aggregate results into a DataFrame
Export markdown summary tables
Functions¶
start_comm_diagnostics() stop_comm_diagnostics() reset_comm_diagnostics() print_comm_diagnostics_table(print_empty_columns=False) start_verbose_comm() stop_verbose_comm()
Getters for specific metrics: - get_comm_diagnostics_{put, get, put_nb, get_nb, try_nb, wait_nb, amo} - get_comm_diagnostics_{execute_on, execute_on_fast, execute_on_nb} - get_comm_diagnostics_cache_{get_hits, get_misses, put_hits, put_misses,
num_prefetches, num_page_readaheads, prefetch_unused, prefetch_waited, readahead_unused, readahead_waited}
- get_comm_diagnostics() → DataFrame
Collect all diagnostics into a single DataFrame.
Examples
>>> import arkouda as ak
>>> import arkouda.comm_diagnostics as cd
>>> ak.connect()
>>> from arkouda.comm_diagnostics import start_comm_diagnostics, stop_comm_diagnostics, get_comm_diagnostics, print_comm_diagnostics_table
>>> start_comm_diagnostics()
'commDiagnostics started.'
>>> a = ak.randint(0, 100, 1_000_000)
>>> b = ak.sort(a)
>>> stop_comm_diagnostics()
'commDiagnostics stopped.'
>>> df = get_comm_diagnostics()
>>> df.columns
Index(['put', 'get', 'put_nb', 'get_nb', 'try_nb', 'amo', 'execute_on', 'execute_on_fast', 'execute_on_nb', 'cache_get_hits', 'cache_get_misses', 'cache_put_hits', 'cache_put_misses', 'cache_num_prefetches', 'cache_num_page_readaheads', 'cache_prefetch_unused', 'cache_prefetch_waited', 'cache_readahead_unused', 'cache_readahead_waited', 'wait_nb'], dtype='<U0')
>>> df[["put","get"]]
put get
0 162 118
1 170 198
2 170 198
3 170 198 (4 rows x 2 columns)
>>> print_comm_diagnostics_table()
+----+-------+-------+--------------+-----------------+
| | put | get | execute_on | execute_on_nb |
+====+=======+=======+==============+=================+
| 0 | 162 | 118 | 180 | 126 |
+----+-------+-------+--------------+-----------------+
| 1 | 170 | 198 | 184 | 0 |
+----+-------+-------+--------------+-----------------+
| 2 | 170 | 198 | 184 | 0 |
+----+-------+-------+--------------+-----------------+
| 3 | 170 | 198 | 184 | 0 |
+----+-------+-------+--------------+-----------------+
'commDiagnostics printed.'
…
Notes
Printed tables and verbose messages appear in the server-side Chapel logs.
See also
arkouda.DataFrame
, arkouda.client.generic_msg
Functions¶
|
Return a DataFrame with the communication diagnostics statistics. |
Return atomic memory operations statistic. |
|
Return number of gets that were handled by the cache. |
|
Return number of gets that were not handled by the cache. |
|
Return number of readaheads issued to the remote cache at the granularity of cache pages. |
|
Return number of prefetches issued to the remote cache at the granularity of cache pages. |
|
Return number of cache pages that were prefetched but unused. |
|
Return number of cache pages that were prefetched but waited. |
|
Return number of puts that were stored in cache pages that already existed. |
|
Return number of puts that required the cache to create a new page to store them. |
|
Return number of cache pages that were read ahead but unused. |
|
Return number of cache pages that were read ahead but waited. |
|
Return blocking remote executions, in which initiator waits for completion. |
|
Return blocking remote executions performed by the target locale’s Active Message handler. |
|
Return non-blocking remote executions. |
|
Return blocking gets, in which initiator waits for completion. |
|
Return non-blocking gets. |
|
Return blocking puts, in which initiator waits for completion. |
|
Return non-blocking puts. |
|
Return test statistics for non-blocking get/put completions. |
|
Return blocking waits for non-blocking get/put completions. |
|
|
Print the current communication counts in a markdown table. |
Reset aggregate communication counts across the whole program. |
|
Start counting communication operations across the whole program. |
|
Start on-the-fly reporting of communication initiated on any locale. |
|
Stop counting communication operations across the whole program. |
|
Stop on-the-fly reporting of communication initiated on any locale. |
Module Contents¶
- arkouda.comm_diagnostics.get_comm_diagnostics() arkouda.pandas.dataframe.DataFrame [source]¶
Return a DataFrame with the communication diagnostics statistics.
- Return type:
- arkouda.comm_diagnostics.get_comm_diagnostics_amo()[source]¶
Return atomic memory operations statistic.
- Returns:
A pdarray, where the size is the number of locales, populated with the statistic value from each locale.
- Return type:
- arkouda.comm_diagnostics.get_comm_diagnostics_cache_get_hits()[source]¶
Return number of gets that were handled by the cache.
Gets counted here did not require the cache to communicate in order to return the result.
- Returns:
A pdarray, where the size is the number of locales, populated with the statistic value from each locale.
- Return type:
- arkouda.comm_diagnostics.get_comm_diagnostics_cache_get_misses()[source]¶
Return number of gets that were not handled by the cache.
- Returns:
A pdarray, where the size is the number of locales, populated with the statistic value from each locale.
- Return type:
- arkouda.comm_diagnostics.get_comm_diagnostics_cache_num_page_readaheads()[source]¶
Return number of readaheads issued to the remote cache at the granularity of cache pages.
- Returns:
A pdarray, where the size is the number of locales, populated with the statistic value from each locale.
- Return type:
- arkouda.comm_diagnostics.get_comm_diagnostics_cache_num_prefetches()[source]¶
Return number of prefetches issued to the remote cache at the granularity of cache pages.
This counter is specifically triggered via calls to chpl_comm_remote_prefetch.
- Returns:
A pdarray, where the size is the number of locales, populated with the statistic value from each locale.
- Return type:
- arkouda.comm_diagnostics.get_comm_diagnostics_cache_prefetch_unused()[source]¶
Return number of cache pages that were prefetched but unused.
Return number of cache pages that were prefetched but evicted from the cache before being accessed (i.e., the prefetches were too early).
- Returns:
A pdarray, where the size is the number of locales, populated with the statistic value from each locale.
- Return type:
- arkouda.comm_diagnostics.get_comm_diagnostics_cache_prefetch_waited()[source]¶
Return number of cache pages that were prefetched but waited.
Number of cache pages that were prefetched but did not arrive in the cache before being accessed (i.e., the prefetches were too late).
- Returns:
A pdarray, where the size is the number of locales, populated with the statistic value from each locale.
- Return type:
- arkouda.comm_diagnostics.get_comm_diagnostics_cache_put_hits()[source]¶
Return number of puts that were stored in cache pages that already existed.
- Returns:
A pdarray, where the size is the number of locales, populated with the statistic value from each locale.
- Return type:
- arkouda.comm_diagnostics.get_comm_diagnostics_cache_put_misses()[source]¶
Return number of puts that required the cache to create a new page to store them.
- Returns:
A pdarray, where the size is the number of locales, populated with the statistic value from each locale.
- Return type:
- arkouda.comm_diagnostics.get_comm_diagnostics_cache_readahead_unused()[source]¶
Return number of cache pages that were read ahead but unused.
The number of cache pages that were read ahead but evicted from the cache before being accessed (i.e., the readaheads were too early).
- Returns:
A pdarray, where the size is the number of locales, populated with the statistic value from each locale.
- Return type:
- arkouda.comm_diagnostics.get_comm_diagnostics_cache_readahead_waited()[source]¶
Return number of cache pages that were read ahead but waited.
Return number of cache pages that were read ahead but did not arrive in the cache before being accessed (i.e., the readaheads were too late).
- Returns:
A pdarray, where the size is the number of locales, populated with the statistic value from each locale.
- Return type:
- arkouda.comm_diagnostics.get_comm_diagnostics_execute_on()[source]¶
Return blocking remote executions, in which initiator waits for completion.
- Returns:
A pdarray, where the size is the number of locales, populated with the statistic value from each locale.
- Return type:
- arkouda.comm_diagnostics.get_comm_diagnostics_execute_on_fast()[source]¶
Return blocking remote executions performed by the target locale’s Active Message handler.
- Returns:
A pdarray, where the size is the number of locales, populated with the statistic value from each locale.
- Return type:
- arkouda.comm_diagnostics.get_comm_diagnostics_execute_on_nb()[source]¶
Return non-blocking remote executions.
- Returns:
A pdarray, where the size is the number of locales, populated with the statistic value from each locale.
- Return type:
- arkouda.comm_diagnostics.get_comm_diagnostics_get()[source]¶
Return blocking gets, in which initiator waits for completion.
- Returns:
A pdarray, where the size is the number of locales, populated with the statistic value from each locale.
- Return type:
- arkouda.comm_diagnostics.get_comm_diagnostics_get_nb()[source]¶
Return non-blocking gets.
- Returns:
A pdarray, where the size is the number of locales, populated with the statistic value from each locale.
- Return type:
- arkouda.comm_diagnostics.get_comm_diagnostics_put()[source]¶
Return blocking puts, in which initiator waits for completion.
- Returns:
A pdarray, where the size is the number of locales, populated with the statistic value from each locale.
- Return type:
- arkouda.comm_diagnostics.get_comm_diagnostics_put_nb()[source]¶
Return non-blocking puts.
- Returns:
A pdarray, where the size is the number of locales, populated with the statistic value from each locale.
- Return type:
- arkouda.comm_diagnostics.get_comm_diagnostics_try_nb()[source]¶
Return test statistics for non-blocking get/put completions.
- Returns:
A pdarray, where the size is the number of locales, populated with the statistic value from each locale.
- Return type:
- arkouda.comm_diagnostics.get_comm_diagnostics_wait_nb()[source]¶
Return blocking waits for non-blocking get/put completions.
- Returns:
A pdarray, where the size is the number of locales, populated with the statistic value from each locale.
- Return type:
- arkouda.comm_diagnostics.print_comm_diagnostics_table(print_empty_columns=False)[source]¶
Print the current communication counts in a markdown table.
Uses a row per locale and a column per operation. By default, operations for which all locales have a count of zero are not displayed in the table, though an argument can be used to reverse that behavior.
- Parameters:
print_empty_columns (bool=False)
Note
The table will only be printed to the chapel logs.
- Returns:
Completion message.
- Return type:
str
- arkouda.comm_diagnostics.reset_comm_diagnostics()[source]¶
Reset aggregate communication counts across the whole program.
- Returns:
Completion message.
- Return type:
str
- arkouda.comm_diagnostics.start_comm_diagnostics()[source]¶
Start counting communication operations across the whole program.
- Returns:
Completion message.
- Return type:
str
- arkouda.comm_diagnostics.start_verbose_comm()[source]¶
Start on-the-fly reporting of communication initiated on any locale.
Note
Reporting will only be printed to the chapel logs.
- Returns:
Completion message.
- Return type:
str