PyTest Benchmarks

Arkouda uses pytest-benchmark for performance benchmarking. This document provides an overview of running pytest-benchmark and the configurations available to the user.

More information on pytest-benchmark can be found here

Running The Full Suite

In most cases, running the full benchmark suite is desired. The simplest way to do this is to navigate into the root-level of arkouda and run make benchmark

This will run the entire benchmark suite with the following command:

python3 -m pytest -c benchmark.ini --benchmark-autosave --benchmark-storage=file://benchmark_v2/.benchmarks

The results for the benchmarks can be found within the provided benchmark storage path, which by default is within a directory found in //benchmark_v2/.benchmarks. Here you will find JSON files with the details on all the benchmarks.

The -c flag specifies to PyTest to use benchmark.ini as the configuration file for this set of test. The configuration file specifies which files contain benchmarks as well as a set of environment variables used by the benchmarks.

--benchmark-autosave tells pytest-benchmark to save the results of the benchmark in a json file stored in the path specified by --benchmark-storage.

Benchmark Arguments

There are a large number of commandline arguments available for configuring the benchmarks to run in a way fitting to any use case.

--benchmark-autosave

Used by default when running make benchmark

Save the benchmark JSON results to the provided storage location

--benchmark-storage

Sets location to “file://benchmark_v2/.benchmarks” when using make benchmark

Storage location for benchmark output JSON

--benchmark-save

example: 0001_0d4865d7c9453adc6af6409568da326845c358b9_20230406_165330.json

Name to save the output JSON as. Will be saved as “counter_NAME.json”

-c

Specify configuration file to be used by PyTest

benchmark.ini is our benchmarking configuration file

-k

Run tests which contain names that match the given string expression (case-insensitive), and can include Python operators that use filenames, class names and function names as variables.

--size

Default: 10**8

Problem size: length of array to use for benchmarks.

--trials

Default: 5 Number of times to run each test before averaging results. For tests that run as many trials as possible in a given time, will be treated as number of seconds to run for.

--seed

Value to initialize random number generator.

--dtype

Dtypes to run benchmarks against. Comma separated list (NO SPACES) allowing for multiple. Accepted values: int64, uint64, bigint, float64, bool, str and mixed. Mixed is used to generate sets of multiple types.

Example:

--dtype="int64,bigint,bool,str"

--numpy

True if --numpy is provided, False if omitted

When set, runs numpy comparison benchmarks

--maxbits

Default: -1

Maximum number of bits, so values > 2**max_bits will wraparound. -1 is interpreted as no maximum

Only applies to BigInt benchmarks, other benchmarks will be unaffected

--alpha

Default: 1.0

Scalar Multiple

--randomize

True if --randomize is provided, False if omitted

Fill arrays with random values instead of ones

--index_size

Length of index array (number of gathers to perform)

Only used by Gather and Scatter Benchmarks, other benchmarks will be unaffected

--value_size

Length of array from which values are gathered

Only used by Gather and Scatter Benchmarks, other benchmarks will be unaffected

--encoding

Comma separated list (NO SPACES) allowing for multiple encoding to be used. Accepted values: idna, ascii

Example:

--encoding="idna,ascii"   

Only used by Encoding benchmarks, other benchmarks will be unaffected

--io_only_write

True if --io_only_write is provided, False if omitted

Only write the files; files will not be removed

Only applies to IO benchmarks

--io_only_read

True if --io_only_read is provided, False if omitted

Only read the files; files will not be removed

Only applies to IO benchmarks

--io_only_delete

True if --io_only_delete is provided, False if omitted

Only delete files created from writing with this benchmark

Only applies to IO benchmarks

--io_files_per_loc

Default: 1

Number of files to create per locale

Only applies to IO benchmarks

--io_compression

Default: All types

Compression types to run Parquet IO benchmarks against. Comma delimited list (NO SPACES) allowing for multiple. Accepted values: none, snappy, gzip, brotli, zstd, and lz4

--io_compression="none,snappy,brotli,lz4"

Only applies to IO benchmarks

--io_path

Default: //benchmark_v2/ak_io_benchmark

Target path for measuring read/write rates

Only applies to IO benchmarks

Running Single Files or Tests

In instances where a single test or set of tests needs to be run, use the -k <expression> flag.

python3 -m pytest -c benchmark.ini --benchmark-autosave --benchmark-storage=file://benchmark_v2/.benchmarks -k encoding_benchmark.py

Running this command, you can expect to see an output table similar to this

benchmark_v2/encoding_benchmark.py ....                                                                                                                                       [100%]
Saved benchmark data in: <Arkouda_root>/benchmark_v2/.benchmarks/Linux-CPython-3.9-64bit/0014_31de39be8b19c76d073a8999def6673a305c250d_20230405_145759_uncommited-changes.json

-------------------------------------------------------------------- benchmark 'Strings_EncodeDecode': 4 tests ---------------------------------------------------------------------
Name (time in ms)          Min               Max              Mean            StdDev            Median               IQR            Outliers       OPS            Rounds  Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
bench_encode[idna]      3.3304 (1.0)      9.2561 (2.10)     4.7544 (1.27)     2.5306 (6.18)     3.8075 (1.11)     1.9012 (3.62)          1;1  210.3306 (0.79)          5           1
bench_encode[ascii]     3.3805 (1.02)     4.8800 (1.10)     3.7336 (1.0)      0.6465 (1.58)     3.4231 (1.0)      0.5246 (1.0)           1;1  267.8380 (1.0)           5           1
bench_decode[idna]      3.4444 (1.03)     4.4177 (1.0)      3.7852 (1.01)     0.4097 (1.0)      3.5622 (1.04)     0.5837 (1.11)          1;0  264.1882 (0.99)          5           1
bench_decode[ascii]     3.4621 (1.04)     4.9177 (1.11)     4.2250 (1.13)     0.6125 (1.50)     4.0197 (1.17)     0.9991 (1.90)          2;0  236.6864 (0.88)          5           1
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Similarly, to only run a single test within a file, specify the test name with the -k flag instead of a filename. The following example will run only the bench_encode benchmark.

python3 -m pytest -c benchmark.ini --benchmark-autosave --benchmark-storage=file://benchmark_v2/.benchmarks -k bench_encode

Results:

benchmark_v2/encoding_benchmark.py ..                                                                                                                                         [100%]
Saved benchmark data in: <Arkouda_root>/benchmark_v2/.benchmarks/Linux-CPython-3.9-64bit/0015_31de39be8b19c76d073a8999def6673a305c250d_20230405_145947_uncommited-changes.json

-------------------------------------------------------------------- benchmark 'Strings_EncodeDecode': 2 tests ---------------------------------------------------------------------
Name (time in ms)          Min               Max              Mean            StdDev            Median               IQR            Outliers       OPS            Rounds  Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
bench_encode[ascii]     3.4298 (1.0)      3.6450 (1.0)      3.5541 (1.0)      0.0889 (1.0)      3.5801 (1.00)     0.1436 (1.0)           2;0  281.3620 (1.0)           5           1
bench_encode[idna]      3.4875 (1.02)     4.5255 (1.24)     3.7912 (1.07)     0.4328 (4.87)     3.5652 (1.0)      0.4869 (3.39)          1;0  263.7659 (0.94)          5           1
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

More information on running single files, sets of files, or benchmarks can be found here under “Specifying which tests to run”.

Reading the JSON Output

The output JSON contains a lot of information. Not all of this information is beneficial for our purposes. The main area we care about is the benchmarks.stats section and associated information. To a lesser extent, machine information can be beneficial to see how different CPU architectures perform differently.

The benchmarks section contains one entry for every benchmark that gets ran. For a full build, this will result in 350 entries. Each entry contains the name of the benchmark and a group name that allows for easy association between related benchmarks.

The below JSON is the output from the above example python3 -m pytest -c benchmark.ini --benchmark-autosave --benchmark-storage=file://benchmark_v2/.benchmarks -k bench_encode --size=100

Full example output JSON
        {
        "machine_info": {
            "node": "MSI",
            "processor": "x86_64",
            "machine": "x86_64",
            "python_compiler": "GCC 9.3.0",
            "python_implementation": "CPython",
            "python_implementation_version": "3.9.0",
            "python_version": "3.9.0",
            "python_build": [
                "default",
                "Nov 26 2020 07:57:39"
            ],
            "release": "5.10.16.3-microsoft-standard-WSL2",
            "system": "Linux",
            "cpu": {
                "python_version": "3.9.0.final.0 (64 bit)",
                "cpuinfo_version": [
                    9,
                    0,
                    0
                ],
                "cpuinfo_version_string": "9.0.0",
                "arch": "X86_64",
                "bits": 64,
                "count": 12,
                "arch_string_raw": "x86_64",
                "vendor_id_raw": "GenuineIntel",
                "brand_raw": "Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz",
                "hz_advertised_friendly": "2.2000 GHz",
                "hz_actual_friendly": "2.2080 GHz",
                "hz_advertised": [
                    2200000000,
                    0
                ],
                "hz_actual": [
                    2207999000,
                    0
                ],
                "stepping": 10,
                "model": 158,
                "family": 6,
                "flags": [
                    "3dnowprefetch",
                    "abm",
                    "adx",
                    "aes",
                    "apic",
                    "arch_capabilities",
                    "avx",
                    "avx2",
                    "bmi1",
                    "bmi2",
                    "clflush",
                    "clflushopt",
                    "cmov",
                    "constant_tsc",
                    "cpuid",
                    "cx16",
                    "cx8",
                    "de",
                    "erms",
                    "f16c",
                    "flush_l1d",
                    "fma",
                    "fpu",
                    "fsgsbase",
                    "fxsr",
                    "ht",
                    "hypervisor",
                    "ibpb",
                    "ibrs",
                    "invpcid",
                    "invpcid_single",
                    "lahf_lm",
                    "lm",
                    "mca",
                    "mce",
                    "mmx",
                    "movbe",
                    "msr",
                    "mtrr",
                    "nopl",
                    "nx",
                    "osxsave",
                    "pae",
                    "pat",
                    "pcid",
                    "pclmulqdq",
                    "pdpe1gb",
                    "pge",
                    "pni",
                    "popcnt",
                    "pse",
                    "pse36",
                    "pti",
                    "rdrand",
                    "rdrnd",
                    "rdseed",
                    "rdtscp",
                    "rep_good",
                    "sep",
                    "smap",
                    "smep",
                    "ss",
                    "ssbd",
                    "sse",
                    "sse2",
                    "sse4_1",
                    "sse4_2",
                    "ssse3",
                    "stibp",
                    "syscall",
                    "tsc",
                    "vme",
                    "xgetbv1",
                    "xsave",
                    "xsavec",
                    "xsaveopt",
                    "xsaves",
                    "xtopology"
                ],
                "l3_cache_size": 9437184,
                "l2_cache_size": "1.5 MiB",
                "l1_data_cache_size": 196608,
                "l1_instruction_cache_size": 196608,
                "l2_cache_line_size": 256,
                "l2_cache_associativity": 6
            }
        },
        "commit_info": {
            "id": "31de39be8b19c76d073a8999def6673a305c250d",
            "time": "2023-04-04T16:26:14+00:00",
            "author_time": "2023-04-04T12:26:14-04:00",
            "dirty": true,
            "project": "arkouda",
            "branch": "2324_pytest_benchmark_docs"
        },
        "benchmarks": [
            {
                "group": "Strings_EncodeDecode",
                "name": "bench_encode[idna]",
                "fullname": "benchmark_v2/encoding_benchmark.py::bench_encode[idna]",
                "params": {
                    "encoding": "idna"
                },
                "param": "idna",
                "extra_info": {
                    "description": "Measures the performance of Strings.encode",
                    "problem_size": 100,
                    "transfer_rate": "0.0002 GiB/sec"
                },
                "options": {
                    "disable_gc": false,
                    "timer": "perf_counter",
                    "min_rounds": 5,
                    "max_time": 1.0,
                    "min_time": 5e-06,
                    "warmup": false
                },
                "stats": {
                    "min": 0.004066600000442122,
                    "max": 0.007168699999965611,
                    "mean": 0.0048064200000226265,
                    "stddev": 0.001326192548940973,
                    "rounds": 5,
                    "median": 0.004246700000294368,
                    "iqr": 0.0009575499998391024,
                    "q1": 0.004131924999910552,
                    "q3": 0.005089474999749655,
                    "iqr_outliers": 1,
                    "stddev_outliers": 1,
                    "outliers": "1;1",
                    "ld15iqr": 0.004066600000442122,
                    "hd15iqr": 0.007168699999965611,
                    "ops": 208.0550596900172,
                    "total": 0.024032100000113132,
                    "iterations": 1
                }
            },
            {
                "group": "Strings_EncodeDecode",
                "name": "bench_encode[ascii]",
                "fullname": "benchmark_v2/encoding_benchmark.py::bench_encode[ascii]",
                "params": {
                    "encoding": "ascii"
                },
                "param": "ascii",
                "extra_info": {
                    "description": "Measures the performance of Strings.encode",
                    "problem_size": 100,
                    "transfer_rate": "0.0002 GiB/sec"
                },
                "options": {
                    "disable_gc": false,
                    "timer": "perf_counter",
                    "min_rounds": 5,
                    "max_time": 1.0,
                    "min_time": 5e-06,
                    "warmup": false
                },
                "stats": {
                    "min": 0.00383609999971668,
                    "max": 0.0043372999998609885,
                    "mean": 0.004057779999857303,
                    "stddev": 0.00018361238254747651,
                    "rounds": 5,
                    "median": 0.0040258999997604406,
                    "iqr": 0.0002090000002681336,
                    "q1": 0.0039507749997937935,
                    "q3": 0.004159775000061927,
                    "iqr_outliers": 0,
                    "stddev_outliers": 2,
                    "outliers": "2;0",
                    "ld15iqr": 0.00383609999971668,
                    "hd15iqr": 0.0043372999998609885,
                    "ops": 246.44017172817806,
                    "total": 0.020288899999286514,
                    "iterations": 1
                }
            }
        ],
        "datetime": "2023-04-05T15:32:09.097392",
        "version": "4.0.0"
    }

Simplified version of the JSON with only sections that we care about:

{
    "machine_info": {
        "python_version": "3.9.0",
        "release": "5.10.16.3-microsoft-standard-WSL2",
        "system": "Linux",
        "cpu": {
            "arch": "X86_64",
            "count": 12,
            "brand_raw": "Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz"
        }
    },
    "benchmarks": [
        {
            "group": "Strings_EncodeDecode",
            "name": "bench_encode[ascii]",
            "params": {
                "encoding": "ascii"
            },
            "extra_info": {
                "description": "Measures the performance of Strings.encode",
                "problem_size": 100,
                "transfer_rate": "0.0002 GiB/sec"
            },
            "stats": {
                "min": 0.00383609999971668,
                "max": 0.0043372999998609885,
                "mean": 0.004057779999857303,
                "stddev": 0.00018361238254747651,
                "rounds": 5,
                "median": 0.0040258999997604406,
                "iqr": 0.0002090000002681336,
                "q1": 0.0039507749997937935,
                "q3": 0.004159775000061927,
                "iqr_outliers": 0,
                "stddev_outliers": 2,
                "outliers": "2;0",
                "ld15iqr": 0.00383609999971668,
                "hd15iqr": 0.0043372999998609885,
                "ops": 246.44017172817806,
                "total": 0.020288899999286514
            }
        }
    ],
    "datetime": "2023-04-05T15:32:09.097392"
}

The components to pay attention to are benchmarks.extra_info, which contains the details on the problem size and

data transfer rate, and benchmarks.stats which contains all the timing statistic information calculated from the number of trials we ran, represented in benchmarks.stats.rounds