Here's a (strongly NUMA-centric) performance comparison of the three NUMA kernels: the 'balancenuma-v10' tree from Mel, the AutoNUMA-v28 kernel from Andrea and the unified NUMA -v3 tree Peter and me are working on. The goal of these measurements is to specifically quantify the NUMA optimization qualities of each of the three NUMA-optimizing kernels. There are lots of numbers in this mail and lot of material to read - sorry about that! :-/ I used the latest available kernel versions everywhere: furthermore the AutoNUMA-v28 tree has been patched with Hugh Dickin's THP-migration support patch, to make it a fair apples-to-apples comparison. I have used the 'perf bench numa' tool to do the measurements, which tool can be found at: git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git perf/bench # to build it install numactl-dev[el] and do "cd tools/perf; make -j install' To get the raw numbers I ran "perf bench numa mem -a" multiple times on each kernel, on a 32-way, 64 GB RAM, 4-node Opteron test-system. Each kernel used the same base .config, copied from a Fedora RPM kernel, with the NUMA-balancing options enabled. ( Note that the testcases are tailored to my test-system: on a smaller system you'd want to run slightly smaller testcases, on a larger system you'd want to run a couple of larger testcases as well. ) NUMA convergence latency measurements ------------------------------------- 'NUMA convergence' latency is the number of seconds a workload takes to reach 'perfectly NUMA balanced' state. This is measured on the CPU placement side: once it has converged then memory typically follows within a couple of seconds. Because convergence is not guaranteed, a 100 seconds latency time-out is used in the benchmark. If you see a 100 seconds result in the table it means that that particular NUMA kernel did not manage to converge that workload unit test within 100 seconds. The NxM denotion means process/thread relationship: a 1x4 test is 1 process with 4 thread that share a workload - a 4x6 test are 4 processes with 6 threads in each process, the processes isolated from each other but the threads working on the same working set. I used a wide set of test-cases I collected in the past: [ Lower numbers are better. ] [test unit] : v3.7 |balancenuma-v10| AutoNUMA-v28 | numa-u-v3 | ------------------------------------------------------------------------------------------ 1x3-convergence : 100.1 | 100.0 | 0.2 | 2.3 | secs 1x4-convergence : 100.2 | 100.1 | 100.2 | 2.1 | secs 1x6-convergence : 100.3 | 100.4 | 100.8 | 7.3 | secs 2x3-convergence : 100.6 | 100.6 | 100.5 | 4.1 | secs 3x3-convergence : 100.6 | 100.5 | 100.5 | 7.6 | secs 4x4-convergence : 100.6 | 100.5 | 4.1 | 7.4 | secs 4x4-convergence-NOTHP : 101.1 | 100.5 | 12.2 | 9.2 | secs 4x6-convergence : 5.4 | 101.2 | 16.6 | 11.7 | secs 4x8-convergence : 101.1 | 101.3 | 3.4 | 3.9 | secs 8x4-convergence : 100.9 | 100.8 | 18.3 | 8.9 | secs 8x4-convergence-NOTHP : 101.9 | 101.0 | 15.7 | 12.1 | secs 3x1-convergence : 0.7 | 1.0 | 0.8 | 0.9 | secs 4x1-convergence : 0.6 | 0.8 | 0.8 | 0.7 | secs 8x1-convergence : 2.8 | 2.9 | 2.9 | 1.2 | secs 16x1-convergence : 3.5 | 3.7 | 2.5 | 2.0 | secs 32x1-convergence : 3.6 | 2.8 | 3.0 | 1.9 | secs As expected, mainline only manages to converge workloads where each worker process is isolated and the default spread-to-all-nodes scheduling policy creates an ideal layout, regardless of task ordering. [ Note that the mainline kernel got a 'lucky strike' convergence in the 4x6 workload: it's always possible for the workload to accidentally converge. On a repeat test this did not occur, but I did not erase the outlier because luck is a valid and existing phenomenon. ] The 'balancenuma' kernel does not converge any of the workloads where worker threads or processes relate to each other. AutoNUMA does pretty well, but it did not manage to converge for 4 testcases of shared, under-loaded workloads. The unified NUMA-v3 tree converged well in every testcase. NUMA workload bandwidth measurements ------------------------------------ The other set of numbers I've collected are workload bandwidth measurements, run over 20 seconds. Using 20 seconds gives a healthy mix of pre-convergence and post-convergence bandwidth, giving the (non-trivial) expense of convergence and memory migraton a weight in the result as well. So these are not 'ideal' results with long runtimes where migration cost gets averaged out. [ The denotion of the workloads is similar to the latency measurements: for example "2x3" means 2 processes, 3 threads per process. See the 'perf bench' tool for details. ] The 'numa02' and 'numa01-THREAD' tests are AutoNUMA-benchmark work-alike workloads, with a shorter runtime for numa01. The results are: [ Higher numbers are better. ] [test unit] : v3.7 |balancenuma-v10| AutoNUMA-v28 | numa-u-v3 | ------------------------------------------------------------------------------------------ 2x1-bw-process : 6.248| 6.136: -1.8%| 8.073: 29.2%| 9.647: 54.4%| GB/sec 3x1-bw-process : 7.292| 7.250: -0.6%| 12.583: 72.6%| 14.528: 99.2%| GB/sec 4x1-bw-process : 6.007| 6.867: 14.3%| 12.313: 105.0%| 18.903: 214.7%| GB/sec 8x1-bw-process : 6.100| 7.974: 30.7%| 20.237: 231.8%| 26.829: 339.8%| GB/sec 8x1-bw-process-NOTHP : 5.944| 5.937: -0.1%| 17.831: 200.0%| 22.237: 274.1%| GB/sec 16x1-bw-process : 5.607| 5.592: -0.3%| 5.959: 6.3%| 29.294: 422.5%| GB/sec 4x1-bw-thread : 6.035| 13.598: 125.3%| 17.443: 189.0%| 19.290: 219.6%| GB/sec 8x1-bw-thread : 5.941| 16.356: 175.3%| 22.433: 277.6%| 26.391: 344.2%| GB/sec 16x1-bw-thread : 5.648| 24.608: 335.7%| 20.204: 257.7%| 29.557: 423.3%| GB/sec 32x1-bw-thread : 5.929| 25.477: 329.7%| 18.230: 207.5%| 30.232: 409.9%| GB/sec 2x3-bw-thread : 5.756| 8.785: 52.6%| 14.652: 154.6%| 15.327: 166.3%| GB/sec 4x4-bw-thread : 5.605| 6.366: 13.6%| 9.835: 75.5%| 27.957: 398.8%| GB/sec 4x6-bw-thread : 5.771| 6.287: 8.9%| 15.372: 166.4%| 27.877: 383.1%| GB/sec 4x8-bw-thread : 5.858| 5.860: 0.0%| 11.865: 102.5%| 28.439: 385.5%| GB/sec 4x8-bw-thread-NOTHP : 5.645| 6.167: 9.2%| 9.224: 63.4%| 25.067: 344.1%| GB/sec 3x3-bw-thread : 5.937| 8.235: 38.7%| 6.635: 11.8%| 21.560: 263.1%| GB/sec 5x5-bw-thread : 5.771| 5.762: -0.2%| 9.575: 65.9%| 26.081: 351.9%| GB/sec 2x16-bw-thread : 5.953| 5.920: -0.6%| 5.945: -0.1%| 23.269: 290.9%| GB/sec 1x32-bw-thread : 5.879| 5.828: -0.9%| 5.848: -0.5%| 18.985: 222.9%| GB/sec numa02-bw : 6.049| 29.054: 380.3%| 24.744: 309.1%| 31.431: 419.6%| GB/sec numa02-bw-NOTHP : 5.850| 27.064: 362.6%| 20.415: 249.0%| 29.104: 397.5%| GB/sec numa01-bw-thread : 5.834| 20.338: 248.6%| 15.169: 160.0%| 28.607: 390.3%| GB/sec numa01-bw-thread-NOTHP : 5.581| 18.528: 232.0%| 12.108: 117.0%| 21.119: 278.4%| GB/sec ------------------------------------------------------------------------------------------ The first column shows mainline kernel bandwidth in GB/sec, the following 3 colums show pairs of GB/sec bandwidth and percentage results, where percentage shows the speed difference to the mainline kernel. Noise is 1-2% in these tests with these durations, so the good news is that none of the NUMA kernels regresses on these workloads against the mainline kernel. Perhaps balancenuma's "2x1-bw-process" and "3x1-bw-process" results might be worth a closer look. No kernel shows particular vulnerability to the NOTHP tests that were mixed into the test stream. As can be expected from the convergence latency results, the 'balancenuma' tree does well with workloads where there's no relationship between threads - but even there it's outperformed by the AutoNUMA kernel, and outperformed by an even larger margin by the NUMA-v3 kernel. Workloads like the 4x JVM SPECjbb on the other hand pose a challenge to the balancenuma kernel, both the AutoNUMA and the NUMA-v3 kernels are several times faster in those tests. The AutoNUMA kernel does well in most workloads - its weakness are system-wide shared workloads like 2x16-bw-thread and 1x32-bw-thread, where it falls back to mainline performance. The NUMA-v3 kernel outperforms every other NUMA kernel. Here's a direct comparison between the two fastest kernels, the AutoNUMA and the NUMA-v3 kernels: [ Higher numbers are better. ] [test unit] :AutoNUMA| numa-u-v3 | ---------------------------------------------------------- 2x1-bw-process : 8.073| 9.647: 19.5%| GB/sec 3x1-bw-process : 12.583| 14.528: 15.5%| GB/sec 4x1-bw-process : 12.313| 18.903: 53.5%| GB/sec 8x1-bw-process : 20.237| 26.829: 32.6%| GB/sec 8x1-bw-process-NOTHP : 17.831| 22.237: 24.7%| GB/sec 16x1-bw-process : 5.959| 29.294: 391.6%| GB/sec 4x1-bw-thread : 17.443| 19.290: 10.6%| GB/sec 8x1-bw-thread : 22.433| 26.391: 17.6%| GB/sec 16x1-bw-thread : 20.204| 29.557: 46.3%| GB/sec 32x1-bw-thread : 18.230| 30.232: 65.8%| GB/sec 2x3-bw-thread : 14.652| 15.327: 4.6%| GB/sec 4x4-bw-thread : 9.835| 27.957: 184.3%| GB/sec 4x6-bw-thread : 15.372| 27.877: 81.3%| GB/sec 4x8-bw-thread : 11.865| 28.439: 139.7%| GB/sec 4x8-bw-thread-NOTHP : 9.224| 25.067: 171.8%| GB/sec 3x3-bw-thread : 6.635| 21.560: 224.9%| GB/sec 5x5-bw-thread : 9.575| 26.081: 172.4%| GB/sec 2x16-bw-thread : 5.945| 23.269: 291.4%| GB/sec 1x32-bw-thread : 5.848| 18.985: 224.6%| GB/sec numa02-bw : 24.744| 31.431: 27.0%| GB/sec numa02-bw-NOTHP : 20.415| 29.104: 42.6%| GB/sec numa01-bw-thread : 15.169| 28.607: 88.6%| GB/sec numa01-bw-thread-NOTHP : 12.108| 21.119: 74.4%| GB/sec NUMA workload "spread" measurements ----------------------------------- A third, somewhat obscure category of measurements deals with the 'execution spread' between threads. Workloads that have to wait for the result of every thread before they can declare a result are directly limited by this spread. The 'spread' is measured by the percentage difference between the slowest and fastest thread's execution time in a workload: [ Lower numbers are better. ] [test unit] : v3.7 |balancenuma-v10| AutoNUMA-v28 | numa-u-v3 | ------------------------------------------------------------------------------------------ RAM-bw-local : 0.0% | 0.0% | 0.0% | 0.0% | % RAM-bw-local-NOTHP : 0.2% | 0.2% | 0.2% | 0.2% | % RAM-bw-remote : 0.0% | 0.0% | 0.0% | 0.0% | % RAM-bw-local-2x : 0.3% | 0.0% | 0.2% | 0.3% | % RAM-bw-remote-2x : 0.0% | 0.2% | 0.0% | 0.2% | % RAM-bw-cross : 0.4% | 0.2% | 0.0% | 0.1% | % 2x1-bw-process : 0.5% | 0.2% | 0.2% | 0.2% | % 3x1-bw-process : 0.6% | 0.2% | 0.2% | 0.1% | % 4x1-bw-process : 0.4% | 0.8% | 0.2% | 0.3% | % 8x1-bw-process : 0.8% | 0.1% | 0.2% | 0.2% | % 8x1-bw-process-NOTHP : 0.9% | 0.7% | 0.4% | 0.5% | % 16x1-bw-process : 1.0% | 0.9% | 0.6% | 0.1% | % 4x1-bw-thread : 0.1% | 0.1% | 0.1% | 0.1% | % 8x1-bw-thread : 0.2% | 0.1% | 0.1% | 0.2% | % 16x1-bw-thread : 0.3% | 0.1% | 0.1% | 0.1% | % 32x1-bw-thread : 0.3% | 0.1% | 0.1% | 0.1% | % 2x3-bw-thread : 0.4% | 0.3% | 0.3% | 0.3% | % 4x4-bw-thread : 2.3% | 1.4% | 0.8% | 0.4% | % 4x6-bw-thread : 2.5% | 2.2% | 1.0% | 0.6% | % 4x8-bw-thread : 3.9% | 3.7% | 1.3% | 0.9% | % 4x8-bw-thread-NOTHP : 6.0% | 2.5% | 1.5% | 1.0% | % 3x3-bw-thread : 0.5% | 0.4% | 0.5% | 0.3% | % 5x5-bw-thread : 1.8% | 2.7% | 1.3% | 0.7% | % 2x16-bw-thread : 3.7% | 4.1% | 3.6% | 1.1% | % 1x32-bw-thread : 2.9% | 7.3% | 3.5% | 4.4% | % numa02-bw : 0.1% | 0.0% | 0.1% | 0.1% | % numa02-bw-NOTHP : 0.4% | 0.3% | 0.3% | 0.3% | % numa01-bw-thread : 1.3% | 0.4% | 0.3% | 0.3% | % numa01-bw-thread-NOTHP : 1.8% | 0.8% | 0.8% | 0.9% | % The results are pretty good because the runs were relatively short with 20 seconds runtime. Both mainline and balancenuma has trouble with the spread of shared workloads - possibly signalling memory allocation assymetries. Longer - 60 seconds or more - runs of the key workloads would certainly be informative there. NOTHP (4K ptes) increases the spread and non-determinism of every NUMA kernel. The AutoNUMA and NUMA-v3 kernels have the lowest spread, signalling stable NUMA convergence in most scenarios. Finally, below is the (long!) dump of all the raw data, in case someone wants to double-check my results. The perf/bench tool can be used to double check the measurements on other systems. Thanks, Ingo --------------------> Here are the exact kernel versions used: # kernel 1: {v3.7-rc8-18a2f371f5ed} # kernel 2: {balancenuma-v10} # kernel 3: {autonuma-v28-c4bba428cc5c} # kernel 4: {numa/base-v3} --------------------> # # Running test on: Linux vega 3.7.0-rc8+ #3 SMP Fri Dec 7 18:29:16 CET 2012 x86_64 x86_64 x86_64 GNU/Linux # # Running numa/mem benchmark... # Running main, "perf bench numa mem -a" # Running RAM-bw-local, "perf bench numa mem -p 1 -t 1 -P 1024 -C 0 -M 0 -s 20 -zZq --thp 1 --no-data_rand_walk" RAM-bw-local, 20.111, secs, runtime-max/thread RAM-bw-local, 20.106, secs, runtime-min/thread RAM-bw-local, 20.106, secs, runtime-avg/thread RAM-bw-local, 0.013, %, spread-runtime/thread RAM-bw-local, 169.651, GB, data/thread RAM-bw-local, 169.651, GB, data-total RAM-bw-local, 0.119, nsecs, runtime/byte/thread RAM-bw-local, 8.436, GB/sec, thread-speed RAM-bw-local, 8.436, GB/sec, total-speed # Running RAM-bw-local-NOTHP, "perf bench numa mem -p 1 -t 1 -P 1024 -C 0 -M 0 -s 20 -zZq --thp 1 --no-data_rand_walk --thp -1" RAM-bw-local-NOTHP, 20.125, secs, runtime-max/thread RAM-bw-local-NOTHP, 20.050, secs, runtime-min/thread RAM-bw-local-NOTHP, 20.050, secs, runtime-avg/thread RAM-bw-local-NOTHP, 0.187, %, spread-runtime/thread RAM-bw-local-NOTHP, 169.651, GB, data/thread RAM-bw-local-NOTHP, 169.651, GB, data-total RAM-bw-local-NOTHP, 0.119, nsecs, runtime/byte/thread RAM-bw-local-NOTHP, 8.430, GB/sec, thread-speed RAM-bw-local-NOTHP, 8.430, GB/sec, total-speed # Running RAM-bw-remote, "perf bench numa mem -p 1 -t 1 -P 1024 -C 0 -M 1 -s 20 -zZq --thp 1 --no-data_rand_walk" RAM-bw-remote, 20.141, secs, runtime-max/thread RAM-bw-remote, 20.134, secs, runtime-min/thread RAM-bw-remote, 20.134, secs, runtime-avg/thread RAM-bw-remote, 0.017, %, spread-runtime/thread RAM-bw-remote, 135.291, GB, data/thread RAM-bw-remote, 135.291, GB, data-total RAM-bw-remote, 0.149, nsecs, runtime/byte/thread RAM-bw-remote, 6.717, GB/sec, thread-speed RAM-bw-remote, 6.717, GB/sec, total-speed # Running RAM-bw-local-2x, "perf bench numa mem -p 2 -t 1 -P 1024 -C 0,2 -M 0x2 -s 20 -zZq --thp 1 --no-data_rand_walk" RAM-bw-local-2x, 20.128, secs, runtime-max/thread RAM-bw-local-2x, 20.006, secs, runtime-min/thread RAM-bw-local-2x, 20.064, secs, runtime-avg/thread RAM-bw-local-2x, 0.302, %, spread-runtime/thread RAM-bw-local-2x, 132.607, GB, data/thread RAM-bw-local-2x, 265.214, GB, data-total RAM-bw-local-2x, 0.152, nsecs, runtime/byte/thread RAM-bw-local-2x, 6.588, GB/sec, thread-speed RAM-bw-local-2x, 13.177, GB/sec, total-speed # Running RAM-bw-remote-2x, "perf bench numa mem -p 2 -t 1 -P 1024 -C 0,2 -M 1x2 -s 20 -zZq --thp 1 --no-data_rand_walk" RAM-bw-remote-2x, 20.102, secs, runtime-max/thread RAM-bw-remote-2x, 20.094, secs, runtime-min/thread RAM-bw-remote-2x, 20.094, secs, runtime-avg/thread RAM-bw-remote-2x, 0.021, %, spread-runtime/thread RAM-bw-remote-2x, 74.088, GB, data/thread RAM-bw-remote-2x, 148.176, GB, data-total RAM-bw-remote-2x, 0.271, nsecs, runtime/byte/thread RAM-bw-remote-2x, 3.686, GB/sec, thread-speed RAM-bw-remote-2x, 7.371, GB/sec, total-speed # Running RAM-bw-cross, "perf bench numa mem -p 2 -t 1 -P 1024 -C 0,8 -M 1,0 -s 20 -zZq --thp 1 --no-data_rand_walk" RAM-bw-cross, 20.159, secs, runtime-max/thread RAM-bw-cross, 20.011, secs, runtime-min/thread RAM-bw-cross, 20.081, secs, runtime-avg/thread RAM-bw-cross, 0.369, %, spread-runtime/thread RAM-bw-cross, 122.407, GB, data/thread RAM-bw-cross, 244.813, GB, data-total RAM-bw-cross, 0.165, nsecs, runtime/byte/thread RAM-bw-cross, 6.072, GB/sec, thread-speed RAM-bw-cross, 12.144, GB/sec, total-speed # Running 1x3-convergence, "perf bench numa mem -p 1 -t 3 -P 512 -s 100 -zZ0qcm --thp 1" 1x3-convergence, 100.103, secs, NUMA-convergence-latency 1x3-convergence, 100.103, secs, runtime-max/thread 1x3-convergence, 100.082, secs, runtime-min/thread 1x3-convergence, 100.093, secs, runtime-avg/thread 1x3-convergence, 0.010, %, spread-runtime/thread 1x3-convergence, 278.636, GB, data/thread 1x3-convergence, 835.908, GB, data-total 1x3-convergence, 0.359, nsecs, runtime/byte/thread 1x3-convergence, 2.784, GB/sec, thread-speed 1x3-convergence, 8.351, GB/sec, total-speed # Running 1x4-convergence, "perf bench numa mem -p 1 -t 4 -P 512 -s 100 -zZ0qcm --thp 1" 1x4-convergence, 100.211, secs, NUMA-convergence-latency 1x4-convergence, 100.211, secs, runtime-max/thread 1x4-convergence, 100.070, secs, runtime-min/thread 1x4-convergence, 100.140, secs, runtime-avg/thread 1x4-convergence, 0.070, %, spread-runtime/thread 1x4-convergence, 154.887, GB, data/thread 1x4-convergence, 619.549, GB, data-total 1x4-convergence, 0.647, nsecs, runtime/byte/thread 1x4-convergence, 1.546, GB/sec, thread-speed 1x4-convergence, 6.182, GB/sec, total-speed # Running 1x6-convergence, "perf bench numa mem -p 1 -t 6 -P 1020 -s 100 -zZ0qcm --thp 1" 1x6-convergence, 100.343, secs, NUMA-convergence-latency 1x6-convergence, 100.343, secs, runtime-max/thread 1x6-convergence, 100.235, secs, runtime-min/thread 1x6-convergence, 100.303, secs, runtime-avg/thread 1x6-convergence, 0.054, %, spread-runtime/thread 1x6-convergence, 95.725, GB, data/thread 1x6-convergence, 574.347, GB, data-total 1x6-convergence, 1.048, nsecs, runtime/byte/thread 1x6-convergence, 0.954, GB/sec, thread-speed 1x6-convergence, 5.724, GB/sec, total-speed # Running 2x3-convergence, "perf bench numa mem -p 3 -t 3 -P 1020 -s 100 -zZ0qcm --thp 1" 2x3-convergence, 100.601, secs, NUMA-convergence-latency 2x3-convergence, 100.601, secs, runtime-max/thread 2x3-convergence, 100.054, secs, runtime-min/thread 2x3-convergence, 100.307, secs, runtime-avg/thread 2x3-convergence, 0.272, %, spread-runtime/thread 2x3-convergence, 65.837, GB, data/thread 2x3-convergence, 592.529, GB, data-total 2x3-convergence, 1.528, nsecs, runtime/byte/thread 2x3-convergence, 0.654, GB/sec, thread-speed 2x3-convergence, 5.890, GB/sec, total-speed # Running 3x3-convergence, "perf bench numa mem -p 3 -t 3 -P 1020 -s 100 -zZ0qcm --thp 1" 3x3-convergence, 100.572, secs, NUMA-convergence-latency 3x3-convergence, 100.572, secs, runtime-max/thread 3x3-convergence, 100.095, secs, runtime-min/thread 3x3-convergence, 100.330, secs, runtime-avg/thread 3x3-convergence, 0.238, %, spread-runtime/thread 3x3-convergence, 65.837, GB, data/thread 3x3-convergence, 592.529, GB, data-total 3x3-convergence, 1.528, nsecs, runtime/byte/thread 3x3-convergence, 0.655, GB/sec, thread-speed 3x3-convergence, 5.892, GB/sec, total-speed # Running 4x4-convergence, "perf bench numa mem -p 4 -t 4 -P 512 -s 100 -zZ0qcm --thp 1" 4x4-convergence, 100.571, secs, NUMA-convergence-latency 4x4-convergence, 100.571, secs, runtime-max/thread 4x4-convergence, 100.122, secs, runtime-min/thread 4x4-convergence, 100.386, secs, runtime-avg/thread 4x4-convergence, 0.223, %, spread-runtime/thread 4x4-convergence, 35.266, GB, data/thread 4x4-convergence, 564.251, GB, data-total 4x4-convergence, 2.852, nsecs, runtime/byte/thread 4x4-convergence, 0.351, GB/sec, thread-speed 4x4-convergence, 5.610, GB/sec, total-speed # Running 4x4-convergence-NOTHP, "perf bench numa mem -p 4 -t 4 -P 512 -s 100 -zZ0qcm --thp 1 --thp -1" 4x4-convergence-NOTHP, 101.051, secs, NUMA-convergence-latency 4x4-convergence-NOTHP, 101.051, secs, runtime-max/thread 4x4-convergence-NOTHP, 100.066, secs, runtime-min/thread 4x4-convergence-NOTHP, 100.683, secs, runtime-avg/thread 4x4-convergence-NOTHP, 0.487, %, spread-runtime/thread 4x4-convergence-NOTHP, 35.769, GB, data/thread 4x4-convergence-NOTHP, 572.304, GB, data-total 4x4-convergence-NOTHP, 2.825, nsecs, runtime/byte/thread 4x4-convergence-NOTHP, 0.354, GB/sec, thread-speed 4x4-convergence-NOTHP, 5.664, GB/sec, total-speed # Running 4x6-convergence, "perf bench numa mem -p 4 -t 6 -P 1020 -s 100 -zZ0qcm --thp 1" 4x6-convergence, 5.444, secs, NUMA-convergence-latency 4x6-convergence, 5.444, secs, runtime-max/thread 4x6-convergence, 2.853, secs, runtime-min/thread 4x6-convergence, 4.531, secs, runtime-avg/thread 4x6-convergence, 23.794, %, spread-runtime/thread 4x6-convergence, 1.292, GB, data/thread 4x6-convergence, 31.017, GB, data-total 4x6-convergence, 4.212, nsecs, runtime/byte/thread 4x6-convergence, 0.237, GB/sec, thread-speed 4x6-convergence, 5.698, GB/sec, total-speed # Running 4x8-convergence, "perf bench numa mem -p 4 -t 8 -P 512 -s 100 -zZ0qcm --thp 1" 4x8-convergence, 101.133, secs, NUMA-convergence-latency 4x8-convergence, 101.133, secs, runtime-max/thread 4x8-convergence, 100.455, secs, runtime-min/thread 4x8-convergence, 100.803, secs, runtime-avg/thread 4x8-convergence, 0.335, %, spread-runtime/thread 4x8-convergence, 18.522, GB, data/thread 4x8-convergence, 592.705, GB, data-total 4x8-convergence, 5.460, nsecs, runtime/byte/thread 4x8-convergence, 0.183, GB/sec, thread-speed 4x8-convergence, 5.861, GB/sec, total-speed # Running 8x4-convergence, "perf bench numa mem -p 8 -t 4 -P 512 -s 100 -zZ0qcm --thp 1" 8x4-convergence, 100.878, secs, NUMA-convergence-latency 8x4-convergence, 100.878, secs, runtime-max/thread 8x4-convergence, 100.021, secs, runtime-min/thread 8x4-convergence, 100.567, secs, runtime-avg/thread 8x4-convergence, 0.425, %, spread-runtime/thread 8x4-convergence, 18.388, GB, data/thread 8x4-convergence, 588.411, GB, data-total 8x4-convergence, 5.486, nsecs, runtime/byte/thread 8x4-convergence, 0.182, GB/sec, thread-speed 8x4-convergence, 5.833, GB/sec, total-speed # Running 8x4-convergence-NOTHP, "perf bench numa mem -p 8 -t 4 -P 512 -s 100 -zZ0qcm --thp 1 --thp -1" 8x4-convergence-NOTHP, 101.868, secs, NUMA-convergence-latency 8x4-convergence-NOTHP, 101.868, secs, runtime-max/thread 8x4-convergence-NOTHP, 100.499, secs, runtime-min/thread 8x4-convergence-NOTHP, 101.118, secs, runtime-avg/thread 8x4-convergence-NOTHP, 0.672, %, spread-runtime/thread 8x4-convergence-NOTHP, 17.851, GB, data/thread 8x4-convergence-NOTHP, 571.231, GB, data-total 8x4-convergence-NOTHP, 5.707, nsecs, runtime/byte/thread 8x4-convergence-NOTHP, 0.175, GB/sec, thread-speed 8x4-convergence-NOTHP, 5.608, GB/sec, total-speed # Running 3x1-convergence, "perf bench numa mem -p 3 -t 1 -P 512 -s 100 -zZ0qcm --thp 1" 3x1-convergence, 0.652, secs, NUMA-convergence-latency 3x1-convergence, 0.652, secs, runtime-max/thread 3x1-convergence, 0.471, secs, runtime-min/thread 3x1-convergence, 0.584, secs, runtime-avg/thread 3x1-convergence, 13.878, %, spread-runtime/thread 3x1-convergence, 1.432, GB, data/thread 3x1-convergence, 4.295, GB, data-total 3x1-convergence, 0.456, nsecs, runtime/byte/thread 3x1-convergence, 2.195, GB/sec, thread-speed 3x1-convergence, 6.584, GB/sec, total-speed # Running 4x1-convergence, "perf bench numa mem -p 4 -t 1 -P 512 -s 100 -zZ0qcm --thp 1" 4x1-convergence, 0.643, secs, NUMA-convergence-latency 4x1-convergence, 0.643, secs, runtime-max/thread 4x1-convergence, 0.479, secs, runtime-min/thread 4x1-convergence, 0.562, secs, runtime-avg/thread 4x1-convergence, 12.750, %, spread-runtime/thread 4x1-convergence, 1.074, GB, data/thread 4x1-convergence, 4.295, GB, data-total 4x1-convergence, 0.599, nsecs, runtime/byte/thread 4x1-convergence, 1.669, GB/sec, thread-speed 4x1-convergence, 6.677, GB/sec, total-speed # Running 8x1-convergence, "perf bench numa mem -p 8 -t 1 -P 512 -s 100 -zZ0qcm --thp 1" 8x1-convergence, 2.803, secs, NUMA-convergence-latency 8x1-convergence, 2.803, secs, runtime-max/thread 8x1-convergence, 2.509, secs, runtime-min/thread 8x1-convergence, 2.664, secs, runtime-avg/thread 8x1-convergence, 5.250, %, spread-runtime/thread 8x1-convergence, 2.147, GB, data/thread 8x1-convergence, 17.180, GB, data-total 8x1-convergence, 1.305, nsecs, runtime/byte/thread 8x1-convergence, 0.766, GB/sec, thread-speed 8x1-convergence, 6.129, GB/sec, total-speed # Running 16x1-convergence, "perf bench numa mem -p 16 -t 1 -P 256 -s 100 -zZ0qcm --thp 1" 16x1-convergence, 3.482, secs, NUMA-convergence-latency 16x1-convergence, 3.482, secs, runtime-max/thread 16x1-convergence, 3.162, secs, runtime-min/thread 16x1-convergence, 3.328, secs, runtime-avg/thread 16x1-convergence, 4.603, %, spread-runtime/thread 16x1-convergence, 1.242, GB, data/thread 16x1-convergence, 19.864, GB, data-total 16x1-convergence, 2.805, nsecs, runtime/byte/thread 16x1-convergence, 0.357, GB/sec, thread-speed 16x1-convergence, 5.704, GB/sec, total-speed # Running 32x1-convergence, "perf bench numa mem -p 32 -t 1 -P 128 -s 100 -zZ0qcm --thp 1" 32x1-convergence, 3.612, secs, NUMA-convergence-latency 32x1-convergence, 3.612, secs, runtime-max/thread 32x1-convergence, 3.170, secs, runtime-min/thread 32x1-convergence, 3.456, secs, runtime-avg/thread 32x1-convergence, 6.118, %, spread-runtime/thread 32x1-convergence, 0.671, GB, data/thread 32x1-convergence, 21.475, GB, data-total 32x1-convergence, 5.382, nsecs, runtime/byte/thread 32x1-convergence, 0.186, GB/sec, thread-speed 32x1-convergence, 5.945, GB/sec, total-speed # Running 2x1-bw-process, "perf bench numa mem -p 2 -t 1 -P 1024 -s 20 -zZ0q --thp 1" 2x1-bw-process, 20.280, secs, runtime-max/thread 2x1-bw-process, 20.059, secs, runtime-min/thread 2x1-bw-process, 20.166, secs, runtime-avg/thread 2x1-bw-process, 0.546, %, spread-runtime/thread 2x1-bw-process, 63.351, GB, data/thread 2x1-bw-process, 126.702, GB, data-total 2x1-bw-process, 0.320, nsecs, runtime/byte/thread 2x1-bw-process, 3.124, GB/sec, thread-speed 2x1-bw-process, 6.248, GB/sec, total-speed # Running 3x1-bw-process, "perf bench numa mem -p 3 -t 1 -P 1024 -s 20 -zZ0q --thp 1" 3x1-bw-process, 20.320, secs, runtime-max/thread 3x1-bw-process, 20.078, secs, runtime-min/thread 3x1-bw-process, 20.202, secs, runtime-avg/thread 3x1-bw-process, 0.595, %, spread-runtime/thread 3x1-bw-process, 49.392, GB, data/thread 3x1-bw-process, 148.176, GB, data-total 3x1-bw-process, 0.411, nsecs, runtime/byte/thread 3x1-bw-process, 2.431, GB/sec, thread-speed 3x1-bw-process, 7.292, GB/sec, total-speed # Running 4x1-bw-process, "perf bench numa mem -p 4 -t 1 -P 1024 -s 20 -zZ0q --thp 1" 4x1-bw-process, 20.379, secs, runtime-max/thread 4x1-bw-process, 20.210, secs, runtime-min/thread 4x1-bw-process, 20.291, secs, runtime-avg/thread 4x1-bw-process, 0.413, %, spread-runtime/thread 4x1-bw-process, 30.602, GB, data/thread 4x1-bw-process, 122.407, GB, data-total 4x1-bw-process, 0.666, nsecs, runtime/byte/thread 4x1-bw-process, 1.502, GB/sec, thread-speed 4x1-bw-process, 6.007, GB/sec, total-speed # Running 8x1-bw-process, "perf bench numa mem -p 8 -t 1 -P 512 -s 20 -zZ0q --thp 1" 8x1-bw-process, 20.419, secs, runtime-max/thread 8x1-bw-process, 20.073, secs, runtime-min/thread 8x1-bw-process, 20.328, secs, runtime-avg/thread 8x1-bw-process, 0.848, %, spread-runtime/thread 8x1-bw-process, 15.569, GB, data/thread 8x1-bw-process, 124.554, GB, data-total 8x1-bw-process, 1.311, nsecs, runtime/byte/thread 8x1-bw-process, 0.762, GB/sec, thread-speed 8x1-bw-process, 6.100, GB/sec, total-speed # Running 8x1-bw-process-NOTHP, "perf bench numa mem -p 8 -t 1 -P 512 -s 20 -zZ0q --thp 1 --thp -1" 8x1-bw-process-NOTHP, 20.502, secs, runtime-max/thread 8x1-bw-process-NOTHP, 20.113, secs, runtime-min/thread 8x1-bw-process-NOTHP, 20.307, secs, runtime-avg/thread 8x1-bw-process-NOTHP, 0.950, %, spread-runtime/thread 8x1-bw-process-NOTHP, 15.234, GB, data/thread 8x1-bw-process-NOTHP, 121.870, GB, data-total 8x1-bw-process-NOTHP, 1.346, nsecs, runtime/byte/thread 8x1-bw-process-NOTHP, 0.743, GB/sec, thread-speed 8x1-bw-process-NOTHP, 5.944, GB/sec, total-speed # Running 16x1-bw-process, "perf bench numa mem -p 16 -t 1 -P 256 -s 20 -zZ0q --thp 1" 16x1-bw-process, 20.539, secs, runtime-max/thread 16x1-bw-process, 20.145, secs, runtime-min/thread 16x1-bw-process, 20.407, secs, runtime-avg/thread 16x1-bw-process, 0.959, %, spread-runtime/thread 16x1-bw-process, 7.197, GB, data/thread 16x1-bw-process, 115.159, GB, data-total 16x1-bw-process, 2.854, nsecs, runtime/byte/thread 16x1-bw-process, 0.350, GB/sec, thread-speed 16x1-bw-process, 5.607, GB/sec, total-speed # Running 4x1-bw-thread, "perf bench numa mem -p 1 -t 4 -T 256 -s 20 -zZ0q --thp 1" 4x1-bw-thread, 20.105, secs, runtime-max/thread 4x1-bw-thread, 20.047, secs, runtime-min/thread 4x1-bw-thread, 20.071, secs, runtime-avg/thread 4x1-bw-thread, 0.144, %, spread-runtime/thread 4x1-bw-thread, 30.333, GB, data/thread 4x1-bw-thread, 121.333, GB, data-total 4x1-bw-thread, 0.663, nsecs, runtime/byte/thread 4x1-bw-thread, 1.509, GB/sec, thread-speed 4x1-bw-thread, 6.035, GB/sec, total-speed # Running 8x1-bw-thread, "perf bench numa mem -p 1 -t 8 -T 256 -s 20 -zZ0q --thp 1" 8x1-bw-thread, 20.106, secs, runtime-max/thread 8x1-bw-thread, 20.021, secs, runtime-min/thread 8x1-bw-thread, 20.062, secs, runtime-avg/thread 8x1-bw-thread, 0.213, %, spread-runtime/thread 8x1-bw-thread, 14.932, GB, data/thread 8x1-bw-thread, 119.454, GB, data-total 8x1-bw-thread, 1.347, nsecs, runtime/byte/thread 8x1-bw-thread, 0.743, GB/sec, thread-speed 8x1-bw-thread, 5.941, GB/sec, total-speed # Running 16x1-bw-thread, "perf bench numa mem -p 1 -t 16 -T 128 -s 20 -zZ0q --thp 1" 16x1-bw-thread, 20.176, secs, runtime-max/thread 16x1-bw-thread, 20.049, secs, runtime-min/thread 16x1-bw-thread, 20.125, secs, runtime-avg/thread 16x1-bw-thread, 0.314, %, spread-runtime/thread 16x1-bw-thread, 7.122, GB, data/thread 16x1-bw-thread, 113.951, GB, data-total 16x1-bw-thread, 2.833, nsecs, runtime/byte/thread 16x1-bw-thread, 0.353, GB/sec, thread-speed 16x1-bw-thread, 5.648, GB/sec, total-speed # Running 32x1-bw-thread, "perf bench numa mem -p 1 -t 32 -T 64 -s 20 -zZ0q --thp 1" 32x1-bw-thread, 20.159, secs, runtime-max/thread 32x1-bw-thread, 20.034, secs, runtime-min/thread 32x1-bw-thread, 20.120, secs, runtime-avg/thread 32x1-bw-thread, 0.309, %, spread-runtime/thread 32x1-bw-thread, 3.735, GB, data/thread 32x1-bw-thread, 119.521, GB, data-total 32x1-bw-thread, 5.397, nsecs, runtime/byte/thread 32x1-bw-thread, 0.185, GB/sec, thread-speed 32x1-bw-thread, 5.929, GB/sec, total-speed # Running 2x3-bw-thread, "perf bench numa mem -p 2 -t 3 -P 512 -s 20 -zZ0q --thp 1" 2x3-bw-thread, 20.239, secs, runtime-max/thread 2x3-bw-thread, 20.092, secs, runtime-min/thread 2x3-bw-thread, 20.183, secs, runtime-avg/thread 2x3-bw-thread, 0.363, %, spread-runtime/thread 2x3-bw-thread, 19.417, GB, data/thread 2x3-bw-thread, 116.501, GB, data-total 2x3-bw-thread, 1.042, nsecs, runtime/byte/thread 2x3-bw-thread, 0.959, GB/sec, thread-speed 2x3-bw-thread, 5.756, GB/sec, total-speed # Running 4x4-bw-thread, "perf bench numa mem -p 4 -t 4 -P 512 -s 20 -zZ0q --thp 1" 4x4-bw-thread, 20.978, secs, runtime-max/thread 4x4-bw-thread, 20.005, secs, runtime-min/thread 4x4-bw-thread, 20.576, secs, runtime-avg/thread 4x4-bw-thread, 2.321, %, spread-runtime/thread 4x4-bw-thread, 7.348, GB, data/thread 4x4-bw-thread, 117.575, GB, data-total 4x4-bw-thread, 2.855, nsecs, runtime/byte/thread 4x4-bw-thread, 0.350, GB/sec, thread-speed 4x4-bw-thread, 5.605, GB/sec, total-speed # Running 4x6-bw-thread, "perf bench numa mem -p 4 -t 6 -P 512 -s 20 -zZ0q --thp 1" 4x6-bw-thread, 21.118, secs, runtime-max/thread 4x6-bw-thread, 20.082, secs, runtime-min/thread 4x6-bw-thread, 20.819, secs, runtime-avg/thread 4x6-bw-thread, 2.451, %, spread-runtime/thread 4x6-bw-thread, 5.078, GB, data/thread 4x6-bw-thread, 121.870, GB, data-total 4x6-bw-thread, 4.159, nsecs, runtime/byte/thread 4x6-bw-thread, 0.240, GB/sec, thread-speed 4x6-bw-thread, 5.771, GB/sec, total-speed # Running 4x8-bw-thread, "perf bench numa mem -p 4 -t 8 -P 512 -s 20 -zZ0q --thp 1" 4x8-bw-thread, 21.994, secs, runtime-max/thread 4x8-bw-thread, 20.290, secs, runtime-min/thread 4x8-bw-thread, 21.387, secs, runtime-avg/thread 4x8-bw-thread, 3.874, %, spread-runtime/thread 4x8-bw-thread, 4.027, GB, data/thread 4x8-bw-thread, 128.849, GB, data-total 4x8-bw-thread, 5.462, nsecs, runtime/byte/thread 4x8-bw-thread, 0.183, GB/sec, thread-speed 4x8-bw-thread, 5.858, GB/sec, total-speed # Running 4x8-bw-thread-NOTHP, "perf bench numa mem -p 4 -t 8 -P 512 -s 20 -zZ0q --thp 1 --thp -1" 4x8-bw-thread-NOTHP, 22.728, secs, runtime-max/thread 4x8-bw-thread-NOTHP, 20.013, secs, runtime-min/thread 4x8-bw-thread-NOTHP, 21.968, secs, runtime-avg/thread 4x8-bw-thread-NOTHP, 5.975, %, spread-runtime/thread 4x8-bw-thread-NOTHP, 4.010, GB, data/thread 4x8-bw-thread-NOTHP, 128.312, GB, data-total 4x8-bw-thread-NOTHP, 5.668, nsecs, runtime/byte/thread 4x8-bw-thread-NOTHP, 0.176, GB/sec, thread-speed 4x8-bw-thread-NOTHP, 5.645, GB/sec, total-speed # Running 3x3-bw-thread, "perf bench numa mem -p 3 -t 3 -P 512 -s 20 -zZ0q --thp 1" 3x3-bw-thread, 20.526, secs, runtime-max/thread 3x3-bw-thread, 20.317, secs, runtime-min/thread 3x3-bw-thread, 20.467, secs, runtime-avg/thread 3x3-bw-thread, 0.510, %, spread-runtime/thread 3x3-bw-thread, 13.541, GB, data/thread 3x3-bw-thread, 121.870, GB, data-total 3x3-bw-thread, 1.516, nsecs, runtime/byte/thread 3x3-bw-thread, 0.660, GB/sec, thread-speed 3x3-bw-thread, 5.937, GB/sec, total-speed # Running 5x5-bw-thread, "perf bench numa mem -p 5 -t 5 -P 512 -s 20 -zZ0q --thp 1" 5x5-bw-thread, 21.023, secs, runtime-max/thread 5x5-bw-thread, 20.252, secs, runtime-min/thread 5x5-bw-thread, 20.701, secs, runtime-avg/thread 5x5-bw-thread, 1.833, %, spread-runtime/thread 5x5-bw-thread, 4.853, GB, data/thread 5x5-bw-thread, 121.333, GB, data-total 5x5-bw-thread, 4.332, nsecs, runtime/byte/thread 5x5-bw-thread, 0.231, GB/sec, thread-speed 5x5-bw-thread, 5.771, GB/sec, total-speed # Running 2x16-bw-thread, "perf bench numa mem -p 2 -t 16 -P 512 -s 20 -zZ0q --thp 1" 2x16-bw-thread, 21.646, secs, runtime-max/thread 2x16-bw-thread, 20.065, secs, runtime-min/thread 2x16-bw-thread, 21.026, secs, runtime-avg/thread 2x16-bw-thread, 3.652, %, spread-runtime/thread 2x16-bw-thread, 4.027, GB, data/thread 2x16-bw-thread, 128.849, GB, data-total 2x16-bw-thread, 5.376, nsecs, runtime/byte/thread 2x16-bw-thread, 0.186, GB/sec, thread-speed 2x16-bw-thread, 5.953, GB/sec, total-speed # Running 1x32-bw-thread, "perf bench numa mem -p 1 -t 32 -P 2048 -s 20 -zZ0q --thp 1" 1x32-bw-thread, 23.377, secs, runtime-max/thread 1x32-bw-thread, 22.030, secs, runtime-min/thread 1x32-bw-thread, 22.936, secs, runtime-avg/thread 1x32-bw-thread, 2.881, %, spread-runtime/thread 1x32-bw-thread, 4.295, GB, data/thread 1x32-bw-thread, 137.439, GB, data-total 1x32-bw-thread, 5.443, nsecs, runtime/byte/thread 1x32-bw-thread, 0.184, GB/sec, thread-speed 1x32-bw-thread, 5.879, GB/sec, total-speed # Running numa02-bw, "perf bench numa mem -p 1 -t 32 -T 32 -s 20 -zZ0q --thp 1" numa02-bw, 20.065, secs, runtime-max/thread numa02-bw, 20.012, secs, runtime-min/thread numa02-bw, 20.050, secs, runtime-avg/thread numa02-bw, 0.132, %, spread-runtime/thread numa02-bw, 3.793, GB, data/thread numa02-bw, 121.366, GB, data-total numa02-bw, 5.290, nsecs, runtime/byte/thread numa02-bw, 0.189, GB/sec, thread-speed numa02-bw, 6.049, GB/sec, total-speed # Running numa02-bw-NOTHP, "perf bench numa mem -p 1 -t 32 -T 32 -s 20 -zZ0q --thp 1 --thp -1" numa02-bw-NOTHP, 20.132, secs, runtime-max/thread numa02-bw-NOTHP, 19.987, secs, runtime-min/thread numa02-bw-NOTHP, 20.049, secs, runtime-avg/thread numa02-bw-NOTHP, 0.360, %, spread-runtime/thread numa02-bw-NOTHP, 3.681, GB, data/thread numa02-bw-NOTHP, 117.776, GB, data-total numa02-bw-NOTHP, 5.470, nsecs, runtime/byte/thread numa02-bw-NOTHP, 0.183, GB/sec, thread-speed numa02-bw-NOTHP, 5.850, GB/sec, total-speed # Running numa01-bw-thread, "perf bench numa mem -p 2 -t 16 -T 192 -s 20 -zZ0q --thp 1" numa01-bw-thread, 20.704, secs, runtime-max/thread numa01-bw-thread, 20.185, secs, runtime-min/thread numa01-bw-thread, 20.571, secs, runtime-avg/thread numa01-bw-thread, 1.254, %, spread-runtime/thread numa01-bw-thread, 3.775, GB, data/thread numa01-bw-thread, 120.796, GB, data-total numa01-bw-thread, 5.485, nsecs, runtime/byte/thread numa01-bw-thread, 0.182, GB/sec, thread-speed numa01-bw-thread, 5.834, GB/sec, total-speed # Running numa01-bw-thread-NOTHP, "perf bench numa mem -p 2 -t 16 -T 192 -s 20 -zZ0q --thp 1 --thp -1" numa01-bw-thread-NOTHP, 20.780, secs, runtime-max/thread numa01-bw-thread-NOTHP, 20.023, secs, runtime-min/thread numa01-bw-thread-NOTHP, 20.418, secs, runtime-avg/thread numa01-bw-thread-NOTHP, 1.821, %, spread-runtime/thread numa01-bw-thread-NOTHP, 3.624, GB, data/thread numa01-bw-thread-NOTHP, 115.964, GB, data-total numa01-bw-thread-NOTHP, 5.734, nsecs, runtime/byte/thread numa01-bw-thread-NOTHP, 0.174, GB/sec, thread-speed numa01-bw-thread-NOTHP, 5.581, GB/sec, total-speed # # Running test on: Linux vega 3.7.0-rc6+ #2 SMP Fri Dec 7 17:59:13 CET 2012 x86_64 x86_64 x86_64 GNU/Linux # # Running numa/mem benchmark... # Running main, "perf bench numa mem -a" # Running RAM-bw-local, "perf bench numa mem -p 1 -t 1 -P 1024 -C 0 -M 0 -s 20 -zZq --thp 1 --no-data_rand_walk" RAM-bw-local, 20.049, secs, runtime-max/thread RAM-bw-local, 20.044, secs, runtime-min/thread RAM-bw-local, 20.044, secs, runtime-avg/thread RAM-bw-local, 0.014, %, spread-runtime/thread RAM-bw-local, 172.872, GB, data/thread RAM-bw-local, 172.872, GB, data-total RAM-bw-local, 0.116, nsecs, runtime/byte/thread RAM-bw-local, 8.622, GB/sec, thread-speed RAM-bw-local, 8.622, GB/sec, total-speed # Running RAM-bw-local-NOTHP, "perf bench numa mem -p 1 -t 1 -P 1024 -C 0 -M 0 -s 20 -zZq --thp 1 --no-data_rand_walk --thp -1" RAM-bw-local-NOTHP, 20.135, secs, runtime-max/thread RAM-bw-local-NOTHP, 20.059, secs, runtime-min/thread RAM-bw-local-NOTHP, 20.059, secs, runtime-avg/thread RAM-bw-local-NOTHP, 0.189, %, spread-runtime/thread RAM-bw-local-NOTHP, 172.872, GB, data/thread RAM-bw-local-NOTHP, 172.872, GB, data-total RAM-bw-local-NOTHP, 0.116, nsecs, runtime/byte/thread RAM-bw-local-NOTHP, 8.586, GB/sec, thread-speed RAM-bw-local-NOTHP, 8.586, GB/sec, total-speed # Running RAM-bw-remote, "perf bench numa mem -p 1 -t 1 -P 1024 -C 0 -M 1 -s 20 -zZq --thp 1 --no-data_rand_walk" RAM-bw-remote, 20.080, secs, runtime-max/thread RAM-bw-remote, 20.073, secs, runtime-min/thread RAM-bw-remote, 20.073, secs, runtime-avg/thread RAM-bw-remote, 0.017, %, spread-runtime/thread RAM-bw-remote, 135.291, GB, data/thread RAM-bw-remote, 135.291, GB, data-total RAM-bw-remote, 0.148, nsecs, runtime/byte/thread RAM-bw-remote, 6.738, GB/sec, thread-speed RAM-bw-remote, 6.738, GB/sec, total-speed # Running RAM-bw-local-2x, "perf bench numa mem -p 2 -t 1 -P 1024 -C 0,2 -M 0x2 -s 20 -zZq --thp 1 --no-data_rand_walk" RAM-bw-local-2x, 20.127, secs, runtime-max/thread RAM-bw-local-2x, 20.111, secs, runtime-min/thread RAM-bw-local-2x, 20.116, secs, runtime-avg/thread RAM-bw-local-2x, 0.038, %, spread-runtime/thread RAM-bw-local-2x, 130.997, GB, data/thread RAM-bw-local-2x, 261.993, GB, data-total RAM-bw-local-2x, 0.154, nsecs, runtime/byte/thread RAM-bw-local-2x, 6.509, GB/sec, thread-speed RAM-bw-local-2x, 13.017, GB/sec, total-speed # Running RAM-bw-remote-2x, "perf bench numa mem -p 2 -t 1 -P 1024 -C 0,2 -M 1x2 -s 20 -zZq --thp 1 --no-data_rand_walk" RAM-bw-remote-2x, 20.183, secs, runtime-max/thread RAM-bw-remote-2x, 20.110, secs, runtime-min/thread RAM-bw-remote-2x, 20.143, secs, runtime-avg/thread RAM-bw-remote-2x, 0.180, %, spread-runtime/thread RAM-bw-remote-2x, 75.162, GB, data/thread RAM-bw-remote-2x, 150.324, GB, data-total RAM-bw-remote-2x, 0.269, nsecs, runtime/byte/thread RAM-bw-remote-2x, 3.724, GB/sec, thread-speed RAM-bw-remote-2x, 7.448, GB/sec, total-speed # Running RAM-bw-cross, "perf bench numa mem -p 2 -t 1 -P 1024 -C 0,8 -M 1,0 -s 20 -zZq --thp 1 --no-data_rand_walk" RAM-bw-cross, 20.159, secs, runtime-max/thread RAM-bw-cross, 20.071, secs, runtime-min/thread RAM-bw-cross, 20.111, secs, runtime-avg/thread RAM-bw-cross, 0.220, %, spread-runtime/thread RAM-bw-cross, 124.017, GB, data/thread RAM-bw-cross, 248.034, GB, data-total RAM-bw-cross, 0.163, nsecs, runtime/byte/thread RAM-bw-cross, 6.152, GB/sec, thread-speed RAM-bw-cross, 12.304, GB/sec, total-speed # Running 1x3-convergence, "perf bench numa mem -p 1 -t 3 -P 512 -s 100 -zZ0qcm --thp 1" 1x3-convergence, 100.038, secs, NUMA-convergence-latency 1x3-convergence, 100.038, secs, runtime-max/thread 1x3-convergence, 100.005, secs, runtime-min/thread 1x3-convergence, 100.016, secs, runtime-avg/thread 1x3-convergence, 0.016, %, spread-runtime/thread 1x3-convergence, 379.210, GB, data/thread 1x3-convergence, 1137.629, GB, data-total 1x3-convergence, 0.264, nsecs, runtime/byte/thread 1x3-convergence, 3.791, GB/sec, thread-speed 1x3-convergence, 11.372, GB/sec, total-speed # Running 1x4-convergence, "perf bench numa mem -p 1 -t 4 -P 512 -s 100 -zZ0qcm --thp 1" 1x4-convergence, 100.091, secs, NUMA-convergence-latency 1x4-convergence, 100.091, secs, runtime-max/thread 1x4-convergence, 100.016, secs, runtime-min/thread 1x4-convergence, 100.053, secs, runtime-avg/thread 1x4-convergence, 0.037, %, spread-runtime/thread 1x4-convergence, 162.672, GB, data/thread 1x4-convergence, 650.688, GB, data-total 1x4-convergence, 0.615, nsecs, runtime/byte/thread 1x4-convergence, 1.625, GB/sec, thread-speed 1x4-convergence, 6.501, GB/sec, total-speed # Running 1x6-convergence, "perf bench numa mem -p 1 -t 6 -P 1020 -s 100 -zZ0qcm --thp 1" 1x6-convergence, 100.366, secs, NUMA-convergence-latency 1x6-convergence, 100.366, secs, runtime-max/thread 1x6-convergence, 100.005, secs, runtime-min/thread 1x6-convergence, 100.144, secs, runtime-avg/thread 1x6-convergence, 0.180, %, spread-runtime/thread 1x6-convergence, 103.924, GB, data/thread 1x6-convergence, 623.546, GB, data-total 1x6-convergence, 0.966, nsecs, runtime/byte/thread 1x6-convergence, 1.035, GB/sec, thread-speed 1x6-convergence, 6.213, GB/sec, total-speed # Running 2x3-convergence, "perf bench numa mem -p 3 -t 3 -P 1020 -s 100 -zZ0qcm --thp 1" 2x3-convergence, 100.632, secs, NUMA-convergence-latency 2x3-convergence, 100.632, secs, runtime-max/thread 2x3-convergence, 100.080, secs, runtime-min/thread 2x3-convergence, 100.376, secs, runtime-avg/thread 2x3-convergence, 0.274, %, spread-runtime/thread 2x3-convergence, 87.941, GB, data/thread 2x3-convergence, 791.465, GB, data-total 2x3-convergence, 1.144, nsecs, runtime/byte/thread 2x3-convergence, 0.874, GB/sec, thread-speed 2x3-convergence, 7.865, GB/sec, total-speed # Running 3x3-convergence, "perf bench numa mem -p 3 -t 3 -P 1020 -s 100 -zZ0qcm --thp 1" 3x3-convergence, 100.474, secs, NUMA-convergence-latency 3x3-convergence, 100.474, secs, runtime-max/thread 3x3-convergence, 100.070, secs, runtime-min/thread 3x3-convergence, 100.338, secs, runtime-avg/thread 3x3-convergence, 0.201, %, spread-runtime/thread 3x3-convergence, 118.363, GB, data/thread 3x3-convergence, 1065.269, GB, data-total 3x3-convergence, 0.849, nsecs, runtime/byte/thread 3x3-convergence, 1.178, GB/sec, thread-speed 3x3-convergence, 10.602, GB/sec, total-speed # Running 4x4-convergence, "perf bench numa mem -p 4 -t 4 -P 512 -s 100 -zZ0qcm --thp 1" 4x4-convergence, 100.527, secs, NUMA-convergence-latency 4x4-convergence, 100.527, secs, runtime-max/thread 4x4-convergence, 100.179, secs, runtime-min/thread 4x4-convergence, 100.353, secs, runtime-avg/thread 4x4-convergence, 0.173, %, spread-runtime/thread 4x4-convergence, 65.230, GB, data/thread 4x4-convergence, 1043.677, GB, data-total 4x4-convergence, 1.541, nsecs, runtime/byte/thread 4x4-convergence, 0.649, GB/sec, thread-speed 4x4-convergence, 10.382, GB/sec, total-speed # Running 4x4-convergence-NOTHP, "perf bench numa mem -p 4 -t 4 -P 512 -s 100 -zZ0qcm --thp 1 --thp -1" 4x4-convergence-NOTHP, 100.532, secs, NUMA-convergence-latency 4x4-convergence-NOTHP, 100.532, secs, runtime-max/thread 4x4-convergence-NOTHP, 100.095, secs, runtime-min/thread 4x4-convergence-NOTHP, 100.343, secs, runtime-avg/thread 4x4-convergence-NOTHP, 0.217, %, spread-runtime/thread 4x4-convergence-NOTHP, 57.311, GB, data/thread 4x4-convergence-NOTHP, 916.976, GB, data-total 4x4-convergence-NOTHP, 1.754, nsecs, runtime/byte/thread 4x4-convergence-NOTHP, 0.570, GB/sec, thread-speed 4x4-convergence-NOTHP, 9.121, GB/sec, total-speed # Running 4x6-convergence, "perf bench numa mem -p 4 -t 6 -P 1020 -s 100 -zZ0qcm --thp 1" 4x6-convergence, 101.230, secs, NUMA-convergence-latency 4x6-convergence, 101.230, secs, runtime-max/thread 4x6-convergence, 100.093, secs, runtime-min/thread 4x6-convergence, 100.825, secs, runtime-avg/thread 4x6-convergence, 0.562, %, spread-runtime/thread 4x6-convergence, 28.076, GB, data/thread 4x6-convergence, 673.815, GB, data-total 4x6-convergence, 3.606, nsecs, runtime/byte/thread 4x6-convergence, 0.277, GB/sec, thread-speed 4x6-convergence, 6.656, GB/sec, total-speed # Running 4x8-convergence, "perf bench numa mem -p 4 -t 8 -P 512 -s 100 -zZ0qcm --thp 1" 4x8-convergence, 101.310, secs, NUMA-convergence-latency 4x8-convergence, 101.310, secs, runtime-max/thread 4x8-convergence, 100.052, secs, runtime-min/thread 4x8-convergence, 100.679, secs, runtime-avg/thread 4x8-convergence, 0.621, %, spread-runtime/thread 4x8-convergence, 18.740, GB, data/thread 4x8-convergence, 599.685, GB, data-total 4x8-convergence, 5.406, nsecs, runtime/byte/thread 4x8-convergence, 0.185, GB/sec, thread-speed 4x8-convergence, 5.919, GB/sec, total-speed # Running 8x4-convergence, "perf bench numa mem -p 8 -t 4 -P 512 -s 100 -zZ0qcm --thp 1" 8x4-convergence, 100.849, secs, NUMA-convergence-latency 8x4-convergence, 100.849, secs, runtime-max/thread 8x4-convergence, 100.020, secs, runtime-min/thread 8x4-convergence, 100.570, secs, runtime-avg/thread 8x4-convergence, 0.411, %, spread-runtime/thread 8x4-convergence, 22.364, GB, data/thread 8x4-convergence, 715.649, GB, data-total 8x4-convergence, 4.509, nsecs, runtime/byte/thread 8x4-convergence, 0.222, GB/sec, thread-speed 8x4-convergence, 7.096, GB/sec, total-speed # Running 8x4-convergence-NOTHP, "perf bench numa mem -p 8 -t 4 -P 512 -s 100 -zZ0qcm --thp 1 --thp -1" 8x4-convergence-NOTHP, 100.976, secs, NUMA-convergence-latency 8x4-convergence-NOTHP, 100.976, secs, runtime-max/thread 8x4-convergence-NOTHP, 100.066, secs, runtime-min/thread 8x4-convergence-NOTHP, 100.580, secs, runtime-avg/thread 8x4-convergence-NOTHP, 0.451, %, spread-runtime/thread 8x4-convergence-NOTHP, 27.146, GB, data/thread 8x4-convergence-NOTHP, 868.657, GB, data-total 8x4-convergence-NOTHP, 3.720, nsecs, runtime/byte/thread 8x4-convergence-NOTHP, 0.269, GB/sec, thread-speed 8x4-convergence-NOTHP, 8.603, GB/sec, total-speed # Running 3x1-convergence, "perf bench numa mem -p 3 -t 1 -P 512 -s 100 -zZ0qcm --thp 1" 3x1-convergence, 1.010, secs, NUMA-convergence-latency 3x1-convergence, 1.010, secs, runtime-max/thread 3x1-convergence, 0.869, secs, runtime-min/thread 3x1-convergence, 0.958, secs, runtime-avg/thread 3x1-convergence, 6.944, %, spread-runtime/thread 3x1-convergence, 2.326, GB, data/thread 3x1-convergence, 6.979, GB, data-total 3x1-convergence, 0.434, nsecs, runtime/byte/thread 3x1-convergence, 2.305, GB/sec, thread-speed 3x1-convergence, 6.914, GB/sec, total-speed # Running 4x1-convergence, "perf bench numa mem -p 4 -t 1 -P 512 -s 100 -zZ0qcm --thp 1" 4x1-convergence, 0.782, secs, NUMA-convergence-latency 4x1-convergence, 0.782, secs, runtime-max/thread 4x1-convergence, 0.623, secs, runtime-min/thread 4x1-convergence, 0.689, secs, runtime-avg/thread 4x1-convergence, 10.122, %, spread-runtime/thread 4x1-convergence, 1.208, GB, data/thread 4x1-convergence, 4.832, GB, data-total 4x1-convergence, 0.647, nsecs, runtime/byte/thread 4x1-convergence, 1.545, GB/sec, thread-speed 4x1-convergence, 6.181, GB/sec, total-speed # Running 8x1-convergence, "perf bench numa mem -p 8 -t 1 -P 512 -s 100 -zZ0qcm --thp 1" 8x1-convergence, 2.914, secs, NUMA-convergence-latency 8x1-convergence, 2.914, secs, runtime-max/thread 8x1-convergence, 2.533, secs, runtime-min/thread 8x1-convergence, 2.750, secs, runtime-avg/thread 8x1-convergence, 6.538, %, spread-runtime/thread 8x1-convergence, 2.215, GB, data/thread 8x1-convergence, 17.717, GB, data-total 8x1-convergence, 1.316, nsecs, runtime/byte/thread 8x1-convergence, 0.760, GB/sec, thread-speed 8x1-convergence, 6.080, GB/sec, total-speed # Running 16x1-convergence, "perf bench numa mem -p 16 -t 1 -P 256 -s 100 -zZ0qcm --thp 1" 16x1-convergence, 3.688, secs, NUMA-convergence-latency 16x1-convergence, 3.688, secs, runtime-max/thread 16x1-convergence, 3.358, secs, runtime-min/thread 16x1-convergence, 3.533, secs, runtime-avg/thread 16x1-convergence, 4.481, %, spread-runtime/thread 16x1-convergence, 1.292, GB, data/thread 16x1-convergence, 20.670, GB, data-total 16x1-convergence, 2.855, nsecs, runtime/byte/thread 16x1-convergence, 0.350, GB/sec, thread-speed 16x1-convergence, 5.604, GB/sec, total-speed # Running 32x1-convergence, "perf bench numa mem -p 32 -t 1 -P 128 -s 100 -zZ0qcm --thp 1" 32x1-convergence, 2.762, secs, NUMA-convergence-latency 32x1-convergence, 2.762, secs, runtime-max/thread 32x1-convergence, 2.552, secs, runtime-min/thread 32x1-convergence, 2.735, secs, runtime-avg/thread 32x1-convergence, 3.807, %, spread-runtime/thread 32x1-convergence, 0.516, GB, data/thread 32x1-convergence, 16.509, GB, data-total 32x1-convergence, 5.354, nsecs, runtime/byte/thread 32x1-convergence, 0.187, GB/sec, thread-speed 32x1-convergence, 5.976, GB/sec, total-speed # Running 2x1-bw-process, "perf bench numa mem -p 2 -t 1 -P 1024 -s 20 -zZ0q --thp 1" 2x1-bw-process, 20.123, secs, runtime-max/thread 2x1-bw-process, 20.053, secs, runtime-min/thread 2x1-bw-process, 20.085, secs, runtime-avg/thread 2x1-bw-process, 0.173, %, spread-runtime/thread 2x1-bw-process, 61.740, GB, data/thread 2x1-bw-process, 123.480, GB, data-total 2x1-bw-process, 0.326, nsecs, runtime/byte/thread 2x1-bw-process, 3.068, GB/sec, thread-speed 2x1-bw-process, 6.136, GB/sec, total-speed # Running 3x1-bw-process, "perf bench numa mem -p 3 -t 1 -P 1024 -s 20 -zZ0q --thp 1" 3x1-bw-process, 20.143, secs, runtime-max/thread 3x1-bw-process, 20.043, secs, runtime-min/thread 3x1-bw-process, 20.091, secs, runtime-avg/thread 3x1-bw-process, 0.249, %, spread-runtime/thread 3x1-bw-process, 48.676, GB, data/thread 3x1-bw-process, 146.029, GB, data-total 3x1-bw-process, 0.414, nsecs, runtime/byte/thread 3x1-bw-process, 2.417, GB/sec, thread-speed 3x1-bw-process, 7.250, GB/sec, total-speed # Running 4x1-bw-process, "perf bench numa mem -p 4 -t 1 -P 1024 -s 20 -zZ0q --thp 1" 4x1-bw-process, 20.327, secs, runtime-max/thread 4x1-bw-process, 20.020, secs, runtime-min/thread 4x1-bw-process, 20.168, secs, runtime-avg/thread 4x1-bw-process, 0.754, %, spread-runtime/thread 4x1-bw-process, 34.897, GB, data/thread 4x1-bw-process, 139.586, GB, data-total 4x1-bw-process, 0.582, nsecs, runtime/byte/thread 4x1-bw-process, 1.717, GB/sec, thread-speed 4x1-bw-process, 6.867, GB/sec, total-speed # Running 8x1-bw-process, "perf bench numa mem -p 8 -t 1 -P 512 -s 20 -zZ0q --thp 1" 8x1-bw-process, 20.063, secs, runtime-max/thread 8x1-bw-process, 20.004, secs, runtime-min/thread 8x1-bw-process, 20.034, secs, runtime-avg/thread 8x1-bw-process, 0.148, %, spread-runtime/thread 8x1-bw-process, 19.998, GB, data/thread 8x1-bw-process, 159.988, GB, data-total 8x1-bw-process, 1.003, nsecs, runtime/byte/thread 8x1-bw-process, 0.997, GB/sec, thread-speed 8x1-bw-process, 7.974, GB/sec, total-speed # Running 8x1-bw-process-NOTHP, "perf bench numa mem -p 8 -t 1 -P 512 -s 20 -zZ0q --thp 1 --thp -1" 8x1-bw-process-NOTHP, 20.435, secs, runtime-max/thread 8x1-bw-process-NOTHP, 20.150, secs, runtime-min/thread 8x1-bw-process-NOTHP, 20.255, secs, runtime-avg/thread 8x1-bw-process-NOTHP, 0.699, %, spread-runtime/thread 8x1-bw-process-NOTHP, 15.167, GB, data/thread 8x1-bw-process-NOTHP, 121.333, GB, data-total 8x1-bw-process-NOTHP, 1.347, nsecs, runtime/byte/thread 8x1-bw-process-NOTHP, 0.742, GB/sec, thread-speed 8x1-bw-process-NOTHP, 5.937, GB/sec, total-speed # Running 16x1-bw-process, "perf bench numa mem -p 16 -t 1 -P 256 -s 20 -zZ0q --thp 1" 16x1-bw-process, 20.451, secs, runtime-max/thread 16x1-bw-process, 20.078, secs, runtime-min/thread 16x1-bw-process, 20.311, secs, runtime-avg/thread 16x1-bw-process, 0.912, %, spread-runtime/thread 16x1-bw-process, 7.147, GB, data/thread 16x1-bw-process, 114.354, GB, data-total 16x1-bw-process, 2.861, nsecs, runtime/byte/thread 16x1-bw-process, 0.349, GB/sec, thread-speed 16x1-bw-process, 5.592, GB/sec, total-speed # Running 4x1-bw-thread, "perf bench numa mem -p 1 -t 4 -T 256 -s 20 -zZ0q --thp 1" 4x1-bw-thread, 20.038, secs, runtime-max/thread 4x1-bw-thread, 20.006, secs, runtime-min/thread 4x1-bw-thread, 20.023, secs, runtime-avg/thread 4x1-bw-thread, 0.079, %, spread-runtime/thread 4x1-bw-thread, 68.115, GB, data/thread 4x1-bw-thread, 272.462, GB, data-total 4x1-bw-thread, 0.294, nsecs, runtime/byte/thread 4x1-bw-thread, 3.399, GB/sec, thread-speed 4x1-bw-thread, 13.598, GB/sec, total-speed # Running 8x1-bw-thread, "perf bench numa mem -p 1 -t 8 -T 256 -s 20 -zZ0q --thp 1" 8x1-bw-thread, 20.055, secs, runtime-max/thread 8x1-bw-thread, 20.001, secs, runtime-min/thread 8x1-bw-thread, 20.033, secs, runtime-avg/thread 8x1-bw-thread, 0.136, %, spread-runtime/thread 8x1-bw-thread, 41.004, GB, data/thread 8x1-bw-thread, 328.028, GB, data-total 8x1-bw-thread, 0.489, nsecs, runtime/byte/thread 8x1-bw-thread, 2.045, GB/sec, thread-speed 8x1-bw-thread, 16.356, GB/sec, total-speed # Running 16x1-bw-thread, "perf bench numa mem -p 1 -t 16 -T 128 -s 20 -zZ0q --thp 1" 16x1-bw-thread, 20.044, secs, runtime-max/thread 16x1-bw-thread, 19.994, secs, runtime-min/thread 16x1-bw-thread, 20.021, secs, runtime-avg/thread 16x1-bw-thread, 0.124, %, spread-runtime/thread 16x1-bw-thread, 30.828, GB, data/thread 16x1-bw-thread, 493.250, GB, data-total 16x1-bw-thread, 0.650, nsecs, runtime/byte/thread 16x1-bw-thread, 1.538, GB/sec, thread-speed 16x1-bw-thread, 24.608, GB/sec, total-speed # Running 32x1-bw-thread, "perf bench numa mem -p 1 -t 32 -T 64 -s 20 -zZ0q --thp 1" 32x1-bw-thread, 19.990, secs, runtime-max/thread 32x1-bw-thread, 19.955, secs, runtime-min/thread 32x1-bw-thread, 19.996, secs, runtime-avg/thread 32x1-bw-thread, 0.087, %, spread-runtime/thread 32x1-bw-thread, 15.915, GB, data/thread 32x1-bw-thread, 509.289, GB, data-total 32x1-bw-thread, 1.256, nsecs, runtime/byte/thread 32x1-bw-thread, 0.796, GB/sec, thread-speed 32x1-bw-thread, 25.477, GB/sec, total-speed # Running 2x3-bw-thread, "perf bench numa mem -p 2 -t 3 -P 512 -s 20 -zZ0q --thp 1" 2x3-bw-thread, 20.168, secs, runtime-max/thread 2x3-bw-thread, 20.028, secs, runtime-min/thread 2x3-bw-thread, 20.103, secs, runtime-avg/thread 2x3-bw-thread, 0.346, %, spread-runtime/thread 2x3-bw-thread, 29.528, GB, data/thread 2x3-bw-thread, 177.167, GB, data-total 2x3-bw-thread, 0.683, nsecs, runtime/byte/thread 2x3-bw-thread, 1.464, GB/sec, thread-speed 2x3-bw-thread, 8.785, GB/sec, total-speed # Running 4x4-bw-thread, "perf bench numa mem -p 4 -t 4 -P 512 -s 20 -zZ0q --thp 1" 4x4-bw-thread, 20.576, secs, runtime-max/thread 4x4-bw-thread, 20.002, secs, runtime-min/thread 4x4-bw-thread, 20.312, secs, runtime-avg/thread 4x4-bw-thread, 1.394, %, spread-runtime/thread 4x4-bw-thread, 8.187, GB, data/thread 4x4-bw-thread, 130.997, GB, data-total 4x4-bw-thread, 2.513, nsecs, runtime/byte/thread 4x4-bw-thread, 0.398, GB/sec, thread-speed 4x4-bw-thread, 6.366, GB/sec, total-speed # Running 4x6-bw-thread, "perf bench numa mem -p 4 -t 6 -P 512 -s 20 -zZ0q --thp 1" 4x6-bw-thread, 21.007, secs, runtime-max/thread 4x6-bw-thread, 20.075, secs, runtime-min/thread 4x6-bw-thread, 20.573, secs, runtime-avg/thread 4x6-bw-thread, 2.219, %, spread-runtime/thread 4x6-bw-thread, 5.503, GB, data/thread 4x6-bw-thread, 132.070, GB, data-total 4x6-bw-thread, 3.817, nsecs, runtime/byte/thread 4x6-bw-thread, 0.262, GB/sec, thread-speed 4x6-bw-thread, 6.287, GB/sec, total-speed # Running 4x8-bw-thread, "perf bench numa mem -p 4 -t 8 -P 512 -s 20 -zZ0q --thp 1" 4x8-bw-thread, 21.986, secs, runtime-max/thread 4x8-bw-thread, 20.359, secs, runtime-min/thread 4x8-bw-thread, 21.300, secs, runtime-avg/thread 4x8-bw-thread, 3.701, %, spread-runtime/thread 4x8-bw-thread, 4.027, GB, data/thread 4x8-bw-thread, 128.849, GB, data-total 4x8-bw-thread, 5.460, nsecs, runtime/byte/thread 4x8-bw-thread, 0.183, GB/sec, thread-speed 4x8-bw-thread, 5.860, GB/sec, total-speed # Running 4x8-bw-thread-NOTHP, "perf bench numa mem -p 4 -t 8 -P 512 -s 20 -zZ0q --thp 1 --thp -1" 4x8-bw-thread-NOTHP, 21.155, secs, runtime-max/thread 4x8-bw-thread-NOTHP, 20.115, secs, runtime-min/thread 4x8-bw-thread-NOTHP, 20.705, secs, runtime-avg/thread 4x8-bw-thread-NOTHP, 2.459, %, spread-runtime/thread 4x8-bw-thread-NOTHP, 4.077, GB, data/thread 4x8-bw-thread-NOTHP, 130.460, GB, data-total 4x8-bw-thread-NOTHP, 5.189, nsecs, runtime/byte/thread 4x8-bw-thread-NOTHP, 0.193, GB/sec, thread-speed 4x8-bw-thread-NOTHP, 6.167, GB/sec, total-speed # Running 3x3-bw-thread, "perf bench numa mem -p 3 -t 3 -P 512 -s 20 -zZ0q --thp 1" 3x3-bw-thread, 20.211, secs, runtime-max/thread 3x3-bw-thread, 20.044, secs, runtime-min/thread 3x3-bw-thread, 20.127, secs, runtime-avg/thread 3x3-bw-thread, 0.413, %, spread-runtime/thread 3x3-bw-thread, 18.492, GB, data/thread 3x3-bw-thread, 166.430, GB, data-total 3x3-bw-thread, 1.093, nsecs, runtime/byte/thread 3x3-bw-thread, 0.915, GB/sec, thread-speed 3x3-bw-thread, 8.235, GB/sec, total-speed # Running 5x5-bw-thread, "perf bench numa mem -p 5 -t 5 -P 512 -s 20 -zZ0q --thp 1" 5x5-bw-thread, 21.244, secs, runtime-max/thread 5x5-bw-thread, 20.115, secs, runtime-min/thread 5x5-bw-thread, 20.873, secs, runtime-avg/thread 5x5-bw-thread, 2.657, %, spread-runtime/thread 5x5-bw-thread, 4.896, GB, data/thread 5x5-bw-thread, 122.407, GB, data-total 5x5-bw-thread, 4.339, nsecs, runtime/byte/thread 5x5-bw-thread, 0.230, GB/sec, thread-speed 5x5-bw-thread, 5.762, GB/sec, total-speed # Running 2x16-bw-thread, "perf bench numa mem -p 2 -t 16 -P 512 -s 20 -zZ0q --thp 1" 2x16-bw-thread, 21.854, secs, runtime-max/thread 2x16-bw-thread, 20.047, secs, runtime-min/thread 2x16-bw-thread, 21.157, secs, runtime-avg/thread 2x16-bw-thread, 4.135, %, spread-runtime/thread 2x16-bw-thread, 4.043, GB, data/thread 2x16-bw-thread, 129.386, GB, data-total 2x16-bw-thread, 5.405, nsecs, runtime/byte/thread 2x16-bw-thread, 0.185, GB/sec, thread-speed 2x16-bw-thread, 5.920, GB/sec, total-speed # Running 1x32-bw-thread, "perf bench numa mem -p 1 -t 32 -P 2048 -s 20 -zZ0q --thp 1" 1x32-bw-thread, 23.952, secs, runtime-max/thread 1x32-bw-thread, 20.470, secs, runtime-min/thread 1x32-bw-thread, 22.975, secs, runtime-avg/thread 1x32-bw-thread, 7.268, %, spread-runtime/thread 1x32-bw-thread, 4.362, GB, data/thread 1x32-bw-thread, 139.586, GB, data-total 1x32-bw-thread, 5.491, nsecs, runtime/byte/thread 1x32-bw-thread, 0.182, GB/sec, thread-speed 1x32-bw-thread, 5.828, GB/sec, total-speed # Running numa02-bw, "perf bench numa mem -p 1 -t 32 -T 32 -s 20 -zZ0q --thp 1" numa02-bw, 19.990, secs, runtime-max/thread numa02-bw, 19.975, secs, runtime-min/thread numa02-bw, 19.995, secs, runtime-avg/thread numa02-bw, 0.037, %, spread-runtime/thread numa02-bw, 18.150, GB, data/thread numa02-bw, 580.794, GB, data-total numa02-bw, 1.101, nsecs, runtime/byte/thread numa02-bw, 0.908, GB/sec, thread-speed numa02-bw, 29.054, GB/sec, total-speed # Running numa02-bw-NOTHP, "perf bench numa mem -p 1 -t 32 -T 32 -s 20 -zZ0q --thp 1 --thp -1" numa02-bw-NOTHP, 20.072, secs, runtime-max/thread numa02-bw-NOTHP, 19.965, secs, runtime-min/thread numa02-bw-NOTHP, 19.998, secs, runtime-avg/thread numa02-bw-NOTHP, 0.266, %, spread-runtime/thread numa02-bw-NOTHP, 16.975, GB, data/thread numa02-bw-NOTHP, 543.213, GB, data-total numa02-bw-NOTHP, 1.182, nsecs, runtime/byte/thread numa02-bw-NOTHP, 0.846, GB/sec, thread-speed numa02-bw-NOTHP, 27.064, GB/sec, total-speed # Running numa01-bw-thread, "perf bench numa mem -p 2 -t 16 -T 192 -s 20 -zZ0q --thp 1" numa01-bw-thread, 20.125, secs, runtime-max/thread numa01-bw-thread, 19.980, secs, runtime-min/thread numa01-bw-thread, 20.094, secs, runtime-avg/thread numa01-bw-thread, 0.361, %, spread-runtime/thread numa01-bw-thread, 12.791, GB, data/thread numa01-bw-thread, 409.297, GB, data-total numa01-bw-thread, 1.573, nsecs, runtime/byte/thread numa01-bw-thread, 0.636, GB/sec, thread-speed numa01-bw-thread, 20.338, GB/sec, total-speed # Running numa01-bw-thread-NOTHP, "perf bench numa mem -p 2 -t 16 -T 192 -s 20 -zZ0q --thp 1 --thp -1" numa01-bw-thread-NOTHP, 20.298, secs, runtime-max/thread numa01-bw-thread-NOTHP, 19.965, secs, runtime-min/thread numa01-bw-thread-NOTHP, 20.055, secs, runtime-avg/thread numa01-bw-thread-NOTHP, 0.820, %, spread-runtime/thread numa01-bw-thread-NOTHP, 11.752, GB, data/thread numa01-bw-thread-NOTHP, 376.078, GB, data-total numa01-bw-thread-NOTHP, 1.727, nsecs, runtime/byte/thread numa01-bw-thread-NOTHP, 0.579, GB/sec, thread-speed numa01-bw-thread-NOTHP, 18.528, GB/sec, total-speed # # Running test on: Linux vega 3.6.0+ #4 SMP Fri Dec 7 19:14:49 CET 2012 x86_64 x86_64 x86_64 GNU/Linux # # Running numa/mem benchmark... # Running main, "perf bench numa mem -a" # Running RAM-bw-local, "perf bench numa mem -p 1 -t 1 -P 1024 -C 0 -M 0 -s 20 -zZq --thp 1 --no-data_rand_walk" RAM-bw-local, 20.080, secs, runtime-max/thread RAM-bw-local, 20.073, secs, runtime-min/thread RAM-bw-local, 20.073, secs, runtime-avg/thread RAM-bw-local, 0.018, %, spread-runtime/thread RAM-bw-local, 170.725, GB, data/thread RAM-bw-local, 170.725, GB, data-total RAM-bw-local, 0.118, nsecs, runtime/byte/thread RAM-bw-local, 8.502, GB/sec, thread-speed RAM-bw-local, 8.502, GB/sec, total-speed # Running RAM-bw-local-NOTHP, "perf bench numa mem -p 1 -t 1 -P 1024 -C 0 -M 0 -s 20 -zZq --thp 1 --no-data_rand_walk --thp -1" RAM-bw-local-NOTHP, 20.112, secs, runtime-max/thread RAM-bw-local-NOTHP, 20.028, secs, runtime-min/thread RAM-bw-local-NOTHP, 20.028, secs, runtime-avg/thread RAM-bw-local-NOTHP, 0.209, %, spread-runtime/thread RAM-bw-local-NOTHP, 169.651, GB, data/thread RAM-bw-local-NOTHP, 169.651, GB, data-total RAM-bw-local-NOTHP, 0.119, nsecs, runtime/byte/thread RAM-bw-local-NOTHP, 8.435, GB/sec, thread-speed RAM-bw-local-NOTHP, 8.435, GB/sec, total-speed # Running RAM-bw-remote, "perf bench numa mem -p 1 -t 1 -P 1024 -C 0 -M 1 -s 20 -zZq --thp 1 --no-data_rand_walk" RAM-bw-remote, 20.101, secs, runtime-max/thread RAM-bw-remote, 20.093, secs, runtime-min/thread RAM-bw-remote, 20.093, secs, runtime-avg/thread RAM-bw-remote, 0.021, %, spread-runtime/thread RAM-bw-remote, 134.218, GB, data/thread RAM-bw-remote, 134.218, GB, data-total RAM-bw-remote, 0.150, nsecs, runtime/byte/thread RAM-bw-remote, 6.677, GB/sec, thread-speed RAM-bw-remote, 6.677, GB/sec, total-speed # Running RAM-bw-local-2x, "perf bench numa mem -p 2 -t 1 -P 1024 -C 0,2 -M 0x2 -s 20 -zZq --thp 1 --no-data_rand_walk" RAM-bw-local-2x, 20.109, secs, runtime-max/thread RAM-bw-local-2x, 20.011, secs, runtime-min/thread RAM-bw-local-2x, 20.056, secs, runtime-avg/thread RAM-bw-local-2x, 0.243, %, spread-runtime/thread RAM-bw-local-2x, 135.291, GB, data/thread RAM-bw-local-2x, 270.583, GB, data-total RAM-bw-local-2x, 0.149, nsecs, runtime/byte/thread RAM-bw-local-2x, 6.728, GB/sec, thread-speed RAM-bw-local-2x, 13.456, GB/sec, total-speed # Running RAM-bw-remote-2x, "perf bench numa mem -p 2 -t 1 -P 1024 -C 0,2 -M 1x2 -s 20 -zZq --thp 1 --no-data_rand_walk" RAM-bw-remote-2x, 20.292, secs, runtime-max/thread RAM-bw-remote-2x, 20.279, secs, runtime-min/thread RAM-bw-remote-2x, 20.281, secs, runtime-avg/thread RAM-bw-remote-2x, 0.034, %, spread-runtime/thread RAM-bw-remote-2x, 74.625, GB, data/thread RAM-bw-remote-2x, 149.250, GB, data-total RAM-bw-remote-2x, 0.272, nsecs, runtime/byte/thread RAM-bw-remote-2x, 3.677, GB/sec, thread-speed RAM-bw-remote-2x, 7.355, GB/sec, total-speed # Running RAM-bw-cross, "perf bench numa mem -p 2 -t 1 -P 1024 -C 0,8 -M 1,0 -s 20 -zZq --thp 1 --no-data_rand_walk" RAM-bw-cross, 20.177, secs, runtime-max/thread RAM-bw-cross, 20.158, secs, runtime-min/thread RAM-bw-cross, 20.163, secs, runtime-avg/thread RAM-bw-cross, 0.048, %, spread-runtime/thread RAM-bw-cross, 122.943, GB, data/thread RAM-bw-cross, 245.887, GB, data-total RAM-bw-cross, 0.164, nsecs, runtime/byte/thread RAM-bw-cross, 6.093, GB/sec, thread-speed RAM-bw-cross, 12.187, GB/sec, total-speed # Running 1x3-convergence, "perf bench numa mem -p 1 -t 3 -P 512 -s 100 -zZ0qcm --thp 1" 1x3-convergence, 0.224, secs, NUMA-convergence-latency 1x3-convergence, 0.224, secs, runtime-max/thread 1x3-convergence, 0.205, secs, runtime-min/thread 1x3-convergence, 0.214, secs, runtime-avg/thread 1x3-convergence, 4.078, %, spread-runtime/thread 1x3-convergence, 0.537, GB, data/thread 1x3-convergence, 1.611, GB, data-total 1x3-convergence, 0.417, nsecs, runtime/byte/thread 1x3-convergence, 2.401, GB/sec, thread-speed 1x3-convergence, 7.202, GB/sec, total-speed # Running 1x4-convergence, "perf bench numa mem -p 1 -t 4 -P 512 -s 100 -zZ0qcm --thp 1" 1x4-convergence, 100.173, secs, NUMA-convergence-latency 1x4-convergence, 100.173, secs, runtime-max/thread 1x4-convergence, 100.026, secs, runtime-min/thread 1x4-convergence, 100.067, secs, runtime-avg/thread 1x4-convergence, 0.073, %, spread-runtime/thread 1x4-convergence, 162.672, GB, data/thread 1x4-convergence, 650.688, GB, data-total 1x4-convergence, 0.616, nsecs, runtime/byte/thread 1x4-convergence, 1.624, GB/sec, thread-speed 1x4-convergence, 6.496, GB/sec, total-speed # Running 1x6-convergence, "perf bench numa mem -p 1 -t 6 -P 1020 -s 100 -zZ0qcm --thp 1" 1x6-convergence, 100.821, secs, NUMA-convergence-latency 1x6-convergence, 100.821, secs, runtime-max/thread 1x6-convergence, 100.428, secs, runtime-min/thread 1x6-convergence, 100.706, secs, runtime-avg/thread 1x6-convergence, 0.195, %, spread-runtime/thread 1x6-convergence, 99.111, GB, data/thread 1x6-convergence, 594.668, GB, data-total 1x6-convergence, 1.017, nsecs, runtime/byte/thread 1x6-convergence, 0.983, GB/sec, thread-speed 1x6-convergence, 5.898, GB/sec, total-speed # Running 2x3-convergence, "perf bench numa mem -p 3 -t 3 -P 1020 -s 100 -zZ0qcm --thp 1" 2x3-convergence, 100.539, secs, NUMA-convergence-latency 2x3-convergence, 100.539, secs, runtime-max/thread 2x3-convergence, 100.015, secs, runtime-min/thread 2x3-convergence, 100.273, secs, runtime-avg/thread 2x3-convergence, 0.260, %, spread-runtime/thread 2x3-convergence, 147.954, GB, data/thread 2x3-convergence, 1331.587, GB, data-total 2x3-convergence, 0.680, nsecs, runtime/byte/thread 2x3-convergence, 1.472, GB/sec, thread-speed 2x3-convergence, 13.245, GB/sec, total-speed # Running 3x3-convergence, "perf bench numa mem -p 3 -t 3 -P 1020 -s 100 -zZ0qcm --thp 1" 3x3-convergence, 100.463, secs, NUMA-convergence-latency 3x3-convergence, 100.463, secs, runtime-max/thread 3x3-convergence, 100.066, secs, runtime-min/thread 3x3-convergence, 100.216, secs, runtime-avg/thread 3x3-convergence, 0.198, %, spread-runtime/thread 3x3-convergence, 132.624, GB, data/thread 3x3-convergence, 1193.615, GB, data-total 3x3-convergence, 0.758, nsecs, runtime/byte/thread 3x3-convergence, 1.320, GB/sec, thread-speed 3x3-convergence, 11.881, GB/sec, total-speed # Running 4x4-convergence, "perf bench numa mem -p 4 -t 4 -P 512 -s 100 -zZ0qcm --thp 1" 4x4-convergence, 4.119, secs, NUMA-convergence-latency 4x4-convergence, 4.119, secs, runtime-max/thread 4x4-convergence, 3.751, secs, runtime-min/thread 4x4-convergence, 3.948, secs, runtime-avg/thread 4x4-convergence, 4.462, %, spread-runtime/thread 4x4-convergence, 1.980, GB, data/thread 4x4-convergence, 31.675, GB, data-total 4x4-convergence, 2.081, nsecs, runtime/byte/thread 4x4-convergence, 0.481, GB/sec, thread-speed 4x4-convergence, 7.690, GB/sec, total-speed # Running 4x4-convergence-NOTHP, "perf bench numa mem -p 4 -t 4 -P 512 -s 100 -zZ0qcm --thp 1 --thp -1" 4x4-convergence-NOTHP, 12.166, secs, NUMA-convergence-latency 4x4-convergence-NOTHP, 12.166, secs, runtime-max/thread 4x4-convergence-NOTHP, 11.801, secs, runtime-min/thread 4x4-convergence-NOTHP, 11.917, secs, runtime-avg/thread 4x4-convergence-NOTHP, 1.502, %, spread-runtime/thread 4x4-convergence-NOTHP, 5.234, GB, data/thread 4x4-convergence-NOTHP, 83.752, GB, data-total 4x4-convergence-NOTHP, 2.324, nsecs, runtime/byte/thread 4x4-convergence-NOTHP, 0.430, GB/sec, thread-speed 4x4-convergence-NOTHP, 6.884, GB/sec, total-speed # Running 4x6-convergence, "perf bench numa mem -p 4 -t 6 -P 1020 -s 100 -zZ0qcm --thp 1" 4x6-convergence, 16.592, secs, NUMA-convergence-latency 4x6-convergence, 16.592, secs, runtime-max/thread 4x6-convergence, 15.407, secs, runtime-min/thread 4x6-convergence, 16.109, secs, runtime-avg/thread 4x6-convergence, 3.572, %, spread-runtime/thread 4x6-convergence, 6.729, GB, data/thread 4x6-convergence, 161.502, GB, data-total 4x6-convergence, 2.466, nsecs, runtime/byte/thread 4x6-convergence, 0.406, GB/sec, thread-speed 4x6-convergence, 9.734, GB/sec, total-speed # Running 4x8-convergence, "perf bench numa mem -p 4 -t 8 -P 512 -s 100 -zZ0qcm --thp 1" 4x8-convergence, 3.385, secs, NUMA-convergence-latency 4x8-convergence, 3.385, secs, runtime-max/thread 4x8-convergence, 1.465, secs, runtime-min/thread 4x8-convergence, 2.846, secs, runtime-avg/thread 4x8-convergence, 28.361, %, spread-runtime/thread 4x8-convergence, 0.638, GB, data/thread 4x8-convergence, 20.401, GB, data-total 4x8-convergence, 5.309, nsecs, runtime/byte/thread 4x8-convergence, 0.188, GB/sec, thread-speed 4x8-convergence, 6.028, GB/sec, total-speed # Running 8x4-convergence, "perf bench numa mem -p 8 -t 4 -P 512 -s 100 -zZ0qcm --thp 1" 8x4-convergence, 18.295, secs, NUMA-convergence-latency 8x4-convergence, 18.295, secs, runtime-max/thread 8x4-convergence, 16.808, secs, runtime-min/thread 8x4-convergence, 17.809, secs, runtime-avg/thread 8x4-convergence, 4.064, %, spread-runtime/thread 8x4-convergence, 3.406, GB, data/thread 8x4-convergence, 108.985, GB, data-total 8x4-convergence, 5.372, nsecs, runtime/byte/thread 8x4-convergence, 0.186, GB/sec, thread-speed 8x4-convergence, 5.957, GB/sec, total-speed # Running 8x4-convergence-NOTHP, "perf bench numa mem -p 8 -t 4 -P 512 -s 100 -zZ0qcm --thp 1 --thp -1" 8x4-convergence-NOTHP, 15.675, secs, NUMA-convergence-latency 8x4-convergence-NOTHP, 15.675, secs, runtime-max/thread 8x4-convergence-NOTHP, 14.861, secs, runtime-min/thread 8x4-convergence-NOTHP, 15.321, secs, runtime-avg/thread 8x4-convergence-NOTHP, 2.596, %, spread-runtime/thread 8x4-convergence-NOTHP, 5.302, GB, data/thread 8x4-convergence-NOTHP, 169.651, GB, data-total 8x4-convergence-NOTHP, 2.957, nsecs, runtime/byte/thread 8x4-convergence-NOTHP, 0.338, GB/sec, thread-speed 8x4-convergence-NOTHP, 10.823, GB/sec, total-speed # Running 3x1-convergence, "perf bench numa mem -p 3 -t 1 -P 512 -s 100 -zZ0qcm --thp 1" 3x1-convergence, 0.811, secs, NUMA-convergence-latency 3x1-convergence, 0.811, secs, runtime-max/thread 3x1-convergence, 0.739, secs, runtime-min/thread 3x1-convergence, 0.782, secs, runtime-avg/thread 3x1-convergence, 4.431, %, spread-runtime/thread 3x1-convergence, 1.969, GB, data/thread 3x1-convergence, 5.906, GB, data-total 3x1-convergence, 0.412, nsecs, runtime/byte/thread 3x1-convergence, 2.428, GB/sec, thread-speed 3x1-convergence, 7.284, GB/sec, total-speed # Running 4x1-convergence, "perf bench numa mem -p 4 -t 1 -P 512 -s 100 -zZ0qcm --thp 1" 4x1-convergence, 0.806, secs, NUMA-convergence-latency 4x1-convergence, 0.806, secs, runtime-max/thread 4x1-convergence, 0.728, secs, runtime-min/thread 4x1-convergence, 0.780, secs, runtime-avg/thread 4x1-convergence, 4.838, %, spread-runtime/thread 4x1-convergence, 1.476, GB, data/thread 4x1-convergence, 5.906, GB, data-total 4x1-convergence, 0.546, nsecs, runtime/byte/thread 4x1-convergence, 1.832, GB/sec, thread-speed 4x1-convergence, 7.329, GB/sec, total-speed # Running 8x1-convergence, "perf bench numa mem -p 8 -t 1 -P 512 -s 100 -zZ0qcm --thp 1" 8x1-convergence, 2.879, secs, NUMA-convergence-latency 8x1-convergence, 2.879, secs, runtime-max/thread 8x1-convergence, 2.737, secs, runtime-min/thread 8x1-convergence, 2.805, secs, runtime-avg/thread 8x1-convergence, 2.475, %, spread-runtime/thread 8x1-convergence, 3.288, GB, data/thread 8x1-convergence, 26.307, GB, data-total 8x1-convergence, 0.876, nsecs, runtime/byte/thread 8x1-convergence, 1.142, GB/sec, thread-speed 8x1-convergence, 9.137, GB/sec, total-speed # Running 16x1-convergence, "perf bench numa mem -p 16 -t 1 -P 256 -s 100 -zZ0qcm --thp 1" 16x1-convergence, 2.484, secs, NUMA-convergence-latency 16x1-convergence, 2.484, secs, runtime-max/thread 16x1-convergence, 2.169, secs, runtime-min/thread 16x1-convergence, 2.376, secs, runtime-avg/thread 16x1-convergence, 6.353, %, spread-runtime/thread 16x1-convergence, 0.906, GB, data/thread 16x1-convergence, 14.496, GB, data-total 16x1-convergence, 2.742, nsecs, runtime/byte/thread 16x1-convergence, 0.365, GB/sec, thread-speed 16x1-convergence, 5.835, GB/sec, total-speed # Running 32x1-convergence, "perf bench numa mem -p 32 -t 1 -P 128 -s 100 -zZ0qcm --thp 1" 32x1-convergence, 3.039, secs, NUMA-convergence-latency 32x1-convergence, 3.039, secs, runtime-max/thread 32x1-convergence, 2.755, secs, runtime-min/thread 32x1-convergence, 2.983, secs, runtime-avg/thread 32x1-convergence, 4.672, %, spread-runtime/thread 32x1-convergence, 0.579, GB, data/thread 32x1-convergence, 18.522, GB, data-total 32x1-convergence, 5.251, nsecs, runtime/byte/thread 32x1-convergence, 0.190, GB/sec, thread-speed 32x1-convergence, 6.094, GB/sec, total-speed # Running 2x1-bw-process, "perf bench numa mem -p 2 -t 1 -P 1024 -s 20 -zZ0q --thp 1" 2x1-bw-process, 20.217, secs, runtime-max/thread 2x1-bw-process, 20.126, secs, runtime-min/thread 2x1-bw-process, 20.168, secs, runtime-avg/thread 2x1-bw-process, 0.224, %, spread-runtime/thread 2x1-bw-process, 81.604, GB, data/thread 2x1-bw-process, 163.209, GB, data-total 2x1-bw-process, 0.248, nsecs, runtime/byte/thread 2x1-bw-process, 4.036, GB/sec, thread-speed 2x1-bw-process, 8.073, GB/sec, total-speed # Running 3x1-bw-process, "perf bench numa mem -p 3 -t 1 -P 1024 -s 20 -zZ0q --thp 1" 3x1-bw-process, 20.138, secs, runtime-max/thread 3x1-bw-process, 20.075, secs, runtime-min/thread 3x1-bw-process, 20.105, secs, runtime-avg/thread 3x1-bw-process, 0.156, %, spread-runtime/thread 3x1-bw-process, 84.468, GB, data/thread 3x1-bw-process, 253.403, GB, data-total 3x1-bw-process, 0.238, nsecs, runtime/byte/thread 3x1-bw-process, 4.194, GB/sec, thread-speed 3x1-bw-process, 12.583, GB/sec, total-speed # Running 4x1-bw-process, "perf bench numa mem -p 4 -t 1 -P 1024 -s 20 -zZ0q --thp 1" 4x1-bw-process, 20.143, secs, runtime-max/thread 4x1-bw-process, 20.052, secs, runtime-min/thread 4x1-bw-process, 20.079, secs, runtime-avg/thread 4x1-bw-process, 0.227, %, spread-runtime/thread 4x1-bw-process, 62.009, GB, data/thread 4x1-bw-process, 248.034, GB, data-total 4x1-bw-process, 0.325, nsecs, runtime/byte/thread 4x1-bw-process, 3.078, GB/sec, thread-speed 4x1-bw-process, 12.313, GB/sec, total-speed # Running 8x1-bw-process, "perf bench numa mem -p 8 -t 1 -P 512 -s 20 -zZ0q --thp 1" 8x1-bw-process, 20.109, secs, runtime-max/thread 8x1-bw-process, 20.013, secs, runtime-min/thread 8x1-bw-process, 20.072, secs, runtime-avg/thread 8x1-bw-process, 0.238, %, spread-runtime/thread 8x1-bw-process, 50.869, GB, data/thread 8x1-bw-process, 406.948, GB, data-total 8x1-bw-process, 0.395, nsecs, runtime/byte/thread 8x1-bw-process, 2.530, GB/sec, thread-speed 8x1-bw-process, 20.237, GB/sec, total-speed # Running 8x1-bw-process-NOTHP, "perf bench numa mem -p 8 -t 1 -P 512 -s 20 -zZ0q --thp 1 --thp -1" 8x1-bw-process-NOTHP, 20.203, secs, runtime-max/thread 8x1-bw-process-NOTHP, 20.033, secs, runtime-min/thread 8x1-bw-process-NOTHP, 20.071, secs, runtime-avg/thread 8x1-bw-process-NOTHP, 0.422, %, spread-runtime/thread 8x1-bw-process-NOTHP, 45.030, GB, data/thread 8x1-bw-process-NOTHP, 360.240, GB, data-total 8x1-bw-process-NOTHP, 0.449, nsecs, runtime/byte/thread 8x1-bw-process-NOTHP, 2.229, GB/sec, thread-speed 8x1-bw-process-NOTHP, 17.831, GB/sec, total-speed # Running 16x1-bw-process, "perf bench numa mem -p 16 -t 1 -P 256 -s 20 -zZ0q --thp 1" 16x1-bw-process, 20.271, secs, runtime-max/thread 16x1-bw-process, 20.021, secs, runtime-min/thread 16x1-bw-process, 20.175, secs, runtime-avg/thread 16x1-bw-process, 0.615, %, spread-runtime/thread 16x1-bw-process, 7.550, GB, data/thread 16x1-bw-process, 120.796, GB, data-total 16x1-bw-process, 2.685, nsecs, runtime/byte/thread 16x1-bw-process, 0.372, GB/sec, thread-speed 16x1-bw-process, 5.959, GB/sec, total-speed # Running 4x1-bw-thread, "perf bench numa mem -p 1 -t 4 -T 256 -s 20 -zZ0q --thp 1" 4x1-bw-thread, 20.052, secs, runtime-max/thread 4x1-bw-thread, 20.013, secs, runtime-min/thread 4x1-bw-thread, 20.030, secs, runtime-avg/thread 4x1-bw-thread, 0.097, %, spread-runtime/thread 4x1-bw-thread, 87.443, GB, data/thread 4x1-bw-thread, 349.771, GB, data-total 4x1-bw-thread, 0.229, nsecs, runtime/byte/thread 4x1-bw-thread, 4.361, GB/sec, thread-speed 4x1-bw-thread, 17.443, GB/sec, total-speed # Running 8x1-bw-thread, "perf bench numa mem -p 1 -t 8 -T 256 -s 20 -zZ0q --thp 1" 8x1-bw-thread, 20.067, secs, runtime-max/thread 8x1-bw-thread, 20.011, secs, runtime-min/thread 8x1-bw-thread, 20.038, secs, runtime-avg/thread 8x1-bw-thread, 0.140, %, spread-runtime/thread 8x1-bw-thread, 56.271, GB, data/thread 8x1-bw-thread, 450.166, GB, data-total 8x1-bw-thread, 0.357, nsecs, runtime/byte/thread 8x1-bw-thread, 2.804, GB/sec, thread-speed 8x1-bw-thread, 22.433, GB/sec, total-speed # Running 16x1-bw-thread, "perf bench numa mem -p 1 -t 16 -T 128 -s 20 -zZ0q --thp 1" 16x1-bw-thread, 20.029, secs, runtime-max/thread 16x1-bw-thread, 20.002, secs, runtime-min/thread 16x1-bw-thread, 20.020, secs, runtime-avg/thread 16x1-bw-thread, 0.067, %, spread-runtime/thread 16x1-bw-thread, 25.292, GB, data/thread 16x1-bw-thread, 404.666, GB, data-total 16x1-bw-thread, 0.792, nsecs, runtime/byte/thread 16x1-bw-thread, 1.263, GB/sec, thread-speed 16x1-bw-thread, 20.204, GB/sec, total-speed # Running 32x1-bw-thread, "perf bench numa mem -p 1 -t 32 -T 64 -s 20 -zZ0q --thp 1" 32x1-bw-thread, 19.989, secs, runtime-max/thread 32x1-bw-thread, 19.962, secs, runtime-min/thread 32x1-bw-thread, 20.004, secs, runtime-avg/thread 32x1-bw-thread, 0.068, %, spread-runtime/thread 32x1-bw-thread, 11.388, GB, data/thread 32x1-bw-thread, 364.401, GB, data-total 32x1-bw-thread, 1.755, nsecs, runtime/byte/thread 32x1-bw-thread, 0.570, GB/sec, thread-speed 32x1-bw-thread, 18.230, GB/sec, total-speed # Running 2x3-bw-thread, "perf bench numa mem -p 2 -t 3 -P 512 -s 20 -zZ0q --thp 1" 2x3-bw-thread, 20.190, secs, runtime-max/thread 2x3-bw-thread, 20.082, secs, runtime-min/thread 2x3-bw-thread, 20.110, secs, runtime-avg/thread 2x3-bw-thread, 0.268, %, spread-runtime/thread 2x3-bw-thread, 49.303, GB, data/thread 2x3-bw-thread, 295.816, GB, data-total 2x3-bw-thread, 0.410, nsecs, runtime/byte/thread 2x3-bw-thread, 2.442, GB/sec, thread-speed 2x3-bw-thread, 14.652, GB/sec, total-speed # Running 4x4-bw-thread, "perf bench numa mem -p 4 -t 4 -P 512 -s 20 -zZ0q --thp 1" 4x4-bw-thread, 20.307, secs, runtime-max/thread 4x4-bw-thread, 20.002, secs, runtime-min/thread 4x4-bw-thread, 20.202, secs, runtime-avg/thread 4x4-bw-thread, 0.750, %, spread-runtime/thread 4x4-bw-thread, 12.482, GB, data/thread 4x4-bw-thread, 199.716, GB, data-total 4x4-bw-thread, 1.627, nsecs, runtime/byte/thread 4x4-bw-thread, 0.615, GB/sec, thread-speed 4x4-bw-thread, 9.835, GB/sec, total-speed # Running 4x6-bw-thread, "perf bench numa mem -p 4 -t 6 -P 512 -s 20 -zZ0q --thp 1" 4x6-bw-thread, 20.431, secs, runtime-max/thread 4x6-bw-thread, 20.007, secs, runtime-min/thread 4x6-bw-thread, 20.283, secs, runtime-avg/thread 4x6-bw-thread, 1.036, %, spread-runtime/thread 4x6-bw-thread, 13.086, GB, data/thread 4x6-bw-thread, 314.069, GB, data-total 4x6-bw-thread, 1.561, nsecs, runtime/byte/thread 4x6-bw-thread, 0.641, GB/sec, thread-speed 4x6-bw-thread, 15.372, GB/sec, total-speed # Running 4x8-bw-thread, "perf bench numa mem -p 4 -t 8 -P 512 -s 20 -zZ0q --thp 1" 4x8-bw-thread, 20.543, secs, runtime-max/thread 4x8-bw-thread, 20.015, secs, runtime-min/thread 4x8-bw-thread, 20.324, secs, runtime-avg/thread 4x8-bw-thread, 1.287, %, spread-runtime/thread 4x8-bw-thread, 7.617, GB, data/thread 4x8-bw-thread, 243.739, GB, data-total 4x8-bw-thread, 2.697, nsecs, runtime/byte/thread 4x8-bw-thread, 0.371, GB/sec, thread-speed 4x8-bw-thread, 11.865, GB/sec, total-speed # Running 4x8-bw-thread-NOTHP, "perf bench numa mem -p 4 -t 8 -P 512 -s 20 -zZ0q --thp 1 --thp -1" 4x8-bw-thread-NOTHP, 20.661, secs, runtime-max/thread 4x8-bw-thread-NOTHP, 20.023, secs, runtime-min/thread 4x8-bw-thread-NOTHP, 20.292, secs, runtime-avg/thread 4x8-bw-thread-NOTHP, 1.546, %, spread-runtime/thread 4x8-bw-thread-NOTHP, 5.956, GB, data/thread 4x8-bw-thread-NOTHP, 190.589, GB, data-total 4x8-bw-thread-NOTHP, 3.469, nsecs, runtime/byte/thread 4x8-bw-thread-NOTHP, 0.288, GB/sec, thread-speed 4x8-bw-thread-NOTHP, 9.224, GB/sec, total-speed # Running 3x3-bw-thread, "perf bench numa mem -p 3 -t 3 -P 512 -s 20 -zZ0q --thp 1" 3x3-bw-thread, 20.310, secs, runtime-max/thread 3x3-bw-thread, 20.116, secs, runtime-min/thread 3x3-bw-thread, 20.202, secs, runtime-avg/thread 3x3-bw-thread, 0.480, %, spread-runtime/thread 3x3-bw-thread, 14.973, GB, data/thread 3x3-bw-thread, 134.755, GB, data-total 3x3-bw-thread, 1.356, nsecs, runtime/byte/thread 3x3-bw-thread, 0.737, GB/sec, thread-speed 3x3-bw-thread, 6.635, GB/sec, total-speed # Running 5x5-bw-thread, "perf bench numa mem -p 5 -t 5 -P 512 -s 20 -zZ0q --thp 1" 5x5-bw-thread, 20.578, secs, runtime-max/thread 5x5-bw-thread, 20.039, secs, runtime-min/thread 5x5-bw-thread, 20.379, secs, runtime-avg/thread 5x5-bw-thread, 1.309, %, spread-runtime/thread 5x5-bw-thread, 7.881, GB, data/thread 5x5-bw-thread, 197.032, GB, data-total 5x5-bw-thread, 2.611, nsecs, runtime/byte/thread 5x5-bw-thread, 0.383, GB/sec, thread-speed 5x5-bw-thread, 9.575, GB/sec, total-speed # Running 2x16-bw-thread, "perf bench numa mem -p 2 -t 16 -P 512 -s 20 -zZ0q --thp 1" 2x16-bw-thread, 21.581, secs, runtime-max/thread 2x16-bw-thread, 20.043, secs, runtime-min/thread 2x16-bw-thread, 20.958, secs, runtime-avg/thread 2x16-bw-thread, 3.564, %, spread-runtime/thread 2x16-bw-thread, 4.010, GB, data/thread 2x16-bw-thread, 128.312, GB, data-total 2x16-bw-thread, 5.382, nsecs, runtime/byte/thread 2x16-bw-thread, 0.186, GB/sec, thread-speed 2x16-bw-thread, 5.945, GB/sec, total-speed # Running 1x32-bw-thread, "perf bench numa mem -p 1 -t 32 -P 2048 -s 20 -zZ0q --thp 1" 1x32-bw-thread, 23.503, secs, runtime-max/thread 1x32-bw-thread, 21.850, secs, runtime-min/thread 1x32-bw-thread, 22.953, secs, runtime-avg/thread 1x32-bw-thread, 3.518, %, spread-runtime/thread 1x32-bw-thread, 4.295, GB, data/thread 1x32-bw-thread, 137.439, GB, data-total 1x32-bw-thread, 5.472, nsecs, runtime/byte/thread 1x32-bw-thread, 0.183, GB/sec, thread-speed 1x32-bw-thread, 5.848, GB/sec, total-speed # Running numa02-bw, "perf bench numa mem -p 1 -t 32 -T 32 -s 20 -zZ0q --thp 1" numa02-bw, 19.948, secs, runtime-max/thread numa02-bw, 19.921, secs, runtime-min/thread numa02-bw, 19.983, secs, runtime-avg/thread numa02-bw, 0.068, %, spread-runtime/thread numa02-bw, 15.425, GB, data/thread numa02-bw, 493.586, GB, data-total numa02-bw, 1.293, nsecs, runtime/byte/thread numa02-bw, 0.773, GB/sec, thread-speed numa02-bw, 24.744, GB/sec, total-speed # Running numa02-bw-NOTHP, "perf bench numa mem -p 1 -t 32 -T 32 -s 20 -zZ0q --thp 1 --thp -1" numa02-bw-NOTHP, 20.055, secs, runtime-max/thread numa02-bw-NOTHP, 19.948, secs, runtime-min/thread numa02-bw-NOTHP, 19.991, secs, runtime-avg/thread numa02-bw-NOTHP, 0.267, %, spread-runtime/thread numa02-bw-NOTHP, 12.795, GB, data/thread numa02-bw-NOTHP, 409.431, GB, data-total numa02-bw-NOTHP, 1.567, nsecs, runtime/byte/thread numa02-bw-NOTHP, 0.638, GB/sec, thread-speed numa02-bw-NOTHP, 20.415, GB/sec, total-speed # Running numa01-bw-thread, "perf bench numa mem -p 2 -t 16 -T 192 -s 20 -zZ0q --thp 1" numa01-bw-thread, 20.107, secs, runtime-max/thread numa01-bw-thread, 19.978, secs, runtime-min/thread numa01-bw-thread, 20.067, secs, runtime-avg/thread numa01-bw-thread, 0.320, %, spread-runtime/thread numa01-bw-thread, 9.532, GB, data/thread numa01-bw-thread, 305.010, GB, data-total numa01-bw-thread, 2.110, nsecs, runtime/byte/thread numa01-bw-thread, 0.474, GB/sec, thread-speed numa01-bw-thread, 15.169, GB/sec, total-speed # Running numa01-bw-thread-NOTHP, "perf bench numa mem -p 2 -t 16 -T 192 -s 20 -zZ0q --thp 1 --thp -1" numa01-bw-thread-NOTHP, 20.319, secs, runtime-max/thread numa01-bw-thread-NOTHP, 19.978, secs, runtime-min/thread numa01-bw-thread-NOTHP, 20.076, secs, runtime-avg/thread numa01-bw-thread-NOTHP, 0.839, %, spread-runtime/thread numa01-bw-thread-NOTHP, 7.688, GB, data/thread numa01-bw-thread-NOTHP, 246.021, GB, data-total numa01-bw-thread-NOTHP, 2.643, nsecs, runtime/byte/thread numa01-bw-thread-NOTHP, 0.378, GB/sec, thread-speed numa01-bw-thread-NOTHP, 12.108, GB/sec, total-speed # # Running test on: Linux vega 3.7.0-rc8+ #2 SMP Fri Dec 7 02:46:02 CET 2012 x86_64 x86_64 x86_64 GNU/Linux # # Running numa/mem benchmark... # Running main, "perf bench numa mem -a" # Running RAM-bw-local, "perf bench numa mem -p 1 -t 1 -P 1024 -C 0 -M 0 -s 20 -zZq --thp 1 --no-data_rand_walk" RAM-bw-local, 20.132, secs, runtime-max/thread RAM-bw-local, 20.123, secs, runtime-min/thread RAM-bw-local, 20.123, secs, runtime-avg/thread RAM-bw-local, 0.024, %, spread-runtime/thread RAM-bw-local, 171.799, GB, data/thread RAM-bw-local, 171.799, GB, data-total RAM-bw-local, 0.117, nsecs, runtime/byte/thread RAM-bw-local, 8.534, GB/sec, thread-speed RAM-bw-local, 8.534, GB/sec, total-speed # Running RAM-bw-local-NOTHP, "perf bench numa mem -p 1 -t 1 -P 1024 -C 0 -M 0 -s 20 -zZq --thp 1 --no-data_rand_walk --thp -1" RAM-bw-local-NOTHP, 20.133, secs, runtime-max/thread RAM-bw-local-NOTHP, 20.047, secs, runtime-min/thread RAM-bw-local-NOTHP, 20.047, secs, runtime-avg/thread RAM-bw-local-NOTHP, 0.214, %, spread-runtime/thread RAM-bw-local-NOTHP, 169.651, GB, data/thread RAM-bw-local-NOTHP, 169.651, GB, data-total RAM-bw-local-NOTHP, 0.119, nsecs, runtime/byte/thread RAM-bw-local-NOTHP, 8.427, GB/sec, thread-speed RAM-bw-local-NOTHP, 8.427, GB/sec, total-speed # Running RAM-bw-remote, "perf bench numa mem -p 1 -t 1 -P 1024 -C 0 -M 1 -s 20 -zZq --thp 1 --no-data_rand_walk" RAM-bw-remote, 20.127, secs, runtime-max/thread RAM-bw-remote, 20.117, secs, runtime-min/thread RAM-bw-remote, 20.117, secs, runtime-avg/thread RAM-bw-remote, 0.025, %, spread-runtime/thread RAM-bw-remote, 134.218, GB, data/thread RAM-bw-remote, 134.218, GB, data-total RAM-bw-remote, 0.150, nsecs, runtime/byte/thread RAM-bw-remote, 6.669, GB/sec, thread-speed RAM-bw-remote, 6.669, GB/sec, total-speed # Running RAM-bw-local-2x, "perf bench numa mem -p 2 -t 1 -P 1024 -C 0,2 -M 0x2 -s 20 -zZq --thp 1 --no-data_rand_walk" RAM-bw-local-2x, 20.139, secs, runtime-max/thread RAM-bw-local-2x, 20.011, secs, runtime-min/thread RAM-bw-local-2x, 20.070, secs, runtime-avg/thread RAM-bw-local-2x, 0.319, %, spread-runtime/thread RAM-bw-local-2x, 130.997, GB, data/thread RAM-bw-local-2x, 261.993, GB, data-total RAM-bw-local-2x, 0.154, nsecs, runtime/byte/thread RAM-bw-local-2x, 6.505, GB/sec, thread-speed RAM-bw-local-2x, 13.009, GB/sec, total-speed # Running RAM-bw-remote-2x, "perf bench numa mem -p 2 -t 1 -P 1024 -C 0,2 -M 1x2 -s 20 -zZq --thp 1 --no-data_rand_walk" RAM-bw-remote-2x, 20.177, secs, runtime-max/thread RAM-bw-remote-2x, 20.083, secs, runtime-min/thread RAM-bw-remote-2x, 20.125, secs, runtime-avg/thread RAM-bw-remote-2x, 0.233, %, spread-runtime/thread RAM-bw-remote-2x, 74.088, GB, data/thread RAM-bw-remote-2x, 148.176, GB, data-total RAM-bw-remote-2x, 0.272, nsecs, runtime/byte/thread RAM-bw-remote-2x, 3.672, GB/sec, thread-speed RAM-bw-remote-2x, 7.344, GB/sec, total-speed # Running RAM-bw-cross, "perf bench numa mem -p 2 -t 1 -P 1024 -C 0,8 -M 1,0 -s 20 -zZq --thp 1 --no-data_rand_walk" RAM-bw-cross, 20.122, secs, runtime-max/thread RAM-bw-cross, 20.094, secs, runtime-min/thread RAM-bw-cross, 20.103, secs, runtime-avg/thread RAM-bw-cross, 0.070, %, spread-runtime/thread RAM-bw-cross, 121.870, GB, data/thread RAM-bw-cross, 243.739, GB, data-total RAM-bw-cross, 0.165, nsecs, runtime/byte/thread RAM-bw-cross, 6.057, GB/sec, thread-speed RAM-bw-cross, 12.113, GB/sec, total-speed # Running 1x3-convergence, "perf bench numa mem -p 1 -t 3 -P 512 -s 100 -zZ0qcm --thp 1" 1x3-convergence, 2.333, secs, NUMA-convergence-latency 1x3-convergence, 2.333, secs, runtime-max/thread 1x3-convergence, 2.304, secs, runtime-min/thread 1x3-convergence, 2.313, secs, runtime-avg/thread 1x3-convergence, 0.620, %, spread-runtime/thread 1x3-convergence, 7.516, GB, data/thread 1x3-convergence, 22.549, GB, data-total 1x3-convergence, 0.310, nsecs, runtime/byte/thread 1x3-convergence, 3.222, GB/sec, thread-speed 1x3-convergence, 9.665, GB/sec, total-speed # Running 1x4-convergence, "perf bench numa mem -p 1 -t 4 -P 512 -s 100 -zZ0qcm --thp 1" 1x4-convergence, 2.057, secs, NUMA-convergence-latency 1x4-convergence, 2.057, secs, runtime-max/thread 1x4-convergence, 1.958, secs, runtime-min/thread 1x4-convergence, 1.998, secs, runtime-avg/thread 1x4-convergence, 2.403, %, spread-runtime/thread 1x4-convergence, 4.429, GB, data/thread 1x4-convergence, 17.717, GB, data-total 1x4-convergence, 0.464, nsecs, runtime/byte/thread 1x4-convergence, 2.154, GB/sec, thread-speed 1x4-convergence, 8.614, GB/sec, total-speed # Running 1x6-convergence, "perf bench numa mem -p 1 -t 6 -P 1020 -s 100 -zZ0qcm --thp 1" 1x6-convergence, 7.327, secs, NUMA-convergence-latency 1x6-convergence, 7.327, secs, runtime-max/thread 1x6-convergence, 6.879, secs, runtime-min/thread 1x6-convergence, 7.187, secs, runtime-avg/thread 1x6-convergence, 3.063, %, spread-runtime/thread 1x6-convergence, 11.052, GB, data/thread 1x6-convergence, 66.312, GB, data-total 1x6-convergence, 0.663, nsecs, runtime/byte/thread 1x6-convergence, 1.508, GB/sec, thread-speed 1x6-convergence, 9.050, GB/sec, total-speed # Running 2x3-convergence, "perf bench numa mem -p 3 -t 3 -P 1020 -s 100 -zZ0qcm --thp 1" 2x3-convergence, 4.086, secs, NUMA-convergence-latency 2x3-convergence, 4.086, secs, runtime-max/thread 2x3-convergence, 3.779, secs, runtime-min/thread 2x3-convergence, 3.960, secs, runtime-avg/thread 2x3-convergence, 3.761, %, spread-runtime/thread 2x3-convergence, 6.774, GB, data/thread 2x3-convergence, 60.964, GB, data-total 2x3-convergence, 0.603, nsecs, runtime/byte/thread 2x3-convergence, 1.658, GB/sec, thread-speed 2x3-convergence, 14.920, GB/sec, total-speed # Running 3x3-convergence, "perf bench numa mem -p 3 -t 3 -P 1020 -s 100 -zZ0qcm --thp 1" 3x3-convergence, 7.627, secs, NUMA-convergence-latency 3x3-convergence, 7.627, secs, runtime-max/thread 3x3-convergence, 7.380, secs, runtime-min/thread 3x3-convergence, 7.504, secs, runtime-avg/thread 3x3-convergence, 1.624, %, spread-runtime/thread 3x3-convergence, 15.093, GB, data/thread 3x3-convergence, 135.833, GB, data-total 3x3-convergence, 0.505, nsecs, runtime/byte/thread 3x3-convergence, 1.979, GB/sec, thread-speed 3x3-convergence, 17.809, GB/sec, total-speed # Running 4x4-convergence, "perf bench numa mem -p 4 -t 4 -P 512 -s 100 -zZ0qcm --thp 1" 4x4-convergence, 7.381, secs, NUMA-convergence-latency 4x4-convergence, 7.381, secs, runtime-max/thread 4x4-convergence, 7.149, secs, runtime-min/thread 4x4-convergence, 7.277, secs, runtime-avg/thread 4x4-convergence, 1.569, %, spread-runtime/thread 4x4-convergence, 7.181, GB, data/thread 4x4-convergence, 114.890, GB, data-total 4x4-convergence, 1.028, nsecs, runtime/byte/thread 4x4-convergence, 0.973, GB/sec, thread-speed 4x4-convergence, 15.566, GB/sec, total-speed # Running 4x4-convergence-NOTHP, "perf bench numa mem -p 4 -t 4 -P 512 -s 100 -zZ0qcm --thp 1 --thp -1" 4x4-convergence-NOTHP, 9.200, secs, NUMA-convergence-latency 4x4-convergence-NOTHP, 9.200, secs, runtime-max/thread 4x4-convergence-NOTHP, 8.944, secs, runtime-min/thread 4x4-convergence-NOTHP, 9.047, secs, runtime-avg/thread 4x4-convergence-NOTHP, 1.391, %, spread-runtime/thread 4x4-convergence-NOTHP, 11.778, GB, data/thread 4x4-convergence-NOTHP, 188.442, GB, data-total 4x4-convergence-NOTHP, 0.781, nsecs, runtime/byte/thread 4x4-convergence-NOTHP, 1.280, GB/sec, thread-speed 4x4-convergence-NOTHP, 20.483, GB/sec, total-speed # Running 4x6-convergence, "perf bench numa mem -p 4 -t 6 -P 1020 -s 100 -zZ0qcm --thp 1" 4x6-convergence, 11.664, secs, NUMA-convergence-latency 4x6-convergence, 11.664, secs, runtime-max/thread 4x6-convergence, 11.155, secs, runtime-min/thread 4x6-convergence, 11.420, secs, runtime-avg/thread 4x6-convergence, 2.180, %, spread-runtime/thread 4x6-convergence, 11.319, GB, data/thread 4x6-convergence, 271.665, GB, data-total 4x6-convergence, 1.030, nsecs, runtime/byte/thread 4x6-convergence, 0.970, GB/sec, thread-speed 4x6-convergence, 23.292, GB/sec, total-speed # Running 4x8-convergence, "perf bench numa mem -p 4 -t 8 -P 512 -s 100 -zZ0qcm --thp 1" 4x8-convergence, 3.880, secs, NUMA-convergence-latency 4x8-convergence, 3.880, secs, runtime-max/thread 4x8-convergence, 3.613, secs, runtime-min/thread 4x8-convergence, 3.784, secs, runtime-avg/thread 4x8-convergence, 3.440, %, spread-runtime/thread 4x8-convergence, 2.047, GB, data/thread 4x8-convergence, 65.498, GB, data-total 4x8-convergence, 1.896, nsecs, runtime/byte/thread 4x8-convergence, 0.528, GB/sec, thread-speed 4x8-convergence, 16.882, GB/sec, total-speed # Running 8x4-convergence, "perf bench numa mem -p 8 -t 4 -P 512 -s 100 -zZ0qcm --thp 1" 8x4-convergence, 8.938, secs, NUMA-convergence-latency 8x4-convergence, 8.938, secs, runtime-max/thread 8x4-convergence, 8.556, secs, runtime-min/thread 8x4-convergence, 8.744, secs, runtime-avg/thread 8x4-convergence, 2.135, %, spread-runtime/thread 8x4-convergence, 4.396, GB, data/thread 8x4-convergence, 140.660, GB, data-total 8x4-convergence, 2.033, nsecs, runtime/byte/thread 8x4-convergence, 0.492, GB/sec, thread-speed 8x4-convergence, 15.738, GB/sec, total-speed # Running 8x4-convergence-NOTHP, "perf bench numa mem -p 8 -t 4 -P 512 -s 100 -zZ0qcm --thp 1 --thp -1" 8x4-convergence-NOTHP, 12.123, secs, NUMA-convergence-latency 8x4-convergence-NOTHP, 12.123, secs, runtime-max/thread 8x4-convergence-NOTHP, 11.749, secs, runtime-min/thread 8x4-convergence-NOTHP, 11.936, secs, runtime-avg/thread 8x4-convergence-NOTHP, 1.542, %, spread-runtime/thread 8x4-convergence-NOTHP, 4.480, GB, data/thread 8x4-convergence-NOTHP, 143.345, GB, data-total 8x4-convergence-NOTHP, 2.706, nsecs, runtime/byte/thread 8x4-convergence-NOTHP, 0.370, GB/sec, thread-speed 8x4-convergence-NOTHP, 11.824, GB/sec, total-speed # Running 3x1-convergence, "perf bench numa mem -p 3 -t 1 -P 512 -s 100 -zZ0qcm --thp 1" 3x1-convergence, 0.879, secs, NUMA-convergence-latency 3x1-convergence, 0.879, secs, runtime-max/thread 3x1-convergence, 0.810, secs, runtime-min/thread 3x1-convergence, 0.839, secs, runtime-avg/thread 3x1-convergence, 3.911, %, spread-runtime/thread 3x1-convergence, 2.326, GB, data/thread 3x1-convergence, 6.979, GB, data-total 3x1-convergence, 0.378, nsecs, runtime/byte/thread 3x1-convergence, 2.647, GB/sec, thread-speed 3x1-convergence, 7.941, GB/sec, total-speed # Running 4x1-convergence, "perf bench numa mem -p 4 -t 1 -P 512 -s 100 -zZ0qcm --thp 1" 4x1-convergence, 0.685, secs, NUMA-convergence-latency 4x1-convergence, 0.685, secs, runtime-max/thread 4x1-convergence, 0.617, secs, runtime-min/thread 4x1-convergence, 0.650, secs, runtime-avg/thread 4x1-convergence, 4.967, %, spread-runtime/thread 4x1-convergence, 1.476, GB, data/thread 4x1-convergence, 5.906, GB, data-total 4x1-convergence, 0.464, nsecs, runtime/byte/thread 4x1-convergence, 2.154, GB/sec, thread-speed 4x1-convergence, 8.616, GB/sec, total-speed # Running 8x1-convergence, "perf bench numa mem -p 8 -t 1 -P 512 -s 100 -zZ0qcm --thp 1" 8x1-convergence, 1.158, secs, NUMA-convergence-latency 8x1-convergence, 1.158, secs, runtime-max/thread 8x1-convergence, 1.010, secs, runtime-min/thread 8x1-convergence, 1.060, secs, runtime-avg/thread 8x1-convergence, 6.396, %, spread-runtime/thread 8x1-convergence, 1.745, GB, data/thread 8x1-convergence, 13.959, GB, data-total 8x1-convergence, 0.664, nsecs, runtime/byte/thread 8x1-convergence, 1.507, GB/sec, thread-speed 8x1-convergence, 12.054, GB/sec, total-speed # Running 16x1-convergence, "perf bench numa mem -p 16 -t 1 -P 256 -s 100 -zZ0qcm --thp 1" 16x1-convergence, 2.010, secs, NUMA-convergence-latency 16x1-convergence, 2.010, secs, runtime-max/thread 16x1-convergence, 1.939, secs, runtime-min/thread 16x1-convergence, 1.991, secs, runtime-avg/thread 16x1-convergence, 1.760, %, spread-runtime/thread 16x1-convergence, 2.668, GB, data/thread 16x1-convergence, 42.681, GB, data-total 16x1-convergence, 0.753, nsecs, runtime/byte/thread 16x1-convergence, 1.327, GB/sec, thread-speed 16x1-convergence, 21.237, GB/sec, total-speed # Running 32x1-convergence, "perf bench numa mem -p 32 -t 1 -P 128 -s 100 -zZ0qcm --thp 1" 32x1-convergence, 1.946, secs, NUMA-convergence-latency 32x1-convergence, 1.946, secs, runtime-max/thread 32x1-convergence, 1.850, secs, runtime-min/thread 32x1-convergence, 1.946, secs, runtime-avg/thread 32x1-convergence, 2.479, %, spread-runtime/thread 32x1-convergence, 1.242, GB, data/thread 32x1-convergence, 39.728, GB, data-total 32x1-convergence, 1.568, nsecs, runtime/byte/thread 32x1-convergence, 0.638, GB/sec, thread-speed 32x1-convergence, 20.410, GB/sec, total-speed # Running 2x1-bw-process, "perf bench numa mem -p 2 -t 1 -P 1024 -s 20 -zZ0q --thp 1" 2x1-bw-process, 20.146, secs, runtime-max/thread 2x1-bw-process, 20.068, secs, runtime-min/thread 2x1-bw-process, 20.102, secs, runtime-avg/thread 2x1-bw-process, 0.193, %, spread-runtime/thread 2x1-bw-process, 97.174, GB, data/thread 2x1-bw-process, 194.347, GB, data-total 2x1-bw-process, 0.207, nsecs, runtime/byte/thread 2x1-bw-process, 4.824, GB/sec, thread-speed 2x1-bw-process, 9.647, GB/sec, total-speed # Running 3x1-bw-process, "perf bench numa mem -p 3 -t 1 -P 1024 -s 20 -zZ0q --thp 1" 3x1-bw-process, 20.177, secs, runtime-max/thread 3x1-bw-process, 20.127, secs, runtime-min/thread 3x1-bw-process, 20.146, secs, runtime-avg/thread 3x1-bw-process, 0.126, %, spread-runtime/thread 3x1-bw-process, 97.711, GB, data/thread 3x1-bw-process, 293.132, GB, data-total 3x1-bw-process, 0.207, nsecs, runtime/byte/thread 3x1-bw-process, 4.843, GB/sec, thread-speed 3x1-bw-process, 14.528, GB/sec, total-speed # Running 4x1-bw-process, "perf bench numa mem -p 4 -t 1 -P 1024 -s 20 -zZ0q --thp 1" 4x1-bw-process, 20.165, secs, runtime-max/thread 4x1-bw-process, 20.025, secs, runtime-min/thread 4x1-bw-process, 20.078, secs, runtime-avg/thread 4x1-bw-process, 0.348, %, spread-runtime/thread 4x1-bw-process, 95.295, GB, data/thread 4x1-bw-process, 381.178, GB, data-total 4x1-bw-process, 0.212, nsecs, runtime/byte/thread 4x1-bw-process, 4.726, GB/sec, thread-speed 4x1-bw-process, 18.903, GB/sec, total-speed # Running 8x1-bw-process, "perf bench numa mem -p 8 -t 1 -P 512 -s 20 -zZ0q --thp 1" 8x1-bw-process, 20.131, secs, runtime-max/thread 8x1-bw-process, 20.066, secs, runtime-min/thread 8x1-bw-process, 20.090, secs, runtime-avg/thread 8x1-bw-process, 0.161, %, spread-runtime/thread 8x1-bw-process, 67.512, GB, data/thread 8x1-bw-process, 540.092, GB, data-total 8x1-bw-process, 0.298, nsecs, runtime/byte/thread 8x1-bw-process, 3.354, GB/sec, thread-speed 8x1-bw-process, 26.829, GB/sec, total-speed # Running 8x1-bw-process-NOTHP, "perf bench numa mem -p 8 -t 1 -P 512 -s 20 -zZ0q --thp 1 --thp -1" 8x1-bw-process-NOTHP, 20.208, secs, runtime-max/thread 8x1-bw-process-NOTHP, 20.002, secs, runtime-min/thread 8x1-bw-process-NOTHP, 20.067, secs, runtime-avg/thread 8x1-bw-process-NOTHP, 0.509, %, spread-runtime/thread 8x1-bw-process-NOTHP, 56.170, GB, data/thread 8x1-bw-process-NOTHP, 449.361, GB, data-total 8x1-bw-process-NOTHP, 0.360, nsecs, runtime/byte/thread 8x1-bw-process-NOTHP, 2.780, GB/sec, thread-speed 8x1-bw-process-NOTHP, 22.237, GB/sec, total-speed # Running 16x1-bw-process, "perf bench numa mem -p 16 -t 1 -P 256 -s 20 -zZ0q --thp 1" 16x1-bw-process, 20.068, secs, runtime-max/thread 16x1-bw-process, 20.014, secs, runtime-min/thread 16x1-bw-process, 20.042, secs, runtime-avg/thread 16x1-bw-process, 0.136, %, spread-runtime/thread 16x1-bw-process, 36.742, GB, data/thread 16x1-bw-process, 587.874, GB, data-total 16x1-bw-process, 0.546, nsecs, runtime/byte/thread 16x1-bw-process, 1.831, GB/sec, thread-speed 16x1-bw-process, 29.294, GB/sec, total-speed # Running 4x1-bw-thread, "perf bench numa mem -p 1 -t 4 -T 256 -s 20 -zZ0q --thp 1" 4x1-bw-thread, 20.053, secs, runtime-max/thread 4x1-bw-thread, 20.003, secs, runtime-min/thread 4x1-bw-thread, 20.025, secs, runtime-avg/thread 4x1-bw-thread, 0.123, %, spread-runtime/thread 4x1-bw-thread, 96.704, GB, data/thread 4x1-bw-thread, 386.815, GB, data-total 4x1-bw-thread, 0.207, nsecs, runtime/byte/thread 4x1-bw-thread, 4.822, GB/sec, thread-speed 4x1-bw-thread, 19.290, GB/sec, total-speed # Running 8x1-bw-thread, "perf bench numa mem -p 1 -t 8 -T 256 -s 20 -zZ0q --thp 1" 8x1-bw-thread, 20.068, secs, runtime-max/thread 8x1-bw-thread, 20.004, secs, runtime-min/thread 8x1-bw-thread, 20.031, secs, runtime-avg/thread 8x1-bw-thread, 0.160, %, spread-runtime/thread 8x1-bw-thread, 66.203, GB, data/thread 8x1-bw-thread, 529.623, GB, data-total 8x1-bw-thread, 0.303, nsecs, runtime/byte/thread 8x1-bw-thread, 3.299, GB/sec, thread-speed 8x1-bw-thread, 26.391, GB/sec, total-speed # Running 16x1-bw-thread, "perf bench numa mem -p 1 -t 16 -T 128 -s 20 -zZ0q --thp 1" 16x1-bw-thread, 20.044, secs, runtime-max/thread 16x1-bw-thread, 20.007, secs, runtime-min/thread 16x1-bw-thread, 20.029, secs, runtime-avg/thread 16x1-bw-thread, 0.092, %, spread-runtime/thread 16x1-bw-thread, 37.027, GB, data/thread 16x1-bw-thread, 592.437, GB, data-total 16x1-bw-thread, 0.541, nsecs, runtime/byte/thread 16x1-bw-thread, 1.847, GB/sec, thread-speed 16x1-bw-thread, 29.557, GB/sec, total-speed # Running 32x1-bw-thread, "perf bench numa mem -p 1 -t 32 -T 64 -s 20 -zZ0q --thp 1" 32x1-bw-thread, 20.029, secs, runtime-max/thread 32x1-bw-thread, 19.975, secs, runtime-min/thread 32x1-bw-thread, 20.015, secs, runtime-avg/thread 32x1-bw-thread, 0.134, %, spread-runtime/thread 32x1-bw-thread, 18.923, GB, data/thread 32x1-bw-thread, 605.523, GB, data-total 32x1-bw-thread, 1.058, nsecs, runtime/byte/thread 32x1-bw-thread, 0.945, GB/sec, thread-speed 32x1-bw-thread, 30.232, GB/sec, total-speed # Running 2x3-bw-thread, "perf bench numa mem -p 2 -t 3 -P 512 -s 20 -zZ0q --thp 1" 2x3-bw-thread, 20.176, secs, runtime-max/thread 2x3-bw-thread, 20.072, secs, runtime-min/thread 2x3-bw-thread, 20.136, secs, runtime-avg/thread 2x3-bw-thread, 0.257, %, spread-runtime/thread 2x3-bw-thread, 51.540, GB, data/thread 2x3-bw-thread, 309.238, GB, data-total 2x3-bw-thread, 0.391, nsecs, runtime/byte/thread 2x3-bw-thread, 2.555, GB/sec, thread-speed 2x3-bw-thread, 15.327, GB/sec, total-speed # Running 4x4-bw-thread, "perf bench numa mem -p 4 -t 4 -P 512 -s 20 -zZ0q --thp 1" 4x4-bw-thread, 20.183, secs, runtime-max/thread 4x4-bw-thread, 20.013, secs, runtime-min/thread 4x4-bw-thread, 20.086, secs, runtime-avg/thread 4x4-bw-thread, 0.421, %, spread-runtime/thread 4x4-bw-thread, 35.266, GB, data/thread 4x4-bw-thread, 564.251, GB, data-total 4x4-bw-thread, 0.572, nsecs, runtime/byte/thread 4x4-bw-thread, 1.747, GB/sec, thread-speed 4x4-bw-thread, 27.957, GB/sec, total-speed # Running 4x6-bw-thread, "perf bench numa mem -p 4 -t 6 -P 512 -s 20 -zZ0q --thp 1" 4x6-bw-thread, 20.298, secs, runtime-max/thread 4x6-bw-thread, 20.061, secs, runtime-min/thread 4x6-bw-thread, 20.184, secs, runtime-avg/thread 4x6-bw-thread, 0.584, %, spread-runtime/thread 4x6-bw-thread, 23.578, GB, data/thread 4x6-bw-thread, 565.862, GB, data-total 4x6-bw-thread, 0.861, nsecs, runtime/byte/thread 4x6-bw-thread, 1.162, GB/sec, thread-speed 4x6-bw-thread, 27.877, GB/sec, total-speed # Running 4x8-bw-thread, "perf bench numa mem -p 4 -t 8 -P 512 -s 20 -zZ0q --thp 1" 4x8-bw-thread, 20.350, secs, runtime-max/thread 4x8-bw-thread, 20.004, secs, runtime-min/thread 4x8-bw-thread, 20.190, secs, runtime-avg/thread 4x8-bw-thread, 0.851, %, spread-runtime/thread 4x8-bw-thread, 18.086, GB, data/thread 4x8-bw-thread, 578.747, GB, data-total 4x8-bw-thread, 1.125, nsecs, runtime/byte/thread 4x8-bw-thread, 0.889, GB/sec, thread-speed 4x8-bw-thread, 28.439, GB/sec, total-speed # Running 4x8-bw-thread-NOTHP, "perf bench numa mem -p 4 -t 8 -P 512 -s 20 -zZ0q --thp 1 --thp -1" 4x8-bw-thread-NOTHP, 20.411, secs, runtime-max/thread 4x8-bw-thread-NOTHP, 19.990, secs, runtime-min/thread 4x8-bw-thread-NOTHP, 20.246, secs, runtime-avg/thread 4x8-bw-thread-NOTHP, 1.032, %, spread-runtime/thread 4x8-bw-thread-NOTHP, 15.989, GB, data/thread 4x8-bw-thread-NOTHP, 511.638, GB, data-total 4x8-bw-thread-NOTHP, 1.277, nsecs, runtime/byte/thread 4x8-bw-thread-NOTHP, 0.783, GB/sec, thread-speed 4x8-bw-thread-NOTHP, 25.067, GB/sec, total-speed # Running 3x3-bw-thread, "perf bench numa mem -p 3 -t 3 -P 512 -s 20 -zZ0q --thp 1" 3x3-bw-thread, 20.170, secs, runtime-max/thread 3x3-bw-thread, 20.050, secs, runtime-min/thread 3x3-bw-thread, 20.109, secs, runtime-avg/thread 3x3-bw-thread, 0.299, %, spread-runtime/thread 3x3-bw-thread, 48.318, GB, data/thread 3x3-bw-thread, 434.865, GB, data-total 3x3-bw-thread, 0.417, nsecs, runtime/byte/thread 3x3-bw-thread, 2.396, GB/sec, thread-speed 3x3-bw-thread, 21.560, GB/sec, total-speed # Running 5x5-bw-thread, "perf bench numa mem -p 5 -t 5 -P 512 -s 20 -zZ0q --thp 1" 5x5-bw-thread, 20.276, secs, runtime-max/thread 5x5-bw-thread, 20.004, secs, runtime-min/thread 5x5-bw-thread, 20.155, secs, runtime-avg/thread 5x5-bw-thread, 0.671, %, spread-runtime/thread 5x5-bw-thread, 21.153, GB, data/thread 5x5-bw-thread, 528.818, GB, data-total 5x5-bw-thread, 0.959, nsecs, runtime/byte/thread 5x5-bw-thread, 1.043, GB/sec, thread-speed 5x5-bw-thread, 26.081, GB/sec, total-speed # Running 2x16-bw-thread, "perf bench numa mem -p 2 -t 16 -P 512 -s 20 -zZ0q --thp 1" 2x16-bw-thread, 20.465, secs, runtime-max/thread 2x16-bw-thread, 20.004, secs, runtime-min/thread 2x16-bw-thread, 20.284, secs, runtime-avg/thread 2x16-bw-thread, 1.127, %, spread-runtime/thread 2x16-bw-thread, 14.881, GB, data/thread 2x16-bw-thread, 476.204, GB, data-total 2x16-bw-thread, 1.375, nsecs, runtime/byte/thread 2x16-bw-thread, 0.727, GB/sec, thread-speed 2x16-bw-thread, 23.269, GB/sec, total-speed # Running 1x32-bw-thread, "perf bench numa mem -p 1 -t 32 -P 2048 -s 20 -zZ0q --thp 1" 1x32-bw-thread, 21.944, secs, runtime-max/thread 1x32-bw-thread, 20.031, secs, runtime-min/thread 1x32-bw-thread, 20.878, secs, runtime-avg/thread 1x32-bw-thread, 4.358, %, spread-runtime/thread 1x32-bw-thread, 13.019, GB, data/thread 1x32-bw-thread, 416.612, GB, data-total 1x32-bw-thread, 1.686, nsecs, runtime/byte/thread 1x32-bw-thread, 0.593, GB/sec, thread-speed 1x32-bw-thread, 18.985, GB/sec, total-speed # Running numa02-bw, "perf bench numa mem -p 1 -t 32 -T 32 -s 20 -zZ0q --thp 1" numa02-bw, 20.000, secs, runtime-max/thread numa02-bw, 19.967, secs, runtime-min/thread numa02-bw, 19.994, secs, runtime-avg/thread numa02-bw, 0.081, %, spread-runtime/thread numa02-bw, 19.644, GB, data/thread numa02-bw, 628.609, GB, data-total numa02-bw, 1.018, nsecs, runtime/byte/thread numa02-bw, 0.982, GB/sec, thread-speed numa02-bw, 31.431, GB/sec, total-speed # Running numa02-bw-NOTHP, "perf bench numa mem -p 1 -t 32 -T 32 -s 20 -zZ0q --thp 1 --thp -1" numa02-bw-NOTHP, 20.062, secs, runtime-max/thread numa02-bw-NOTHP, 19.940, secs, runtime-min/thread numa02-bw-NOTHP, 19.988, secs, runtime-avg/thread numa02-bw-NOTHP, 0.304, %, spread-runtime/thread numa02-bw-NOTHP, 18.246, GB, data/thread numa02-bw-NOTHP, 583.881, GB, data-total numa02-bw-NOTHP, 1.100, nsecs, runtime/byte/thread numa02-bw-NOTHP, 0.909, GB/sec, thread-speed numa02-bw-NOTHP, 29.104, GB/sec, total-speed # Running numa01-bw-thread, "perf bench numa mem -p 2 -t 16 -T 192 -s 20 -zZ0q --thp 1" numa01-bw-thread, 20.106, secs, runtime-max/thread numa01-bw-thread, 19.989, secs, runtime-min/thread numa01-bw-thread, 20.052, secs, runtime-avg/thread numa01-bw-thread, 0.293, %, spread-runtime/thread numa01-bw-thread, 17.975, GB, data/thread numa01-bw-thread, 575.190, GB, data-total numa01-bw-thread, 1.119, nsecs, runtime/byte/thread numa01-bw-thread, 0.894, GB/sec, thread-speed numa01-bw-thread, 28.607, GB/sec, total-speed # Running numa01-bw-thread-NOTHP, "perf bench numa mem -p 2 -t 16 -T 192 -s 20 -zZ0q --thp 1 --thp -1" numa01-bw-thread-NOTHP, 20.391, secs, runtime-max/thread numa01-bw-thread-NOTHP, 20.010, secs, runtime-min/thread numa01-bw-thread-NOTHP, 20.085, secs, runtime-avg/thread numa01-bw-thread-NOTHP, 0.936, %, spread-runtime/thread numa01-bw-thread-NOTHP, 13.457, GB, data/thread numa01-bw-thread-NOTHP, 430.638, GB, data-total numa01-bw-thread-NOTHP, 1.515, nsecs, runtime/byte/thread numa01-bw-thread-NOTHP, 0.660, GB/sec, thread-speed numa01-bw-thread-NOTHP, 21.119, GB/sec, total-speed -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>