On 07/14/2017 03:16 PM, daniel.m.jordan@xxxxxxxxxx wrote: > Machine: Intel(R) Xeon(R) CPU E7-8895 v3 @ 2.60GHz, 288 cpus, 1T memory > Test: Clear a range of gigantic pages > nthread speedup size (GiB) min time (s) stdev > 1 100 41.13 0.03 > 2 2.03x 100 20.26 0.14 > 4 4.28x 100 9.62 0.09 > 8 8.39x 100 4.90 0.05 > 16 10.44x 100 3.94 0.03 ... > 1 800 434.91 1.81 > 2 2.54x 800 170.97 1.46 > 4 4.98x 800 87.38 1.91 > 8 10.15x 800 42.86 2.59 > 16 12.99x 800 33.48 0.83 What was the actual test here? Did you just use sysfs to allocate 800GB of 1GB huge pages? This test should be entirely memory-bandwidth-limited, right? Are you contending here that a single core can only use 1/10th of the memory bandwidth when clearing a page? Or, does all the gain here come because we are round-robin-allocating the pages across all 8 NUMA nodes' memory controllers and the speedup here is because we're not doing the clearing across the interconnect? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>