On Tue, 15 Aug 2017 16:45:34 +0800 Kemi Wang <kemi.wang@xxxxxxxxx> wrote: > Each page allocation updates a set of per-zone statistics with a call to > zone_statistics(). As discussed in 2017 MM submit, these are a substantial ^^^^^^ should be "summit" > source of overhead in the page allocator and are very rarely consumed. This > significant overhead in cache bouncing caused by zone counters (NUMA > associated counters) update in parallel in multi-threaded page allocation > (pointed out by Dave Hansen). Hi Kemi Thanks a lot for following up on this work. A link to the MM summit slides: http://people.netfilter.org/hawk/presentations/MM-summit2017/MM-summit2017-JesperBrouer.pdf > To mitigate this overhead, this patchset separates NUMA statistics from > zone statistics framework, and update NUMA counter threshold to a fixed > size of 32765, as a small threshold greatly increases the update frequency > of the global counter from local per cpu counter (suggested by Ying Huang). > The rationality is that these statistics counters don't need to be read > often, unlike other VM counters, so it's not a problem to use a large > threshold and make readers more expensive. > > With this patchset, we see 26.6% drop of CPU cycles(537-->394, see below) > for per single page allocation and reclaim on Jesper's page_bench03 > benchmark. Meanwhile, this patchset keeps the same style of virtual memory > statistics with little end-user-visible effects (see the first patch for > details), except that the number of NUMA items in each cpu > (vm_numa_stat_diff[]) is added to zone->vm_numa_stat[] when a user *reads* > the value of NUMA counter to eliminate deviation. I'm very happy to see that you found my kernel module for benchmarking useful :-) > I did an experiment of single page allocation and reclaim concurrently > using Jesper's page_bench03 benchmark on a 2-Socket Broadwell-based server > (88 processors with 126G memory) with different size of threshold of pcp > counter. > > Benchmark provided by Jesper D Broucer(increase loop times to 10000000): ^^^^^^^ You mis-spelled my last name, it is "Brouer". > https://github.com/netoptimizer/prototype-kernel/tree/master/kernel/mm/bench > > Threshold CPU cycles Throughput(88 threads) > 32 799 241760478 > 64 640 301628829 > 125 537 358906028 <==> system by default > 256 468 412397590 > 512 428 450550704 > 4096 399 482520943 > 20000 394 489009617 > 30000 395 488017817 > 32765 394(-26.6%) 488932078(+36.2%) <==> with this patchset > N/A 342(-36.3%) 562900157(+56.8%) <==> disable zone_statistics > > Kemi Wang (2): > mm: Change the call sites of numa statistics items > mm: Update NUMA counter threshold size > > drivers/base/node.c | 22 ++++--- > include/linux/mmzone.h | 25 +++++--- > include/linux/vmstat.h | 33 ++++++++++ > mm/page_alloc.c | 10 +-- > mm/vmstat.c | 162 +++++++++++++++++++++++++++++++++++++++++++++++-- > 5 files changed, 227 insertions(+), 25 deletions(-) > -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>