Hi Dave, On Thu, Nov 21, 2013 at 02:57:18PM -0800, Dave Hansen wrote: > Hey Johannes, > > I'm running an open/close microbenchmark from the will-it-scale set: > > https://github.com/antonblanchard/will-it-scale/blob/master/tests/open1.c > > I was seeing some weird symptoms on 3.12 vs 3.11. The throughput in > that test was going from down from 50 million to 35 million. > > The profiles show an increase in cpu time in _raw_spin_lock_irq. The > profiles pointed to slub code that hasn't been touched in quite a while. > I bisected it down to: > > 81c0a2bb515fd4daae8cab64352877480792b515 is the first bad commit > commit 81c0a2bb515fd4daae8cab64352877480792b515 > Author: Johannes Weiner <hannes@xxxxxxxxxxx> > Date: Wed Sep 11 14:20:47 2013 -0700 > > Which also seems a bit weird, but I've tested with this and its > preceding commit enough times to be fairly sure that I did it right. > > __slab_free() and free_one_page() both seem to be spending more time > spinning on their respective spinlocks, even though the throughput went > down and we should have been doing fewer actual allocations/frees. The > best explanation for this would be if CPUs are tending to go after and > contending for remote cachelines more often once this patch is applied. > > Any ideas? > > It's a 8-socket/160-thread (one NUMA node per socket) system that is not > under memory pressure during the test. The latencies are also such that > vm.zone_reclaim_mode=0. The change will definitely spread allocations out to all nodes then and it's plausible that the remote references will hurt kernel object allocations in a tight loop. Just to confirm, could you rerun the test with zone_reclaim_mode enabled to make the allocator stay in the local zones? The fairness code was written for reclaimable memory, which is longer-lived, and the only memory where it matters. I might have to be bypass it for unreclaimable allocations... > Raw perf profiles and .config are in here: > http://www.sr71.net/~dave/intel/201311-wisregress0/ > > Here's a chunk of the 'perf diff': > > 17.65% +3.47% [kernel.kallsyms] [k] _raw_spin_lock_irqsave > > 13.80% -0.31% [kernel.kallsyms] [k] _raw_spin_lock > > 7.21% -0.51% [unknown] [.] 0x00007f7849058640 > > 3.43% +0.15% [kernel.kallsyms] [k] setup_object > > 2.99% -0.31% [kernel.kallsyms] [k] file_free_rcu > > 2.71% -0.13% [kernel.kallsyms] [k] rcu_process_callbacks > > 2.26% -0.09% [kernel.kallsyms] [k] get_empty_filp > > 2.06% -0.09% [kernel.kallsyms] [k] kmem_cache_alloc > > 1.65% -0.08% [kernel.kallsyms] [k] link_path_walk > > 1.53% -0.08% [kernel.kallsyms] [k] memset > > 1.46% -0.09% [kernel.kallsyms] [k] do_dentry_open > > 1.44% -0.04% [kernel.kallsyms] [k] __d_lookup_rcu > > 1.27% -0.04% [kernel.kallsyms] [k] do_last > > 1.18% -0.04% [kernel.kallsyms] [k] ext4_release_file > > 1.16% -0.04% [kernel.kallsyms] [k] __call_rcu.constprop.11 Thanks for the detailed report. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>