Re: NUMA? bisected performance regression 3.11->3.12

Johannes Weiner <hannes@xxxxxxxxxxx> · Fri, 22 Nov 2013 00:22:19 -0500

Hi Dave,

On Thu, Nov 21, 2013 at 02:57:18PM -0800, Dave Hansen wrote:
> Hey Johannes,
> 
> I'm running an open/close microbenchmark from the will-it-scale set:
> > https://github.com/antonblanchard/will-it-scale/blob/master/tests/open1.c
> 
> I was seeing some weird symptoms on 3.12 vs 3.11.  The throughput in
> that test was going from down from 50 million to 35 million.
> 
> The profiles show an increase in cpu time in _raw_spin_lock_irq.  The
> profiles pointed to slub code that hasn't been touched in quite a while.
>  I bisected it down to:
> 
> 81c0a2bb515fd4daae8cab64352877480792b515 is the first bad commit
> commit 81c0a2bb515fd4daae8cab64352877480792b515
> Author: Johannes Weiner <hannes@xxxxxxxxxxx>
> Date:   Wed Sep 11 14:20:47 2013 -0700
> 
> Which also seems a bit weird, but I've tested with this and its
> preceding commit enough times to be fairly sure that I did it right.
> 
> __slab_free() and free_one_page() both seem to be spending more time
> spinning on their respective spinlocks, even though the throughput went
> down and we should have been doing fewer actual allocations/frees.  The
> best explanation for this would be if CPUs are tending to go after and
> contending for remote cachelines more often once this patch is applied.
> 
> Any ideas?
> 
> It's a 8-socket/160-thread (one NUMA node per socket) system that is not
> under memory pressure during the test.  The latencies are also such that
> vm.zone_reclaim_mode=0.

The change will definitely spread allocations out to all nodes then
and it's plausible that the remote references will hurt kernel object
allocations in a tight loop.  Just to confirm, could you rerun the
test with zone_reclaim_mode enabled to make the allocator stay in the
local zones?

The fairness code was written for reclaimable memory, which is
longer-lived, and the only memory where it matters.  I might have to
be bypass it for unreclaimable allocations...

> Raw perf profiles and .config are in here:
> http://www.sr71.net/~dave/intel/201311-wisregress0/
> 
> Here's a chunk of the 'perf diff':
> >     17.65%   +3.47%  [kernel.kallsyms]  [k] _raw_spin_lock_irqsave           
> >     13.80%   -0.31%  [kernel.kallsyms]  [k] _raw_spin_lock                   
> >      7.21%   -0.51%  [unknown]          [.] 0x00007f7849058640               
> >      3.43%   +0.15%  [kernel.kallsyms]  [k] setup_object                     
> >      2.99%   -0.31%  [kernel.kallsyms]  [k] file_free_rcu                    
> >      2.71%   -0.13%  [kernel.kallsyms]  [k] rcu_process_callbacks            
> >      2.26%   -0.09%  [kernel.kallsyms]  [k] get_empty_filp                   
> >      2.06%   -0.09%  [kernel.kallsyms]  [k] kmem_cache_alloc                 
> >      1.65%   -0.08%  [kernel.kallsyms]  [k] link_path_walk                   
> >      1.53%   -0.08%  [kernel.kallsyms]  [k] memset                           
> >      1.46%   -0.09%  [kernel.kallsyms]  [k] do_dentry_open                   
> >      1.44%   -0.04%  [kernel.kallsyms]  [k] __d_lookup_rcu                   
> >      1.27%   -0.04%  [kernel.kallsyms]  [k] do_last                          
> >      1.18%   -0.04%  [kernel.kallsyms]  [k] ext4_release_file                
> >      1.16%   -0.04%  [kernel.kallsyms]  [k] __call_rcu.constprop.11          

Thanks for the detailed report.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>