Re: [GIT PULL] Lockless SLUB slowpaths for v3.1-rc1

Pekka Enberg <penberg@xxxxxxxxxx> · Mon, 01 Aug 2011 08:08:26 +0300

On Sun, 2011-07-31 at 14:55 -0700, David Rientjes wrote:
> On Sun, 31 Jul 2011, Pekka Enberg wrote:
> 
> > > And although slub is definitely heading in the right direction regarding 
> > > the netperf benchmark, it's still a non-starter for anybody using large 
> > > NUMA machines for networking performance.  On my 16-core, 4 node, 64GB 
> > > client/server machines running netperf TCP_RR with various thread counts 
> > > for 60 seconds each on 3.0:
> > > 
> > > 	threads		SLUB		SLAB		diff
> > > 	 16		76345		74973		- 1.8%
> > > 	 32		116380		116272		- 0.1%
> > > 	 48		150509		153703		+ 2.1%
> > > 	 64		187984		189750		+ 0.9%
> > > 	 80		216853		224471		+ 3.5%
> > > 	 96		236640		249184		+ 5.3%
> > > 	112		256540		275464		+ 7.4%
> > > 	128		273027		296014		+ 8.4%
> > > 	144		281441		314791		+11.8%
> > > 	160		287225		326941		+13.8%
> > 
> > That looks like a pretty nasty scaling issue. David, would it be
> > possible to see 'perf report' for the 160 case? [ Maybe even 'perf
> > annotate' for the interesting SLUB functions. ] 
> 
> More interesting than the perf report (which just shows kfree, 
> kmem_cache_free, kmem_cache_alloc dominating) is the statistics that are 
> exported by slub itself, it shows the "slab thrashing" issue that I 
> described several times over the past few years.  It's difficult to 
> address because it's a result of slub's design.  From the client side of 
> 160 netperf TCP_RR threads for 60 seconds:
> 
> 	cache		alloc_fastpath		alloc_slowpath
> 	kmalloc-256	10937512 (62.8%)	6490753
> 	kmalloc-1024	17121172 (98.3%)	303547
> 	kmalloc-4096	5526281			11910454 (68.3%)
> 
> 	cache		free_fastpath		free_slowpath
> 	kmalloc-256	15469			17412798 (99.9%)
> 	kmalloc-1024	11604742 (66.6%)	5819973
> 	kmalloc-4096	14848			17421902 (99.9%)
> 
> With those stats, there's no way that slub will even be able to compete 
> with slab because it's not optimized for the slowpath.

Is the slowpath being hit more often with 160 vs 16 threads? As I said,
the problem you mentioned looks like a *scaling issue* to me which is
actually somewhat surprising. I knew that the slowpaths were slow but I
haven't seen this sort of data before.

I snipped the 'SLUB can never compete with SLAB' part because I'm
frankly more interested in raw data I can analyse myself. I'm hoping to
the per-CPU partial list patch queued for v3.2 soon and I'd be
interested to know how much I can expect that to help.

			Pekka

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>