On Fri, 2009-01-23 at 08:52 +0200, Pekka Enberg wrote: > > 1) If I start CPU_NUM clients and servers, SLUB's result is about 2% better than SLQB's; > > 2) If I start 1 clinet and 1 server, and bind them to different physical cpu, SLQB's result > > is about 10% better than SLUB's. > > > > I don't know why there is still 10% difference with item 2). Maybe cachemiss causes it? > > Maybe we can use the perfstat and/or kerneltop utilities of the new perf > counters patch to diagnose this: > > http://lkml.org/lkml/2009/1/21/273 > > And do oprofile, of course. Thanks! I assume binding the client and the server to different physical CPUs also means that the SKB is always allocated on CPU 1 and freed on CPU 2? If so, we will be taking the __slab_free() slow path all the time on kfree() which will cause cache effects, no doubt. But there's another potential performance hit we're taking because the object size of the cache is so big. As allocations from CPU 1 keep coming in, we need to allocate new pages and unfreeze the per-cpu page. That in turn causes __slab_free() to be more eager to discard the slab (see the PageSlubFrozen check there). So before going for cache profiling, I'd really like to see an oprofile report. I suspect we're still going to see much more page allocator activity there than with SLAB or SLQB which is why we're still behaving so badly here. Pekka -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html