On Tue, 17 May 2011, Mel Gorman wrote: > > The fragmentation isn't the only issue with the netperf TCP_RR benchmark, > > the problem is that the slub slowpath is being used >95% of the time on > > every allocation and free for the very large number of kmalloc-256 and > > kmalloc-2K caches. > > Ok, that makes sense as I'd full expect that benchmark to exhaust > the per-cpu page (high order or otherwise) of slab objects routinely > during default and I'd also expect the freeing on the other side to > be releasing slabs frequently to the partial or empty lists. > That's most of the problem, but it's compounded on this benchmark because the slab pulled from the partial list to replace the per-cpu page typically only has a very minimal number (2 or 3) of free objects, so it can only serve one allocation and then require the allocation slowpath to pull yet another slab from the partial list the next time around. I had a patchset that addressed that, which I called "slab thrashing", by only pulling a slab from the partial list when it had a pre-defined proportion of available objects and otherwise skipping it, and that ended up helping the benchmark by 5-7%. Smaller orders will make this worse, as well, since if there were only 2 or 3 free objects on an order-3 slab before, there's no chance that's going to be equivalent on an order-0 slab. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html