On Mon, May 16, 2011 at 02:03:33PM -0700, David Rientjes wrote: > On Thu, 12 May 2011, Andrea Arcangeli wrote: > > > On Wed, May 11, 2011 at 01:38:47PM -0700, David Rientjes wrote: > > > kswapd and doing compaction for the higher order allocs before falling > > > > Note that patch 2 disabled compaction by clearing __GFP_WAIT. > > > > What you describe here would be patch 2 without the ~__GFP_WAIT > > addition (so keeping only ~GFP_NOFAIL). > > > > It's out of context, my sentence was: > > "With the previous changes in this patchset, specifically avoiding waking > kswapd and doing compaction for the higher order allocs before falling > back to the min order..." > > meaning this patchset avoids waking kswapd and avoids doing compaction. > Ok. > > Not clearing __GFP_WAIT when compaction is enabled is possible and > > shouldn't result in bad behavior (if compaction is not enabled with > > current SLUB it's hard to imagine how it could perform decently if > > there's fragmentation). You should try to benchmark to see if it's > > worth it on the large NUMA systems with heavy network traffic (for > > normal systems I doubt compaction is worth it but I'm not against > > trying to keep it enabled just in case). > > > > The fragmentation isn't the only issue with the netperf TCP_RR benchmark, > the problem is that the slub slowpath is being used >95% of the time on > every allocation and free for the very large number of kmalloc-256 and > kmalloc-2K caches. Ok, that makes sense as I'd full expect that benchmark to exhaust the per-cpu page (high order or otherwise) of slab objects routinely during default and I'd also expect the freeing on the other side to be releasing slabs frequently to the partial or empty lists. > Those caches are order 1 and 3, respectively, on my > system by default, but the page allocator seldomly gets invoked for such a > benchmark after the partial lists are populated: the overhead is from the > per-node locking required in the slowpath to traverse the partial lists. > See the data I presented two years ago: http://lkml.org/lkml/2009/3/30/15. Ok, I can see how this patch would indeed make the situation worse. I vaguely recall that there were other patches that would increase the per-cpu lists of objects but have no recollection as to what happened them. Maybe Christoph remembers but one way or the other, it's out of scope for James' and Colin's bug. -- Mel Gorman SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html