Hi David, On Tue, Jan 18, 2011 at 2:09 PM, David Rientjes <rientjes@xxxxxxxxxx> wrote: > Before 0e093d99763e (writeback: do not sleep on the congestion queue if > there are no congested BDIs or if significant congestion is not being > encountered in the current zone), preferred_zone was only used for > statistics and to determine the zoneidx from which to allocate from given > the type requested. > > wait_iff_congested(), though, uses preferred_zone to determine if the > congestion wait should be deferred because its dirty pages are backed by > a congested bdi. This incorrectly defers the timeout and busy loops in > the page allocator with various cond_resched() calls if preferred_zone is > not allowed in the current context, usually consuming 100% of a cpu. > > This patch resets preferred_zone to an allowed zone in the slowpath if > the allocation context is constrained by current's cpuset. It also > ensures preferred_zone is from the set of allowed nodes when called from > within direct reclaim; allocations are always constrainted by cpusets > since the context is always blockable. > > Both of these uses of cpuset_current_mems_allowed are protected by > get_mems_allowed(). > --- > mm/page_alloc.c | 12 ++++++++++++ > mm/vmscan.c | 3 ++- > 2 files changed, 14 insertions(+), 1 deletions(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -2034,6 +2034,18 @@ restart: > */ > alloc_flags = gfp_to_alloc_flags(gfp_mask); > > + /* > + * If preferred_zone cannot be allocated from in this context, find the > + * first allowable zone instead. > + */ > + if ((alloc_flags & ALLOC_CPUSET) && > + !cpuset_zone_allowed_softwall(preferred_zone, gfp_mask)) { > + first_zones_zonelist(zonelist, high_zoneidx, > + &cpuset_current_mems_allowed, &preferred_zone); This patch is one we need. but I have a nitpick. I am not familiar with CPUSET so I might be wrong. I think it could make side effect of statistics of ZVM on buffered_rmqueue since you intercept and change preferred_zone. It could make NUMA_HIT instead of NUMA_MISS. Is it your intention? -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxxx For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href