On Tue, Jan 17, 2017 at 07:17:22PM +0100, Vlastimil Babka wrote: > On 01/17/2017 07:07 PM, Jesper Dangaard Brouer wrote: > > > > On Tue, 17 Jan 2017 09:29:51 +0000 Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> wrote: > > > >> +/* Lock and remove page from the per-cpu list */ > >> +static struct page *rmqueue_pcplist(struct zone *preferred_zone, > >> + struct zone *zone, unsigned int order, > >> + gfp_t gfp_flags, int migratetype) > >> +{ > >> + struct per_cpu_pages *pcp; > >> + struct list_head *list; > >> + bool cold = ((gfp_flags & __GFP_COLD) != 0); > >> + struct page *page; > >> + unsigned long flags; > >> + > >> + local_irq_save(flags); > >> + pcp = &this_cpu_ptr(zone->pageset)->pcp; > >> + list = &pcp->lists[migratetype]; > >> + page = __rmqueue_pcplist(zone, migratetype, cold, pcp, list); > >> + if (page) { > >> + __count_zid_vm_events(PGALLOC, page_zonenum(page), 1 << order); > >> + zone_statistics(preferred_zone, zone, gfp_flags); > > > > Word-of-warning: The zone_statistics() call changed number of > > parameters in commit 41b6167e8f74 ("mm: get rid of __GFP_OTHER_NODE"). > > (Not sure what tree you are based on) > Yes, there's a conflict. The fix is trivial and shouldn't affect the overall series. Not that it matters because of ths next part > Yeah and there will likely be more conflicts with fixes wrt the "getting > oom/stalls for ltp test cpuset01 with latest/4.9 kernel???" thread, > hopefully tomorrow. > It's was on my list to look closer at that thread tomorrow. I only took a quick look for the first time a few minutes ago and it looks bad. There is at least a flaw in the retry sequence if cpusets are disabled during an allocation that fails as it won't retry. That leaves a small window if the last cpuset disappeared during which an allocation could artifically fail but that can't be what's going on here. It could still be the retry logic because the nodemask is not necessarily synced up with cpuset_current_mems_allowed. I'll try reproducing this in the morning. The fix is almost certainly going to conflict with this series but this series can wait until after that gets resolved and I'll rebase on top of mmotm. It's late so I'm fairly tired but assuming I can reproduce this in the morning, the first thing I'll try is something like this to force a reread of mems_allowed; diff --git a/mm/page_alloc.c b/mm/page_alloc.c index ebea51cc0135..3fc2b3a8d301 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3774,13 +3774,6 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, .migratetype = gfpflags_to_migratetype(gfp_mask), }; - if (cpusets_enabled()) { - alloc_mask |= __GFP_HARDWALL; - alloc_flags |= ALLOC_CPUSET; - if (!ac.nodemask) - ac.nodemask = &cpuset_current_mems_allowed; - } - gfp_mask &= gfp_allowed_mask; lockdep_trace_alloc(gfp_mask); @@ -3802,6 +3795,13 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, alloc_flags |= ALLOC_CMA; retry_cpuset: + if (cpusets_enabled()) { + alloc_mask |= __GFP_HARDWALL; + alloc_flags |= ALLOC_CPUSET; + if (!nodemask) + ac.nodemask = &cpuset_current_mems_allowed; + } + cpuset_mems_cookie = read_mems_allowed_begin(); /* Dirty zone balancing only done in the fast path */ If that doesn't work out then I'll start kicking the problem properly unless you've beaten me to the correct solution already :) -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>