+CC linux-api On 11.4.2017 19:24, Christoph Lameter wrote: > On Tue, 11 Apr 2017, Vlastimil Babka wrote: > >> The root of the problem is that the cpuset's mems_allowed and mempolicy's >> nodemask can temporarily have no intersection, thus get_page_from_freelist() >> cannot find any usable zone. The current semantic for empty intersection is to >> ignore mempolicy's nodemask and honour cpuset restrictions. This is checked in >> node_zonelist(), but the racy update can happen after we already passed the > > The fallback was only intended for a cpuset on which boundaries are not enforced > in critical conditions (softwall). A hardwall cpuset (CS_MEM_HARDWALL) > should fail the allocation. Hmm just to clarify - I'm talking about ignoring the *mempolicy's* nodemask on the basis of cpuset having higher priority, while you seem to be talking about ignoring a (softwall) cpuset nodemask, right? man set_mempolicy says "... if required nodemask contains no nodes that are allowed by the process's current cpuset context, the memory policy reverts to local allocation" which does come down to ignoring mempolicy's nodemask. >> This patch fixes the issue by having __alloc_pages_slowpath() check for empty >> intersection of cpuset and ac->nodemask before OOM or allocation failure. If >> it's indeed empty, the nodemask is ignored and allocation retried, which mimics >> node_zonelist(). This works fine, because almost all callers of > > Well that would need to be subject to the hardwall flag. Allocation needs > to fail for a hardwall cpuset. They still do, if no hardwall cpuset node can satisfy the allocation with mempolicy ignored. -- To unsubscribe from this list: send the line "unsubscribe cgroups" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html