On 05/17/2017 04:48 PM, Christoph Lameter wrote: > On Wed, 17 May 2017, Michal Hocko wrote: > >>>> So how are you going to distinguish VM_FAULT_OOM from an empty mempolicy >>>> case in a raceless way? >>> >>> You dont have to do that if you do not create an empty mempolicy in the >>> first place. The current kernel code avoids that by first allowing access >>> to the new set of nodes and removing the old ones from the set when done. >> >> which is racy and as Vlastimil pointed out. If we simply fail such an >> allocation the failure will go up the call chain until we hit the OOM >> killer due to VM_FAULT_OOM. How would you want to handle that? > > The race is where? If you expand the node set during the move of the > application then you are safe in terms of the legacy apps that did not > include static bindings. No, that expand/shrink by itself doesn't work against parallel get_page_from_freelist going through a zonelist. Moving from node 0 to 1, with zonelist containing nodes 1 and 0 in that order: - mempolicy mask is 0 - zonelist iteration checks node 1, it's not allowed, skip - mempolicy mask is 0,1 (expand) - mempolicy mask is 1 (shrink) - zonelist iteration checks node 0, it's not allowed, skip - OOM -- To unsubscribe from this list: send the line "unsubscribe cgroups" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html