On Sat, 2 Aug 2014, Johannes Weiner wrote: > > I see one concern: that panic_on_oom == 1 will not trigger on pagefault > > when constrained by cpusets. To address that, I'll state that, since > > cpuset-constrained allocations are the allocation context for pagefaults, > > panic_on_oom == 1 should not trigger on pagefault when constrained by > > cpusets. > > I expressed my concern pretty clearly above: out_of_memory() wants the > zonelist that was used during the failed allocation, you are passing a > non-sensical value in there that only happens to have the same type. > It's certainly meaningful, the particular zonelist chosen isn't important because we don't care about the ordering and pagefaults are not going to be using __GFP_THISNODE. In this context, we only need to pass a zonelist that includes all zones because constrained_alloc() tests if the allocation is cpuset-constrained based on the gfp flags. We'll get CONSTRAINT_CPUSET in that case. This is important because the behavior of panic_on_oom differs, as you pointed out, depending on the constraint. pagefault_out_of_memory(), with my patch, will always get CONSTRAINT_CPUSET when needed and check_panic_on_oom() will behave correctly now for cpusets. > We simply don't have the right information at the end of the page > fault handler to respect constrained allocations. Case in point: > nodemask is unset from pagefault_out_of_memory(), so we still kill > based on mempolicy even though check_panic_on_oom() says it wouldn't. > That is, in fact, the only last bit of information we need in the pagefault handler to make correct decisions. It's important, too, since if the vma of the faulting address is constrained by a mempolicy, we want to avoid needless killing a process that has a mempolicy with a disjoint set of nodes. > The code change is not an adequate solution for the problem we have > here and the changelog is an insult to everybody who wants to make > sense of this from the git history later on. > We can also address mempolicies by modifying the page fault handler and passing the vma and faulting address to make the correct panic_on_oom decisions but also filter processes that have mempolicies that consist solely of a disjoint set of nodes. I'll post that patch series as well. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>