Michal Hocko wrote: > On Tue 26-01-16 00:08:28, Tetsuo Handa wrote: > [...] > > If it turned out that we are using GFP_NOFS from LSM hooks correctly, > > I'd expect such GFP_NOFS allocations retry unless SIGKILL is pending. > > Filesystems might be able to handle GFP_NOFS allocation failures. But > > userspace might not be able to handle system call failures caused by > > GFP_NOFS allocation failures; OOM-unkillable processes might unexpectedly > > terminate as if they are OOM-killed. Would you please add GFP_KILLABLE > > to list of the topics? > > Are there so many places to justify a flag? Isn't it easier to check for > fatal_signal_pending in the failed path and do the retry otherwise? This > allows for a more flexible fallback strategy - e.g. drop the locks and > retry again, sleep for reasonable time, wait for some event etc... This > sounds much more extensible than a single flag burried down in the > allocator path. If you allow any in-kernel code to directly call out_of_memory(), I'm OK with that. I consider that whether to invoke the OOM killer should not be determined based on ability to reclaim memory; it should be determined based on importance and/or purpose of that memory allocation request. We allocate memory on behalf of userspace processes. If a userspace process asks for a page via page fault, we are using __GFP_FS. If in-kernel code does something on behalf of a userspace process, we should use __GFP_FS. Forcing in-kernel code to use !__GFP_FS allocation requests is a hack for workarounding inconvenient circumstances in memory allocation (memory reclaim deadlock) which is not fault of userspace processes. Userspace controls oom_score_adj and makes a bet between processes. If process A wins, the OOM killer kills process B, and process A gets memory. If process B wins, the OOM killer kills process A, and process B gets memory. Not invoking the OOM killer due to lack of __GFP_FS is something like forcing processes to use oom_kill_allocating_task = 1. Therefore, since __GFP_KILLABLE does not exist and out_of_memory() is not exported, I'll change my !__GFP_FS allocation requests to __GFP_NOFAIL (in order to allow processes to make a bet) if mm people change small !__GFP_FS allocation requests to fail upon OOM. Note that there is no need to retry such __GFP_NOFAIL allocation requests if SIGKILL is pending, but __GFP_NOFAIL does not allow fail upon SIGKILL. __GFP_KILLABLE (with current "no-fail unless chosen by the OOM killer" behavior) will handle it perfectly. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html