Re: [LSF/MM TOPIC] proposals for topics

Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> · Wed, 27 Jan 2016 22:44:30 +0900

Michal Hocko wrote:
> On Tue 26-01-16 00:08:28, Tetsuo Handa wrote:
> [...]
> > If it turned out that we are using GFP_NOFS from LSM hooks correctly,
> > I'd expect such GFP_NOFS allocations retry unless SIGKILL is pending.
> > Filesystems might be able to handle GFP_NOFS allocation failures. But
> > userspace might not be able to handle system call failures caused by
> > GFP_NOFS allocation failures; OOM-unkillable processes might unexpectedly
> > terminate as if they are OOM-killed. Would you please add GFP_KILLABLE
> > to list of the topics?
> 
> Are there so many places to justify a flag? Isn't it easier to check for
> fatal_signal_pending in the failed path and do the retry otherwise? This
> allows for a more flexible fallback strategy - e.g. drop the locks and
> retry again, sleep for reasonable time, wait for some event etc... This
> sounds much more extensible than a single flag burried down in the
> allocator path.

If you allow any in-kernel code to directly call out_of_memory(), I'm
OK with that.

I consider that whether to invoke the OOM killer should not be determined
based on ability to reclaim memory; it should be determined based on
importance and/or purpose of that memory allocation request.

We allocate memory on behalf of userspace processes. If a userspace process
asks for a page via page fault, we are using __GFP_FS. If in-kernel code
does something on behalf of a userspace process, we should use __GFP_FS.

Forcing in-kernel code to use !__GFP_FS allocation requests is a hack for
workarounding inconvenient circumstances in memory allocation (memory
reclaim deadlock) which is not fault of userspace processes.

Userspace controls oom_score_adj and makes a bet between processes.
If process A wins, the OOM killer kills process B, and process A gets memory.
If process B wins, the OOM killer kills process A, and process B gets memory.
Not invoking the OOM killer due to lack of __GFP_FS is something like forcing
processes to use oom_kill_allocating_task = 1.

Therefore, since __GFP_KILLABLE does not exist and out_of_memory() is not
exported, I'll change my !__GFP_FS allocation requests to __GFP_NOFAIL
(in order to allow processes to make a bet) if mm people change small !__GFP_FS
allocation requests to fail upon OOM. Note that there is no need to retry such
__GFP_NOFAIL allocation requests if SIGKILL is pending, but __GFP_NOFAIL does
not allow fail upon SIGKILL. __GFP_KILLABLE (with current "no-fail unless chosen
by the OOM killer" behavior) will handle it perfectly.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html