Re: [PATCH 08/10] exit, oom: postpone exit_oom_victim to later

Michal Hocko <mhocko@xxxxxxxxxx> · Tue, 2 Aug 2016 13:31:29 +0200

On Tue 02-08-16 19:32:45, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > > > > It is possible that a user creates a process with 10000 threads
> > > > > and let that process be OOM-killed. Then, this patch allows 10000 threads
> > > > > to start consuming memory reserves after they left exit_mm(). OOM victims
> > > > > are not the only threads who need to allocate memory for termination. Non
> > > > > OOM victims might need to allocate memory at exit_task_work() in order to
> > > > > allow OOM victims to make forward progress.
> > > > 
> > > > this might be possible but unlike the regular exiting tasks we do
> > > > reclaim oom victim's memory in the background. So while they can consume
> > > > memory reserves we should also give some (and arguably much more) memory
> > > > back. The reserves are there to expedite the exit.
> > > 
> > > Background reclaim does not occur on CONFIG_MMU=n kernels. But this patch
> > > also affects CONFIG_MMU=n kernels. If a process with two threads was
> > > OOM-killed and one thread consumed too much memory after it left exit_mm()
> > > before the other thread sets MMF_OOM_SKIP on their mm by returning from
> > > exit_aio() etc. in __mmput() from mmput() from exit_mm(), this patch
> > > introduces a new possibility to OOM livelock. I think it is wild to assume
> > > that "CONFIG_MMU=n kernels can OOM livelock even without this patch. Thus,
> > > let's apply this patch even though this patch might break the balance of
> > > OOM handling in CONFIG_MMU=n kernels."
> > 
> > As I've said if you have strong doubts about the patch I can drop it for
> > now. I do agree that nommu really matters here, though.
> 
> OK. Then, for now let's postpone only the oom_killer_disbale() to later
> rather than postpone the exit_oom_victim() to later.

that would require other changes (basically make oom_killer_disbale
independent on TIF_MEMDIE) which I think doesn't belong to this pile. So
I would rather sacrifice this patch instead and it will not be part of
the v2.

[...]
> > > > > I think that allocations from
> > > > > do_exit() are important for terminating cleanly (from the point of view of
> > > > > filesystem integrity and kernel object management) and such allocations
> > > > > should not be given up simply because ALLOC_NO_WATERMARKS allocations
> > > > > failed.
> > > > 
> > > > We are talking about a fatal condition when OOM killer forcefully kills
> > > > a task. Chances are that the userspace leaves so much state behind that
> > > > a manual cleanup would be necessary anyway. Depleting the memory
> > > > reserves is not nice but I really believe that this particular patch
> > > > doesn't make the situation really much worse than before.
> > > 
> > > I'm not talking about inconsistency in userspace programs. I'm talking
> > > about inconsistency of objects managed by kernel (e.g. failing to drop
> > > references) caused by allocation failures.
> > 
> > That would be a bug on its own, no?
> 
> Right, but memory allocations after exit_mm() from do_exit() (e.g.
> exit_task_work()) might assume (or depend on) the "too small to fail"
> memory-allocation rule where small GFP_FS allocations won't fail unless
> TIF_MEMDIE is set, but this patch can unexpectedly break that rule if
> they assume (or depend on) that rule.

Silent dependency on nofail semantic withtou GFP_NOFAIL is still a bug.
Full stop. I really fail to see why you are still arguing about that.

[...]
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>