Re: [PATCH v2] mm,oom: Re-enable OOM killer using timeout.

Michal Hocko <mhocko@xxxxxxxxxx> · Tue, 26 Apr 2016 16:31:29 +0200

On Tue 26-04-16 23:00:15, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > Hmm, I guess we have already discussed that in the past but I might
> > misremember. The above relies on oom killer to be triggered after the
> > previous victim was selected. There is no guarantee this will happen.
> 
> Why there is no guarantee this will happen?

What happens if you even do not hit the out_of_memory path? E.g
GFP_FS allocation being stuck somewhere in shrinkers waiting for
somebody to make a forward progress which never happens. Because this is
essentially what would block the mmap_sem write holder as well and what
you are trying to workaround by the timeout based approach.

> This OOM livelock is caused by waiting for TIF_MEMDIE threads forever
> unconditionally. If oom_unkillable_task() is not called, it is not
> the OOM killer's problem.

It really doesn't matter whose problem is that because whoever it is
doesn't have a full picture to draw any conclusions.

[...]

> These OOM livelocks are caused by lack of mechanism for hearing administrator's
> policy. We are missing rescue mechanisms which are needed for recovering from
> situations your model did not expect.

I am not opposed against a rescue policy defined by the admin. All I
am saying is that the only save and reasonably maintainable one with
_predictable_ behavior I can see is to reboot/panic/killall-tasks after
a certain timeout. You consider this to be too harsh but do you at
least agree that the semantic of this is clear and an admin knows what
the behavior would be? As we are not able to find a consensus on
go-to-other-victim approach can we at least agree on the absolute last
resort first?

We will surely hear complains if this is too coarse and users really
need something more fine grained.

> I'm talking about corner cases where your deterministic approach fail. What we
> need is "stop waiting for something forever unconditionally" and "hear what the
> administrator wants to do". You can deprecate and then remove sysctl knobs for
> hearing what the administrator wants to do when you developed perfect model and
> mechanism.
> 
> > Why cannot we get back to the timer based solution at least for the
> > panic timeout?
> 
> Use of global timer can cause false positive panic() calls.

Race that would take in orders of tens of seconds which would be the
most probable chosen value doesn't matter that much IMHO.

> Timeout should be calculated for per task_struct or signal_struct basis.
> 
> Also, although a different problem, global timer based solution does not
> work for OOM livelock without any TIF_MEMDIE thread case (an example
> shown above).

which is a technical detail which can be solved.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>