Re: [PATCH 4/4] mm, oom: Fix unnecessary killing of additional processes.

Michal Hocko <mhocko@xxxxxxxxxx> · Fri, 7 Sep 2018 13:10:38 +0200

On Fri 07-09-18 06:13:13, Tetsuo Handa wrote:
> On 2018/09/06 23:16, Michal Hocko wrote:
> > On Thu 06-09-18 23:06:40, Tetsuo Handa wrote:
> >> On 2018/09/06 22:56, Michal Hocko wrote:
> >>> On Thu 06-09-18 22:40:24, Tetsuo Handa wrote:
> >>>> On 2018/09/06 21:05, Michal Hocko wrote:
> >>>>>> If you are too busy, please show "the point of no-blocking" using source code
> >>>>>> instead. If such "the point of no-blocking" really exists, it can be executed
> >>>>>> by allocating threads.
> >>>>>
> >>>>> I would have to study this much deeper but I _suspect_ that we are not
> >>>>> taking any blocking locks right after we return from unmap_vmas. In
> >>>>> other words the place we used to have synchronization with the
> >>>>> oom_reaper in the past.
> >>>>
> >>>> See commit 97b1255cb27c551d ("mm,oom_reaper: check for MMF_OOM_SKIP before
> >>>> complaining"). Since this dependency is inode-based (i.e. irrelevant with
> >>>> OOM victims), waiting for this lock can livelock.
> >>>>
> >>>> So, where is safe "the point of no-blocking" ?
> >>>
> >>> Ohh, right unlink_file_vma and its i_mmap_rwsem lock. As I've said I
> >>> have to think about that some more. Maybe we can split those into two parts.
> >>>
> >>
> >> Meanwhile, I'd really like to use timeout based back off. Like I wrote at
> >> http://lkml.kernel.org/r/201809060703.w8673Kbs076435@xxxxxxxxxxxxxxxxxxx ,
> >> we need to wait for some period after all.
> >>
> >> We can replace timeout based back off after we got safe "the point of no-blocking" .
> > 
> > Why don't you invest your time in the long term solution rather than
> > playing with something that doesn't solve anything just papers over the
> > issue?
> > 
> 
> I am not a MM people. I am a secure programmer from security subsystem.

OK, so let me be completely honest with you. You have pretty strong
statements about the MM code while you are not considering yourself an
MM person. You are suggesting hacks which do not solve real underlying
problems and I will keep shooting those down.

> You are almost always introducing bugs (like you call dragons) rather
> than starting from safe changes. The OOM killer _is_ always racy. Even
> your what you think the long term solution _shall be_ racy. 

The reason why this area is so easy to to get wrong is basically a lack
of comprehensible design.  We have historical hacks here and there. I
really do not want to follow that direction and as long as my word has
some weigh (which is not my decision of course) I will keep fighting
for simplifications and an overall design refinements. If we are to add
heuristic they should be backed by well understood workloads we do care
about.

You might have your toy workload that hits different corner cases and
testing those is fine. But I absolutely disagree to base any non trivial
changes for that kind of workload unless they are a general improvement.

If you disagree then we have to agree to disagree and it doesn't make
much sense to continue in discussion.

> I can't waste my time in what you think the long term solution. Please
> don't refuse/ignore my (or David's) patches without your counter
> patches.

If you do not care about long term sanity of the code and if you do not
care about a larger picture then I am not interested in any patches from
you. MM code is far from trivial and no playground. This attitude of
yours is just dangerous.
-- 
Michal Hocko
SUSE Labs