Re: [merged] mm-memcg-handle-non-error-oom-situations-more-gracefully.patch removed from -mm tree

Johannes Weiner <hannes@xxxxxxxxxxx> · Sat, 30 Nov 2013 10:55:42 -0500

On Sat, Nov 30, 2013 at 02:32:43AM -0800, David Rientjes wrote:
> On Fri, 29 Nov 2013, Johannes Weiner wrote:
> 
> > > You said you have informed stable to not merge these patches until further 
> > > notice, I'd suggest simply avoid ever merging the whole series into a 
> > > stable kernel since the problem isn't serious enough.  Marking changes 
> > > that do "goto nomem" seem fine to mark for stable, though.
> > 
> > These are followup fixes for the series that is upstream but didn't go
> > to stable.  I truly have no idea what you are talking about.
> > 
> 
> I'm responding to your comments[*] that indicate you were going to 
> eventually be sending it to stable.
> 
> > > On the scale that we run memcg, we would see it daily in automated testing 
> > > primarily because we panic the machine for memcg oom conditions where 
> > > there are no killable processes.  It would typically manifest by two 
> > > processes that are allocating memory in a memcg; one is oom killed, is 
> > > allowed to allocate, handles its SIGKILL, exits and frees its memory and 
> > > the second process which is oom disabled races with the uncharge and is 
> > > oom disabled so the machine panics.
> > 
> > So why don't you implement proper synchronization instead of putting
> > these random checks all over the map to make the race window just
> > small enough to not matter most of the time?
> > 
> 
> The oom killer can be quite expensive, so we have found that is 
> advantageous after doing all that work that the memcg is still oom for 
> the charge order before needlessly killing a process.  I am not suggesting 
> that we add synchronization to the uncharge path for such a race, but 
> merely a simple check as illustrated as due diligence.  I think a simple 
> conditional in the oom killer to avoid needlessly killing a user job is 
> beneficial and avoids questions from customers who have a kernel log 
> showing an oom kill occurring in a memcg that is not oom.  We could even 
> do the check in oom_kill_process() after dump_header() if you want to 
> reduce any chance of that to avoid getting bug reports about such cases.

I asked about quantified data of this last-minute check, you replied
with a race condition between an OOM kill victim and a subsequent OOM
kill invocation.

> > If you are really bothered by this race, then please have OOM kill
> > invocations wait for any outstanding TIF_MEMDIE tasks in the same
> > context.
> > 
> 
> The oom killer requires a tasklist scan, or an iteration over the set of 
> processes attached to the memcg for the memcg case, to find a victim.  It 
> already defers if it finds eligible threads with TIF_MEMDIE set.

And now you say that this race does not really exist and repeat the
same ramblings about last-minute checks to avoid unnecessary kills
again.  And again without any supporting data that I already asked
for.

The more I talk to you, the less sense this all makes.  Why do you
insist we merge this patch when you have apparently no idea why and
how it works, and can't demonstrate that it works in the first place?

I only followed you around in circles because I'm afraid that my
shutting up would be interpreted as agreement again and Andrew would
merge this anyway.  But this is unsustainable, the burden of proof
should be on you, not me.  I'm going to stop replying until you
provide the information I asked for.
--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html