Re: [PATCH 1/5] mm: Introduce OOM kill timeout.

David Rientjes <rientjes@xxxxxxxxxx> · Mon, 24 Nov 2014 14:29:00 -0800 (PST)

On Mon, 24 Nov 2014, Michal Hocko wrote:

> > The problem described above is one of phenomena which is triggered by
> > a vulnerability which exists since (if I didn't miss something)
> > Linux 2.0 (18 years ago). However, it is too difficult to backport
> > patches which fix the vulnerability.
> 
> What is the vulnerability?
> 

There have historically been issues when oom killed processes fail to 
exit, so this is probably trying to address one of those issues.

The most notable example is when an oom killed process is waiting on a 
lock that is held by another thread that is trying to allocate memory and 
looping indefinitely since reclaim fails and the oom killer keeps finding 
the oom killed process waiting to exit.  This is a consequence of the page 
allocator looping forever for small order allocations.  Memcg oom kills 
typically see this much more often when you do complete kmem accounting: 
any combination of mutex + kmalloc(GFP_KERNEL) becomes a potential 
livelock.  For the system oom killer, I would imagine this would be 
difficult to trigger since it would require a process holding the mutex to 
never be able to allocate memory.

The oom killer timeout is always an attractive remedy to this situation 
and gets proposed quite often.  Several problems: (1) you can needlessly 
panic the machine because no other processes are eligible for oom kill 
after declaring that the first oom kill victim cannot make progress, (2) 
it can lead to unnecessary oom killing if the oom kill victim can exit but 
hasn't be scheduled or is in the process of exiting, (3) you can easily 
turn the oom killer into a serial oom killer since there's no guarantee 
the next process that is chosen won't be affected by the same problem, and 
(4) this doesn't fix the problem if an oom disabled process is wedged 
trying to allocate memory while holding a mutex that others are waiting 
on.

The general approach has always been to fix the actual issue in whatever 
code is causing the wedge.  We lack specific examples in this changelog 
and I agree that it seems to be papering over issues that could otherwise 
be fixed, so I agree with your NACK.

> We had a kind of similar problem in Memory cgroup controller because the
> OOM was handled in the allocation path which might sit on many locks and
> had to wait for the victim . So waiting for OOM victim to finish would
> simply deadlock if the killed task was stuck on any of the locks held by
> memcg OOM killer. But this is not the case anymore (we are processing
> memcg OOM from the fault path).
> 

I'm painfully aware of it happening with complete kmem accounting, however 
:)  I'm sure you can imagine the scenario that is causes and unfortunately 
our complete support isn't upstream so there's no code that I can point 
to.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>