Re: [patch] memcg: add oom killer delay

David Rientjes <rientjes@xxxxxxxxxx> · Thu, 3 Mar 2011 12:11:27 -0800 (PST)

On Wed, 23 Feb 2011, David Rientjes wrote:

> On Wed, 23 Feb 2011, Andrew Morton wrote:
> 
> > Your patch still stinks!
> > 
> > If userspace can't handle a disabled oom-killer then userspace
> > shouldn't have disabled the oom-killer.
> > 
> 
> I agree, but userspace may not always be perfect especially on large 
> scale; we, in kernel land, can easily choose to ignore that but it's only 
> a problem because we're providing an interface where the memcg will 
> livelock without userspace intervention.  The global oom killer doesn't 
> have this problem and for years it has even radically panicked the machine 
> instead of livelocking EVEN THOUGH other threads, those that are 
> OOM_DISABLE, may be getting work done.
> 
> This is a memcg-specific issue because memory.oom_control has opened the 
> possibility up to livelock that userspace may have no way of correcting on 
> its own especially when it may be oom itself.  The natural conclusion is 
> that you should never set memory.oom_control unless you can guarantee a 
> perfect userspace implementation that will never be unresponsive.  At our 
> scale, we can't make that guarantee so memory.oom_control is not helpful 
> at all.
> 
> If that's the case, then what else do we have at our disposal other than 
> memory.oom_delay_millisecs that allows us to increase a hard limit or kill 
> a job of lower priority other than setting memory thresholds and hoping 
> userspace will schedule and respond before the memcg is completely oom?
> 
> > How do we fix this properly?
> > 
> > A little birdie tells me that the offending userspace oom handler is
> > running in a separate memcg and is not itself running out of memory. 
> 
> It depends on how you configure your memory controllers, but even if it is 
> running in a separate memcg how can you make the conclusion it isn't oom 
> in parallel?
> 
> > The problem is that the userspace oom handler is also taking peeks into
> > processes which are in the stressed memcg and is getting stuck on
> > mmap_sem in the procfs reads.  Correct?
> > 
> 
> That's outside the scope of this feature and is a separate discussion; 
> this patch specifically addresses an issue where a userspace job scheduler 
> wants to take action when a memcg is oom before deferring to the kernel 
> and happens to become unresponsive for whatever reason.
> 
> > It seems to me that such a userspace oom handler is correctly designed,
> > and that we should be looking into the reasons why it is unreliable,
> > and fixing them.  Please tell us about this?
> > 
> 
> The problem isn't specific to any one cause or implementation, we know 
> that userspace programs have bugs, they can stall forever in D-state, they 
> can be oom themselves, they get stuck waiting on a lock, etc etc.
> 

Was there a response to this, or can this patch be merged?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxxx  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>