Re: [patch -mm 1/2] oom: badness heuristic rewrite

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 2 Aug 2010 21:20:40 -0700 (PDT)
David Rientjes <rientjes@xxxxxxxxxx> wrote:

> On Tue, 3 Aug 2010, KAMEZAWA Hiroyuki wrote:
> 
> > > Yes, but this is what oom_score_adj is intended to do: an oom_score_adj of 
> > > 300 means task A should be penalized 30% of available memory.  A positive 
> > > oom_score_adj typically means "all other competing tasks should be allowed 
> > > 30% more memory, cumulatively, compared to this task."  Task A uses ~10% 
> > > of available memory and task B uses 50% of available memory.  That's a 40% 
> > > difference, which is greater than task A's penalization of 30%, so B is 
> > > killed.
> > >
> > 
> > This will confuse LXC(Linux Container) guys. oom_score is unusable anymore.
> > 
> 
> From Documentation/filesystems/proc.txt in 2.6.35:
> 
> 	3.2 /proc/<pid>/oom_score - Display current oom-killer score
> 	-------------------------------------------------------------
> 
> 	This file can be used to check the current score used by the 
> 	oom-killer is for any given <pid>. Use it together with 
> 	/proc/<pid>/oom_adj to tune which process should be killed in an 
> 	out-of-memory situation.
> 
> That is unchanged with the rewrite.  /proc/pid/oom_score still exports the 
> badness() score used by the oom killer to determine which task to kill: 
> the highest score will be killed amongst candidate tasks.  The fact that 
> the score can be influenced by cpuset, memcg, or mempolicy constraint is 
> irrelevant, we cannot assume anything about the badness() heuristic's 
> implementation from the score itself.
> 

In old behavior, oom_score order is synchronous both in the system and
container. High-score one will be killed.
IOW, oom_score have worked as oom_score.

But, after the patch,  the user (of LXC at el.) can't trust oom_score. 
Especially with memcg, it just shows a _broken_ value.

And user has to caluculate oom_score by himself as

real_oom_score = (oom_score - oom_score_adj) *
	system_memory/container_memory + oom_score_adj.

I'm wrong ? Anyway, I think you should take care of this issue.
Maybe this breaks google's oom-killer+cpuset system.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxxx  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>


[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]