Re: oom killer rewrite

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 26 May 2010, KAMEZAWA Hiroyuki wrote:

> > It's not necessarily the memory quantity that is interesting in this case 
> > (or proportion of available memory), it's how the badness() score is 
> > altered relative to other eligible tasks that end up changing the oom kill 
> > priority list.  If we were to implement a tunable that only took a memory 
> > quantity, it would require specific knowledge of the system's capacity to 
> > make any sense compared to other tasks.  An oom_score_adj of 125MB means 
> > vastly different things on a 4GB system compared to 64GB system and admins 
> > do not want to update their script anytime they add (or hotadd) memory or 
> > run on a variety of systems that don't have the same capacities. 
> 
> IMHO, importance of application is consistent under all hosts in the system.
> (the system here means a system maintained by a team of admins to do a service.)
> 

And it's consistent whether it's unbound to any cgroup, it's attached to a 
cpuset, a memcg, or has a mempolicy restriction.  Its importance in those 
"virtualzied environments" is the same when shared with other tasks, 
that's why proportions work well: I can decide that my application should 
be targeted first by the oom killer when it is using more than 25% of 
system memory, for example.  When I attach that task to a cpuset, the 
priority is the same, I've just adjusted the amount of available memory 
out from under a set of tasks.  The point is that the script or admin that 
sets oom_score_adj need not know what resources the application has, but 
rather its memory expectations relative to other applications that share 
the same resources.

> It's not be influenced by the amount of memory, other applications, etc..
> If influenced, it's a chaos for admins.
> It seems that's fundamental difference in ideas among you and me.
> 

Other than polarizing tasks with oom_score_adj of -1000 or +1000, you must 
consider the relative importance of an application to other applications, 
that's the point of adjusting badness scoring: so we can influence which 
task is killed by the kernel instead of simply the task that is most 
memory hogging.  When an application is using more memory than desired (or 
expected), the cost of business would mandate that the task is killed and 
that threshold is definable in clear units from userspace via 
oom_score_adj and not oom_adj.

> > That's the same if you were to implement a memory quantity instead of a 
> > proportion for oom_score_adj and depends on how you want to protect or 
> > prefer that application.  For a 3G application on a 4G machine, an 
> > oom_score_adj of 250 is legitimate if you want to ensure it never uses 
> > more than 3G and is always killed first when it does.  For the 8G machine, 
> > you can't make the same killing choice if another instance of the same 
> > application is using 5G instead of 3G.  See the difference?  In that case, 
> > it may not be the correct choice for oom kill and we should kill something 
> > else: the 5G memory leaker.  That requires userspace intervention to 
> > identify, but unless we mandate the expected memory use is spelled out for 
> > every single application (which we can't), there's no way to use a fixed 
> > memory quantity to determine relative priority.
> > 
> 
> I just don't believe relative priority ;)

If application A typically consumes 6G on an 8G system and that's an 
important task, it's possible to protect it from being oom killed as the 
result of the memory usage of the remaining applications B and C with 
oom_score_adj; oom_adj does not have a clear interface for being able to 
define when to kill A when it uses more than 7G but not to kill it when it 
uses 6G.  That's the motivation behind developing a relative priority like 
oom_score_adj.

> That's why I wrote don't take my words serious. 
> I wonder if people wants precise control of oom_score_adj, they should
> use memcg and put apps into containers. In that case, static priority
> and will be useful.
> 

Indeed, using a hierarchy of memcgs is another way to do the same thing 
but seems like overkill if I'm running only a webserver and a few 
monitoring applications.  oom_score_adj also applies to cpusets and 
mempolicies.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxxx  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>


[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]