On Tue, 3 Aug 2010, KAMEZAWA Hiroyuki wrote: > > Sure, a task could be killed with a very low /proc/pid/oom_score, but only > > if its cpuset is oom, for example, and it has the highest score of all > > tasks attached to that oom_score. So /proc/pid/oom_score needs to be > > considered in the context in which the oom occurs: system-wide, cpuset, > > mempolicy, or memcg. That's unchanged from the old oom killer. > > > > unchanged ? > Oh, I meant the fact that a task with a low oom_score compared to other system tasks may be killed because a cpuset is oom, for instance, is unchanged because we only kill tasks that are constrained to that cpuset. > Assume 2 proceses A, B which has oom_score_adj of 300 and 0 > And A uses 200M, B uses 1G of memory under 4G system > > Under the system. > A's socre = (200M *1000)/4G + 300 = 350 > B's score = (1G * 1000)/4G = 250. Right, A is penalized 30% of system memory and its use is ~5%, resulting in a score of 350, or 35%. B's use is 25%. > In the cpuset, it has 2G of memory. > A's score = (200M * 1000)/2G + 300 = 400 > B's socre = (1G * 1000)/2G = 500 > > This priority-inversion don't happen in current system. > Yes, but this is what oom_score_adj is intended to do: an oom_score_adj of 300 means task A should be penalized 30% of available memory. A positive oom_score_adj typically means "all other competing tasks should be allowed 30% more memory, cumulatively, compared to this task." Task A uses ~10% of available memory and task B uses 50% of available memory. That's a 40% difference, which is greater than task A's penalization of 30%, so B is killed. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxxx For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>