On Thu 11-10-12 10:57:39, Michal Hocko wrote: > oom_badness takes totalpages argument which says how many pages are > available and it uses it as a base for the score calculation. The value > is calculated by mem_cgroup_get_limit which considers both limit and > total_swap_pages (resp. memsw portion of it). > > This is usually correct but since fe35004f (mm: avoid swapping out > with swappiness==0) we do not swap when swappiness is 0 which means > that we cannot really use up all the totalpages pages. This in turn > confuses oom score calculation if the memcg limit is much smaller than > the available swap because the used memory (capped by the limit) is > negligible comparing to totalpages so the resulting score is too small > if adj!=0 (typically task with CAP_SYS_ADMIN or non zero oom_score_adj). > A wrong process might be selected as result. > > The same issue exists for the global oom killer as well but it is not > that problematic as the amount of the RAM is usually much bigger than > the swap space. > > The problem can be worked around by checking mem_cgroup_swappiness==0 > and not considering swap at all in such a case. > > Signed-off-by: Michal Hocko <mhocko@xxxxxxx> > Acked-by: David Rientjes <rientjes@xxxxxxxxxx> > Cc: stable [3.5+] I have just realized that fe35004f (introduced in 3.5-rc1) has been backported to 3.2 and 3.4 stable kernels so this should be [3.2+] > --- > mm/memcontrol.c | 21 +++++++++++++++------ > 1 file changed, 15 insertions(+), 6 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 7acf43b..93a7e36 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -1452,17 +1452,26 @@ static int mem_cgroup_count_children(struct mem_cgroup *memcg) > static u64 mem_cgroup_get_limit(struct mem_cgroup *memcg) > { > u64 limit; > - u64 memsw; > > limit = res_counter_read_u64(&memcg->res, RES_LIMIT); > - limit += total_swap_pages << PAGE_SHIFT; > > - memsw = res_counter_read_u64(&memcg->memsw, RES_LIMIT); > /* > - * If memsw is finite and limits the amount of swap space available > - * to this memcg, return that limit. > + * Do not consider swap space if we cannot swap due to swappiness > */ > - return min(limit, memsw); > + if (mem_cgroup_swappiness(memcg)) { > + u64 memsw; > + > + limit += total_swap_pages << PAGE_SHIFT; > + memsw = res_counter_read_u64(&memcg->memsw, RES_LIMIT); > + > + /* > + * If memsw is finite and limits the amount of swap space > + * available to this memcg, return that limit. > + */ > + limit = min(limit, memsw); > + } > + > + return limit; > } > > void mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask, > -- > 1.7.10.4 > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@xxxxxxxxx. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a> -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>