On Wed 07-11-12 14:53:40, Andrew Morton wrote: > On Wed, 7 Nov 2012 23:46:40 +0100 > Michal Hocko <mhocko@xxxxxxx> wrote: > > > > Realistically, is anyone likely to hurt from this? > > > > The primary motivation for the fix was a real report by a customer. > > Describe it please and I'll copy it to the changelog. The original issue (a wrong tasks get killed in a small group and memcg swappiness=0) has been reported on top of our 3.0 based kernel (with fe35004f backported). I have tried to replicate it by the test case mentioned https://lkml.org/lkml/2012/10/10/223. As David correctly pointed out (https://lkml.org/lkml/2012/10/10/418) the significant role played the fact that all the processes in the group have CAP_SYS_ADMIN but oom_score_adj has the similar effect. Say there is 2G of swap space which is 524288 pages. If you add CAP_SYS_ADMIN bonus then you have -15728 score for the bias. This means that all tasks with less than 60M get the minimum score and it is tasks ordering which determines who gets killed as a result. To summarize it. Users of small groups (relatively to the swap size) with CAP_SYS_ADMIN tasks resp. oom_score_adj are affected the most others might see an unexpected oom_badness calculation. Whether this is a workload which is representative, I don't know but I think that it is worth fixing and pushing to stable as well. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>