On Thu, 19 Oct 2017, Johannes Weiner wrote: > David would have really liked for this patchset to include knobs to > influence how the algorithm picks cgroup victims. The rest of us > agreed that this is beyond the scope of these patches, that the > patches don't need it to be useful, and that there is nothing > preventing anyone from adding configurability later on. David > subsequently nacked the series as he considers it incomplete. Neither > Michal nor I see technical merit in David's nack. > The nack is for three reasons: (1) unfair comparison of root mem cgroup usage to bias against that mem cgroup from oom kill in system oom conditions, (2) the ability of users to completely evade the oom killer by attaching all processes to child cgroups either purposefully or unpurposefully, and (3) the inability of userspace to effectively control oom victim selection. For (1), the difference in v12 is adding the rss of all processes attached to the root mem cgroup as the evaluation. This is not the same criteria that child cgroups are evaluated on, and they are compared using that bias. It is very trivial to provide a fair comparison as was suggested in v11. For (2), users who do for i in $(cat cgroup.procs); do mkdir $i; echo $i > $i/cgroup.procs; done can completely evade the oom killer and this may be part of their standard operating procedure for restricting resources with other cgroups other than the mem cgroup. Again, it's an unfair comparison to all other cgroups on the system. For (3), users need the ability to protect important cgroups, such as protecting a cgroup that is limited to 50% of system memory. They need the ability to kill processes from other cgroups to protect these important processes. This is nothing new: the oom killer has always provided the ability to bias against important processes. There was follow-up email on all of these points where very trivial changes were suggested to address all three of these issues, and which Roman has implemented in one form or another in previous iterations with the bonus that no accounting to the root mem cgroup needs to be done. > Michal acked the implementation, but on the condition that the new > behavior be opt-in, to not surprise existing users. I *think* we agree > that respecting the cgroup topography during global OOM is what we > should have been doing when cgroups were initially introduced; where > we disagree is that I think users shouldn't have to opt in to > improvements. We have done much more invasive changes to the victim > selection without actual regressions in the past. Further, this change > only applies to mounts of the new cgroup2. Tejun also wasn't convinced > of the risk for regression, and too would prefer cgroup-awareness to > be the default in cgroup2. I would ask for patch 5/6 to be dropped. > I agree with Michal that the new victim selection should be opt-in with CGRP_GROUP_OOM. -- To unsubscribe from this list: send the line "unsubscribe cgroups" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html