On Mon, Sep 25, 2017 at 10:25:21PM +0200, Michal Hocko wrote: > On Mon 25-09-17 19:15:33, Roman Gushchin wrote: > [...] > > I'm not against this model, as I've said before. It feels logical, > > and will work fine in most cases. > > > > In this case we can drop any mount/boot options, because it preserves > > the existing behavior in the default configuration. A big advantage. > > I am not sure about this. We still need an opt-in, ragardless, because > selecting the largest process from the largest memcg != selecting the > largest task (just consider memcgs with many processes example). As I understand Johannes, he suggested to compare individual processes with group_oom mem cgroups. In other words, always select a killable entity with the biggest memory footprint. This is slightly different from my v8 approach, where I treat leaf memcgs as indivisible memory consumers independent on group_oom setting, so by default I'm selecting the biggest task in the biggest memcg. While the approach suggested by Johannes looks clear and reasonable, I'm slightly concerned about possible implementation issues, which I've described below: > > > The only thing, I'm slightly concerned, that due to the way how we calculate > > the memory footprint for tasks and memory cgroups, we will have a number > > of weird edge cases. For instance, when putting a single process into > > the group_oom memcg will alter the oom_score significantly and result > > in significantly different chances to be killed. An obvious example will > > be a task with oom_score_adj set to any non-extreme (other than 0 and -1000) > > value, but it can also happen in case of constrained alloc, for instance. > > I am not sure I understand. Are you talking about root memcg comparing > to other memcgs? Not only, but root memcg in this case will be another complication. We can also use the same trick for all memcg (define memcg oom_score as maximum oom_score of the belonging tasks), it will turn group_oom into pure container cleanup solution, without changing victim selection algorithm But, again, I'm not against approach suggested by Johannes. I think that overall it's the best possible semantics, if we're not taking some implementation details into account. -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html