On Fri 13-07-18 14:59:59, David Rientjes wrote: > On Tue, 5 Jun 2018, Michal Hocko wrote: > > > 1) comparision root with tail memcgs during the OOM killer is not fair > > because we are comparing tasks with memcgs. > > > > This is true, but I do not think this matters much for workloads which > > are going to use the feature. Why? Because the main consumers of the new > > feature seem to be containers which really need some fairness when > > comparing _workloads_ rather than processes. Those are unlikely to > > contain any significant memory consumers in the root memcg. That would > > be mostly common infrastructure. > > > > There are users (us) who want to use the feature and not all processes are > attached to leaf mem cgroups. The functionality can be provided in a > generally useful way that doesn't require any specific hierarchy, and I > implemented this in my patch series at > https://marc.info/?l=linux-mm&m=152175563004458&w=2. That proposal to fix > *all* of my concerns with the cgroup-aware oom killer as it sits in -mm, > in it's entirety, only extends it so it is generally useful and does not > remove any functionality. I'm not sure why we are discussing ways of > moving forward when that patchset has been waiting for review for almost > four months and, to date, I haven't seen an objection to. Well, I didn't really get to your patches yet. The last time I've checked I had some pretty serious concerns about the consistency of your proposal. Those might have been fixed in the lastest version of your patchset I haven't seen. But I still strongly suspect that you are largerly underestimating the complexity of more generic oom policies which you are heading to. Considering user API failures from the past (oom_*adj fiasco for example) suggests that we should start with smaller steps and only provide a clear and simple API. oom_group is such a simple and semantically consistent thing which is the reason I am OK with it much more than your "we can be more generic" approach. I simply do not trust we can agree on sane and consistent api in a reasonable time. And it is quite mind boggling that a simpler approach has been basically blocked for months because there are some concerns for workloads which are not really asking for the feature. Sure your usecase might need to handle root memcg differently. That is a fair point but that shouldn't really block containers users who can use the proposed solution without any further changes. If we ever decide to handle root memcg differently we are free to do so because the oom selection policy is not carved in stone by any api. [...] -- Michal Hocko SUSE Labs