On Thu 24-01-19 13:23:28, Johannes Weiner wrote: > On Thu, Jan 24, 2019 at 06:01:17PM +0100, Michal Hocko wrote: > > On Thu 24-01-19 11:00:10, Johannes Weiner wrote: > > [...] > > > We cannot fully eliminate a risk for regression, but it strikes me as > > > highly unlikely, given the extremely young age of cgroup2-based system > > > management and surrounding tooling. > > > > I am not really sure what you consider young but this interface is 4.0+ > > IIRC and the cgroup v2 is considered stable since 4.5 unless I > > missrememeber and that is not a short time period in my book. > > If you read my sentence again, I'm not talking about the kernel but > the surrounding infrastructure that consumes this data. The risk is > not dependent on the age of the interface age, but on its adoption. You really have to assume the user visible interface is consumed shortly after it is exposed/considered stable in this case as cgroups v2 was explicitly called unstable for a considerable period of time. This is a general policy regarding user APIs in the kernel. I can see arguments a next release after introduction or in similar cases but this is 3 years ago. We already have distribution kernels based on 4.12 kernel and it is old comparing to 5.0. > > Changing interfaces now represents a non-trivial risk and so far I > > haven't heard any actual usecase where the current semantic is > > actually wrong. Inconsistency on its own is not a sufficient > > justification IMO. > > It can be seen either way, and in isolation it wouldn't be wrong to > count events on the local level. But we made that decision for the > entire interface, and this file is the odd one out now. From that > comprehensive perspective, yes, the behavior is wrong. I do see your point about consistency. But it is also important to consider the usability of this interface. As already mentioned, catching an oom event at a level where the oom doesn't happen and having hard time to identify that place without races is a not a straightforward API to use. So it might be really the case that the api is actually usable for its purpose. > It really > confuses people who are trying to use it, because they *do* expect it > to behave recursively. Then we should improve the documentation. But seriously these are no strong reasons to change a long term semantic people might rely on. > I'm really having a hard time believing there are existing cgroup2 > users with specific expectations for the non-recursive behavior... I can certainly imagine monitoring tools to hook at levels where limits are set and report events as they happen. It would be more than confusing to receive events for reclaim/ooms that hasn't happened at that level just because a delegated memcg down the hierarchy has decided to set a more restrictive limits. Really this is a very unexpected behavior change for anybody using that interface right now on anything but leaf memcgs. -- Michal Hocko SUSE Labs