On Fri, May 17, 2019 at 02:33:10PM +0200, Michal Hocko wrote: > On Thu 16-05-19 15:39:43, Johannes Weiner wrote: > > On Thu, May 16, 2019 at 08:10:42PM +0200, Michal Hocko wrote: > > > On Thu 16-05-19 13:56:55, Johannes Weiner wrote: > > > > On Wed, Feb 13, 2019 at 01:47:29PM +0100, Michal Hocko wrote: > [...] > > > > > FTR: As I've already said here [1] I can live with this change as long > > > > > as there is a larger consensus among cgroup v2 users. So let's give this > > > > > some more time before merging to see whether there is such a consensus. > > > > > > > > > > [1] http://lkml.kernel.org/r/20190201102515.GK11599@xxxxxxxxxxxxxx > > > > > > > > It's been three months without any objections. > > > > > > It's been three months without any _feedback_ from anybody. It might > > > very well be true that people just do not read these emails or do not > > > care one way or another. > > > > This is exactly the type of stuff that Mel was talking about at LSFMM > > not even two weeks ago. How one objection, however absurd, can cause > > "controversy" and block an effort to address a mistake we have made in > > the past that is now actively causing problems for real users. > > > > And now after stalling this fix for three months to wait for unlikely > > objections, you're moving the goal post. This is frustrating. > > I see your frustration but I find the above wording really unfair. Let me > remind you that this is a considerable user visible change in the > semantic and that always has to be evaluated carefuly. A change that would > clearly regress anybody who rely on the current semantic. This is not an > internal implementation detail kinda thing. > > I have suggested an option for the new behavior to be opt-in which > would be a regression safe option. You keep insisting that we absolutely > have to have hierarchical reporting by default for consistency reasons. > I do understand that argument but when I weigh consistency vs. potential > regression risk I rather go a conservative way. This is a traditional > way how we deal with semantic changes like this. There are always > exceptions possible and that is why I wanted to hear from other users of > cgroup v2, even from those who are not directly affected now. I have acknowledged this concern in previous discussions. But the rule is "don't break userspace", not "never change behavior". We do allow the latter when it's highly unlikely that anyone would mind and the new behavior is a much better default for current and future users. Let me try to make the case for exactly this: - Adoption data suggests that cgroup2 isn't really used yet. RHEL8 was just released with cgroup1 per default. Fedora is currently debating a switch. None of the other distros default to cgroup2. There is an article on the lwn frontpage *right now* about Docker planning on switching to cgroup2 in the near future. Kubernetes is on cgroup1. Android is on cgroup1. Shakeel agrees that Facebook is probably the only serious user of cgroup2 right now. The cloud and all mainstream container software is still on cgroup1. - Using this particular part of the interface is a fairly advanced step in the cgroup2 adoption process. We've been using cgroup2 for a while and we've only now started running into this memory.events problem as we're enhancing our monitoring and automation infrastructure. If we're the only serious deployment, and we just started noticing it, what's the chance of regressing someone else? - Violating expectations costs users time and money either way, but the status quo is much more costly: somebody who expects these events to be local could see events that did occur at an unexpectedly higher level of the tree. But somebody who expects these events to be hierarchical will miss real events entirely! Now, for an alarm and monitoring infrastructure, what is worse: to see occurring OOM kills reported at a tree level you don't expect? Or to *miss* occurring OOM kills that you're trying to look out for? Automatic remediation might not be as clear-cut, but for us, and I would assume many others, an unnecessary service restart or failover would have a shorter downtime than missing a restart after a kill. - The status quo is more likely to violate expectations, given how the cgroup2 interface as a whole is designed. We have seen this in practice: memory.current is hierarchical, memory.stat is hierarchical, memory.pressure is hierarchical - users expect memory.events to be hierarchical. This misunderstanding has already cost us time and money. Chances are, even if there were other users of memory.events, that they're using the interface incorrectly and haven't noticed yet, rather than relying on the inconsistency. It's not a hypothetical, we have seen this with our fleet customers. So combining what we know about 1. the current adoption rate 2. likely user expectations 3. the failure mode of missing pressure and OOM kill signals means that going with the conservative option and not fixing this inconsistency puts pretty much all users that will ever use this interface at the risk of pain, outages and wasted engineering hours. Making the fix available but opt-in has the same result for everybody that isn't following this thread/patch along. All that to protect an unlikely existing cgroup2 user from something they are even less likely have noticed, let alone rely on. This sounds like a terrible trade-off to me. I don't think it's a close call in this case. I understand that we have to think harder and be more careful with changes like this. The bar *should* be high. But in this case, there doesn't seem to be a real risk of regression for anybody, while the risk of the status quo causing problems is high and happening. These circumstances should be part of the decision process.