On Fri, Apr 15, 2022 at 4:33 PM Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote: > > On Fri, 15 Apr 2022 14:11:32 -0600 Yu Zhao <yuzhao@xxxxxxxxxx> wrote: > > > > > > > I grabbed > > > https://kojipkgs.fedoraproject.org//packages/kernel/5.18.0/0.rc2.23.fc37/src/kernel-5.18.0-0.rc2.23.fc37.src.rpm > > > and > > > > Yes, Fedora/RHEL is one concrete example of the model I mentioned > > above (experimental/stable). I added Justin, the Fedora kernel > > maintainer, and he can further clarify. We almost split into 3 scenarios. In rawhide we run a standard Fedora config for rcX releases and .0, but git snapshots are built with debug configs only. The trade off is that we can't turn on certain options which kill performance, but we do get more users running these kernels which expose real bugs. The rawhide kernel follows Linus' tree and is rebuilt most weekdays. Stable Fedora is not a full debug config, but in cases where we can keep a debug feature on without it much getting in the way of performance, as is the case with CONFIG_DEBUG_VM, I think there is value in keeping those on, until there is not. And of course RHEL is a much more conservative config, and a much more conservative rebase/backport codebase. > > If we don't want more VM_BUG_ONs, I'll remove them. But (let me > > reiterate) it seems to me that just defeats the purpose of having > > CONFIG_DEBUG_VM. > > > > Well, I feel your pain. It was never expected that VM_BUG_ON() would > get subverted in this fashion. Fedora is not trying to subvert anything. If keeping the option on becomes problematic, we can simply turn it off. Fedora certainly has a more diverse installed base than typical enterprise distributions, and much more diverse than most QA pools. Both in the array of hardware, and in the use patterns, so things do get uncovered that would not be seen otherwise. > We could create a new MM-developer-only assertion. Might even call it > MM_BUG_ON(). With compile-time enablement but perhaps not a runtime > switch. > > With nice simple semantics, please. Like "it returns void" and "if you > pass an expression with side-effects then you lose". And "if you send > a patch which produces warnings when CONFIG_MM_BUG_ON=n then you get to > switch to windows95 for a month". > > Let's leave the mglru assertions in place for now and let's think about > creating something more suitable, with a view to switching mglru over > to that at a later time. > > > > But really, none of this addresses the core problem: *_BUG_ON() often > kills the kernel. So guess what we just did? We killed the user's > kernel at the exact time when we least wished to do so: when they have > a bug to report to us. So the thing is self-defeating. > > It's much much better to WARN and to attempt to continue. This makes > it much more likely that we'll get to hear about the kernel flaw. I agree very much with this. We hear about warnings from users, they don't go unnoticed, and several of these users are willing to spend time to help get to the bottom of an issue. They may not know the code, but plenty are willing to test various patches or scenarios. Justin