On Tue, 31 Oct 2023 at 03:57, Paul E. McKenney <paulmck@xxxxxxxxxx> wrote: > > Would it help if we make rcu_stall_chain_notifier_register() print a > suitably obnoxious message saying that future RCU CPU stall warnings > might be unreliable? It's not the future stall warnings I worry about. It's literally things like somebody thinking they are being clever, registering a rcu stall notifier that prints out extra information in order to be helpful, and in the process takes a spinlock or something without thinking about it. And that spinlock might be the *reason* for the RCU stall in the first place. So now the RCU stall code prints out NOTHING AT ALL, because now the stall notifier itself has deadlocked. This is *exactly* what has happened before with these kinds of "helpful" exception case notifiers. Because they never trigger in normal loads, they get basically zero testing - and then when bad things happen, it turns out that the "helpful" debug code actually just makes things worse. Or, if they get testing, they get tested in artificial bad cases (eg "let's just write a busy loop that hangs for 30 seconds to trigger a RCU stall"), which doesn't show any of the issues, because they aren't real bugs with real existing deadlocks. See what I'm saying? Having notifiers for "sh*t happened" is fundmanetally questionable to begin with, because they get no testing. And then calling said notifiers *before* you even have the core printout for "Look, things are going down hill quickly", now you've turned a bad situation even worse. I really think that we should *never* have any kind of notifiers for kernel bugs. They cause problems. The *one* exception is an actual honest-to-goodness kernel debugger, and then it should literally *only* be the debugger that can register a notifier, so that you are *never* in the situation that a kernel without a debugger will just hang because of some bogus debug notifier. Linus