Re: [GIT PULL] RCU changes for v6.7

"Paul E. McKenney" <paulmck@xxxxxxxxxx> · Wed, 1 Nov 2023 10:13:08 -0700

On Tue, Oct 31, 2023 at 06:07:57PM -0700, Paul E. McKenney wrote:
> On Tue, Oct 31, 2023 at 01:06:44PM -1000, Linus Torvalds wrote:

[ . . . ]

> > I really think that we should *never* have any kind of notifiers for
> > kernel bugs. They cause problems. The *one* exception is an actual
> > honest-to-goodness kernel debugger, and then it should literally
> > *only* be the debugger that can register a notifier, so that you are
> > *never* in the situation that a kernel without a debugger will just
> > hang because of some bogus debug notifier.

Here you might have been suggesting that I use gdb and just set a
breakpoint in check_cpu_stall(), and then use gdb commands to read out
the state.  And yes, this work well in some situations.  In fact, there
is a --gdb parameter to the rcutorture scripting for just this purpose.

Except that I normally run a few hundred rcutorture guest OSes spread
across 20 systems, and sometimes more than a thousand guest OSes across
50 systems for hard-to-reproduce bugs.  In my experience, managing that
many remote gdb sessions is cranky and unreliable, which is not helpful
when debugging.  Writing a few tens of lines of C code in the kernel is
much simpler and more reliable.

Assuming of course that I avoid the traps you point out.  Which I have
done thus far.  (Famous last words...)

							Thanx, Paul