Re: [GIT PULL] RCU changes for v6.7

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Wed, 1 Nov 2023 07:11:54 -1000

On Tue, 31 Oct 2023 at 15:08, Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:
>
> Here are the ways forward I can see:
>
> 1.      Status quo.  This has all the issues that you call out.
>         People will hurt themselves with it and consume time and effort.
>         So let's not do this.

Well, at a *minimum*, I really want that notifier chain call to be
done *after* the core printk's.

That way, if it deadlocks or does something else stupid, at least the
core printouts make it out.

IOW, I think the notifier should be done perhaps just before the
"panic_on_rcu_stall()" call, not at the top before you've even
reported any stall conditions at all.

And yes, I think the trace_rcu_stall_warning() might be better off
later too, but at least trace events are things that get regular
testing in nasty conditions (including NMI etc), so I'm *much* less
worried about those than about "random developers who think they know
what they do and add a notifier".

And yes, I do think the notifier should be narrowed down a lot, if you
actually want to keep it.

I did not actually hear you say that there is a good use-case for it.
I only saw you say "Those of us who need this", without showing *any*
kind of indication of why anybody would use it in reality.

Why the secrecy? There is certainly no current user, nor any
description of what a user would be and what makes that notifier
useful.

The commit message also just says "It is sometimes helpful" and some
strange reference to "the subsystem causing the stall to dump its
state". It all sounds very fishy. Why would anybody ever have a known
subsystem causing RCU stalls? Except, of course, for the rcutorture
testing.

Anyway, that all absolutely SCREAMS to me "this is not something
useful in any normal kernel", and so yes:

> 3.      Add a default-n Kconfig option that depends on RCU_EXPERT
>         and KEBUG_KERNEL, so that these problems can only arise in
>         specially built kernels.
>
> 4.      Same as #3, but use a kernel boot parameter instead of a
>         Kconfig option.

let's make it clear that this is *not* something that any upstream
kernel would ever do, and the *only* possible use for it is some kind
of external temporary debug patch.

See why I so hate things like this? Let's head off any crazy use long
*long* before somebody decides that "Oh, I want to use this".

               Linus