Re: [PATCH] sched/all: Change BUG_ON() instances to WARN_ON()

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Thu, 11 Aug 2022 13:43:09 -0700

On Thu, Aug 11, 2022 at 12:13 AM Ingo Molnar <mingo@xxxxxxxxxx> wrote:
>
> By using a WARN_ON() we at least give the user a chance to report
> any bugs triggered here - instead of getting silent hangs.
>
> None of these WARN_ON()s are supposed to trigger, ever - so we ignore
> cases where a NULL check is done via a BUG_ON() and we let a NULL
> pointer through after a WARN_ON().

May I suggest going one step further, and making these WARN_ON_ONCE() instead.

>From personal experience, once some scheduler bug (or task struct
corruption) happens, ti often *keeps* happening, and the logs just
fill up with more and more data, to the point where you lose sight of
the original report (and the machine can even get unusable just from
the logging).

WARN_ON_ONCE() can help that situation.

Now, obviously

 (a) WARN_ON_ONCE *can* also result in less information, and maybe
there are situations where having more - possibly different - cases of
the same thing triggering could be useful.

 (b) WARN_ON_ONCE historically generated a bit bigger code than
WARN_ON simply due to the extra "did this already trigger" check.

I *think* (b) is no longer true, and it's just a flag these days, but
I didn't actually check.

so it's not like there aren't potential downsides, but in general I
think the sanest and most natural thing is to have BUG_ON() translate
to WARN_ON_ONCE().

For the "reboot-on-warn" people, it ends up being the same thing. And
for the rest of us, the "give me *one* warning" can end up making the
reporting a lot easier.

Obviously, with the "this never actually happens", the whole "once or
many times" is kind of moot. But if it never happens at all, to the
point where it doesn't even add a chance of helping debugging, maybe
the whole test should be removed entirely...

           Linus