David Hildenbrand <david@xxxxxxxxxx> writes: > Linus notes [1] that the introduction of new code that uses VM_BUG_ON() > is just as bad as BUG_ON(), because it will crash the kernel on > distributions that enable CONFIG_DEBUG_VM (like Fedora): > > VM_BUG_ON() has the exact same semantics as BUG_ON. It is literally > no different, the only difference is "we can make the code smaller > because these are less important". [2] > > This resulted in a more generic discussion about usage of BUG() and > friends. While there might be corner cases that still deserve a BUG_ON(), > most BUG_ON() cases should simply use WARN_ON_ONCE() and implement a > recovery path if reasonable: > > The only possible case where BUG_ON can validly be used is "I have > some fundamental data corruption and cannot possibly return an > error". [2] > > As a very good approximation is the general rule: > > "absolutely no new BUG_ON() calls _ever_" [2] > > ... not even if something really shouldn't ever happen and is merely for > documenting that an invariant always has to hold. However, there are sill > exceptions where BUG_ON() may be used: > > If you have a "this is major internal corruption, there's no way we can > continue", then BUG_ON() is appropriate. [3] > > There is only one good BUG_ON(): > > Now, that said, there is one very valid sub-form of BUG_ON(): > BUILD_BUG_ON() is absolutely 100% fine. [2] > > While WARN will also crash the machine with panic_on_warn set, that's > exactly to be expected: > > So we have two very different cases: the "virtual machine with good > logging where a dead machine is fine" - use 'panic_on_warn'. And > the actual real hardware with real drivers, running real loads by > users. [4] > > The basic idea is that warnings will similarly get reported by users > and be found during testing. However, in contrast to a BUG(), there is a > way to actually influence the expected behavior (e.g., panic_on_warn) > and to eventually keep the machine alive to extract some debug info. > > Ingo notes that not all WARN_ON_ONCE cases need recovery. If we don't ever > expect this code to trigger in any case, recovery code is not really > helpful. > > I'd prefer to keep all these warnings 'simple' - i.e. no attempted > recovery & control flow, unless we ever expect these to trigger. > [5] > > There have been different rules floating around that were never properly > documented. Let's try to clarify. > > [1] https://lkml.kernel.org/r/CAHk-=wiEAH+ojSpAgx_Ep=NKPWHU8AdO3V56BXcCsU97oYJ1EA@xxxxxxxxxxxxxx > [2] https://lore.kernel.org/r/CAHk-=wg40EAZofO16Eviaj7mfqDhZ2gVEbvfsMf6gYzspRjYvw@xxxxxxxxxxxxxx > [2] https://lkml.kernel.org/r/CAHk-=wit-DmhMfQErY29JSPjFgebx_Ld+pnerc4J2Ag990WwAA@xxxxxxxxxxxxxx > [4] https://lore.kernel.org/r/CAHk-=wgF7K2gSSpy=m_=K3Nov4zaceUX9puQf1TjkTJLA2XC_g@xxxxxxxxxxxxxx > [5] https://lore.kernel.org/r/YwIW+mVeZoTOxn%2F4@xxxxxxxxx > > Signed-off-by: David Hildenbrand <david@xxxxxxxxxx> [...] > +Use WARN_ON_ONCE() rather than WARN() or WARN_ON() > +************************************************** > + > +WARN_ON_ONCE() is generally preferred over WARN() or WARN_ON(), because it > +is common for a given warning condition, if it occurs at all, to occur > +multiple times. This can fill up and wrap the kernel log, and can even slow > +the system enough that the excessive logging turns into its own, additional > +problem. FWIW I have had cases where WARN() messages caused a reboot, maybe mention that here? In my case the logging was so excessive that the watchdog wasn't updated and in the end the device was forcefully rebooted. -- https://patchwork.kernel.org/project/linux-wireless/list/ https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches