On Fri, May 12, 2023 at 4:18 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > On Fri, May 12, 2023, David Matlack wrote: > > On Thu, May 11, 2023 at 04:59:13PM -0700, Sean Christopherson wrote: > > > Convert all "runtime" assertions, i.e. assertions that can be triggered > > > while running vCPUs, from WARN_ON() to WARN_ON_ONCE(). Every WARN in the > > > MMU that is tied to running vCPUs, i.e. not contained to loading and > > > initializing KVM, is likely to fire _a lot_ when it does trigger. E.g. if > > > KVM ends up with a bug that causes a root to be invalidated before the > > > page fault handler is invoked, pretty much _every_ page fault VM-Exit > > > triggers the WARN. > > > > > > If a WARN is triggered frequently, the resulting spam usually causes a lot > > > of damage of its own, e.g. consumes resources to log the WARN and pollutes > > > the kernel log, often to the point where other useful information can be > > > lost. In many case, the damage caused by the spam is actually worse than > > > the bug itself, e.g. KVM can almost always recover from an unexpectedly > > > invalid root. > > > > > > On the flip side, warning every time is rarely helpful for debug and > > > triage, i.e. a single splat is usually sufficient to point a debugger in > > > the right direction, and automated testing, e.g. syzkaller, typically runs > > > with warn_on_panic=1, i.e. will never get past the first WARN anyways. > > > > On the topic of syzkaller, we should get them to test with > > CONFIG_KVM_PROVE_MMU once it's available. > > +1 > > > > Lastly, when an assertions fails multiple times, the stack traces in KVM > > > are almost always identical, i.e. the full splat only needs to be captured > > > once. And _if_ there is value in captruing information about the failed > > > assert, a ratelimited printk() is sufficient and less likely to rack up a > > > large amount of collateral damage. > > > > These are all good arguments and I think they apply to KVM_MMU_WARN_ON() > > as well. Should we convert that to _ONCE() too? > > Already done in this patch :-) I didn't call it out because that warn also falls > under the "runtime assertions" umbrella. Doh! Indeed. I was expecting to see KVM_MMU_WARN_ON() change to KVM_MMU_WARN_ON_ONCE().