On Thu, May 11, 2023, Sean Christopherson wrote: > Introduce KVM_BUG_ON_DATA_CORRUPTION() and use it in the low-level rmap > helpers to convert the existing BUG()s to WARN_ON_ONCE() when the kernel > is built with CONFIG_BUG_ON_DATA_CORRUPTION=n, i.e. does NOT want to BUG() > on corruption of host kernel data structures. Environments that don't > have infrastructure to automatically capture crash dumps, i.e. aren't > likely to enable CONFIG_BUG_ON_DATA_CORRUPTION=y, are typically better > served overall by WARN-and-continue behavior (for the kernel, the VM is > dead regardless), as a BUG() while holding mmu_lock all but guarantees > the _best_ case scenario is a panic(). > > Make the BUG()s conditional instead of removing/replacing them entirely as > there's a non-zero chance (though by no means a guarantee) that the damage > isn't contained to the target VM, e.g. if no rmap is found for a SPTE then > KVM may be double-zapping the SPTE, i.e. has already freed the memory the > SPTE pointed at and thus KVM is reading/writing memory that KVM no longer > owns. > > Link: https://lore.kernel.org/all/20221129191237.31447-1-mizhang@xxxxxxxxxx > Suggested-by: Mingwei Zhang <mizhang@xxxxxxxxxx> > Cc: David Matlack <dmatlack@xxxxxxxxxx> > Cc: Jim Mattson <jmattson@xxxxxxxxxx> > Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx> Reviewed-by: Mingwei Zhang <mizhang@xxxxxxxxxx> > +/* > + * Note, "data corruption" refers to corruption of host kernel data structures, > + * not guest data. Guest data corruption, suspected or confirmed, that is tied > + * and contained to a single VM should *never* BUG() and potentially panic the > + * host, i.e. use this variant of KVM_BUG() if and only if a KVM data structure > + * is corrupted and that corruption can have a cascading effect to other parts > + * of the hosts and/or to other VMs. > + */ > +#define KVM_BUG_ON_DATA_CORRUPTION(cond, kvm) \ > +({ \ > + bool __ret = !!(cond); \ > + \ > + if (IS_ENABLED(CONFIG_BUG_ON_DATA_CORRUPTION)) \ > + BUG_ON(__ret); \ > + else if (WARN_ON_ONCE(__ret && !(kvm)->vm_bugged)) \ > + kvm_vm_bugged(kvm); \ > + unlikely(__ret); \ > +}) > + Previously, my concern was that people might abuse this feature by generating lots of KVM_BUG_ON_DATA_CORRUPTION() in the code, with the execuse that "hey, it is not a BUG_ON(), just turn off CONFIG_BUG_ON_DATA_CORRUPTION." In reality, especially in production, no one will take that risk by completely turning off the KCONFIG, so KVM_BUG_ON_DATA_CORRUPTION() is still a BUG_ON() but with people having execuses to add more. Later I realize that this worry is purely based on hypothesis, so I choose to not worry about that anymore. Overall, making BUG_ON() tunable is still a very good progress. Thank you and David for the help. -Mingwei > static inline void kvm_vcpu_srcu_read_lock(struct kvm_vcpu *vcpu) > { > #ifdef CONFIG_PROVE_RCU > -- > 2.40.1.606.ga4b1b128d6-goog >