On Tue, 2023-07-25 at 11:15 -0700, Sean Christopherson wrote: > On Tue, Jul 25, 2023, Kai Huang wrote: > > On Fri, 2023-07-21 at 13:18 -0700, Sean Christopherson wrote: > > > Bail from vmx_emergency_disable() without processing the list of loaded > > > VMCSes if CR4.VMXE=0, i.e. if the CPU can't be post-VMXON. It should be > > > impossible for the list to have entries if VMX is already disabled, and > > > even if that invariant doesn't hold, VMCLEAR will #UD anyways, i.e. > > > processing the list is pointless even if it somehow isn't empty. > > > > > > Assuming no existing KVM bugs, this should be a glorified nop. The > > > primary motivation for the change is to avoid having code that looks like > > > it does VMCLEAR, but then skips VMXON, which is nonsensical. > > > > > > Suggested-by: Kai Huang <kai.huang@xxxxxxxxx> > > > Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx> > > > --- > > > arch/x86/kvm/vmx/vmx.c | 12 ++++++++++-- > > > 1 file changed, 10 insertions(+), 2 deletions(-) > > > > > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c > > > index 5d21931842a5..0ef5ede9cb7c 100644 > > > --- a/arch/x86/kvm/vmx/vmx.c > > > +++ b/arch/x86/kvm/vmx/vmx.c > > > @@ -773,12 +773,20 @@ static void vmx_emergency_disable(void) > > > > > > kvm_rebooting = true; > > > > > > + /* > > > + * Note, CR4.VMXE can be _cleared_ in NMI context, but it can only be > > > + * set in task context. If this races with VMX is disabled by an NMI, > > > + * VMCLEAR and VMXOFF may #UD, but KVM will eat those faults due to > > > + * kvm_rebooting set. > > > + */ > > > > I am not quite following this comment. IIUC this code path is only called from > > NMI context in case of emergency VMX disable. > > The CPU that initiates the emergency reboot can invoke the callback from process > context, only responding CPUs are guaranteed to be handled via NMI shootdown. > E.g. `reboot -f` will reach this point synchronously. > > > How can it race with "VMX is disabled by an NMI"? > > Somewhat theoretically, a different CPU could panic() and do a shootdown of the > CPU that is handling `reboot -f`. Yeah this is the only case I can think of too. Anyway, LGTM. Thanks for explaining.