On Wed, Jun 07, 2023, Isaku Yamahata wrote: > On Wed, Jun 07, 2023 at 12:27:33PM -0700, > Dave Hansen <dave.hansen@xxxxxxxxx> wrote: > > > On 6/7/23 11:53, Isaku Yamahata wrote: > > >>> VMX enabling, and KVM is the only user of TDX. This implementation > > >>> chooses to make KVM itself responsible for enabling VMX before using > > >>> TDX and let the rest of the kernel stay blissfully unaware of VMX. > > >>> > > >>> The current TDX_MODULE_CALL macro handles neither #GP nor #UD. The > > >>> kernel would hit Oops if SEAMCALL were mistakenly made w/o enabling VMX > > >>> first. Architecturally, there is no CPU flag to check whether the CPU > > >>> is in VMX operation. Also, if a BIOS were buggy, it could still report > > >>> valid TDX private KeyIDs when TDX actually couldn't be enabled. > > >> I'm not sure this is a great justification. If the BIOS is lying to the > > >> OS, we _should_ oops. > > >> > > >> How else can this happen other than silly kernel bugs. It's OK to oops > > >> in the face of silly kernel bugs. > > > TDX KVM + reboot can hit #UD. On reboot, VMX is disabled (VMXOFF) via > > > syscore.shutdown callback. However, guest TD can be still running to issue > > > SEAMCALL resulting in #UD. > > > > > > Or we can postpone the change and make the TDX KVM patch series carry a patch > > > for it. > > > > How does the existing KVM use of VMLAUNCH/VMRESUME avoid that problem? > > extable. From arch/x86/kvm/vmx/vmenter.S > > .Lvmresume: > vmresume > jmp .Lvmfail > > .Lvmlaunch: > vmlaunch > jmp .Lvmfail > > _ASM_EXTABLE(.Lvmresume, .Lfixup) > _ASM_EXTABLE(.Lvmlaunch, .Lfixup) More specifically, KVM eats faults on VMX and SVM instructions that occur after KVM forcefully disables VMX/SVM. E.g. with reboot -f, this will be reached without first stopping VMs: static void kvm_shutdown(void) { /* * Disable hardware virtualization and set kvm_rebooting to indicate * that KVM has asynchronously disabled hardware virtualization, i.e. * that relevant errors and exceptions aren't entirely unexpected. * Some flavors of hardware virtualization need to be disabled before * transferring control to firmware (to perform shutdown/reboot), e.g. * on x86, virtualization can block INIT interrupts, which are used by * firmware to pull APs back under firmware control. Note, this path * is used for both shutdown and reboot scenarios, i.e. neither name is * 100% comprehensive. */ pr_info("kvm: exiting hardware virtualization\n"); kvm_rebooting = true; on_each_cpu(hardware_disable_nolock, NULL, 1); } which KVM x86 (VMX and SVM) then queries when deciding what to do with a spurious fault on a VMX/SVM instruction /* * Handle a fault on a hardware virtualization (VMX or SVM) instruction. * * Hardware virtualization extension instructions may fault if a reboot turns * off virtualization while processes are running. Usually after catching the * fault we just panic; during reboot instead the instruction is ignored. */ noinstr void kvm_spurious_fault(void) { /* Fault while not rebooting. We want the trace. */ BUG_ON(!kvm_rebooting); } EXPORT_SYMBOL_GPL(kvm_spurious_fault);