On Fri, May 17, 2024, Paolo Bonzini wrote: > On 5/17/24 18:38, Sean Christopherson wrote: > > > > I've hit this three times now when running KVM-Unit-Tests (I'm pretty sure it's > > > > the EPT test, unsurprisingly). And unless I screwed up my testing, I verified it > > > > still fires with Isaku's fix[*], though I'm suddenly having problems repro'ing. > > > > > > > > I'll update tomorrow as to whether I botched my testing of Isaku's fix, or if > > > > there's another bug lurking. > > > > > > > > https://lore.kernel.org/all/20240515173209.GD168153@xxxxxxxxxxxxxxxxxxxxx > > > I cannot reproduce it on a Skylake (Xeon Gold 5120), with or without Isaku's > > > fix, with either ./runtests.sh or your reproducer line. > > > > > > However I can reproduce it only if eptad=0 and with the following line: > > > > > > ./x86/run x86/vmx.flat -smp 1 -cpu max,host-phys-bits,+vmx -m 2560 \ > > > -append 'ept_access_test_not_present ept_access_test_read_only' > > > > FWIW, I tried that on RPL, still no failure. > > Ok, so it does look like a CPU issue. Even with the fixes you identified, I > don't see any other solution than adding scary text in Kconfig, defaulting > it to "n", and adding an also-very-scary pr_err_once("...") the first time > VMPTRLD is executed with CONFIG_KVM_INTEL_PROVE_VE. I don't think we need to make it super scary, at least not yet. KVM just needs to not kill the VM, which thanks to the BUSY flag is trivial: just resume the guest. Then the failure is "just" a WARN, which won't be anywhere near as problematic for KVM developers. I doubt syzbot will hit this, purely because syzbot runs almost exclusively in VMs, i.e. won't have #VE support. If we don't have a resolution by rc6 or so, then maybe consider doing something more drastic? I agree that it should be off by default though. And the help text should be more clear that this intended only for developers and testing environments. I have a handful of patches, including one to not kill the VM. I'll try to post them later today, mostly just need to write changelogs. diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig index 75082c4a9ac4..5c22186671e9 100644 --- a/arch/x86/kvm/Kconfig +++ b/arch/x86/kvm/Kconfig @@ -98,15 +98,15 @@ config KVM_INTEL config KVM_INTEL_PROVE_VE bool "Check that guests do not receive #VE exceptions" - default KVM_PROVE_MMU || DEBUG_KERNEL - depends on KVM_INTEL + depends on KVM_INTEL && KVM_PROVE_MMU help - Checks that KVM's page table management code will not incorrectly let guests receive a virtualization exception. Virtualization exceptions will be trapped by the hypervisor rather than injected in the guest. + This should never be enabled in a production environment. + If unsure, say N. config X86_SGX_KVM