On Fri, May 17, 2024, Isaku Yamahata wrote: > On Thu, May 16, 2024 at 06:40:02PM -0700, > Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > > On Wed, May 15, 2024, Sean Christopherson wrote: > > > On Tue, May 07, 2024, Paolo Bonzini wrote: > > > > @@ -5200,6 +5215,9 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu) > > > > if (is_invalid_opcode(intr_info)) > > > > return handle_ud(vcpu); > > > > > > > > + if (KVM_BUG_ON(is_ve_fault(intr_info), vcpu->kvm)) > > > > + return -EIO; > > > > > > I've hit this three times now when running KVM-Unit-Tests (I'm pretty sure it's > > > the EPT test, unsurprisingly). And unless I screwed up my testing, I verified it > > > still fires with Isaku's fix[*], though I'm suddenly having problems repro'ing. > > > > > > I'll update tomorrow as to whether I botched my testing of Isaku's fix, or if > > > there's another bug lurking. > > > > *sigh* > > > > AFAICT, I'm hitting a hardware issue. The #VE occurs when the CPU does an A/D > > assist on an entry in the L2's PML4 (L2 GPA 0x109fff8). EPT A/D bits are disabled, > > and KVM has write-protected the GPA (hooray for shadowing EPT entries). The CPU > > tries to write the PML4 entry to do the A/D assist and generates what appears to > > be a spurious #VE. > > > > Isaku, please forward this to the necessary folks at Intel. I doubt whatever > > is broken will block TDX, but it would be nice to get a root cause so we at least > > know whether or not TDX is a ticking time bomb. > > Sure, let me forward it. > I tested it lightly myself. but I couldn't reproduce it. This repros on a CLX and SKX, but not my client RPL box. I verified the same A/D-assist write-protection EPT Violation occurs on RPL, and that PROVE_VE is enabled, so I don't think RPL is simply getting lucky. Unless I'm missing something, this really does look like a CPU issue.