Re: [kvm-unit-tests PATCH 05/12] nSVM: Remove NPT reserved bits tests (new one on the way)

Maxim Levitsky <mlevitsk@xxxxxxxxxx> · Thu, 12 Aug 2021 10:58:37 +0300

On Thu, 2021-06-24 at 17:43 +0000, Sean Christopherson wrote:
> On Thu, Jun 24, 2021, Paolo Bonzini wrote:
> > On 22/06/21 23:00, Sean Christopherson wrote:
> > > Remove two of nSVM's NPT reserved bits test, a soon-to-be-added test will
> > > provide a superset of their functionality, e.g. the current tests are
> > > limited in the sense that they test a single entry and a single bit,
> > > e.g. don't test conditionally-reserved bits.
> > > 
> > > The npt_rsvd test in particular is quite nasty as it subtly relies on
> > > EFER.NX=1; dropping the test will allow cleaning up the EFER.NX weirdness
> > > (it's forced for_all_  tests, presumably to get the desired PFEC.FETCH=1
> > > for this one test).
> > > 
> > > Signed-off-by: Sean Christopherson<seanjc@xxxxxxxxxx>
> > > ---
> > >   x86/svm_tests.c | 45 ---------------------------------------------
> > >   1 file changed, 45 deletions(-)
> > 
> > This exposes a KVM bug, reproducible with
> > 
> > 	./x86/run x86/svm.flat -smp 2 -cpu max,+svm -m 4g \
> > 		-append 'npt_rw npt_rw_pfwalk'
> 
> Any chance you're running against an older KVM version?  The test passes if I
> run against a build with my MMU pile on top of kvm/queue, but fails on a random
> older KVM.
> 
> Side topic, these tests all fail to invalidate TLB entries after modifying PTEs.
> I suspect they work in part because KVM flushes and syncs on all nested SVM
> transitions...

I also now tried to reproduce and the test passes.

Best regards,
	Maxim Levitsky

> 
> > While running npt_rw_pfwalk, the #NPF gets an incorrect EXITINFO2
> > (address for the NPF location; on my machine it gets 0xbfede6f0 instead of
> > 0xbfede000).  The same tests work with QEMU from git.
> > 
> > I didn't quite finish analyzing it, but my current theory is
> > that KVM receives a pagewalk NPF for a *different* page walk that is caused
> > by read-only page tables; then it finds that the page walk to 0xbfede6f0
> > *does fail* (after all the correct and wrong EXITINFO2 belong to the same pfn)
> > and therefore injects it anyway.  This theory is because the 0x6f0 offset in
> > the page table corresponds to the 0xde000 part of the faulting address.
> > Maxim will look into it while I'm away.
> > 
> > Paolo
> >