Re: [PATCH 02/16] KVM: x86/mmu: Introduce a slot flag to zap only slot leafs on slot deletion

Sean Christopherson <seanjc@xxxxxxxxxx> · Wed, 15 May 2024 15:47:18 -0700

On Wed, May 15, 2024, Rick P Edgecombe wrote:
> On Wed, 2024-05-15 at 13:05 -0700, Sean Christopherson wrote:
> > On Wed, May 15, 2024, Rick P Edgecombe wrote:
> > > So rather then try to optimize zapping more someday and hit similar
> > > issues, let userspace decide how it wants it to be done. I'm not sure of
> > > the actual performance tradeoffs here, to be clear.
> > 
> > ...unless someone is able to root cause the VFIO regression, we don't have
> > the luxury of letting userspace give KVM a hint as to whether it might be
> > better to do a precise zap versus a nuke-and-pave.
> 
> Pedantry... I think it's not a regression if something requires a new flag. It
> is still a bug though.

Heh, pedantry denied.  I was speaking in the past tense about the VFIO failure,
which was a regression as I changed KVM behavior without adding a flag.

> The thing I worry about on the bug is whether it might have been due to a guest
> having access to page it shouldn't have. In which case we can't give the user
> the opportunity to create it.
> 
> I didn't gather there was any proof of this. Did you have any hunch either way?

I doubt the guest was able to access memory it shouldn't have been able to access.
But that's a moot point, as the bigger problem is that, because we have no idea
what's at fault, KVM can't make any guarantees about the safety of such a flag.

TDX is a special case where we don't have a better option (we do have other options,
they're just horrible).  In other words, the choice is essentially to either:

 (a) cross our fingers and hope that the problem is limited to shared memory
     with QEMU+VFIO, i.e. and doesn't affect TDX private memory.

or 

 (b) don't merge TDX until the original regression is fully resolved.

FWIW, I would love to root cause and fix the failure, but I don't know how feasible
that is at this point.

> > And more importantly, it would be a _hint_, not the hard requirement that TDX
> > needs.
> > 
> > > That said, a per-vm know is easier for TDX purposes.
> 
> If we don't want it to be a mandate from userspace, then we need to do some per-
> vm checking in TDX's case anyway. In which case we might as well go with the
> per-vm option for TDX.
> 
> You had said up the thread, why not opt all non-normal VMs into the new
> behavior. It will work great for TDX. But why do SEV and others want this
> automatically?

Because I want flexibility in KVM, i.e. I want to take the opportunity to try and
break away from KVM's godawful ABI.  It might be a pipe dream, as keying off the
VM type obviously has similar risks to giving userspace a memslot flag.  The one
sliver of hope is that the VM types really are quite new (though less so for SEV
and SEV-ES), whereas a memslot flag would be easily applied to existing VMs.