Re: [PATCH 02/16] KVM: x86/mmu: Introduce a slot flag to zap only slot leafs on slot deletion

"Edgecombe, Rick P" <rick.p.edgecombe@xxxxxxxxx> · Wed, 15 May 2024 20:53:26 +0000

On Wed, 2024-05-15 at 13:05 -0700, Sean Christopherson wrote:
> On Wed, May 15, 2024, Rick P Edgecombe wrote:
> > On Wed, 2024-05-15 at 12:09 -0700, Sean Christopherson wrote:
> > > > It's weird that userspace needs to control how does KVM zap page table
> > > > for
> > > > memslot delete/move.
> > > 
> > > Yeah, this isn't quite what I had in mind.  Granted, what I had in mind
> > > may
> > > not be much any better, but I definitely don't want to let userspace
> > > dictate exactly how KVM manages SPTEs.
> > 
> > To me it doesn't seem completely unprecedented at least. Linux has a ton of
> > madvise() flags and other knobs to control this kind of PTE management for
> > userspace memory.
> 
> Yes, but they all express their requests in terms of what behavior userspace
> wants
> or to communicate userspace's access paterns.  They don't dictate exact low
> level
> behavior to the kernel.
> 

There are a few for madvise that are like "don't do this". Of course also, some
of the implementations take direct action anyway and then become ABI. Otherwise
there is mlock(). There are so many mm features. It might actually be more of a
cautionary tale.

[snip]

> > So rather then try to optimize zapping more someday and hit similar issues,
> > let
> > userspace decide how it wants it to be done. I'm not sure of the actual
> > performance tradeoffs here, to be clear.
> 
> ...unless someone is able to root cause the VFIO regression, we don't have the
> luxury of letting userspace give KVM a hint as to whether it might be better
> to
> do a precise zap versus a nuke-and-pave.

Pedantry... I think it's not a regression if something requires a new flag. It
is still a bug though.

The thing I worry about on the bug is whether it might have been due to a guest
having access to page it shouldn't have. In which case we can't give the user
the opportunity to create it.

I didn't gather there was any proof of this. Did you have any hunch either way?

> 
> And more importantly, it would be a _hint_, not the hard requirement that TDX
> needs.
> 
> > That said, a per-vm know is easier for TDX purposes.

If we don't want it to be a mandate from userspace, then we need to do some per-
vm checking in TDX's case anyway. In which case we might as well go with the
per-vm option for TDX.

You had said up the thread, why not opt all non-normal VMs into the new
behavior. It will work great for TDX. But why do SEV and others want this
automatically?