Re: [PATCH v2 11/27] KVM: x86/mmu: Zap only the relevant pages when removing a memslot

Sean Christopherson <seanjc@xxxxxxxxxx> · Fri, 21 Oct 2022 19:40:39 +0000




On Thu, Oct 20, 2022, Alexander Graf wrote:
> 
> On 20.10.22 22:37, Sean Christopherson wrote:
> > On Thu, Oct 20, 2022, Alexander Graf wrote:
> > > On 26.06.20 19:32, Sean Christopherson wrote:
> > > > /cast <thread necromancy>
> > > > 
> > > > On Tue, Aug 20, 2019 at 01:03:19PM -0700, Sean Christopherson wrote:
> > > [...]
> > > 
> > > > I don't think any of this explains the pass-through GPU issue.  But, we
> > > > have a few use cases where zapping the entire MMU is undesirable, so I'm
> > > > going to retry upstreaming this patch as with per-VM opt-in.  I wanted to
> > > > set the record straight for posterity before doing so.
> > > Hey Sean,
> > > 
> > > Did you ever get around to upstream or rework the zap optimization? The way
> > > I read current upstream, a memslot change still always wipes all SPTEs, not
> > > only the ones that were changed.
> > Nope, I've more or less given up hope on zapping only the deleted/moved memslot.
> > TDX (and SNP?) will preserve SPTEs for guest private memory, but they're very
> > much a special case.
> > 
> > Do you have use case and/or issue that doesn't play nice with the "zap all" behavior?
> 
> 
> Yeah, we're looking at adding support for the Hyper-V VSM extensions which
> Windows uses to implement Credential Guard. With that, the guest gets access
> to hypercalls that allow it to set reduced permissions for arbitrary gfns.
> To ensure that user space has full visibility into those for live migration,
> memory slots to model access would be a great fit. But it means we'd do
> ~100k memslot modifications on boot.

Oof.  100k memslot updates is going to be painful irrespective of flushing.  And
memslots (in their current form) won't work if the guest can drop executable
permissions.

Assuming KVM needs to support a KVM_MEM_NO_EXEC flag, rather than trying to solve
the "KVM flushes everything on memslot deletion", I think we should instead
properly support toggling KVM_MEM_READONLY (and KVM_MEM_NO_EXEC) without forcing
userspace to delete the memslot.  Commit 75d61fbcf563 ("KVM: set_memory_region:
Disallow changing read-only attribute later") was just a quick-and-dirty fix,
there's no fundemental problem that makes it impossible (or even all that difficult)
to support toggling permissions.

The ABI would be that KVM only guarantees the new permissions take effect when
the ioctl() returns, i.e. KVM doesn't need to ensure there are no writable SPTEs
when the memslot is installed, just that there are no writable SPTEs before
userspace regains control.

E.g. sans sanity checking and whatnot, I think x86 support would be something like:

@@ -12669,9 +12667,16 @@ static void kvm_mmu_slot_apply_flags(struct kvm *kvm,
         * MOVE/DELETE: The old mappings will already have been cleaned up by
         *              kvm_arch_flush_shadow_memslot().
         */
-       if ((change != KVM_MR_FLAGS_ONLY) || (new_flags & KVM_MEM_READONLY))
+       if (change != KVM_MR_FLAGS_ONLY)
                return;
 
+       if ((old_flags ^ new_flags) & KVM_MEM_READONLY) {
+               if ((new_flags & KVM_MEM_READONLY) &&
+                   kvm_mmu_slot_write_protect(kvm, new))
+                       kvm_arch_flush_remote_tlbs_memslot(kvm, new);
+               return;
+       }
+
        /*
         * READONLY and non-flags changes were filtered out above, and the only
         * other flag is LOG_DIRTY_PAGES, i.e. something is wrong if dirty