Re: [PATCH v2 11/27] KVM: x86/mmu: Zap only the relevant pages when removing a memslot

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hey Sean,

On 21.10.22 21:40, Sean Christopherson wrote:

On Thu, Oct 20, 2022, Alexander Graf wrote:
On 20.10.22 22:37, Sean Christopherson wrote:
On Thu, Oct 20, 2022, Alexander Graf wrote:
On 26.06.20 19:32, Sean Christopherson wrote:
/cast <thread necromancy>

On Tue, Aug 20, 2019 at 01:03:19PM -0700, Sean Christopherson wrote:
[...]

I don't think any of this explains the pass-through GPU issue.  But, we
have a few use cases where zapping the entire MMU is undesirable, so I'm
going to retry upstreaming this patch as with per-VM opt-in.  I wanted to
set the record straight for posterity before doing so.
Hey Sean,

Did you ever get around to upstream or rework the zap optimization? The way
I read current upstream, a memslot change still always wipes all SPTEs, not
only the ones that were changed.
Nope, I've more or less given up hope on zapping only the deleted/moved memslot.
TDX (and SNP?) will preserve SPTEs for guest private memory, but they're very
much a special case.

Do you have use case and/or issue that doesn't play nice with the "zap all" behavior?

Yeah, we're looking at adding support for the Hyper-V VSM extensions which
Windows uses to implement Credential Guard. With that, the guest gets access
to hypercalls that allow it to set reduced permissions for arbitrary gfns.
To ensure that user space has full visibility into those for live migration,
memory slots to model access would be a great fit. But it means we'd do
~100k memslot modifications on boot.
Oof.  100k memslot updates is going to be painful irrespective of flushing.  And
memslots (in their current form) won't work if the guest can drop executable
permissions.

Assuming KVM needs to support a KVM_MEM_NO_EXEC flag, rather than trying to solve
the "KVM flushes everything on memslot deletion", I think we should instead
properly support toggling KVM_MEM_READONLY (and KVM_MEM_NO_EXEC) without forcing
userspace to delete the memslot.  Commit 75d61fbcf563 ("KVM: set_memory_region:


That would be a cute acceleration for the case where we have to change permissions for a full slot. Unfortunately, the bulk of the changes are slot splits. Let me explain with numbers from a 1 vcpu, 8GB Windows Server 2019 boot:

GFN permission modification requests: 46294
Unique GFNs: 21200

That means on boot, we start off with a few huge memslots for guest RAM. Then down the road, we need to change permissions for individual pages inside these larger regions. The obvious option for that is a memslot split - delete, create, create, create. Now we have 2 large memslots and 1 that only spans a single page.

Later in the boot process, Windows then some times also toggles permissions for pages that it already split off earlier. That's the case we can optimize with the modify optimization you described in the previous email. But that's only about half the requests. The other half are memslot split requests.

We already built a prototype implementation of an atomic memslot update ioctl that allows us to keep other vCPUs running while we do the delete/create/create/create operation. But even with that, we see up to 30 min boot times for larger guests that most of the time are stuck in zapping pages.

I guess we have 2 options to make this viable:

  1) Optimize memslot splits + modifications to a point where they're fast enough   2) Add a different, faster mechanism on top of memslots for page granular permission bits

Also sorry for not posting the underlying credguard and atomic memslot patches yet. I wanted to kick off this conversation before sending them out - they're still too raw for upstream review atm :).


Thanks,

Alex




Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879






[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux