Re: [RFC PATCH v1 2/9] KVM: guest_memfd: Add guest_memfd support to kvm_(read|/write)_guest_page()

David Hildenbrand <david@xxxxxxxxxx> · Thu, 23 Jan 2025 15:18:34 +0100

That said, we could always have a userspace address dedicated to
mapping shared locations, and use that address when the necessity
arises. Or we could always require that memslots have a userspace
address, even if not used. I don't really have a strong preference.

So, the simpler version where user space would simply mmap guest_memfd
to provide the address via userspace_addr would at least work for the
use case of paravirtualized time?

fwiw, I'm currently prototyping something like this for x86 (although
not by putting the gmem address into userspace_addr, but by adding a new
field to memslots, so that memory attributes continue working), based on
what we talked about at the last guest_memfd sync meeting (the whole
"how to get MMIO emulation working for non-CoCo VMs in guest_memfd"
story).

Yes, I recall that discussion. Can you elaborate why the separate field 
is required to keep memory attributes working? (could it be sorted out 
differently, by reusing userspace_addr?).

So I guess if we're going down this route for x86, maybe it
makes sense to do the same on ARM, for consistency?

It would get rid of the immediate need for this patch and patch #4 to
get it flying.

One interesting question is: when would you want shared memory in
guest_memfd and *not* provide it as part of the same memslot.

In my testing of non-CoCo gmem VMs on ARM, I've been able to get quite
far without giving KVM a way to internally access shared parts of gmem -
it's why I was probing Fuad for this simplified series, because
KVM_SW_PROTECTED_VM + mmap (for loading guest kernel) is enough to get a
working non-CoCo VM on ARM (although I admittedly never looked at clocks
inside the guest - maybe that's one thing that breaks if KVM can't
access gmem. How to guest and host agree on the guest memory range
used to exchange paravirtual timekeeping information? Could that exchange
be intercepted in userspace, and set to shared via memory attributes (e.g.
placed outside gmem)? That's the route I'm going down the paravirtual
time on x86).

Sounds reasonable to me.

One nice thing about the mmap might be that access go via user-space
page tables: E.g., __kvm_read_guest_page can just access the memory
without requiring the folio lock and an additional temporary folio
reference on every access -- it's handled implicitly via the mapcount.

(of course, to map the page we still need that once on the fault path)

Doing a direct map access in kvm_{read,write}_guest() and friends will
also get tricky if guest_memfd folios ever don't have direct map
entries. On-demand restoration is painful, both complexity and
performance wise [1], while going through a userspace mapping of
guest_memfd would "just work".

Indeed.

--
Cheers,

David / dhildenb