On Thu, 2025-01-23 at 12:28 +0000, Fuad Tabba wrote: > Hi Patrick, > > On Thu, 23 Jan 2025 at 11:57, Patrick Roy <roypat@xxxxxxxxxxxx> wrote: >> >> >> >> On Thu, 2025-01-23 at 11:39 +0000, David Hildenbrand wrote: >>> On 23.01.25 10:48, Fuad Tabba wrote: >>>> On Wed, 22 Jan 2025 at 22:10, David Hildenbrand <david@xxxxxxxxxx> wrote: >>>>> >>>>> On 22.01.25 16:27, Fuad Tabba wrote: >>>>>> Make kvm_(read|/write)_guest_page() capable of accessing guest >>>>>> memory for slots that don't have a userspace address, but only if >>>>>> the memory is mappable, which also indicates that it is >>>>>> accessible by the host. >>>>> >>>>> Interesting. So far my assumption was that, for shared memory, user >>>>> space would simply mmap() guest_memdd and pass it as userspace address >>>>> to the same memslot that has this guest_memfd for private memory. >>>>> >>>>> Wouldn't that be easier in the first shot? (IOW, not require this patch >>>>> with the cost of faulting the shared page into the page table on access) >>>> >>> >>> In light of: >>> >>> https://lkml.kernel.org/r/20250117190938.93793-4-imbrenda@xxxxxxxxxxxxx >>> >>> there can, in theory, be memslots that start at address 0 and have a >>> "valid" mapping. This case is done from the kernel (and on special s390x >>> hardware), though, so it does not apply here at all so far. >>> >>> In practice, getting address 0 as a valid address is unlikely, because >>> the default: >>> >>> $ sysctl vm.mmap_min_addr >>> vm.mmap_min_addr = 65536 >>> >>> usually prohibits it for good reason. >>> >>>> This has to do more with the ABI I had for pkvm and shared memory >>>> implementations, in which you don't need to specify the userspace >>>> address for memory in a guestmem memslot. The issue is there is no >>>> obvious address to map it to. This would be the case in kvm:arm64 for >>>> tracking paravirtualized time, which the userspace doesn't necessarily >>>> need to interact with, but kvm does. >>> >>> So I understand correctly: userspace wouldn't have to mmap it because it >>> is not interested in accessing it, but there is nothing speaking against >>> mmaping it, at least in the first shot. >>> >>> I assume it would not be a private memslot (so far, my understanding is >>> that internal memslots never have a guest_memfd attached). >>> kvm_gmem_create() is only called via KVM_CREATE_GUEST_MEMFD, to be set >>> on user-created memslots. >>> >>>> >>>> That said, we could always have a userspace address dedicated to >>>> mapping shared locations, and use that address when the necessity >>>> arises. Or we could always require that memslots have a userspace >>>> address, even if not used. I don't really have a strong preference. >>> >>> So, the simpler version where user space would simply mmap guest_memfd >>> to provide the address via userspace_addr would at least work for the >>> use case of paravirtualized time? >> >> fwiw, I'm currently prototyping something like this for x86 (although >> not by putting the gmem address into userspace_addr, but by adding a new >> field to memslots, so that memory attributes continue working), based on >> what we talked about at the last guest_memfd sync meeting (the whole >> "how to get MMIO emulation working for non-CoCo VMs in guest_memfd" >> story). So I guess if we're going down this route for x86, maybe it >> makes sense to do the same on ARM, for consistency? >> >>> It would get rid of the immediate need for this patch and patch #4 to >>> get it flying. >>> >>> >>> One interesting question is: when would you want shared memory in >>> guest_memfd and *not* provide it as part of the same memslot. >> >> In my testing of non-CoCo gmem VMs on ARM, I've been able to get quite >> far without giving KVM a way to internally access shared parts of gmem - >> it's why I was probing Fuad for this simplified series, because >> KVM_SW_PROTECTED_VM + mmap (for loading guest kernel) is enough to get a >> working non-CoCo VM on ARM (although I admittedly never looked at clocks >> inside the guest - maybe that's one thing that breaks if KVM can't >> access gmem. How to guest and host agree on the guest memory range >> used to exchange paravirtual timekeeping information? Could that exchange >> be intercepted in userspace, and set to shared via memory attributes (e.g. >> placed outside gmem)? That's the route I'm going down the paravirtual >> time on x86). > > For an idea of what it looks like on arm64, here's how kvmtool handles it: > https://github.com/kvmtool/kvmtool/blob/master/arm/aarch64/pvtime.c > > Cheers, > /fuad Thanks! In that example, kvmtool actually allocates a separate memslot for the pvclock stuff, so I guess it's always possible to simply put it into a non-gmem memslot, which indeed sidesteps this issue as you mention in your reply to David :D >>> One nice thing about the mmap might be that access go via user-space >>> page tables: E.g., __kvm_read_guest_page can just access the memory >>> without requiring the folio lock and an additional temporary folio >>> reference on every access -- it's handled implicitly via the mapcount. >>> >>> (of course, to map the page we still need that once on the fault path) >> >> Doing a direct map access in kvm_{read,write}_guest() and friends will >> also get tricky if guest_memfd folios ever don't have direct map >> entries. On-demand restoration is painful, both complexity and >> performance wise [1], while going through a userspace mapping of >> guest_memfd would "just work". >> >>> -- >>> Cheers, >>> >>> David / dhildenb >>>