Hi Patrick, On Thu, 23 Jan 2025 at 11:57, Patrick Roy <roypat@xxxxxxxxxxxx> wrote: > > > > On Thu, 2025-01-23 at 11:39 +0000, David Hildenbrand wrote: > > On 23.01.25 10:48, Fuad Tabba wrote: > >> On Wed, 22 Jan 2025 at 22:10, David Hildenbrand <david@xxxxxxxxxx> wrote: > >>> > >>> On 22.01.25 16:27, Fuad Tabba wrote: > >>>> Make kvm_(read|/write)_guest_page() capable of accessing guest > >>>> memory for slots that don't have a userspace address, but only if > >>>> the memory is mappable, which also indicates that it is > >>>> accessible by the host. > >>> > >>> Interesting. So far my assumption was that, for shared memory, user > >>> space would simply mmap() guest_memdd and pass it as userspace address > >>> to the same memslot that has this guest_memfd for private memory. > >>> > >>> Wouldn't that be easier in the first shot? (IOW, not require this patch > >>> with the cost of faulting the shared page into the page table on access) > >> > > > > In light of: > > > > https://lkml.kernel.org/r/20250117190938.93793-4-imbrenda@xxxxxxxxxxxxx > > > > there can, in theory, be memslots that start at address 0 and have a > > "valid" mapping. This case is done from the kernel (and on special s390x > > hardware), though, so it does not apply here at all so far. > > > > In practice, getting address 0 as a valid address is unlikely, because > > the default: > > > > $ sysctl vm.mmap_min_addr > > vm.mmap_min_addr = 65536 > > > > usually prohibits it for good reason. > > > >> This has to do more with the ABI I had for pkvm and shared memory > >> implementations, in which you don't need to specify the userspace > >> address for memory in a guestmem memslot. The issue is there is no > >> obvious address to map it to. This would be the case in kvm:arm64 for > >> tracking paravirtualized time, which the userspace doesn't necessarily > >> need to interact with, but kvm does. > > > > So I understand correctly: userspace wouldn't have to mmap it because it > > is not interested in accessing it, but there is nothing speaking against > > mmaping it, at least in the first shot. > > > > I assume it would not be a private memslot (so far, my understanding is > > that internal memslots never have a guest_memfd attached). > > kvm_gmem_create() is only called via KVM_CREATE_GUEST_MEMFD, to be set > > on user-created memslots. > > > >> > >> That said, we could always have a userspace address dedicated to > >> mapping shared locations, and use that address when the necessity > >> arises. Or we could always require that memslots have a userspace > >> address, even if not used. I don't really have a strong preference. > > > > So, the simpler version where user space would simply mmap guest_memfd > > to provide the address via userspace_addr would at least work for the > > use case of paravirtualized time? > > fwiw, I'm currently prototyping something like this for x86 (although > not by putting the gmem address into userspace_addr, but by adding a new > field to memslots, so that memory attributes continue working), based on > what we talked about at the last guest_memfd sync meeting (the whole > "how to get MMIO emulation working for non-CoCo VMs in guest_memfd" > story). So I guess if we're going down this route for x86, maybe it > makes sense to do the same on ARM, for consistency? > > > It would get rid of the immediate need for this patch and patch #4 to > > get it flying. > > > > > > One interesting question is: when would you want shared memory in > > guest_memfd and *not* provide it as part of the same memslot. > > In my testing of non-CoCo gmem VMs on ARM, I've been able to get quite > far without giving KVM a way to internally access shared parts of gmem - > it's why I was probing Fuad for this simplified series, because > KVM_SW_PROTECTED_VM + mmap (for loading guest kernel) is enough to get a > working non-CoCo VM on ARM (although I admittedly never looked at clocks > inside the guest - maybe that's one thing that breaks if KVM can't > access gmem. How to guest and host agree on the guest memory range > used to exchange paravirtual timekeeping information? Could that exchange > be intercepted in userspace, and set to shared via memory attributes (e.g. > placed outside gmem)? That's the route I'm going down the paravirtual > time on x86). For an idea of what it looks like on arm64, here's how kvmtool handles it: https://github.com/kvmtool/kvmtool/blob/master/arm/aarch64/pvtime.c Cheers, /fuad > > One nice thing about the mmap might be that access go via user-space > > page tables: E.g., __kvm_read_guest_page can just access the memory > > without requiring the folio lock and an additional temporary folio > > reference on every access -- it's handled implicitly via the mapcount. > > > > (of course, to map the page we still need that once on the fault path) > > Doing a direct map access in kvm_{read,write}_guest() and friends will > also get tricky if guest_memfd folios ever don't have direct map > entries. On-demand restoration is painful, both complexity and > performance wise [1], while going through a userspace mapping of > guest_memfd would "just work". > > > -- > > Cheers, > > > > David / dhildenb > >