Re: [RFC PATCH v1 2/9] KVM: guest_memfd: Add guest_memfd support to kvm_(read|/write)_guest_page()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi David,

On Thu, 23 Jan 2025 at 11:39, David Hildenbrand <david@xxxxxxxxxx> wrote:
>
> On 23.01.25 10:48, Fuad Tabba wrote:
> > On Wed, 22 Jan 2025 at 22:10, David Hildenbrand <david@xxxxxxxxxx> wrote:
> >>
> >> On 22.01.25 16:27, Fuad Tabba wrote:
> >>> Make kvm_(read|/write)_guest_page() capable of accessing guest
> >>> memory for slots that don't have a userspace address, but only if
> >>> the memory is mappable, which also indicates that it is
> >>> accessible by the host.
> >>
> >> Interesting. So far my assumption was that, for shared memory, user
> >> space would simply mmap() guest_memdd and pass it as userspace address
> >> to the same memslot that has this guest_memfd for private memory.
> >>
> >> Wouldn't that be easier in the first shot? (IOW, not require this patch
> >> with the cost of faulting the shared page into the page table on access)
> >

Before answering your questions in detail, in light of this
discussion, and in other discussions I've had, I think the more
reasonable thing to do is to let the memslot for in-place shared
memory look the same as the other guest_memfd memslots, i.e., have a
userspace address regardless if it's needed or not. This makes the
interface the same across all guest_memfd implementations, and avoids
the need for patches such as this one, at least initially until we
realize that we need it.

That said...

>
> In light of:
>
> https://lkml.kernel.org/r/20250117190938.93793-4-imbrenda@xxxxxxxxxxxxx
>
> there can, in theory, be memslots that start at address 0 and have a
> "valid" mapping. This case is done from the kernel (and on special s390x
> hardware), though, so it does not apply here at all so far.
>
> In practice, getting address 0 as a valid address is unlikely, because
> the default:
>
> $ sysctl  vm.mmap_min_addr
> vm.mmap_min_addr = 65536
>
> usually prohibits it for good reason.

Ack.

> > This has to do more with the ABI I had for pkvm and shared memory
> > implementations, in which you don't need to specify the userspace
> > address for memory in a guestmem memslot. The issue is there is no
> > obvious address to map it to. This would be the case in kvm:arm64 for
> > tracking paravirtualized time, which the userspace doesn't necessarily
> > need to interact with, but kvm does.
>
> So I understand correctly: userspace wouldn't have to mmap it because it
> is not interested in accessing it, but there is nothing speaking against
> mmaping it, at least in the first shot.

That's right.

> I assume it would not be a private memslot (so far, my understanding is
> that internal memslots never have a guest_memfd attached).
> kvm_gmem_create() is only called via KVM_CREATE_GUEST_MEMFD, to be set
> on user-created memslots.

Right, it wouldn't be a private memslot, but a user created one.

> >
> > That said, we could always have a userspace address dedicated to
> > mapping shared locations, and use that address when the necessity
> > arises. Or we could always require that memslots have a userspace
> > address, even if not used. I don't really have a strong preference.
>
> So, the simpler version where user space would simply mmap guest_memfd
> to provide the address via userspace_addr would at least work for the
> use case of paravirtualized time?
>
> It would get rid of the immediate need for this patch and patch #4 to
> get it flying.

I agree now.

>
> One interesting question is: when would you want shared memory in
> guest_memfd and *not* provide it as part of the same memslot.
>

My original thinking wasn't that we wouldn't _want_ to, rather that we
don't _need_ to. I mistakenly thought that it would simplify things,
but, especially after this discussion, I now realize that it makes
things more complicated instead.

> One nice thing about the mmap might be that access go via user-space
> page tables: E.g., __kvm_read_guest_page can just access the memory
> without requiring the folio lock and an additional temporary folio
> reference on every access -- it's handled implicitly via the mapcount.
>
> (of course, to map the page we still need that once on the fault path)

Ack.

Thanks,
/fuad


> --
> Cheers,
>
> David / dhildenb
>




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux