Re: [PATCH RFC 0/5] mm/gup: Introduce exclusive GUP pinning

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 21.06.24 11:25, Quentin Perret wrote:
On Friday 21 Jun 2024 at 10:02:08 (+0200), David Hildenbrand wrote:
Thanks for the information. IMHO we really should try to find a common
ground here, and FOLL_EXCLUSIVE is likely not it :)

That's OK, IMO at least :-).

Thanks for reviving this discussion with your patch set!

pKVM is interested in in-place conversion, I believe there are valid use
cases for in-place conversion for TDX and friends as well (as discussed, I
think that might be a clean way to get huge/gigantic page support in).

This implies the option to:

1) Have shared+private memory in guest_memfd
2) Be able to mmap shared parts
3) Be able to convert shared<->private in place

and later in my interest

4) Have huge/gigantic page support in guest_memfd with the option of
    converting individual subpages

We might not want to make use of that model for all of CC -- as you state,
sometimes the destructive approach might be better performance wise -- but
having that option doesn't sound crazy to me (and maybe would solve real
issues as well).

Cool.

After all, the common requirement here is that "private" pages are not
mapped/pinned/accessible.

Sure, there might be cases like "pKVM can handle access to private pages in
user page mappings", "AMD-SNP will not crash the host if writing to private
pages" but there are not factors that really make a difference for a common
solution.

Sure, there isn't much value in differentiating on these things. One
might argue that we could save one mmap() on the private->shared
conversion path by keeping all of guest_memfd mapped in userspace
including private memory, but that's most probably not worth the
effort of re-designing the whole thing just for that, so let's forget
that.

In a world where we can mmap() the whole (sparse "shared") thing, and dynamically map/unmap the shared parts only it would be saving a page fault on private->shared conversion, correct.

But that's sounds more like a CC-specific optimization for frequent conversions, which we should just ignore initially.


The ability to handle stage-2 faults in the kernel has implications in
other places however. It means we don't need to punch holes in the
kernel linear map when donating memory to a guest for example, even with
'crazy' access patterns like load_unaligned_zeropad(). So that's good.

private memory: not mapped, not pinned
shared memory: maybe mapped, maybe pinned
granularity of conversion: single pages

Anything I am missing?

That looks good to me. And as discussed in previous threads, we have the
ambition of getting page-migration to work, including for private memory,
mostly to get kcompactd to work better when pVMs are running. Android
makes extensive use of compaction, and pVMs currently stick out like a
sore thumb.

Yes, I think migration for compaction has to be supported at some point (at least for small pages that can be either private or shared, not a mixture), and I suspect we should be able to integrate it with core-mm in a not-too-horrible fashion. For example, we do have a non-lru page migration infrastructure in place already if the LRU-based one is not a good fit.

Memory swapping and all other currently-strictly LRU-based mechanisms should be out of scope for now: as Sean says, we don't want to go down that path.


We can trivially implement a hypercall to have pKVM swap a private
page with another without the guest having to know. The difficulty is
obviously to hook that in Linux, and I've personally not looked into it
properly, so that is clearly longer term. We don't want to take anybody
by surprise if there is a need for some added complexity in guest_memfd
to support this use-case though. I don't expect folks on the receiving
end of that to agree to it blindly without knowing _what_ this
complexity is FWIW. But at least our intentions are clear :-)

Agreed.

--
Cheers,

David / dhildenb





[Index of Archives]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Device Mapper]

  Powered by Linux