On Monday 04 Mar 2024 at 22:58:49 (+0100), David Hildenbrand wrote: > On 04.03.24 22:43, Elliot Berman wrote: > > On Mon, Mar 04, 2024 at 09:17:05PM +0100, David Hildenbrand wrote: > > > On 04.03.24 20:04, Sean Christopherson wrote: > > > > On Mon, Mar 04, 2024, Quentin Perret wrote: > > > > > > As discussed in the sub-thread, that might still be required. > > > > > > > > > > > > One could think about completely forbidding GUP on these mmap'ed > > > > > > guest-memfds. But likely, there might be use cases in the future where you > > > > > > want to use GUP on shared memory inside a guest_memfd. > > > > > > > > > > > > (the iouring example I gave might currently not work because > > > > > > FOLL_PIN|FOLL_LONGTERM|FOLL_WRITE only works on shmem+hugetlb, and > > > > > > guest_memfd will likely not be detected as shmem; 8ac268436e6d contains some > > > > > > details) > > > > > > > > > > Perhaps it would be wise to start with GUP being forbidden if the > > > > > current users do not need it (not sure if that is the case in Android, > > > > > I'll check) ? We can always relax this constraint later when/if the > > > > > use-cases arise, which is obviously much harder to do the other way > > > > > around. > > > > > > > > +1000. At least on the KVM side, I would like to be as conservative as possible > > > > when it comes to letting anything other than the guest access guest_memfd. > > > > > > So we'll have to do it similar to any occurrences of "secretmem" in gup.c. > > > We'll have to see how to marry KVM guest_memfd with core-mm code similar to > > > e.g., folio_is_secretmem(). > > > > > > IIRC, we might not be able to de-reference the actual mapping because it > > > could get free concurrently ... > > > > > > That will then prohibit any kind of GUP access to these pages, including > > > reading/writing for ptrace/debugging purposes, for core dumping purposes > > > etc. But at least, you know that nobody was able to optain page references > > > using GUP that might be used for reading/writing later. > > > > Do you have any concerns to add to enum mapping_flags, AS_NOGUP, and > > replacing folio_is_secretmem() with a test of this bit instead of > > comparing the a_ops? I think it scales better. > > The only concern I have are races, but let's look into the details: > > In GUP-fast, we can essentially race with unmap of folios, munmap() of VMAs > etc. > > We had a similar discussion recently about possible races. It's documented > in folio_fast_pin_allowed() regarding disabled IRQs and RCU grace periods. > > "inodes and thus their mappings are freed under RCU, which means the mapping > cannot be freed beneath us and thus we can safely dereference it." > > So if we follow the same rules as folio_fast_pin_allowed(), we can > de-reference folio->mapping, for example comparing mapping->a_ops. > > [folio_is_secretmem should better follow the same approach] Resurecting this discussion, I had discussions internally and as it turns out Android makes extensive use of vhost/vsock when communicating with guest VMs, which requires GUP. So, my bad, not supporting GUP for the pKVM variant of guest_memfd is a bit of a non-starter, we'll need to support it from the start. But again this should be a matter of 'simply' having a dedicated KVM exit reason so hopefully it's not too bad. Thanks, Quentin