On 19.03.24 10:47, Quentin Perret wrote:
On Monday 04 Mar 2024 at 22:58:49 (+0100), David Hildenbrand wrote:
On 04.03.24 22:43, Elliot Berman wrote:
On Mon, Mar 04, 2024 at 09:17:05PM +0100, David Hildenbrand wrote:
On 04.03.24 20:04, Sean Christopherson wrote:
On Mon, Mar 04, 2024, Quentin Perret wrote:
As discussed in the sub-thread, that might still be required.
One could think about completely forbidding GUP on these mmap'ed
guest-memfds. But likely, there might be use cases in the future where you
want to use GUP on shared memory inside a guest_memfd.
(the iouring example I gave might currently not work because
FOLL_PIN|FOLL_LONGTERM|FOLL_WRITE only works on shmem+hugetlb, and
guest_memfd will likely not be detected as shmem; 8ac268436e6d contains some
details)
Perhaps it would be wise to start with GUP being forbidden if the
current users do not need it (not sure if that is the case in Android,
I'll check) ? We can always relax this constraint later when/if the
use-cases arise, which is obviously much harder to do the other way
around.
+1000. At least on the KVM side, I would like to be as conservative as possible
when it comes to letting anything other than the guest access guest_memfd.
So we'll have to do it similar to any occurrences of "secretmem" in gup.c.
We'll have to see how to marry KVM guest_memfd with core-mm code similar to
e.g., folio_is_secretmem().
IIRC, we might not be able to de-reference the actual mapping because it
could get free concurrently ...
That will then prohibit any kind of GUP access to these pages, including
reading/writing for ptrace/debugging purposes, for core dumping purposes
etc. But at least, you know that nobody was able to optain page references
using GUP that might be used for reading/writing later.
Do you have any concerns to add to enum mapping_flags, AS_NOGUP, and
replacing folio_is_secretmem() with a test of this bit instead of
comparing the a_ops? I think it scales better.
The only concern I have are races, but let's look into the details:
In GUP-fast, we can essentially race with unmap of folios, munmap() of VMAs
etc.
We had a similar discussion recently about possible races. It's documented
in folio_fast_pin_allowed() regarding disabled IRQs and RCU grace periods.
"inodes and thus their mappings are freed under RCU, which means the mapping
cannot be freed beneath us and thus we can safely dereference it."
So if we follow the same rules as folio_fast_pin_allowed(), we can
de-reference folio->mapping, for example comparing mapping->a_ops.
[folio_is_secretmem should better follow the same approach]
Resurecting this discussion, I had discussions internally and as it
turns out Android makes extensive use of vhost/vsock when communicating
with guest VMs, which requires GUP. So, my bad, not supporting GUP for
the pKVM variant of guest_memfd is a bit of a non-starter, we'll need to
support it from the start. But again this should be a matter of 'simply'
having a dedicated KVM exit reason so hopefully it's not too bad.
Likely you should look into ways to teach these interfaces that require
GUP to consume fd+offset instead.
Yes, there is no such thing as free lunch :P
--
Cheers,
David / dhildenb