Ah, this was something I hadn't thought about. I think both Fuad and I
need to update our series to check the refcount rather than mapcount
(kvm_is_gmem_mapped for Fuad, gunyah_folio_lend_safe for me).
An alternative might be !folio_mapped() && !folio_maybe_dma_pinned().
But checking for any unexpected references might be better (there are
still some GUP users that don't use FOLL_PIN).
At least concurrent migration/swapout (that temporarily unmaps a folio
and can give you folio_mapped() "false negatives", which both take a
temporary folio reference and hold the page lock) should not be a
concern because guest_memfd doesn't support that yet.
Now, regarding the original question (disallow mapping the page), I see the
following approaches:
1) SIGBUS during page fault. There are other cases that can trigger
SIGBUS during page faults: hugetlb when we are out of free hugetlb
pages, userfaultfd with UFFD_FEATURE_SIGBUS.
-> Simple and should get the job done.
2) folio_mmapped() + preventing new mmaps covering that folio
-> More complicated, requires an rmap walk on every conversion.
3) Disallow any mmaps of the file while any page is private
-> Likely not what you want.
Why was 1) abandoned? I looks a lot easier and harder to mess up. Why are
you trying to avoid page faults? What's the use case?
We were chatting whether we could do better than the SIGBUS approach.
SIGBUS/FAULT usually crashes userspace, so I was brainstorming ways to
return errors early. One difference between hugetlb and this usecase is
that running out of free hugetlb pages isn't something we could detect
With hugetlb reservation one can try detecting it at mmap() time. But as
reservations are not NUMA aware, it's not reliable.
at mmap time. In guest_memfd usecase, we should be able to detect when
SIGBUS becomes possible due to memory being lent to guest.
I can't think of a reason why userspace would want/be able to resume
operation after trying to access a page that it shouldn't be allowed, so
SIGBUS is functional. The advantage of trying to avoid SIGBUS was
better/easier reporting to userspace.
To me, it sounds conceptually easier and less error-prone to
1) Converting a page to private only if there are no unexpected
references (no mappings, GUP pins, ...)
2) Disallowing mapping private pages and failing the page fault.
3) Handling that small race window only (page lock?)
Instead of
1) Converting a page to private only if there are no unexpected
references (no mappings, GUP pins, ...) and no VMAs covering it where
we could fault it in later
2) Disallowing mmap when the range would contain any private page
3) Handling races between mmap and page conversion
--
Cheers,
David / dhildenb