Re: folio_mmapped

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]


On 28.02.24 11:48, Quentin Perret wrote:
On Tuesday 27 Feb 2024 at 15:59:37 (+0100), David Hildenbrand wrote:

Ah, this was something I hadn't thought about. I think both Fuad and I
need to update our series to check the refcount rather than mapcount
(kvm_is_gmem_mapped for Fuad, gunyah_folio_lend_safe for me).

An alternative might be !folio_mapped() && !folio_maybe_dma_pinned(). But
checking for any unexpected references might be better (there are still some
GUP users that don't use FOLL_PIN).

As a non-mm person I'm not sure to understand to consequences of holding
a GUP pin to a page that is not covered by any VMA. The absence of VMAs
imply that userspace cannot access the page right? Presumably the kernel
can't be coerced into accessing that page either? Is that correct?

Simple example: register the page using an iouring fixed buffer, then unmap the VMA. iouring now has the page pinned and can read/write it using an address in the kernel vitual address space (direct map).

Then, you can happily make the kernel read/write that page using iouring, even though no VMA still covers/maps that page.


Instead of

1) Converting a page to private only if there are no unexpected
    references (no mappings, GUP pins, ...) and no VMAs covering it where
    we could fault it in later
2) Disallowing mmap when the range would contain any private page
3) Handling races between mmap and page conversion

The one thing that makes the second option cleaner from a userspace
perspective (IMO) is that the conversion to private is happening lazily
during guest faults. So whether or not an mmapped page can indeed be
accessed from userspace will be entirely undeterministic as it depends
on the guest faulting pattern which userspace is entirely unaware of.
Elliot's suggestion would prevent spurious crashes caused by that
somewhat odd behaviour, though arguably sane userspace software
shouldn't be doing that to start with.

The last sentence is the important one. User space should not access that memory. If it does, it gets a slap on the hand. Because it should not access that memory.

We might even be able to export to user space which pages are currently accessible and which ones not (e.g., pagemap), although it would be racy as long as the VM is running and can trigger a conversion.

To add a layer of paint to the shed, the usage of SIGBUS for
something that is really a permission access problem doesn't feel

SIGBUS stands for "BUS error (bad memory access)."

Which makes sense, if you try accessing something that can no longer be accessed. It's now inaccessible. Even if it is temporarily.

Just like a page with an MCE error. Swapin errors. Etc. You cannot access it.

It might be a permission problem on the pKVM side, but it's not the traditional "permission problem" as in mprotect() and friends. You cannot resolve that permission problem yourself. It's a higher entity that turned that memory inaccessible.

appropriate. Allocating memory via guestmem and donating that to a
protected guest is a way for userspace to voluntarily relinquish access
permissions to the memory it allocated. So a userspace process violating
that could, IMO, reasonably expect a SEGV instead of SIGBUS. By the
point that signal would be sent, the page would have been accounted
against that userspace process, so not sure the paging examples that
were discussed earlier are exactly comparable. To illustrate that
differently, given that pKVM and Gunyah use MMU-based protection, there
is nothing architecturally that prevents a guest from sharing a page
back with Linux as RO.

Sure, then allow page faults that allow for reads and give a signal on write faults.

In the scenario, it even makes more sense to not constantly require new mmap's from user space just to access a now-shared page.

Note that we don't currently support this, so I
don't want to conflate this use case, but that hopefully makes it a
little more obvious that this is a "there is a page, but you don't
currently have the permission to access it" problem rather than "sorry
but we ran out of pages" problem.

We could user other signals, at least as the semantics are clear and it's documented. Maybe SIGSEGV would be warranted.

I consider that a minor detail, though.

Requiring mmap()/munmap() dances just to access a page that is now shared from user space sounds a bit suboptimal. But I don't know all the details of the user space implementation.

"mmap() the whole thing once and only access what you are supposed to access" sounds reasonable to me. If you don't play by the rules, you get a signal.

But I'm happy to learn more details.


David / dhildenb

[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux