Re: folio_mmapped

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 04.03.24 13:53, Quentin Perret wrote:
On Friday 01 Mar 2024 at 12:16:54 (+0100), David Hildenbrand wrote:
I don't think that we can assume that only a single VMA covers a page.

But of course, no rmap walk is always better.

We've been thinking some more about how to handle the case where the
host userspace has a mapping of a page that later becomes private.

One idea is to refuse to run the guest (i.e., exit vcpu_run() to back
to the host with a meaningful exit reason) until the host unmaps that
page, and check for the refcount to the page as you mentioned earlier.
This is essentially what the RFC I sent does (minus the bugs :) ) .

The other idea is to use the rmap walk as you suggested to zap that
page. If the host tries to access that page again, it would get a
SIGBUS on the fault. This has the advantage that, as you'd mentioned,
the host doesn't need to constantly mmap() and munmap() pages. It
could potentially be optimised further as suggested if we have a
cooperating VMM that would issue a MADV_DONTNEED or something like
that, but that's just an optimisation and we would still need to have
the option of the rmap walk. However, I was wondering how practical
this idea would be if more than a single VMA covers a page?


Agree with all your points here. I changed Gunyah's implementation to do
the unmap instead of erroring out. I didn't observe a significant
performance difference. However, doing unmap might be a little faster
because we can check folio_mapped() before doing the rmap walk. When
erroring out at mmap() level, we always have to do the walk.

Right. On the mmap() level you won't really have to walk page tables, as the
the munmap() already zapped the page and removed the "problematic" VMA.

Likely, you really want to avoid repeatedly calling mmap()+munmap() just to
access shared memory; but that's just my best guess about your user space
app :)

Ack, and expecting userspace to munmap the pages whenever we hit a valid
mapping in userspace page-tables in the KVM faults path makes for a
somewhat unusual interface IMO. Userspace can munmap, mmap again, and if
it doesn't touch the pages, it can proceed to run the guest just fine,
is that the expectation? If so, it feels like we're 'leaking' internal

It would be weird, and I would not suggest that. It's either

(1) you can leave it mmap'ed, but any access to private memory will SIGBUS. The kernel will try zapping pages inside a VMA itself.

(2) you cannot leave it mmap'ed. In order to convert shared -> private, you have to munmap. mmap will fail if it would cover a currently-private page.

So for (1) you could mmap once in user space and be done with it. For (2) you would have to mmap+munmap when accessing shared memory.

kernel state somehow. The kernel is normally well within its rights to
zap userspace mappings if it wants to e.g. swap. (Obviously mlock is a
weird case, but even in that case, IIRC the kernel still has a certain
amount of flexibility and can use compaction and friends). Similarly,
it should be well within its right to proactively create them. How
would this scheme work if, 10 years from now, something like
Speculative Page Faults makes it into the kernel in a different form?

At least with (2), speculative page faults would never generate a SIGBUS. Just like HW must not generate a fault on speculative access.


Not requiring to userspace to unmap makes the userspace interface a lot
simpler I think -- once a protected guest starts, you better not touch
its memory if it's not been shared back or you'll get slapped on the
wrist. Whether or not those pages have been accessed beforehand for
example is irrelevant.

Yes. So the theory :)

--
Cheers,

David / dhildenb





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux