On 29.02.24 20:01, Fuad Tabba wrote:
Hi David,
...
Hi!
"mmap() the whole thing once and only access what you are supposed to
(> > > access" sounds reasonable to me. If you don't play by the rules, you get a
signal.
"... you get a signal, or maybe you don't". But yes I understand your
point, and as per the above there are real benefits to this approach so
why not.
What do we expect userspace to do when a page goes from shared back to
being guest-private, because e.g. the guest decides to unshare? Use
munmap() on that page? Or perhaps an madvise() call of some sort? Note
that this will be needed when starting a guest as well, as userspace
needs to copy the guest payload in the guestmem file prior to starting
the protected VM.
Let's assume we have the whole guest_memfd mapped exactly once in our
process, a single VMA.
When setting up the VM, we'll write the payload and then fire up the VM.
That will (I assume) trigger some shared -> private conversion.
When we want to convert shared -> private in the kernel, we would first
check if the page is currently mapped. If it is, we could try unmapping that
page using an rmap walk.
I had not considered that. That would most certainly be slow, but a well
behaved userspace process shouldn't hit it so, that's probably not a
problem...
If there really only is a single VMA that covers the page (or even mmaps
the guest_memfd), it should not be too bad. For example, any
fallocate(PUNCHHOLE) has to do the same, to unmap the page before
discarding it from the pagecache.
I don't think that we can assume that only a single VMA covers a page.
But of course, no rmap walk is always better.
We've been thinking some more about how to handle the case where the
host userspace has a mapping of a page that later becomes private.
One idea is to refuse to run the guest (i.e., exit vcpu_run() to back
to the host with a meaningful exit reason) until the host unmaps that
page, and check for the refcount to the page as you mentioned earlier.
This is essentially what the RFC I sent does (minus the bugs :) ) .
:)
The other idea is to use the rmap walk as you suggested to zap that
page. If the host tries to access that page again, it would get a
SIGBUS on the fault. This has the advantage that, as you'd mentioned,
the host doesn't need to constantly mmap() and munmap() pages. It
could potentially be optimised further as suggested if we have a
cooperating VMM that would issue a MADV_DONTNEED or something like
that, but that's just an optimisation and we would still need to have
the option of the rmap walk. However, I was wondering how practical
this idea would be if more than a single VMA covers a page?
A few VMAs won't make a difference I assume. Any idea how many "sharers"
you'd expect in a sane configuration?
Also, there's the question of what to do if the page is gupped? In
this case I think the only thing we can do is refuse to run the guest
until the gup (and all references) are released, which also brings us
back to the way things (kind of) are...
Indeed. There is no way you could possibly make progress.
--
Cheers,
David / dhildenb