Re: folio_mmapped

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi!

[...]


Any state I am missing?

So there is probably state (0) which is 'owned only by the host'. It's a
bit obvious, but I'll make it explicit because it has its importance for
the rest of the discussion.

Yes, I treated it as "simply not mapped into the VM".


And while at it, there are other cases (memory shared/owned with/by the
hypervisor and/or TrustZone) but they're somewhat irrelevant to this
discussion. These pages are usually backed by kernel allocations, so
much less problematic to deal with. So let's ignore those.

Which transitions are possible?

Basically a page must be in the 'exclusively owned' state for an owner
to initiate a share or donation. So e.g. a shared page must be unshared
before it can be donated to someone else (that is true regardless of the
owner, host, guest, hypervisor, ...). That simplifies significantly the
state tracking in pKVM.

Makes sense!


(1) <-> (2) ? Not sure if the direct transition is possible.

Yep, not possible.

(2) <-> (3) ? IIUC yes.

Actually it's not directly possible as is. The ballooning procedure is
essentially a (1) -> (0) transition. (We also tolerate (3) -> (0) in a
single hypercall when doing ballooning, but it's technically just a
(3) -> (1) -> (0) sequence that has been micro-optimized).

Note that state (2) is actually never used for protected VMs. It's
mainly used to implement standard non-protected VMs. The biggest

Interesting.

difference in pKVM between protected and non-protected VMs is basically
that in the former case, in the fault path KVM does a (0) -> (1)
transition, but in the latter it's (0) -> (2). That implies that in the
unprotected case, the host remains the page owner and is allowed to
decide to unshare arbitrary pages, to restrict the guest permissions for
the shared pages etc, which paves the way for implementing migration,
swap, ... relatively easily.

I'll have to digest that :)

... does that mean that for pKVM with protected VMs, "shared" pages are also never migratable/swappable?


(1) <-> (3) ? IIUC yes.

Yep.

<snip>
I agree on all of these and, yes, (3) is the problem for us. We've also
been thinking a bit about CoW recently and I suspect the use of
vm_normal_page() in do_wp_page() could lead to issues similar to those
we hit with GUP. There are various ways to approach that, but I'm not
sure what's best.

Would COW be required or is that just the nasty side-effect of trying to use
anonymous memory?

That'd qualify as an undesirable side effect I think.

Makes sense!



I'm curious, may there be a requirement in the future that shared memory
could be mapped into other processes? (thinking vhost-user and such things).

It's not impossible. We use crosvm as our VMM, and that has a
multi-process sandbox mode which I think relies on just that...


Okay, so basing the design on anonymous memory might not be the best choice
... :/

So, while we're at this stage, let me throw another idea at the wall to
see if it sticks :-)

One observation is that a standard memfd would work relatively well for
pKVM if we had a way to enforce that all mappings to it are MAP_SHARED.

It should be fairly easy to enforce, I wouldn't worry too much about that.

KVM would still need to take an 'exclusive GUP' from the fault path
(which may fail in case of a pre-existing GUP, but that's fine), but
then CoW and friends largely become a non-issue by construction I think.
Is there any way we could enforce that cleanly? Perhaps introducing a
sort of 'mmap notifier' would do the trick? By that I mean something a
bit similar to an MMU notifier offered by memfd that KVM could register
against whenever the memfd is attached to a protected VM memslot.

One of the nice things here is that we could retain an entire mapping of
the whole of guest memory in userspace, conversions wouldn't require any
additional efforts from userspace. A bad thing is that a process that is
being passed such a memfd may not expect the new semantic and the
inability to map !MAP_SHARED. But I guess a process that receives a

I wouldn't worry about the !MAP_SHARED requirement. vhost-user and friends all *must* map it MAP_SHARED to do anything reasonable, so that's what they do.

handle to private memory must be enlightened regardless of the type of
fd, so maybe it's not so bad.

Thoughts?

The whole reason I brought up the guest_memfd+memfd pair idea is that you would similarly be able to do the conversion in the kernel, BUT, you'd never be able to mmap+GUP encrypted pages.

Essentially you're using guest_memfd for what it was designed for: private memory that is inaccessible.

--
Cheers,

David / dhildenb





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux