Re: [RFC PATCH v1 00/26] KVM: Restricted mapping of guest_memfd at the host and pKVM/arm64 support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi David,

On Tue, Feb 27, 2024 at 2:41 PM David Hildenbrand <david@xxxxxxxxxx> wrote:
>
> Hi,
>
> >> Can you elaborate (or point to a summary) why pKVM has to be special
> >> here? Why can't you use guest_memfd only for private memory and another
> >> (ordinary) memfd for shared memory, like the other confidential
> >> computing technologies are planning to?
> >
> > Because the same memory location can switch back and forth between
> > being shared and private in-place. The host/vmm doesn't know
> > beforehand which parts of the guest's private memory might be shared
> > with it later, therefore, it cannot use guest_memfd() for the private
> > memory and anonymous memory for the shared memory without resorting to
>
> I don't remember the latest details about the guest_memfd incarnation in
> user space, but I though we'd be using guest_memfd for private memory
> and an ordinary memfd for shared memory. But maybe it also works with
> anon memory instead of the memfd and that was just an implementation
> detail :)
>
> > copying. Even if it did know beforehand, it wouldn't help much since
> > that memory can change back to being private later on. Other
> > confidential computing proposals like TDX and Arm CCA don't share in
> > place, and need to copy shared data between private and shared memory.
>
> Right.
>
> >
> > If you're interested, there was also a more detailed discussion about
> > this in an earlier guest_memfd() thread [1]
>
> Thanks for the pointer!
>
> >
> >> What's the main reason for that decision and can it be avoided?
> >> (s390x also shares in-place, but doesn't need any special-casing like
> >> guest_memfd provides)
> >
> > In our current implementation of pKVM, we use anonymous memory with a
> > long-term gup, and the host ends up with valid mappings. This isn't
> > just a problem for pKVM, but also for TDX and Gunyah [2, 3]. In TDX,
> > accessing guest private memory can be fatal to the host and the system
> > as a whole since it could result in a machine check. In arm64 it's not
> > necessarily fatal to the system as a whole if a userspace process were
> > to attempt the access. However, a userspace process could trick the
> > host kernel to try to access the protected guest memory, e.g., by
> > having a process A strace a malicious process B which passes protected
> > guest memory as argument to a syscall.
>
> Right.
>
> >
> > What makes pKVM and Gunyah special is that both can easily share
> > memory (and its contents) in place, since it's not encrypted, and
> > convert memory locations between shared/unshared. I'm not familiar
> > with how s390x handles sharing in place, or how it handles memory
> > donated to the guest. I assume it's by donating anonymous memory. I
> > would be also interested to know how it handles and recovers from
> > similar situations, i.e., host (userspace or kernel) trying to access
> > guest protected memory.
>
> I don't know all of the s390x "protected VM" details, but it is pretty
> similar. Take a look at arch/s390/kernel/uv.c if you are interested.
>
> There are "ultravisor" calls that can convert a page
> * from secure (inaccessible by the host) to non-secure (encrypted but
>    accessible by the host)
> * from non-secure to secure
>
> Once the host tries to access a "secure" page -- either from the kernel
> or from user space, the host gets a page fault and calls
> arch_make_page_accessible(). That will encrypt page content such that
> the host can access it (migrate/swapout/whatsoever).
>
> The host has to set aside some memory area for the ultravisor to
> "remember" page state.
>
> So you can swapout/migrate these pages, but the host will only read
> encrypted garbage. In contrast to disallowing access to these pages.
>
> So you don't need any guest_memfd games to protect from that -- and one
> doesn't have to travel back in time to have memory that isn't
> swappable/migratable and only comes in one page size.
>
> [I'm not up-to-date which obscure corner-cases CCA requirement the s390x
> implementation cannot fulfill -- like replacing pages in page tables and
> such; I suspect pKVM also cannot cover all these corner-cases]

Thanks for this. I'll do some more reading on how things work with s390x.

Right, and of course, one key difference of course is that pKVM
doesn't encrypt anything, and only relies on stage-2 protection to
protect the guest.

>
> Extending guest_memfd (the one that was promised initially to not be
> mmappable) to be mmappable just to avoid some crashes in corner cases is
> the right approach. But I'm pretty sure that has all been discussed
> before, that's why I am asking about some details :)

Thank you very much for your reviews and comments. They've already
been very helpful. I noticed the gmap.h in the s390 source, which
might also be something that we could learn from. So please do ask for
as much details as you like.

Cheers,
/fuad

> --
> Cheers,
>
> David / dhildenb
>





[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux