Hi David, On Mon, Feb 26, 2024 at 9:47 AM David Hildenbrand <david@xxxxxxxxxx> wrote: > > On 22.02.24 17:10, Fuad Tabba wrote: > > This series adds restricted mmap() support to guest_memfd [1], as > > well as support guest_memfd on pKVM/arm64. > > > > This series is based on Linux 6.8-rc4 + our pKVM core series [2]. > > The KVM core patches apply to Linux 6.8-rc4 (patches 1-6), but > > the remainder (patches 7-26) require the pKVM core series. A git > > repo with this series applied can be found here [3]. We have a > > (WIP) kvmtool port capable of running the code in this series > > [4]. For a technical deep dive into pKVM, please refer to Quentin > > Perret's KVM Forum Presentation [5, 6]. > > > > I've covered some of the issues presented here in my LPC 2023 > > presentation [7]. > > > > We haven't started using this in Android yet, but we aim to move > > away from anonymous memory to guest_memfd once we have the > > necessary support merged upstream. Others (e.g., Gunyah [8]) are > > also looking into guest_memfd for similar reasons as us. > > > > By design, guest_memfd cannot be mapped, read, or written by the > > host userspace. In pKVM, memory shared between a protected guest > > and the host is shared in-place, unlike the other confidential > > computing solutions that guest_memfd was originally envisaged for > > (e.g, TDX). > > Can you elaborate (or point to a summary) why pKVM has to be special > here? Why can't you use guest_memfd only for private memory and another > (ordinary) memfd for shared memory, like the other confidential > computing technologies are planning to? Because the same memory location can switch back and forth between being shared and private in-place. The host/vmm doesn't know beforehand which parts of the guest's private memory might be shared with it later, therefore, it cannot use guest_memfd() for the private memory and anonymous memory for the shared memory without resorting to copying. Even if it did know beforehand, it wouldn't help much since that memory can change back to being private later on. Other confidential computing proposals like TDX and Arm CCA don't share in place, and need to copy shared data between private and shared memory. If you're interested, there was also a more detailed discussion about this in an earlier guest_memfd() thread [1] > What's the main reason for that decision and can it be avoided? > (s390x also shares in-place, but doesn't need any special-casing like > guest_memfd provides) In our current implementation of pKVM, we use anonymous memory with a long-term gup, and the host ends up with valid mappings. This isn't just a problem for pKVM, but also for TDX and Gunyah [2, 3]. In TDX, accessing guest private memory can be fatal to the host and the system as a whole since it could result in a machine check. In arm64 it's not necessarily fatal to the system as a whole if a userspace process were to attempt the access. However, a userspace process could trick the host kernel to try to access the protected guest memory, e.g., by having a process A strace a malicious process B which passes protected guest memory as argument to a syscall. What makes pKVM and Gunyah special is that both can easily share memory (and its contents) in place, since it's not encrypted, and convert memory locations between shared/unshared. I'm not familiar with how s390x handles sharing in place, or how it handles memory donated to the guest. I assume it's by donating anonymous memory. I would be also interested to know how it handles and recovers from similar situations, i.e., host (userspace or kernel) trying to access guest protected memory. Thank you, /fuad [1] https://lore.kernel.org/all/YkcTTY4YjQs5BRhE@xxxxxxxxxx/ [2] https://lore.kernel.org/all/20231105163040.14904-1-pbonzini@xxxxxxxxxx/ [3] https://lore.kernel.org/all/20240222-gunyah-v17-0-1e9da6763d38@xxxxxxxxxxx/ > > -- > Cheers, > > David / dhildenb >