On Sat, Sep 05, 2020 at 08:27:29PM +0200, Paolo Bonzini wrote: > On 05/09/20 01:17, Andy Lutomirski wrote: > > There's sev_pin_memory(), so QEMU must have at least some idea of > > which memory could potentially be encrypted. Is it in fact the case > > that QEMU doesn't know that some SEV pinned memory might actually be > > used for DMA until the guest tries to do DMA on that memory? If so, > > yuck. > > Yes. All the memory is pinned, all the memory could potentially be used > for DMA (of garbage if it's encrypted). And it's the same for pretty > much all protected VM extensions (SEV, POWER, s390, Intel TDX). > > >> The primary VM and the enclave VM(s) would each get a different memory > >> access file descriptor. QEMU would treat them no differently from any > >> other externally-provided memory backend, say hugetlbfs or memfd, so > >> yeah they would be mmap-ed to userspace and the host virtual address > >> passed as usual to KVM. > > > > Would the VM processes mmap() these descriptors, or would KVM learn > > how to handle that memory without it being mapped? > > The idea is that the process mmaps them, QEMU would treat them just the > same as a hugetlbfs file descriptor for example. > > >> The manager can decide at any time to hide some memory from the parent > >> VM (in order to give it to an enclave). This would actually be done on > >> request of the parent VM itself [...] But QEMU is > >> untrusted, so the manager cannot rely on QEMU behaving well. Hence the > >> privilege separation model that was implemented here. > > > > How does this work? Is there a revoke mechanism, or does the parent > > just munmap() the memory itself? > > The parent has ioctls to add and remove memory from the pidfd-mem. So > unmapping is just calling the ioctl that removes a range. I would strongly suggest we move away from ioctl() patterns. If something like this comes up in the future, just propose them as system calls. > > >> So what you are suggesting is that KVM manages its own address space > >> instead of host virtual addresses (and with no relationship to host > >> virtual addresses, it would be just a "cookie")? > > > > [...] For this pidfd-mem scheme in particular, it might avoid the nasty > > corner case I mentioned. With pidfd-mem as in this patchset, I'm > > concerned about what happens when process A maps some process B > > memory, process B maps some of process A's memory, and there's a > > recursive mapping that results. Or when a process maps its own > > memory, for that matter. > > > > Or memfd could get fancier with operations to split memfds, remove > > pages from memfds, etc. Maybe that's overkill. > > Doing it directly with memfd is certainly an option, especially since > MFD_HUGE_* exists. Basically you'd have a system call to create a I like that idea way better to be honest. Christian