> From: Andy Lutomirski <luto@xxxxxxxxxx> > > > As it stands, the way that KVM memory mappings are created seems to > > > be convenient, but it also seems to be resulting in increasing > > > bizarre userspace mappings. At what point is the right solution to > > > decouple KVM’s mappings from QEMU’s? Convenience is one of the drivers of code reuse. > > So what you are suggesting is that KVM manages its own address space > > instead of host virtual addresses (and with no relationship to host > > virtual addresses, it would be just a "cookie")? It would then need a > > couple ioctls to mmap/munmap (creating and deleting VMAs) into the > > address space, and those cookies would be passed to > > KVM_SET_USER_MEMORY_REGION. QEMU would still need access to > these > > VMAs, would it mmap a file descriptor provided by KVM? All in all the > > implementation seems quite complex, and I don't understand why it > > would avoid incoherent SEV mappings; what am I missing? > > It might not avoid incoherent SEV mappings in particular, but it would > certainly enable other, somewhat related usecases. For example, QEMU > could have KVM map a memfd without itself mapping that memfd, which > would reduce the extent to which the memory would be exposed to an > attacker who can read QEMU memory. Isn't this security through obscurity? > For this pidfd-mem scheme in particular, it might avoid the nasty corner case > I mentioned. With pidfd-mem as in this patchset, I'm concerned about what > happens when process A maps some process B memory, process B maps > some of process A's memory, and there's a recursive mapping that results. > Or when a process maps its own memory, for that matter. If KVM could map > fd's directly, then there could be a parallel mechanism for KVM to import > portions of more than one process's address space, and this particular > problem would be avoided. So a process would create pidfd-mem-like > object and pass that to KVM (via an intermediary process or directly) and > KVM could map that, but no normal process would be allowed to map it. This > avoids the recursion problems. > > Or memfd could get fancier with operations to split memfds, remove pages > from memfds, etc. Maybe that's overkill. > > (Or a fancy recursion detector could be built, but this has been a pain point in > AF_UNIX, epoll, etc in the past. It may be solvable, but it won't be pretty.) > > I admit that allowing KVM to map fd's directly without some specific > vm_operations support for this could be challenging, but ISTM kvm could > plausibly own an mm_struct and pagetables at the cost of some wasted > memory. The result would, under the hood, work more or less like the > current implementation, but the API would be quite different. This looks like an attempt to pass memory related concerns to KVM developers. The userspace mapping mechanism is good as it is. Probably not perfect, just good. The problem is that it's stuck to a few VMA models and needs to evolve towards more bizarre/sketchy/weird/fragile patterns. Also the memory code is one of the most tightly coupled code I have seen. Probably explains the fear of the maintainers to try something new. Mircea