On Fri, Apr 21, 2023 at 06:33:26PM -0700, Sean Christopherson wrote: > > Code is available here if folks want to take a look before any kind of formal > posting: > > https://github.com/sean-jc/linux.git x86/kvm_gmem_solo Hi Sean, I've been working on getting the SNP patches ported to this but I'm having some trouble working out a reasonable scheme for how to work the RMPUPDATE hooks into the proposed design. One of the main things is kvm_gmem_punch_hole(): this is can free pages back to the host whenever userspace feels like it. Pages that are still marked private in the RMP table will blow up the host if they aren't returned to the normal state before handing them back to the kernel. So I'm trying to add a hook, orchestrated by kvm_arch_gmem_invalidate(), to handle that, e.g.: static long kvm_gmem_punch_hole(struct file *file, int mode, loff_t offset, loff_t len) { struct kvm_gmem *gmem = file->private_data; pgoff_t start = offset >> PAGE_SHIFT; pgoff_t end = (offset + len) >> PAGE_SHIFT; struct kvm *kvm = gmem->kvm; /* * Bindings must stable across invalidation to ensure the start+end * are balanced. */ filemap_invalidate_lock(file->f_mapping); kvm_gmem_invalidate_begin(kvm, gmem, start, end); /* Handle arch-specific cleanups before releasing pages */ kvm_arch_gmem_invalidate(kvm, gmem, start, end); truncate_inode_pages_range(file->f_mapping, offset, offset + len); kvm_gmem_invalidate_end(kvm, gmem, start, end); filemap_invalidate_unlock(file->f_mapping); return 0; } But there's another hook, kvm_arch_gmem_set_mem_attributes(), needed to put the page in its intended state in the RMP table prior to mapping it into the guest's NPT. Currently I'm calling that hook via kvm_vm_ioctl_set_mem_attributes(), just after kvm->mem_attr_array is updated based on the ioctl. The reasoning there is that KVM MMU can then rely on the existing mmu_invalidate_seq logic to ensure both the state in the mem_attr_array and the RMP table are in sync and up-to-date once MMU lock is acquired and MMU is ready to map it, or retry #NPF otherwise. But for kvm_gmem_punch_hole(), kvm_vm_ioctl_set_mem_attributes() can potentially result in something like the following sequence if I implement things as above: CPU0: kvm_gmem_punch_hole(): kvm_gmem_invalidate_begin() kvm_arch_gmem_invalidate() // set pages to default/shared state in RMP table before free'ing CPU1: kvm_vm_ioctl_set_mem_attributes(): kvm_arch_gmem_set_mem_attributes() // maliciously set pages to private in RMP table CPU0: truncate_inode_pages_range() // HOST BLOWS UP TOUCHING PRIVATE PAGES kvm_arch_gmem_invalidate_end() One quick and lazy solution is to rely on the fact that kvm_vm_ioctl_set_mem_attributes() holds the kvm->slots_lock throughout the entire begin()/end() portion of the invalidation sequence, and to similarly hold the kvm->slots_lock throughout the begin()/end() sequence in kvm_gmem_punch_hole() to prevent any interleaving. But I'd imagine overloading kvm->slots_lock is not the proper approach. But would introducing a similar mutex to keep these operations grouped/atomic be a reasonable approach to you, or should we be doing something else entirely here? Keep in mind that RMP updates can't be done while holding KVM->mmu_lock spinlock, because we also need to unmap pages from the directmap, which can lead to scheduling-while-atomic BUG()s[1], so that's another constraint we need to work around. Thanks! -Mike [1] https://lore.kernel.org/linux-coco/20221214194056.161492-7-michael.roth@xxxxxxx/T/#m45a1af063aa5ac0b9314d6a7d46eecb1253bba7a > > [1] https://lore.kernel.org/all/ff5c5b97-acdf-9745-ebe5-c6609dd6322e@xxxxxxxxxx > [2] https://lore.kernel.org/all/20230418-anfallen-irdisch-6993a61be10b@brauner > [3] https://lore.kernel.org/linux-mm/20200522125214.31348-1-kirill.shutemov@xxxxxxxxxxxxxxx