From: Sean Christopherson <seanjc@xxxxxxxxxx> Explicitly disallow creating more memslot pages than can fit in an unsigned long, KVM doesn't correctly handle a total number of memslot pages that doesn't fit in an unsigned long and remedying that would be a waste of time. For a 64-bit kernel, this is a nop as memslots are not allowed to overlap in the gfn address space. With a 32-bit kernel, userspace can at most address 3gb of virtual memory, whereas wrapping the total number of pages would require 4tb+ of guest physical memory. Even with x86's second address space for SMM, userspace would need to alias all of guest memory more than one _thousand_ times. And on older x86 hardware with MAXPHYADDR < 43, the guest couldn't actually access any of those aliases even if userspace lied about guest.MAXPHYADDR. On 390 and arm64, this is a nop as they don't support 32-bit hosts. On x86, practically speaking this is simply acknowledging reality as the existing kvm_mmu_calculate_default_mmu_pages() assumes the total number of pages fits in an "unsigned long". On PPC, this is likely a nop as every flavor of PPC KVM assumes gfns (and gpas!) fit in unsigned long. arch/powerpc/kvm/book3s_32_mmu_host.c goes a step further and fails the build if CONFIG_PTE_64BIT=y, which presumably means that it does't support 64-bit physical addresses. On MIPS, this is also likely a nop as the core MMU helpers assume gpas fit in unsigned long, e.g. see kvm_mips_##name##_pte. And finally, RISC-V is a "don't care" as it doesn't exist in any release, i.e. there is no established ABI to break. Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx> Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@xxxxxxxxxx> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@xxxxxxxxxx> --- include/linux/kvm_host.h | 1 + virt/kvm/kvm_main.c | 19 +++++++++++++++++++ 2 files changed, 20 insertions(+) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index a745efe389ab..a6830966f8eb 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -554,6 +554,7 @@ struct kvm { */ struct mutex slots_arch_lock; struct mm_struct *mm; /* userspace tied to this vm */ + unsigned long nr_memslot_pages; struct kvm_memslots __rcu *memslots[KVM_ADDRESS_SPACE_NUM]; struct xarray vcpu_array; diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 863112783ed9..4a1b484518a9 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1640,6 +1640,15 @@ static int kvm_set_memslot(struct kvm *kvm, update_memslots(slots, new, change); slots = install_new_memslots(kvm, as_id, slots); + /* + * Update the total number of memslot pages before calling the arch + * hook so that architectures can consume the result directly. + */ + if (change == KVM_MR_DELETE) + kvm->nr_memslot_pages -= old.npages; + else if (change == KVM_MR_CREATE) + kvm->nr_memslot_pages += new->npages; + kvm_arch_commit_memory_region(kvm, mem, &old, new, change); /* Free the old memslot's metadata. Note, this is the full copy!!! */ @@ -1670,6 +1679,9 @@ static int kvm_delete_memslot(struct kvm *kvm, if (!old->npages) return -EINVAL; + if (WARN_ON_ONCE(kvm->nr_memslot_pages < old->npages)) + return -EIO; + memset(&new, 0, sizeof(new)); new.id = old->id; /* @@ -1753,6 +1765,13 @@ int __kvm_set_memory_region(struct kvm *kvm, if (!old.npages) { change = KVM_MR_CREATE; new.dirty_bitmap = NULL; + + /* + * To simplify KVM internals, the total number of pages across + * all memslots must fit in an unsigned long. + */ + if ((kvm->nr_memslot_pages + new.npages) < kvm->nr_memslot_pages) + return -EINVAL; } else { /* Modify an existing slot. */ if ((new.userspace_addr != old.userspace_addr) || (new.npages != old.npages) ||