Hi, On 2/5/20 5:01 PM, Andre Przywara wrote: > On Thu, 23 Jan 2020 13:47:57 +0000 > Alexandru Elisei <alexandru.elisei@xxxxxxx> wrote: > > Hi, > >> When we want to map a device region into the guest address space, first >> we perform an mmap on the device fd. The resulting VMA is a mapping >> between host userspace addresses and physical addresses associated with >> the device. Next, we create a memslot, which populates the stage 2 table >> with the mappings between guest physical addresses and the device >> physical adresses. >> >> However, when we want to unmap the device from the guest address space, >> we only call munmap, which destroys the VMA and the stage 2 mappings, >> but doesn't destroy the memslot and kvmtool's internal mem_bank >> structure associated with the memslot. >> >> This has been perfectly fine so far, because we only unmap a device >> region when we exit kvmtool. This is will change when we add support for >> reassignable BARs, and we will have to unmap vfio regions as the guest >> kernel writes new addresses in the BARs. This can lead to two possible >> problems: >> >> - We refuse to create a valid BAR mapping because of a stale mem_bank >> structure which belonged to a previously unmapped region. >> >> - It is possible that the mmap in vfio_map_region returns the same >> address that was used to create a memslot, but was unmapped by >> vfio_unmap_region. Guest accesses to the device memory will fault >> because the stage 2 mappings are missing, and this can lead to >> performance degradation. >> >> Let's do the right thing and destroy the memslot and the mem_bank struct >> associated with it when we unmap a vfio region. Set host_addr to NULL >> after the munmap call so we won't try to unmap an address which is >> currently used if vfio_unmap_region gets called twice. >> >> Signed-off-by: Alexandru Elisei <alexandru.elisei@xxxxxxx> >> --- >> include/kvm/kvm.h | 2 ++ >> kvm.c | 65 ++++++++++++++++++++++++++++++++++++++++++++--- >> vfio/core.c | 6 +++++ >> 3 files changed, 69 insertions(+), 4 deletions(-) >> >> diff --git a/include/kvm/kvm.h b/include/kvm/kvm.h >> index 50119a8672eb..c7e57b890cdd 100644 >> --- a/include/kvm/kvm.h >> +++ b/include/kvm/kvm.h >> @@ -56,6 +56,7 @@ struct kvm_mem_bank { >> void *host_addr; >> u64 size; >> enum kvm_mem_type type; >> + u32 slot; >> }; >> >> struct kvm { >> @@ -106,6 +107,7 @@ void kvm__irq_line(struct kvm *kvm, int irq, int level); >> void kvm__irq_trigger(struct kvm *kvm, int irq); >> bool kvm__emulate_io(struct kvm_cpu *vcpu, u16 port, void *data, int direction, int size, u32 count); >> bool kvm__emulate_mmio(struct kvm_cpu *vcpu, u64 phys_addr, u8 *data, u32 len, u8 is_write); >> +int kvm__destroy_mem(struct kvm *kvm, u64 guest_phys, u64 size, void *userspace_addr); >> int kvm__register_mem(struct kvm *kvm, u64 guest_phys, u64 size, void *userspace_addr, >> enum kvm_mem_type type); >> static inline int kvm__register_ram(struct kvm *kvm, u64 guest_phys, u64 size, >> diff --git a/kvm.c b/kvm.c >> index 57c4ff98ec4c..afcf55c7bf45 100644 >> --- a/kvm.c >> +++ b/kvm.c >> @@ -183,20 +183,75 @@ int kvm__exit(struct kvm *kvm) >> } >> core_exit(kvm__exit); >> >> +int kvm__destroy_mem(struct kvm *kvm, u64 guest_phys, u64 size, >> + void *userspace_addr) >> +{ >> + struct kvm_userspace_memory_region mem; >> + struct kvm_mem_bank *bank; >> + int ret; >> + >> + list_for_each_entry(bank, &kvm->mem_banks, list) >> + if (bank->guest_phys_addr == guest_phys && >> + bank->size == size && bank->host_addr == userspace_addr) >> + break; > Shouldn't we protect the list with some lock? I am actually not sure we have this problem already, but at least now a guest could reassign BARs concurrently on different VCPUs, in which case multiple kvm__destroy_mem() and kvm__register_dev_mem() calls might race against each other. > I think so far we got away with it because of the currently static nature of the memslot assignment. And the fact that I haven't tested PCI passthrough with more than one device :) I'll protect changes to the memory banks with a lock. > >> + >> + if (&bank->list == &kvm->mem_banks) { >> + pr_err("Region [%llx-%llx] not found", guest_phys, >> + guest_phys + size - 1); >> + return -EINVAL; >> + } >> + >> + if (bank->type == KVM_MEM_TYPE_RESERVED) { >> + pr_err("Cannot delete reserved region [%llx-%llx]", >> + guest_phys, guest_phys + size - 1); >> + return -EINVAL; >> + } >> + >> + mem = (struct kvm_userspace_memory_region) { >> + .slot = bank->slot, >> + .guest_phys_addr = guest_phys, >> + .memory_size = 0, >> + .userspace_addr = (unsigned long)userspace_addr, >> + }; >> + >> + ret = ioctl(kvm->vm_fd, KVM_SET_USER_MEMORY_REGION, &mem); >> + if (ret < 0) >> + return -errno; >> + >> + list_del(&bank->list); >> + free(bank); >> + kvm->mem_slots--; >> + >> + return 0; >> +} >> + >> int kvm__register_mem(struct kvm *kvm, u64 guest_phys, u64 size, >> void *userspace_addr, enum kvm_mem_type type) >> { >> struct kvm_userspace_memory_region mem; >> struct kvm_mem_bank *merged = NULL; >> struct kvm_mem_bank *bank; >> + struct list_head *prev_entry; >> + u32 slot; >> int ret; >> >> - /* Check for overlap */ >> + /* Check for overlap and find first empty slot. */ >> + slot = 0; >> + prev_entry = &kvm->mem_banks; >> list_for_each_entry(bank, &kvm->mem_banks, list) { >> u64 bank_end = bank->guest_phys_addr + bank->size - 1; >> u64 end = guest_phys + size - 1; >> - if (guest_phys > bank_end || end < bank->guest_phys_addr) >> + if (guest_phys > bank_end || end < bank->guest_phys_addr) { >> + /* >> + * Keep the banks sorted ascending by slot, so it's >> + * easier for us to find a free slot. >> + */ >> + if (bank->slot == slot) { >> + slot++; >> + prev_entry = &bank->list; >> + } >> continue; >> + } >> >> /* Merge overlapping reserved regions */ >> if (bank->type == KVM_MEM_TYPE_RESERVED && >> @@ -241,10 +296,11 @@ int kvm__register_mem(struct kvm *kvm, u64 guest_phys, u64 size, >> bank->host_addr = userspace_addr; >> bank->size = size; >> bank->type = type; >> + bank->slot = slot; >> >> if (type != KVM_MEM_TYPE_RESERVED) { >> mem = (struct kvm_userspace_memory_region) { >> - .slot = kvm->mem_slots++, >> + .slot = slot, >> .guest_phys_addr = guest_phys, >> .memory_size = size, >> .userspace_addr = (unsigned long)userspace_addr, >> @@ -255,7 +311,8 @@ int kvm__register_mem(struct kvm *kvm, u64 guest_phys, u64 size, >> return -errno; >> } >> >> - list_add(&bank->list, &kvm->mem_banks); >> + list_add(&bank->list, prev_entry); >> + kvm->mem_slots++; >> >> return 0; >> } >> diff --git a/vfio/core.c b/vfio/core.c >> index 0ed1e6fee6bf..73fdac8be675 100644 >> --- a/vfio/core.c >> +++ b/vfio/core.c >> @@ -256,8 +256,14 @@ int vfio_map_region(struct kvm *kvm, struct vfio_device *vdev, >> >> void vfio_unmap_region(struct kvm *kvm, struct vfio_region *region) >> { >> + u64 map_size; >> + >> if (region->host_addr) { >> + map_size = ALIGN(region->info.size, PAGE_SIZE); >> munmap(region->host_addr, region->info.size); >> + kvm__destroy_mem(kvm, region->guest_phys_addr, map_size, >> + region->host_addr); > Shouldn't we destroy the memslot first, then unmap? Because in the current version we are giving a no longer valid userland address to the ioctl. I actually wonder how that passes the access_ok() check in the kernel's KVM_SET_USER_MEMORY_REGION handler. Yes, you're right. From Documentation/virt/kvm/api.txt, section 4.35 KVM_SET_USER_MEMORY_REGION: "[..] Memory for the region is taken starting at the address denoted by the field userspace_addr, which must point at user addressable memory for the entire memory slot size." I'll put the munmap after the ioctl. Thanks, Alex > > Cheers, > Andre > >> + region->host_addr = NULL; >> } else if (region->is_ioport) { >> ioport__unregister(kvm, region->port_base); >> } else {