On 10.02.20 13:26, David Hildenbrand wrote: > On 07.02.20 12:39, Christian Borntraeger wrote: >> From: Ulrich Weigand <Ulrich.Weigand@xxxxxxxxxx> >> >> The adapter interrupt page containing the indicator bits is currently >> pinned. That means that a guest with many devices can pin a lot of >> memory pages in the host. This also complicates the reference tracking >> which is needed for memory management handling of protected virtual >> machines. >> We can reuse the pte notifiers to "cache" the page without pinning it. >> >> Signed-off-by: Ulrich Weigand <Ulrich.Weigand@xxxxxxxxxx> >> Suggested-by: Andrea Arcangeli <aarcange@xxxxxxxxxx> >> [borntraeger@xxxxxxxxxx: patch merging, splitting, fixing] >> Signed-off-by: Christian Borntraeger <borntraeger@xxxxxxxxxx> >> --- > > So, instead of pinning explicitly, look up the page address, cache it, > and glue its lifetime to the gmap table entry. When that entry is > changed, invalidate the cached page. On re-access, look up the page > again and register the gmap notifier for the table entry again. I think I might want to split this into two parts. part 1: a naive approach that always does get_user_pages_remote/put_page part 2: do the complex caching Ulrich mentioned that this actually could make the map/unmap a no-op as we have the address and bit already in the irq route. In the end this might be as fast as todays pinning as we replace a list walk with a page table walk. Plus it would simplify the code. Will have a look if that is the case. > > [...] > >> #define MAX_S390_IO_ADAPTERS ((MAX_ISC + 1) * 8) >> diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c >> index c06c89d370a7..4bfb2f8fe57c 100644 >> --- a/arch/s390/kvm/interrupt.c >> +++ b/arch/s390/kvm/interrupt.c >> @@ -28,6 +28,7 @@ >> #include <asm/switch_to.h> >> #include <asm/nmi.h> >> #include <asm/airq.h> >> +#include <linux/pagemap.h> >> #include "kvm-s390.h" >> #include "gaccess.h" >> #include "trace-s390.h" >> @@ -2328,8 +2329,8 @@ static int register_io_adapter(struct kvm_device *dev, >> return -ENOMEM; >> >> INIT_LIST_HEAD(&adapter->maps); >> - init_rwsem(&adapter->maps_lock); >> - atomic_set(&adapter->nr_maps, 0); >> + spin_lock_init(&adapter->maps_lock); >> + adapter->nr_maps = 0; >> adapter->id = adapter_info.id; >> adapter->isc = adapter_info.isc; >> adapter->maskable = adapter_info.maskable; >> @@ -2375,19 +2376,15 @@ static int kvm_s390_adapter_map(struct kvm *kvm, unsigned int id, __u64 addr) >> ret = -EFAULT; >> goto out; >> } >> - ret = get_user_pages_fast(map->addr, 1, FOLL_WRITE, &map->page); >> - if (ret < 0) >> - goto out; >> - BUG_ON(ret != 1); >> - down_write(&adapter->maps_lock); >> - if (atomic_inc_return(&adapter->nr_maps) < MAX_S390_ADAPTER_MAPS) { >> + spin_lock(&adapter->maps_lock); >> + if (adapter->nr_maps < MAX_S390_ADAPTER_MAPS) { >> + adapter->nr_maps++; >> list_add_tail(&map->list, &adapter->maps); > > I do wonder if we should check for duplicates. The unmap path will only > remove exactly one entry. But maybe this can never happen or is already > handled on a a higher layer. This would be a broken userspace, but I also do not see a what would break in the host if this happens. > >> } >> @@ -2430,7 +2426,6 @@ void kvm_s390_destroy_adapters(struct kvm *kvm) >> list_for_each_entry_safe(map, tmp, >> &kvm->arch.adapters[i]->maps, list) { >> list_del(&map->list); >> - put_page(map->page); >> kfree(map); >> } >> kfree(kvm->arch.adapters[i]); > > Between the gmap being removed in kvm_arch_vcpu_destroy() and > kvm_s390_destroy_adapters(), the entries would no longer properly get > invalidated. AFAIK, removing/freeing the gmap will not trigger any > notifiers. > > Not sure if that's an issue (IOW, if we can have some very weird race). > But I guess we would have similar races already :) This is only called when all file descriptors are closed and this also closes all irq routes. So I guess no I/O should be going on any more. > >> @@ -2690,6 +2685,31 @@ struct kvm_device_ops kvm_flic_ops = { >> .destroy = flic_destroy, >> }; >> >> +void kvm_s390_adapter_gmap_notifier(struct gmap *gmap, unsigned long start, >> + unsigned long end) >> +{ >> + struct kvm *kvm = gmap->private; >> + struct s390_map_info *map, *tmp; >> + int i; >> + >> + for (i = 0; i < MAX_S390_IO_ADAPTERS; i++) { >> + struct s390_io_adapter *adapter = kvm->arch.adapters[i]; >> + >> + if (!adapter) >> + continue; > > I have to ask very dumb: How is kvm->arch.adapters[] protected? We only add new ones and this is removed at guest teardown it seems. [...] Let me have a look if we can simplify this.