This series improves scalabiity with userfaultfd-based postcopy live migration. It implements the no-slow-gup approach which James Houghton described in his earlier RFC ([1]). The new cap KVM_CAP_MEM_FAULT_NOWAIT, is introduced, which causes KVM to exit to userspace if fast get_user_pages (GUP) fails while resolving a page fault. The motivation is to allow (most) EPT violations to be resolved without going through userfaultfd, which involves serializing faults on internal locks: see [1] for more details. After receiving the new exit, userspace can check if it has previously UFFDIO_COPY/CONTINUEd the faulting address- if not, then it knows that fast GUP could not possibly have succeeded, and so the fault has to be resolved via UFFDIO_COPY/CONTINUE. In these cases a UFFDIO_WAKE is unnecessary, as the vCPU thread hasn't been put to sleep waiting on the uffd. If userspace *has* already COPY/CONTINUEd the address, then it must take some other action to make fast GUP succeed: such as swapping in the page (for instance, via MADV_POPULATE_WRITE for writable mappings). This feature should only be enabled during userfaultfd postcopy, as it prevents the generation of async page faults. The actual kernel changes to implement the change on arm64/x86 are small: most of this series is actually just adding support for the new feature in the demand paging self test. Performance samples (rates reported in thousands of pages/s, average of five runs each) generated using [2] on an x86 machine with 256 cores, are shown below. vCPUs, Paging Rate (w/o new cap), Paging Rate (w/ new cap) 1 150 340 2 191 477 4 210 809 8 155 1239 16 130 1595 32 108 2299 64 86 3482 128 62 4134 256 36 4012 [1] https://lore.kernel.org/linux-mm/CADrL8HVDB3u2EOhXHCrAgJNLwHkj2Lka1B_kkNb0dNwiWiAN_Q@xxxxxxxxxxxxxx/ [2] ./demand_paging_test -b 64M -u MINOR -s shmem -a -v <n> -r <n> [-w] A quick rundown of the new flags (also detailed in later commits) -a registers all of guest memory to a single uffd. -r species the number of reader threads for polling the uffd. -w is what actually enables memory fault exits. All data was collected after applying the entire series. This series is based on the latest kvm/next (7cb79f433e75). Anish Moorthy (8): selftests/kvm: Fix bug in how demand_paging_test calculates paging rate selftests/kvm: Allow many vcpus per UFFD in demand paging test selftests/kvm: Switch demand paging uffd readers to epoll kvm: Allow hva_pfn_fast to resolve read-only faults. kvm: Add cap/kvm_run field for memory fault exits kvm/x86: Add mem fault exit on EPT violations kvm/arm64: Implement KVM_CAP_MEM_FAULT_NOWAIT for arm64 selftests/kvm: Handle mem fault exits in demand paging test Documentation/virt/kvm/api.rst | 42 ++++ arch/arm64/kvm/arm.c | 1 + arch/arm64/kvm/mmu.c | 14 +- arch/x86/kvm/mmu/mmu.c | 23 +- arch/x86/kvm/x86.c | 1 + include/linux/kvm_host.h | 13 + include/uapi/linux/kvm.h | 13 +- tools/include/uapi/linux/kvm.h | 7 + .../selftests/kvm/aarch64/page_fault_test.c | 4 +- .../selftests/kvm/demand_paging_test.c | 237 ++++++++++++++---- .../selftests/kvm/include/userfaultfd_util.h | 18 +- .../selftests/kvm/lib/userfaultfd_util.c | 160 +++++++----- virt/kvm/kvm_main.c | 48 +++- 13 files changed, 442 insertions(+), 139 deletions(-) -- 2.39.1.581.gbfd45094c4-goog