[PATCH v4 00/10] Create a userfaultfd demand paging test

Ben Gardon <bgardon@xxxxxxxxxx> · Thu, 23 Jan 2020 10:04:26 -0800

When handling page faults for many vCPUs during demand paging, KVM's MMU
lock becomes highly contended. This series creates a test with a naive
userfaultfd based demand paging implementation to demonstrate that
contention. This test serves both as a functional test of userfaultfd
and a microbenchmark of demand paging performance with a variable number
of vCPUs and memory per vCPU.

The test creates N userfaultfd threads, N vCPUs, and a region of memory
with M pages per vCPU. The N userfaultfd polling threads are each set up
to serve faults on a region of memory corresponding to one of the vCPUs.
Each of the vCPUs is then started, and touches each page of its disjoint
memory region, sequentially. In response to faults, the userfaultfd
threads copy a static buffer into the guest's memory. This creates a
worst case for MMU lock contention as we have removed most of the
contention between the userfaultfd threads and there is no time required
to fetch the contents of guest memory.

This test was run successfully on Intel Haswell, Broadwell, and
Cascadelake hosts with a variety of vCPU counts and memory sizes.

This test was adapted from the dirty_log_test.

The series can also be viewed in Gerrit here:
https://linux-review.googlesource.com/c/virt/kvm/kvm/+/1464
(Thanks to Dmitry Vyukov <dvyukov@xxxxxxxxxx> for setting up the Gerrit
instance)

v4 (Responding to feedback from Andrew Jones, Peter Xu, and Peter Shier):
- Tested this revision by running
  demand_paging_test
  at each commit in the series on an Intel Haswell machine. Ran
  demand_paging_test -u -v 8 -b 8M -d 10
  on the same machine at the last commit in the series.
- Readded partial aarch64 support, though aarch64 and s390 remain
  untested
- Implemented pipefd polling to reduce UFFD thread exit latency
- Added variable unit input for memory size so users can pass command
  line arguments of the form -b 24M instead of the raw number or bytes
- Moved a missing break from a patch later in the series to an earlier
  one
- Moved to syncing per-vCPU global variables to guest and looking up
  per-vcpu arguments based on a single CPU ID passed to each guest
  vCPU. This allows for future patches to pass more than the supported
  number of arguments for each arch to the vCPUs.
- Implemented vcpu_args_set for s390 and aarch64 [UNTESTED]
- Changed vm_create to always allocate memslot 0 at 4G instead of only
  when the number of pages required is large.
- Changed vcpu_wss to vcpu_memory_size for clarity.

Ben Gardon (10):
  KVM: selftests: Create a demand paging test
  KVM: selftests: Add demand paging content to the demand paging test
  KVM: selftests: Add configurable demand paging delay
  KVM: selftests: Add memory size parameter to the demand paging test
  KVM: selftests: Pass args to vCPU in global vCPU args struct
  KVM: selftests: Add support for vcpu_args_set to aarch64 and s390x
  KVM: selftests: Support multiple vCPUs in demand paging test
  KVM: selftests: Time guest demand paging
  KVM: selftests: Stop memslot creation in KVM internal memslot region
  KVM: selftests: Move memslot 0 above KVM internal memslots

 tools/testing/selftests/kvm/.gitignore        |   1 +
 tools/testing/selftests/kvm/Makefile          |   5 +-
 .../selftests/kvm/demand_paging_test.c        | 680 ++++++++++++++++++
 .../testing/selftests/kvm/include/test_util.h |   2 +
 .../selftests/kvm/lib/aarch64/processor.c     |  33 +
 tools/testing/selftests/kvm/lib/kvm_util.c    |  27 +-
 .../selftests/kvm/lib/s390x/processor.c       |  35 +
 tools/testing/selftests/kvm/lib/test_util.c   |  61 ++
 8 files changed, 839 insertions(+), 5 deletions(-)
 create mode 100644 tools/testing/selftests/kvm/demand_paging_test.c
 create mode 100644 tools/testing/selftests/kvm/lib/test_util.c

-- 
2.25.0.341.g760bfbb309-goog