On 23/01/20 19:04, Ben Gardon wrote: > When handling page faults for many vCPUs during demand paging, KVM's MMU > lock becomes highly contended. This series creates a test with a naive > userfaultfd based demand paging implementation to demonstrate that > contention. This test serves both as a functional test of userfaultfd > and a microbenchmark of demand paging performance with a variable number > of vCPUs and memory per vCPU. > > The test creates N userfaultfd threads, N vCPUs, and a region of memory > with M pages per vCPU. The N userfaultfd polling threads are each set up > to serve faults on a region of memory corresponding to one of the vCPUs. > Each of the vCPUs is then started, and touches each page of its disjoint > memory region, sequentially. In response to faults, the userfaultfd > threads copy a static buffer into the guest's memory. This creates a > worst case for MMU lock contention as we have removed most of the > contention between the userfaultfd threads and there is no time required > to fetch the contents of guest memory. > > This test was run successfully on Intel Haswell, Broadwell, and > Cascadelake hosts with a variety of vCPU counts and memory sizes. > > This test was adapted from the dirty_log_test. > > The series can also be viewed in Gerrit here: > https://linux-review.googlesource.com/c/virt/kvm/kvm/+/1464 > (Thanks to Dmitry Vyukov <dvyukov@xxxxxxxxxx> for setting up the Gerrit > instance) > > v4 (Responding to feedback from Andrew Jones, Peter Xu, and Peter Shier): > - Tested this revision by running > demand_paging_test > at each commit in the series on an Intel Haswell machine. Ran > demand_paging_test -u -v 8 -b 8M -d 10 > on the same machine at the last commit in the series. > - Readded partial aarch64 support, though aarch64 and s390 remain > untested > - Implemented pipefd polling to reduce UFFD thread exit latency > - Added variable unit input for memory size so users can pass command > line arguments of the form -b 24M instead of the raw number or bytes > - Moved a missing break from a patch later in the series to an earlier > one > - Moved to syncing per-vCPU global variables to guest and looking up > per-vcpu arguments based on a single CPU ID passed to each guest > vCPU. This allows for future patches to pass more than the supported > number of arguments for each arch to the vCPUs. > - Implemented vcpu_args_set for s390 and aarch64 [UNTESTED] > - Changed vm_create to always allocate memslot 0 at 4G instead of only > when the number of pages required is large. > - Changed vcpu_wss to vcpu_memory_size for clarity. > > Ben Gardon (10): > KVM: selftests: Create a demand paging test > KVM: selftests: Add demand paging content to the demand paging test > KVM: selftests: Add configurable demand paging delay > KVM: selftests: Add memory size parameter to the demand paging test > KVM: selftests: Pass args to vCPU in global vCPU args struct > KVM: selftests: Add support for vcpu_args_set to aarch64 and s390x > KVM: selftests: Support multiple vCPUs in demand paging test > KVM: selftests: Time guest demand paging > KVM: selftests: Stop memslot creation in KVM internal memslot region > KVM: selftests: Move memslot 0 above KVM internal memslots > > tools/testing/selftests/kvm/.gitignore | 1 + > tools/testing/selftests/kvm/Makefile | 5 +- > .../selftests/kvm/demand_paging_test.c | 680 ++++++++++++++++++ > .../testing/selftests/kvm/include/test_util.h | 2 + > .../selftests/kvm/lib/aarch64/processor.c | 33 + > tools/testing/selftests/kvm/lib/kvm_util.c | 27 +- > .../selftests/kvm/lib/s390x/processor.c | 35 + > tools/testing/selftests/kvm/lib/test_util.c | 61 ++ > 8 files changed, 839 insertions(+), 5 deletions(-) > create mode 100644 tools/testing/selftests/kvm/demand_paging_test.c > create mode 100644 tools/testing/selftests/kvm/lib/test_util.c > Queued patches 1-9, thanks. Paolo