Re: [RFC PATCH 00/28] kvm: mmu: Rework the x86 TDP direct mapped case

Sean Christopherson <sean.j.christopherson@xxxxxxxxx> · Thu, 17 Oct 2019 11:50:02 -0700

On Thu, Sep 26, 2019 at 04:17:56PM -0700, Ben Gardon wrote:
> Over the years, the needs for KVM's x86 MMU have grown from running small
> guests to live migrating multi-terabyte VMs with hundreds of vCPUs. Where
> we previously depended upon shadow paging to run all guests, we now have
> the use of two dimensional paging (TDP). This RFC proposes and
> demonstrates two major changes to the MMU. First, an iterator abstraction 
> that simplifies traversal of TDP paging structures when running an L1
> guest. This abstraction takes advantage of the relative simplicity of TDP
> to simplify the implementation of MMU functions. Second, this RFC changes
> the synchronization model to enable more parallelism than the monolithic
> MMU lock. This "direct mode" MMU is currently in use at Google and has
> given us the performance necessary to live migrate our 416 vCPU, 12TiB
> m2-ultramem-416 VMs.
> 
> The primary motivation for this work was to handle page faults in
> parallel. When VMs have hundreds of vCPUs and terabytes of memory, KVM's
> MMU lock suffers from extreme contention, resulting in soft-lockups and
> jitter in the guest. To demonstrate this I also written, and will submit
> a demand paging test to KVM selftests. The test creates N vCPUs, which
> each touch disjoint regions of memory. Page faults are picked up by N
> user fault FD handlers, one for each vCPU. Over a 1 second profile of
> the demand paging test, with 416 vCPUs and 4G per vCPU, 98% of the
> execution time was spent waiting for the MMU lock! With this patch
> series the total execution time for the test was reduced by 89% and the
> execution was dominated by get_user_pages and the user fault FD ioctl.
> As a secondary benefit, the iterator-based implementation does not use
> the rmap or struct kvm_mmu_pages, saving ~0.2% of guest memory in KVM
> overheads.
> 
> The goal of this  RFC is to demonstrate and gather feedback on the
> iterator pattern, the memory savings it enables for the "direct case"
> and the changes to the synchronization model. Though they are interwoven
> in this series, I will separate the iterator from the synchronization
> changes in a future series. I recognize that some feature work will be
> needed to make this patch set ready for merging. That work is detailed
> at the end of this cover letter.

Diving into this series is on my todo list, but realistically that's not
going to happen until after KVM forum.  Sorry I can't provide timely
feedback.