Re: [RFC PATCH 00/28] kvm: mmu: Rework the x86 TDP direct mapped case

Paolo Bonzini <pbonzini@xxxxxxxxxx> · Fri, 18 Oct 2019 15:42:54 +0200

On 17/10/19 20:50, Sean Christopherson wrote:
> On Thu, Sep 26, 2019 at 04:17:56PM -0700, Ben Gardon wrote:
>> Over the years, the needs for KVM's x86 MMU have grown from running small
>> guests to live migrating multi-terabyte VMs with hundreds of vCPUs. Where
>> we previously depended upon shadow paging to run all guests, we now have
>> the use of two dimensional paging (TDP). This RFC proposes and
>> demonstrates two major changes to the MMU. First, an iterator abstraction 
>> that simplifies traversal of TDP paging structures when running an L1
>> guest. This abstraction takes advantage of the relative simplicity of TDP
>> to simplify the implementation of MMU functions. Second, this RFC changes
>> the synchronization model to enable more parallelism than the monolithic
>> MMU lock. This "direct mode" MMU is currently in use at Google and has
>> given us the performance necessary to live migrate our 416 vCPU, 12TiB
>> m2-ultramem-416 VMs.
>>
>> The primary motivation for this work was to handle page faults in
>> parallel. When VMs have hundreds of vCPUs and terabytes of memory, KVM's
>> MMU lock suffers from extreme contention, resulting in soft-lockups and
>> jitter in the guest. To demonstrate this I also written, and will submit
>> a demand paging test to KVM selftests. The test creates N vCPUs, which
>> each touch disjoint regions of memory. Page faults are picked up by N
>> user fault FD handlers, one for each vCPU. Over a 1 second profile of
>> the demand paging test, with 416 vCPUs and 4G per vCPU, 98% of the
>> execution time was spent waiting for the MMU lock! With this patch
>> series the total execution time for the test was reduced by 89% and the
>> execution was dominated by get_user_pages and the user fault FD ioctl.
>> As a secondary benefit, the iterator-based implementation does not use
>> the rmap or struct kvm_mmu_pages, saving ~0.2% of guest memory in KVM
>> overheads.
>>
>> The goal of this  RFC is to demonstrate and gather feedback on the
>> iterator pattern, the memory savings it enables for the "direct case"
>> and the changes to the synchronization model. Though they are interwoven
>> in this series, I will separate the iterator from the synchronization
>> changes in a future series. I recognize that some feature work will be
>> needed to make this patch set ready for merging. That work is detailed
>> at the end of this cover letter.
> 
> Diving into this series is on my todo list, but realistically that's not
> going to happen until after KVM forum.  Sorry I can't provide timely
> feedback.

Same here.  I was very lazily waiting to get the big picture from Ben's
talk.

Paolo