On 17/10/19 20:50, Sean Christopherson wrote: > On Thu, Sep 26, 2019 at 04:17:56PM -0700, Ben Gardon wrote: >> Over the years, the needs for KVM's x86 MMU have grown from running small >> guests to live migrating multi-terabyte VMs with hundreds of vCPUs. Where >> we previously depended upon shadow paging to run all guests, we now have >> the use of two dimensional paging (TDP). This RFC proposes and >> demonstrates two major changes to the MMU. First, an iterator abstraction >> that simplifies traversal of TDP paging structures when running an L1 >> guest. This abstraction takes advantage of the relative simplicity of TDP >> to simplify the implementation of MMU functions. Second, this RFC changes >> the synchronization model to enable more parallelism than the monolithic >> MMU lock. This "direct mode" MMU is currently in use at Google and has >> given us the performance necessary to live migrate our 416 vCPU, 12TiB >> m2-ultramem-416 VMs. >> >> The primary motivation for this work was to handle page faults in >> parallel. When VMs have hundreds of vCPUs and terabytes of memory, KVM's >> MMU lock suffers from extreme contention, resulting in soft-lockups and >> jitter in the guest. To demonstrate this I also written, and will submit >> a demand paging test to KVM selftests. The test creates N vCPUs, which >> each touch disjoint regions of memory. Page faults are picked up by N >> user fault FD handlers, one for each vCPU. Over a 1 second profile of >> the demand paging test, with 416 vCPUs and 4G per vCPU, 98% of the >> execution time was spent waiting for the MMU lock! With this patch >> series the total execution time for the test was reduced by 89% and the >> execution was dominated by get_user_pages and the user fault FD ioctl. >> As a secondary benefit, the iterator-based implementation does not use >> the rmap or struct kvm_mmu_pages, saving ~0.2% of guest memory in KVM >> overheads. >> >> The goal of this RFC is to demonstrate and gather feedback on the >> iterator pattern, the memory savings it enables for the "direct case" >> and the changes to the synchronization model. Though they are interwoven >> in this series, I will separate the iterator from the synchronization >> changes in a future series. I recognize that some feature work will be >> needed to make this patch set ready for merging. That work is detailed >> at the end of this cover letter. > > Diving into this series is on my todo list, but realistically that's not > going to happen until after KVM forum. Sorry I can't provide timely > feedback. Same here. I was very lazily waiting to get the big picture from Ben's talk. Paolo