On Thu, Sep 26, 2019 at 04:17:56PM -0700, Ben Gardon wrote: > Over the years, the needs for KVM's x86 MMU have grown from running small > guests to live migrating multi-terabyte VMs with hundreds of vCPUs. Where > we previously depended upon shadow paging to run all guests, we now have > the use of two dimensional paging (TDP). This RFC proposes and > demonstrates two major changes to the MMU. First, an iterator abstraction > that simplifies traversal of TDP paging structures when running an L1 > guest. This abstraction takes advantage of the relative simplicity of TDP > to simplify the implementation of MMU functions. Second, this RFC changes > the synchronization model to enable more parallelism than the monolithic > MMU lock. This "direct mode" MMU is currently in use at Google and has > given us the performance necessary to live migrate our 416 vCPU, 12TiB > m2-ultramem-416 VMs. > > The primary motivation for this work was to handle page faults in > parallel. When VMs have hundreds of vCPUs and terabytes of memory, KVM's > MMU lock suffers from extreme contention, resulting in soft-lockups and > jitter in the guest. To demonstrate this I also written, and will submit > a demand paging test to KVM selftests. The test creates N vCPUs, which > each touch disjoint regions of memory. Page faults are picked up by N > user fault FD handlers, one for each vCPU. Over a 1 second profile of > the demand paging test, with 416 vCPUs and 4G per vCPU, 98% of the > execution time was spent waiting for the MMU lock! With this patch > series the total execution time for the test was reduced by 89% and the > execution was dominated by get_user_pages and the user fault FD ioctl. > As a secondary benefit, the iterator-based implementation does not use > the rmap or struct kvm_mmu_pages, saving ~0.2% of guest memory in KVM > overheads. > > The goal of this RFC is to demonstrate and gather feedback on the > iterator pattern, the memory savings it enables for the "direct case" > and the changes to the synchronization model. Though they are interwoven > in this series, I will separate the iterator from the synchronization > changes in a future series. I recognize that some feature work will be > needed to make this patch set ready for merging. That work is detailed > at the end of this cover letter. Diving into this series is on my todo list, but realistically that's not going to happen until after KVM forum. Sorry I can't provide timely feedback.