On 26/03/21 03:19, Sean Christopherson wrote:
The end goal of this series is to optimize the MMU notifiers to take mmu_lock if and only if the notification is relevant to KVM, i.e. the hva range overlaps a memslot. Large VMs (hundreds of vCPUs) are very sensitive to mmu_lock being taken for write at inopportune times, and such VMs also tend to be "static", e.g. backed by HugeTLB with minimal page shenanigans. The vast majority of notifications for these VMs will be spurious (for KVM), and eliding mmu_lock for spurious notifications avoids an otherwise unacceptable disruption to the guest. To get there without potentially degrading performance, e.g. due to multiple memslot lookups, especially on non-x86 where the use cases are largely unknown (from my perspective), first consolidate the MMU notifier logic by moving the hva->gfn lookups into common KVM. Applies on my TDP MMU TLB flushing bug fixes[*], which conflict horribly with the TDP MMU changes in this series. That code applies on kvm/queue (commit 4a98623d5d90, "KVM: x86/mmu: Mark the PAE roots as decrypted for shadow paging"). Speaking of conflicts, Ben will soon be posting a series to convert a bunch of TDP MMU flows to take mmu_lock only for read. Presumably there will be an absurd number of conflicts; Ben and I will sort out the conflicts in whichever series loses the race. Well tested on Intel and AMD. Compile tested for arm64, MIPS, PPC, PPC e500, and s390. Absolutely needs to be tested for real on non-x86, I give it even odds that I introduced an off-by-one bug somewhere. [*] https://lkml.kernel.org/r/20210325200119.1359384-1-seanjc@xxxxxxxxxx Patches 1-7 are x86 specific prep patches to play nice with moving the hva->gfn memslot lookups into common code. There ended up being waaay more of these than I expected/wanted, but I had a hell of a time getting the flushing logic right when shuffling the memslot and address space loops. In the end, I was more confident I got things correct by batching the flushes. Patch 8 moves the existing API prototypes into common code. It could technically be dropped since the old APIs are gone in the end, but I thought the switch to the new APIs would suck a bit less this way. Patch 9 moves arm64's MMU notifier tracepoints into common code so that they are not lost when arm64 is converted to the new APIs, and so that all architectures can benefit. Patch 10 moves x86's memslot walkers into common KVM. I chose x86 purely because I could actually test it. All architectures use nearly identical code, so I don't think it actually matters in the end. Patches 11-13 move arm64, MIPS, and PPC to the new APIs. Patch 14 yanks out the old APIs. Patch 15 adds the mmu_lock elision, but only for unpaired notifications. Patch 16 adds mmu_lock elision for paired .invalidate_range_{start,end}(). This is quite nasty and no small part of me thinks the patch should be burned with fire (I won't spoil it any further), but it's also the most problematic scenario for our particular use case. :-/ Patches 17-18 are additional x86 cleanups.
Queued and 1-9 and 18, thanks. There's a small issue in patch 10 that prevented me from committing 10-15, but they mostly look good.
Paolo