On Fri, Apr 15, 2022 at 5:04 PM Oliver Upton <oupton@xxxxxxxxxx> wrote: > > On Fri, Apr 15, 2022 at 04:35:24PM -0700, David Matlack wrote: > > On Fri, Apr 15, 2022 at 2:59 PM Oliver Upton <oupton@xxxxxxxxxx> wrote: > > > > > > Presently KVM only takes a read lock for stage 2 faults if it believes > > > the fault can be fixed by relaxing permissions on a PTE (write unprotect > > > for dirty logging). Otherwise, stage 2 faults grab the write lock, which > > > predictably can pile up all the vCPUs in a sufficiently large VM. > > > > > > The x86 port of KVM has what it calls the TDP MMU. Basically, it is an > > > MMU protected by the combination of a read-write lock and RCU, allowing > > > page walkers to traverse in parallel. > > > > > > This series is strongly inspired by the mechanics of the TDP MMU, > > > making use of RCU to protect parallel walks. Note that the TLB > > > invalidation mechanics are a bit different between x86 and ARM, so we > > > need to use the 'break-before-make' sequence to split/collapse a > > > block/table mapping, respectively. > > > > An alternative (or perhaps "v2" [1]) is to make x86's TDP MMU > > arch-neutral and port it to support ARM's stage-2 MMU. This is based > > on a few observations: > > > > - The problems that motivated the development of the TDP MMU are not > > x86-specific (e.g. parallelizing faults during the post-copy phase of > > Live Migration). > > - The synchronization in the TDP MMU (read/write lock, RCU for PT > > freeing, atomic compare-exchanges for modifying PTEs) is complex, but > > would be equivalent across architectures. > > - Eventually RISC-V is going to want similar performance (my > > understanding is RISC-V MMU is already a copy-paste of the ARM MMU), > > and it'd be a shame to re-implement TDP MMU synchronization a third > > time. > > - The TDP MMU includes support for various performance features that > > would benefit other architectures, such as eager page splitting, > > deferred zapping, lockless write-protection resolution, and (coming > > soon) in-place huge page promotion. > > - And then there's the obvious wins from less code duplication in KVM > > (e.g. get rid of the RISC-V MMU copy, increased code test coverage, > > ...). > > I definitely agree with the observation -- we're all trying to solve the > same set of issues. And I completely agree that a good long term goal > would be to create some common parts for all architectures. Less work > for us ARM folks it would seem ;-) > > What's top of mind is how we paper over the architectural differences > between all of the architectures, especially when we need to do entirely > different things because of the arch. > > For example, I whine about break-before-make a lot throughout this > series which is somewhat unique to ARM. I don't think we can do eager > page splitting on the base architecture w/o doing the TLBI for every > block. Not only that, we can't do a direct valid->valid change without > first making an invalid PTE visible to hardware. Things get even more > exciting when hardware revisions relax break-before-make requirements. Gotcha, so porting the TDP MMU to ARM would require adding break-before-make support. That seems feasible and we could guard it behind a e.g. static_key so there is no runtime overhead for architectures (or ARM hardware revisions) that do not require it. Anything else come to mind as major architectural differences? > > There's also significant architectural differences between KVM on x86 > and KVM for ARM. Our paging code runs both in the host kernel and the > hyp/lowvisor, and does: > > - VM two dimensional paging (stage 2 MMU) > - Hyp's own MMU (stage 1 MMU) > - Host kernel isolation (stage 2 MMU) > > each with its own quirks. The 'not exactly in the kernel' part will make > instrumentation a bit of a hassle too. Ah, interesting. It'd probably make sense to start with the VM 2-dimensional paging use-case and leave the other use-cases using the existing MMU, and then investigate transitioning the other use-cases. Similarly in x86 we still have the legacy MMU for shadow paging (e.g. hosts with no stage-2 hardware, and nested virtualization). > > None of this is meant to disagree with you in the slightest. I firmly > agree we need to share as many parts between the architectures as > possible. I'm just trying to call out a few of the things relating to > ARM that will make this annoying so that way whoever embarks on the > adventure will see it. > > > The side of this I haven't really looked into yet is ARM's stage-2 > > MMU, and how amenable it would be to being managed by the TDP MMU. But > > I assume it's a conventional page table structure mapping GPAs to > > HPAs, which is the most important overlap. > > > > That all being said, an arch-neutral TDP MMU would be a larger, more > > complex code change than something like this series (hence my "v2" > > caveat above). But I wanted to get this idea out there since the > > rubber is starting to hit the road on improving ARM MMU scalability. > > All for it. I cc'ed you on the series for this exact reason, I wanted to > grab your attention to spark the conversation :) > > -- > Thanks, > Oliver